The RELOAD routing is fundamentally flawed and for that reason I am
tossing the Chord implementation that it defines and replacing
"periodic stabilization" with "correction on use and correct on
change" also I am ditching finger tables all together as they aren't
needed whatsoever with these changes. Also this brings global
broadcast and multicast that can be registered via the hash table
which are very useful for ICE and any other signaling with little
overhead.
I guess I will have to write my own RFC since RELOAD is IMHO using
outdated overlay technology and since having said that shouldn't it be
a flawless design and not broken right now? I think so.
@Narayan You could not be more wrong about "the number of fingers a
node has should provide a rough estimate (with accuracy in the order
of log N) of the size of the overlay". People use this formula in
Kadmelia, Chord, Chimera, etc and it does not work whatsoever and your
accuracy is even wrong too. I use the algorithm described here: http://dks.sics.se/pub/netsize.pdf
-g
On May 11, 2009, at 12:47 PM, Narayanan, Vidya wrote:
I've described a number of comments on section 9 below. But, if the
responses can be separated out under each topic, it will help the
discussion quite a bit, given the length of this email.
Routing (Section 9.3):
----------------------
The routing rule as specified does not work in all cases. If a node
with ID N and immediate successor S receives a message for resource
id k such that N < k < S, since k is neither a node directly
attached to N and nor is there a node whose ID is between N and k,
the message cannot be routed. The routing rule should instead read
as follows:
"If a peer is not responsible for a Resource-ID k, but is directly
connected to a node with Node-ID k, then it routes the message to
that node. Otherwise, it routes the request to the peer in the
routing table that has the largest Node-ID that is in the interval
between the peer and k. **If no such node is found, it finds the
smallest node id that is greater than k and routes the message to
that node.** The routing table is the union of the neighbor table
and the finger table."
The new text is as enclosed in "** **" above.
Redundancy (Section 9.4):
-------------------------
On the open issue, there should be no need to wait for successful
redundant copy storage to return a STORE response. It doesn't make
sense to send a failure if one of the redundant storage attempts
fails anyway - so, the wait becomes a pointless exercise.
"Note that a malicious node can return a success response but not
store the data locally or in the replica set. Requesting peers that
wish to ensure that the replication actually occurred SHOULD [[Open
Issue: SHOULD or MAY?]] contact each peer listed in the replicas
field of the Store response and retrieve a copy of the data."
The above text is not just relevant for redundant copies of data
stored - it is true for even the actual resource-id owner. Also, in
light of not waiting for the replica responses, the STORE response
will not contain a list of replicas - so, this needs to be corrected.
Joining (Section 9.5):
----------------------
It is not clear why the join procedure requires all neighbor and
finger connections to be established prior to sending the JOIN
message itself. To be functional in the overlay, only the immediate
successor relationships must be correctly in place. The other
successors, predecessors and fingers can be established
subsequently. This also doesn't cause nodes to establish
unnecessary connections unless the node has successfully joined.
There is also the question of whether a JP is supposed to establish
a TLS connection to the BP to send messages through it, which is
presently not addressed in the draft. It seems to me that a more
streamlined join process can be defined as follows:
1. JP connects to its chosen bootstrap node. JP MUST form a TLS
session with the bootstrap node, using the certificate or PSK
required for overlay membership.
2. JP sends a Join request message to its own Node-ID n, through
the BP. This is routed to the admitting peer (AP). The AP sends a
response to the Join.
3. JP also sends an Attach request message to n through the BP.
This may be sent in parallel with the Join message. The AP sends an
attach response, resulting in connection establishment between the
JP and AP.
4. AP does a series of Store requests to JP to store the data that
JP will be responsible for.
5. AP sends JP an Update explicitly labeling JP as its
predecessor. AP also sends an Update to all its other neighbors
with the new value of its neighbor set (including JP). At this
point, JP is part of the ring and responsible for a section of the
overlay. AP SHOULD now store any data that JP is responsible for as
part of replicas it will store (since AP is still JP's successor, it
will need to store the data as well anyway).
6. JP sends Attach requests to the neighbors and establishes
connections with the required successors and predecessors.
7. JP establishes finger connections by sending Attach requests to
peers (n+2^i). (Comments on finger selection below).
Updates (Section 9.7):
----------------------
Section 9.7.1. states that every time a connection is lost, the peer
should remove it from its neighbor table and find a different peer.
Section 9.7.3. talks about attempting to re-establish connections
when a neighbor is detected to be invalid. These are at odds. It
seems to make sense that the peer would always try to re-establish
connections and only remove that neighbor if the connection cannot
be re-established.
Further, having the enrollment server involved in specifying the
frequency of updates and probes seems strange. How is the
enrollment server supposed to know the magic values here? It seems
like it should be enough to specify a recommended value and leave it
at that. Since the finger stabilization is for optimization and the
neighbor stabilization is reactive anyway in the present draft,
having these parameters in the config document seem to be of little
use.
Finger entries (Section 9.7.3 and elsewhere):
---------------------------------------------
The draft requires 16 finger table entries - at first, I took those
to mean 16 different resource ids that will be probed per formula.
After reading 9.7.3, it appears that it is talking about 16
different fingers that a node should maintain. In a small overlay,
that's ridiculously large. The text is also unclear on what happens
when the overlay shrinks. Further, the text states that a peer
SHOULD consider the finger table entry valid if it is in the range
[n + 2^(numBitsInNodeId-i), n + 1.5 x 2^(numBitsInNodeId-i)]. For
small overlays, there may be no entries in this range for every i in
[0, 127].
A node should only need log N fingers to route in OlogN hops. What
we need to specify is an algorithm that uses the same pattern in
resource ids to probe - the fingers corresponding to multiple of
those will collapse to the same node in a small overlay. In our
implementation, the algorithm works as follows:
A node looks at the list of resource-ids corresponding to f=(n+2^i),
for all values of i=[0,127]. Let's say that it first probes n+2^0
and obtains a finger of f0. It checks f0 against the next set of
resource-ids to probe and determines the next larger resource-id to
probe from the list. It populates the finger table entries for all
other fingers f for which n < f < f0, with f0 as the corresponding
finger. Depending on the number of nodes in the overlay, this
should result in approximately logN fingers in the finger table at
any time. Periodic stabilization should adjust the number of
fingers automatically. In fact, the number of fingers a node has
should provide a rough estimate (with accuracy in the order of log
N) of the size of the overlay. The current text suggests that the
number of fingers should be decided based on the overlay size, which
does not seem intuitive or necessary.
Leaving (Section 9.9):
----------------------
In the event of a graceful leave, a peer should be able to send
updates to all members of the neighbor set. It seems redundant to
have the peer send a Leave message, followed by every peer receiving
the Leave to send updates. Is there a reason for this?
Thanks,
Vidya
_______________________________________________
P2PSIP mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/p2psip
_______________________________________________
P2PSIP mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/p2psip