(Just to be clear, everything here is being sent by me as an individual. I’ve asked Brian to make any decisions about discussions of these concerns that might arise on list or in meetings.)
So, with a new baby and lots of work outside the IETF, I’ve been very busy, and haven’t had a chance to make a full, complete formal review of the newest (-06) revision, but I have identified some ongoing major (design/architectural) concerns, some minor concerns (questions, layout, etc.) and some nits I noticed in my quick read of -06 that I’d like to raise/share now. I may have more if/when I get time to do a full review. Overall, I think the document is moving along, and getting much better. I feel it still needs some more editorial work, some clarifications, and more group discussion on a few critical design decisions, and I’d like to see a few more full reviews such as the one Jouni did before I would be able to personally support moving it through WGLC, but I think it is getting closer. Major concerns: Concern 1: Mandatory TLS/DTLS Inappropriate in some Contexts I’ve raised this issue before, but I’m hoping that now that people have had a bit more time to think about all the use cases, see what it means in the real world, etc., there might be a bit more support for modifying the requirement for TLS/DTLS. TLS/DTLS makes sense in some cases, but if we are expecting RELOAD to be reusable, it is clear to me that it does not make sense in all cases. It was familiar to the editors, and well understood, so it made sense as a proposal, but I disagree with it being the mandatory/only solution. I am not advocating we allow unsecured system. There needs to be a way to do that for development/debugging, but you can do that with a null cipher, so that concern is addressed. Instead my concerns are two-fold: 1) TLS/DTLS is too heavy for query-response systems using direct-response routing. 2) There are applications that may want to secure at a different level or using a different technology. First, looking at DTLS for query-response systems using direct routing. A bit of background: Clearly, RELOAD should work for query-response systems, where the only use of the DHT is to obtain a small item of information (e.g. obtaining a registration or a certificate). Similarly, while there has been discussion about whether the current individual draft is the appropriate mechanism, the group hummed in Minneapolis to include direct routing in the protocol, and in SF the hum was repeated, with strong consensus to include it as an extension and ensure base draft doesn’t preclude it. Establishing a TLS/DTLS connection is a very heavy way to establish a connection between two peers for a simple response, particularly if using direct where the connection is used only to convey that response. (7 messages for DTLS [1]) With recursive response, it makes sense, since the connection is persistent, but a lighter mechanism (for example, including the requestors public cert in the request, and having the responder encrypt the reply) is far more appropriate for a direct routed response. The current draft mandates TLS/DTLS and only TLS/DTLS, leading to an artificially high cost for direct response routing. Based on my work with potential real-world users of P2PSIP going back almost 4 years, request-response is a very likely scenario. While the TLS-PSK/TLS-SRP stuff helps here, other mechanisms would still be more efficient. Similarly, we should allow alternate mechanisms: Even if one is using the recursive response approach for routing, there may be other architectures. A system of persistent peers (for example provisioned set-top boxes using reload to share information) may have ISP assigned IDs, and persistent peers connected to one another, and use hardware based encryption, VPNs, etc. between the peers. There are many ways to encrypt a connection, and mandating one and only one seems inappropriate. A better design is to provide a mechanism in the draft for an alternate encryption technique, recommend DTLS, provide information, and require some encryption be used. While someone could choose to build without encryption, the argument seems weak. They could also just ignore the requirement in the first place. In my opinion, we are more likely to have problems later (extensions that will have to modify the original draft to use a different security mechanism, deployments that are “almost” RELOAD but with different security etc.) if we proceed with the current draft’s approach. [1] Modadugu, N., Rescorla, E., "The Design and Implementation of Datagram TLS", Proceedings of ISOC NDSS 2004, February 2004. Concern 2: Alternate Mechanism Needed for Attach-Lite I indicated above why TLS/DTLS was overly expensive, particularly for direct response. Direct response is particularly likely to be used in closed environments (server farms using P2PSIP to distribute information, managed networks, ad-hoc, etc.), and these same deployments are among the most likely to use Attach-lite, as they may want to avoid full ICE. However, in section 5.5.2.3, TLS connectivity checks are required as the mechanism to check rechability. This is a great shortcut when TLS/DTLS is used, but an alternate mechanism needs to exist for systems opting not to use TLS/DTLS. Concern 3: TURN server usage/location not proved “not harmful” Section 8 of the draft defines a TURN server usage, which relies on having an accurate turnDensity defined. As the draft itself notes, if the density is too high, the process of finding a server becomes very expensive. In the Hiroshima meeting, the editors asserted that they had run experiments that showed it was “ok and worked fine”, but declined to provide or share any results. While I understand why (it's a pain to document this kind of thing), this is a critical part of the deployment, and the suggested algorithm and approach is not, as far as know, deployed or documented/analyzed anywhere in the P2P literature. In the absence of some reference or an analysis or simulation shared with the group, I’m concerned about the potential impact this will have on deployments, and don't think we should bless something this unproven. I would recommend either some study or analysis is presented by the editors supporting that this works and isn't too bad when you get it wrong, or we move this to an extension (or merge it in with the ongoing work on service discovery) Haibin Song suggested something similar onlist: http://www.ietf.org/mail-archive/web/p2psip/current/msg05278.html Concern 4: Fixed Length (128 bit) Peer-ids Limit Reuse and are Unnecessary For some reason, the draft mandates 128 bit peer-ids. As far as I can see, there is no good reason for this (it doesn’t even appear resource IDs are limited to the same space), and I don't recall this ever being discussed by the WG. In fact, this seems like a bad idea from a reuse and extensibility perspective. Many other DHTs existing today use other spaces (for example original Chord's 160 bit), and reuse of existing code should be a priority. (for example being able to take an existing, debugged DHT code and plug it into RELOAD) I understand there is a routing efficiency argument to be made here for fixed length fields, so this is a good discussion for the list, but my opinion is that the small performance improvement isn't worth the cost in terms of flexibility and protocol reuse, and that peer-id’s should be variable length. Concern 5: The Private ID concept should not be in the base draft (breaks direct response routing and relay peer routing) While the idea of the Private ID concept (5.1.3) could break routing other than recursive response, if there are concerns about state being maintained. The current proposal is very minimal (a few lines), and doesn’t specify what happens if a message with a different response routing type. If the response will be routed directly or via a relay, what happens to that state? It would be better to place this optimization into an extension. That way, an overlay using a different routing technique can prohibit the optimization, or at least the extension can think through the proper way to handle expiring the state, etc. Including in the base draft causes problems with direct response support, which the group has mandated the draft support. Concern 6: Compressed destinations in via-list should not be in the base draft In section 5.3.2.2 there is discussion of via-list compression and adding an opaque field to represent the list only valid for that peer. The same concerns I have in Concern 5 apply here. This should also be an extension. Concern 7: Document shouldn't specify a particular bootstrap configuration mechanism 3.6.1 describes (in a very cursory way) a specific configuration mechanism for obtaining configuration. There are many ways to handle provisioning, and I don't recall the group ever discussing and blessing this particular mechanism. In fact, I recall there was discussion that this should be out of the scope of this document. Minor Concerns / Questions: Minor Concern 1: In 5.1.1, first bullet, why is this silently dropped? Might make sense, but not clear from the text. Can you explain? Minor Concern 2: In 5.2.1, where and why did 3 seconds and 4 times come from? Can we cite to provide a reason for this? Minor Concern 3: In 5.3.2.1 there is very little explanation about why a Config_Update needs to be accepted. Have we thought through the possible attack vectors of this? Can this text be clarified? Nits: Nit 1: The document needs to be spell-checked. While most from the last version are fixed, I saw a few just on a casual read that should be fixed before we move forward. (perfom at the end of 3.5, theAIMD in Appendix B, protcol in the IANA port registrations, imporvements in the acknowledgements, etc.) Nit 2: In 5.3.2.2 the structure field is destination_data, but the descriptive text is destination_value. They should agree. Similarly for node/peer as a destination type. Nit 3: In 1.2, the description of storage component says “It talks directly to the Topology Plugin to manage data replication” and topology plugin component description says “It uses the Message Transport component to send and receive overlay management messages, to the Storage component to manage data replication”. This seems a bit unclear and circular for the overall architecture description. Nit 4: In 1.2.5, second paragraph, it is described as “fairly generic”. That seems like kind of a meaningless term. Nit 5: In terminology, "host" is used to describe client and peer. Does that make sense? There is not a one-to-one host-peer relationship. Thanks, David (as individual) _______________________________________________ P2PSIP mailing list [email protected] https://www.ietf.org/mailman/listinfo/p2psip
