Re: Generic helper I/O format
tor 2012-07-05 klockan 16:00 +1200 skrev Amos Jeffries: The blob only exists in this discussion for two reasons; the old helpers backward compatibility requires it, and you wanted to discuss a body field for the responses. Even not understanding properly the specifics of why you wanted to discuss it I dont think its a good idea since we can key-pair and encode anything new. Agreed. And please use URL-encoding requiring encoding of at minimum % and any whitespace. It's much less error prone than the quoting mechanism. If we can agree on making it key-pair across the board for all the details which might be passed back we can move on. Yes. With blob only for legacy helpers. Regards Henrik
Re: Generic helper I/O format
On 5/07/2012 4:16 p.m., Robert Collins wrote: On Thu, Jul 5, 2012 at 4:00 PM, Amos Jeffries wrote: Why do we need backwards compat in the new protocol? As an alternative, consider setting a protocol= option on the helpers, making the default our latest-and-greatest,a nd folk running third-party helpers can set protocl=v1 or whatever to get backwards compat. This lets us also warn when we start deprecating the old protocol, that its going away. -Rob I'm trying to avoid adding more options and complexity with this. To be done properly that would mean adding whole new config design bits to set protocol= on several of the more hidden helper lookups. But with one shared parser and a fallback per-helper parser handling the blob cases for backward compatibility we can migrate people over without any config change. Amos
Re: Generic helper I/O format
On 4/07/2012 4:54 p.m., Alex Rousskov wrote: On 07/03/2012 08:10 PM, Amos Jeffries wrote: [channel-ID] (OK/ERR/BH) [key-pairs] blob terminator How will the generic code be able to tell where key-pairs end and blob begins? The generic code will have a format for key-pair of ( 1#token = 1#(token / quoted-string) ) [to be finalized still] with a set of known key names. If the bytes being parsed do not match that format and keys its not part of the key-pairs. There are two problems with this approach, IMO: * It cannot handle a body that just happened to start with supported key-value pair characters. This is shared with any body which just happens to start with the BS token as well. This is a good reason to use key-pair across the board and not document any blob support. * It cannot detect a case of helper sending unsupported key-value pairs (if that helper may send a body too). Yes. It also requires extra code/work to pre-register all known key names, but that is minor. This is not an issue. We have to code operations for handling any accepted key-pair in the handler callbacks anyway. The plan for patch-#2 is to discuss key-pair syntax (which we seem to have started), and move some particular key-pair parsing from helper handlers into the shared one. Opening up user=, tag=, password=, log=, message= keys to non-ACL helper responses will be VERY useful for RADIUS and systems needing to maintain cross-helper request state. I think we should keep it as simple as possible without causing problems with existing parsers and leaving some room for future extensions: * key/name: alphanumeric char followed by anything except (whitespace or '=' or terminator) * value: either quoted-string or anything except (whitespace or terminator); could be empty The parser strips whitespace and strips quotes around quoted strings. Quoted strings use backslash to escape any octet. Okay. So modelling on the ssl_crtd parser instead of external_acl_type parser. If you want one format for all, you probably need something like [channel-ID] (OK/ERR/BH) [key-pairs] [BS body] terminator where BS is some special marker that starts all bodies and cannot be found in key-pairs. Since any body-carrying message is likely to have new lines (and, hence, would need a non-newline terminator), this BS sequence could be something like body:\n, I guess. We REQUIRE a format that does not rely on new special markers like that in order to cope with the existing helpers responses in a backward-compatible way without additional user configuration. When your parser sees n=v after ERR and n is a known key name, how would it distinguish these two possibilities: * This is a start of a blob * This is a key=value pair The second. *IF* it was in the [key-pair] area directly after ERR. Once blob starts everything is blob. Or would you advocate moving the external_acl_type protocol=N.N config option into the helper child object for use by all helpers on determining what key-pairs and format is accepted? I am probably missing some important use cases, but my generic parser would just call a virtual method for each key=value pair found (with no pre-registration of supported keys) and another virtual method for the body blob, if any. It will parse everything using the key, value, and BS formats discussed above. You mean function pointers right? not virtual methods in the HelperReply object? and um, this is a high-use code path - both in parsing and in retrieving the details out of HelperReply. That style of generic parser is good for new formats but not so much for backward-compatibility from old formats. I am going to have to manually receive things like the TT/LD/AF/NA NTLM result codes and map the existing format into record of the HelperReply object. Amos
Re: Generic helper I/O format
On 07/04/2012 12:02 AM, Amos Jeffries wrote: On 4/07/2012 4:54 p.m., Alex Rousskov wrote: On 07/03/2012 08:10 PM, Amos Jeffries wrote: [channel-ID] (OK/ERR/BH) [key-pairs] blob terminator How will the generic code be able to tell where key-pairs end and blob begins? The generic code will have a format for key-pair of ( 1#token = 1#(token / quoted-string) ) [to be finalized still] with a set of known key names. If the bytes being parsed do not match that format and keys its not part of the key-pairs. There are two problems with this approach, IMO: * It cannot handle a body that just happened to start with supported key-value pair characters. This is shared with any body which just happens to start with the BS token as well. No, it is not. BS is required if the body is present and BS is not a valid key name. Thus, BS cannot be confused with a start of a key-value pair _and_ if a body starts with BS as well, there is no problem because we already know to expect a body there (after BS). [ but I am glad you acknowledge this issue with the BS-less approach ] I think we should keep it as simple as possible without causing problems with existing parsers and leaving some room for future extensions: * key/name: alphanumeric char followed by anything except (whitespace or '=' or terminator) * value: either quoted-string or anything except (whitespace or terminator); could be empty The parser strips whitespace and strips quotes around quoted strings. Quoted strings use backslash to escape any octet. Okay. So modelling on the ssl_crtd parser instead of external_acl_type parser. I am trying to model this on the above as simple as ... criteria. I do not really remember the specifics of any particular parser we have (so adjustments may be necessary). If you want one format for all, you probably need something like [channel-ID] (OK/ERR/BH) [key-pairs] [BS body] terminator where BS is some special marker that starts all bodies and cannot be found in key-pairs. Since any body-carrying message is likely to have new lines (and, hence, would need a non-newline terminator), this BS sequence could be something like body:\n, I guess. We REQUIRE a format that does not rely on new special markers like that in order to cope with the existing helpers responses in a backward-compatible way without additional user configuration. When your parser sees n=v after ERR and n is a known key name, how would it distinguish these two possibilities: * This is a start of a blob * This is a key=value pair The second. *IF* it was in the [key-pair] area directly after ERR. And how do you decide that IF? It is exactly the same question I asked, just phrased differently. The BS-less approach does not work (as you have already acknowledged in the beginning of this email): The end of [key-pair] area is indistinguishable from the beginning of a blob in this case. In other words, you do not know whether the blob has started or not. That is exactly the problem BS is solving (because it is not a valid key). Or would you advocate moving the external_acl_type protocol=N.N config option into the helper child object for use by all helpers on determining what key-pairs and format is accepted? I am probably missing some important use cases, but my generic parser would just call a virtual method for each key=value pair found (with no pre-registration of supported keys) and another virtual method for the body blob, if any. It will parse everything using the key, value, and BS formats discussed above. You mean function pointers right? not virtual methods in the HelperReply object? I did mean virtual methods. The child parsers will form HelperReply's kid objects by converting key-value pairs and other parsed strings into helper-specific replies. Those parsers may be integrated with HelperReply hierarchy or be separate from it. If they are integrated, it would look like this: class HelperReply { ... virtual void setResult(string); virtual void setKeyValue(string, string); virtual void setBody(MemBuf); }; ... // a sketch of a child implementation of setKeyValue() Ssl::CrtdReply::setKeyValue(string k, string v) { if (k == foo) ... this-foo_ = StringToInt(v); else HelperReply::parseKeyValue(k,v); } and um, this is a high-use code path - both in parsing and in retrieving the details out of HelperReply. Agreed. I cannot think of a significantly more efficient but still general approach that results in helper-specific ready-to-use replies. That style of generic parser is good for new formats but not so much for backward-compatibility from old formats. I am going to have to manually receive things like the TT/LD/AF/NA NTLM result codes and map the existing format into record of the HelperReply object. Why not have NtmlHelperReply::setResult() do the mapping? Is there
Re: Generic helper I/O format
ons 2012-07-04 klockan 13:02 -0600 skrev Alex Rousskov: No, it is not. BS is required if the body is present and BS is not a valid key name. Thus, BS cannot be confused with a start of a key-value pair _and_ if a body starts with BS as well, there is no problem because we already know to expect a body there (after BS). Why do we need a blob marker? Why not simply use key=value pairs across the board? Why not have NtmlHelperReply::setResult() do the mapping? Is there something format-incompatible with those NTLM result codes that the generic parser cannot parse them using the following format? [channel-ID] result [key-pairs] [BS blob] terminator NTLM do not need a blob. It's just a value. Moving to key=value is fine. Additionally worth noting is that = is not a valid character in NTLM blobs so same parser can be used if tokens without = is supported (keyless values). Regards Henrik
Re: Generic helper I/O format
On 07/04/2012 02:34 PM, Henrik Nordström wrote: ons 2012-07-04 klockan 13:02 -0600 skrev Alex Rousskov: Why not simply use key=value pairs across the board? We need blobs because some values in use today are not expressed as tokens or quoted strings (the two value formats Amos is considering). We could, of course, _encode_ those inconvenient values so that things like SSL certificates can be represented as huge quoted strings. Is that what you are getting at? I am not against that, even though I am a little worried about the added encoding overhead. This will require changing ssl_crtd helper code, but I do not think that would be a big problem. Why not have NtmlHelperReply::setResult() do the mapping? Is there something format-incompatible with those NTLM result codes that the generic parser cannot parse them using the following format? [channel-ID] result [key-pairs] [BS blob] terminator NTLM do not need a blob. We are talking about a general format, not NTLM-specific format. Blob is optional. If NTLM helper does not need it, it will not use it. It's just a value. Moving to key=value is fine. Additionally worth noting is that = is not a valid character in NTLM blobs so same parser can be used if tokens without = is supported (keyless values). Yes, anonymous values or valueless names can be allowed if it helps existing helpers. Can somebody paste a typical NTLM helper response or two? Thank you, Alex.
Re: Generic helper I/O format
On 05.07.2012 08:34, Henrik Nordström wrote: ons 2012-07-04 klockan 13:02 -0600 skrev Alex Rousskov: No, it is not. BS is required if the body is present and BS is not a valid key name. Thus, BS cannot be confused with a start of a key-value pair _and_ if a body starts with BS as well, there is no problem because we already know to expect a body there (after BS). Why do we need a blob marker? Why not simply use key=value pairs across the board? We need it as a temporary measure to handle old helpers responses. I agree on the desire to move to key-pair across the board and am trying to reach it out of this forest of individual fixed-syntax parsers. I think its possible (easy even) to do it using a blob instead of new config settings. Why not have NtmlHelperReply::setResult() do the mapping? Is there something format-incompatible with those NTLM result codes that the generic parser cannot parse them using the following format? [channel-ID] result [key-pairs] [BS blob] terminator NTLM do not need a blob. It's just a value. Moving to key=value is fine. Additionally worth noting is that = is not a valid character in NTLM blobs so same parser can be used if tokens without = is supported (keyless values). Good point here. Same stands for all the other helpers. Alex, before starting I went through each of the helper response styles we have and documented exactly what comes and goes (http://wiki.squid-cache.org/Features/AddonHelpers) to determined that none of the response formats started with a token containing [a-z0-9] then =. We are *not* starting from a random-input state where blob can overlap with key. Anything which does not start with a registered result status code, is blob automatically. The ones which do use result are safely mapped to the key-pair format based on the presence of a valid key= token, or its absence indicating blob to be handled by the old parser. Have a look through that wiki page at the set of Result line sent back to Squid and its clear. The proposed shared format is *exactly* what external ACL already processes. It also if you look at the code has the proposed background blob for backward compatibility with some squid-2.5 era helpers which only sent a reason field in plain-text (random input! yuck, as an established PoC it seems not to have had any complaints). Amos
Re: Generic helper I/O format
On 05.07.2012 10:15, Alex Rousskov wrote: On 07/04/2012 02:34 PM, Henrik Nordström wrote: ons 2012-07-04 klockan 13:02 -0600 skrev Alex Rousskov: Why not simply use key=value pairs across the board? We need blobs because some values in use today are not expressed as tokens or quoted strings (the two value formats Amos is considering). We could, of course, _encode_ those inconvenient values so that things like SSL certificates can be represented as huge quoted strings. Is that what you are getting at? I am not against that, even though I am a little worried about the added encoding overhead. This will require changing ssl_crtd helper code, but I do not think that would be a big problem. Why not have NtmlHelperReply::setResult() do the mapping? Is there something format-incompatible with those NTLM result codes that the generic parser cannot parse them using the following format? [channel-ID] result [key-pairs] [BS blob] terminator NTLM do not need a blob. We are talking about a general format, not NTLM-specific format. Blob is optional. If NTLM helper does not need it, it will not use it. It's just a value. Moving to key=value is fine. Additionally worth noting is that = is not a valid character in NTLM blobs so same parser can be used if tokens without = is supported (keyless values). Yes, anonymous values or valueless names can be allowed if it helps existing helpers. Can somebody paste a typical NTLM helper response or two? http://wiki.squid-cache.org/Features/AddonHelpers#Negotiate_and_NTLM_Scheme If you note, the reason field is usually something other than OK/ERR, so we can parse the old syntax differently to the new OK/ERR responses. Where there is an overlap (BH) the reason field is an error message, usually the program name and a ':' colon (not valid in the proposed key-name) or number in brackets (again not valid in key name) or plain text message (no '=' expected on human-readable words). Amos
Re: Generic helper I/O format
On 07/04/2012 04:52 PM, Amos Jeffries wrote: On 05.07.2012 08:34, Henrik Nordström wrote: ons 2012-07-04 klockan 13:02 -0600 skrev Alex Rousskov: Alex, before starting I went through each of the helper response styles we have and documented exactly what comes and goes (http://wiki.squid-cache.org/Features/AddonHelpers) to determined that none of the response formats started with a token containing [a-z0-9] then =. We are *not* starting from a random-input state where blob can overlap with key. I thought you wanted a general format. I agree that current bodies will not clash with current keys. Alex. Anything which does not start with a registered result status code, is blob automatically. The ones which do use result are safely mapped to the key-pair format based on the presence of a valid key= token, or its absence indicating blob to be handled by the old parser. Have a look through that wiki page at the set of Result line sent back to Squid and its clear. The proposed shared format is *exactly* what external ACL already processes. It also if you look at the code has the proposed background blob for backward compatibility with some squid-2.5 era helpers which only sent a reason field in plain-text (random input! yuck, as an established PoC it seems not to have had any complaints). Amos
Re: Generic helper I/O format
On 07/04/2012 05:07 PM, Amos Jeffries wrote: http://wiki.squid-cache.org/Features/AddonHelpers#Negotiate_and_NTLM_Scheme If you note, the reason field is usually something other than OK/ERR, so we can parse the old syntax differently to the new OK/ERR responses. Where there is an overlap (BH) the reason field is an error message, usually the program name and a ':' colon (not valid in the proposed key-name) or number in brackets (again not valid in key name) or plain text message (no '=' expected on human-readable words). You can let setResult() and setKeyValue() methods in HelperReply children to indicate that what follows is a blob: NtlmHelperReply::setResult(string result) { if (result == BH) { ... remember that the result was BH ... // no more key=value pairs; read the entire reason expectBody(true); } ... } This is not something you can easily express in a static format spec, but it will work nicely to avoid custom code when handling reason phrases in BH responses or ssl_crtd certificates. After expectBody() is called, the general parser will get everything up to the terminator and call setBody(). HTH, Alex.
Re: Generic helper I/O format
On 5/07/2012 11:10 a.m., Alex Rousskov wrote: On 07/04/2012 04:52 PM, Amos Jeffries wrote: On 05.07.2012 08:34, Henrik Nordström wrote: ons 2012-07-04 klockan 13:02 -0600 skrev Alex Rousskov: Alex, before starting I went through each of the helper response styles we have and documented exactly what comes and goes (http://wiki.squid-cache.org/Features/AddonHelpers) to determined that none of the response formats started with a token containing [a-z0-9] then =. We are *not* starting from a random-input state where blob can overlap with key. I thought you wanted a general format. I agree that current bodies will not clash with current keys. General as in shared by all helpers yes. Preferably where we can add fields at will in future without having to code around positioning of specific details and special cases. The blob only exists in this discussion for two reasons; the old helpers backward compatibility requires it, and you wanted to discuss a body field for the responses. Even not understanding properly the specifics of why you wanted to discuss it I dont think its a good idea since we can key-pair and encode anything new. If we can agree on making it key-pair across the board for all the details which might be passed back we can move on. Amos
Re: Generic helper I/O format
On Thu, Jul 5, 2012 at 4:00 PM, Amos Jeffries squ...@treenet.co.nz wrote: Why do we need backwards compat in the new protocol? As an alternative, consider setting a protocol= option on the helpers, making the default our latest-and-greatest,a nd folk running third-party helpers can set protocl=v1 or whatever to get backwards compat. This lets us also warn when we start deprecating the old protocol, that its going away. -Rob
Re: Generic helper I/O format
On 07/04/2012 10:00 PM, Amos Jeffries wrote: The blob only exists in this discussion for two reasons; the old helpers backward compatibility requires it, and you wanted to discuss a body field for the responses. Even not understanding properly the specifics of why you wanted to discuss it I dont think its a good idea since we can key-pair and encode anything new. But if blob is required for backward compatibility, why not use it for new stuff as well, especially since it will reduce encoding overheads? Seems like a good solution to me. And since specific helper replies can tell the generic parser when to expect the blob, we do not have to argue about the BS part; that part is not needed. If we can agree on making it key-pair across the board for all the details which might be passed back we can move on. but we cannot use key-pair across the board because, as you said, the old helpers backward compatibility requires [the blob]. I am OK with going key-pair across the board if no other helper (old or new) is going to use blobs. Otherwise, the [new] helpers that can benefit from blobs should be able to use them. Why waste a good feature if it is required for other reasons anyway? Thank you, Alex.
Generic helper I/O format
On 07/03/2012 08:10 PM, Amos Jeffries wrote: [channel-ID] (OK/ERR/BH) [key-pairs] blob terminator How will the generic code be able to tell where key-pairs end and blob begins? The generic code will have a format for key-pair of ( 1#token = 1#(token / quoted-string) ) [to be finalized still] with a set of known key names. If the bytes being parsed do not match that format and keys its not part of the key-pairs. There are two problems with this approach, IMO: * It cannot handle a body that just happened to start with supported key-value pair characters. * It cannot detect a case of helper sending unsupported key-value pairs (if that helper may send a body too). It also requires extra code/work to pre-register all known key names, but that is minor. Some issues being resolved around char* vs String vs MemBuf (vs Sbuf?) for blob. Sort of settling on MemBuf for the ssl_crtd \0 handling requirements. I agree that we should use MemBuf for this. The plan for patch-#2 is to discuss key-pair syntax (which we seem to have started), and move some particular key-pair parsing from helper handlers into the shared one. Opening up user=, tag=, password=, log=, message= keys to non-ACL helper responses will be VERY useful for RADIUS and systems needing to maintain cross-helper request state. I think we should keep it as simple as possible without causing problems with existing parsers and leaving some room for future extensions: * key/name: alphanumeric char followed by anything except (whitespace or '=' or terminator) * value: either quoted-string or anything except (whitespace or terminator); could be empty The parser strips whitespace and strips quotes around quoted strings. Quoted strings use backslash to escape any octet. If you want one format for all, you probably need something like [channel-ID] (OK/ERR/BH) [key-pairs] [BS body] terminator where BS is some special marker that starts all bodies and cannot be found in key-pairs. Since any body-carrying message is likely to have new lines (and, hence, would need a non-newline terminator), this BS sequence could be something like body:\n, I guess. We REQUIRE a format that does not rely on new special markers like that in order to cope with the existing helpers responses in a backward-compatible way without additional user configuration. When your parser sees n=v after ERR and n is a known key name, how would it distinguish these two possibilities: * This is a start of a blob * This is a key=value pair Or would you advocate moving the external_acl_type protocol=N.N config option into the helper child object for use by all helpers on determining what key-pairs and format is accepted? I am probably missing some important use cases, but my generic parser would just call a virtual method for each key=value pair found (with no pre-registration of supported keys) and another virtual method for the body blob, if any. It will parse everything using the key, value, and BS formats discussed above. HTH, Alex.