On 07/04/2012 12:02 AM, Amos Jeffries wrote: > On 4/07/2012 4:54 p.m., Alex Rousskov wrote: >> On 07/03/2012 08:10 PM, Amos Jeffries wrote:
>>>>> [channel-ID] (OK/ERR/BH) [key-pairs] <blob> <terminator> >>>> How will the generic code be able to tell where key-pairs end and blob >>>> begins? >>> The generic code will have a format for key-pair of ( 1#token "=" >>> 1#(token / quoted-string) ) [to be finalized still] with a set of known >>> key names. If the bytes being parsed do not match that format and keys >>> its not part of the key-pairs. >> There are two problems with this approach, IMO: >> >> * It cannot handle a body that just happened to start with supported >> key-value pair characters. > > This is shared with any body which just happens to start with the "BS" > token as well. No, it is not. BS is required if the body is present and BS is not a valid key name. Thus, BS cannot be confused with a start of a key-value pair _and_ if a body starts with BS as well, there is no problem because we already know to expect a body there (after BS). [ but I am glad you acknowledge this issue with the BS-less approach ] >> I think we should keep it as simple as possible without causing problems >> with existing parsers and leaving some room for future extensions: >> >> * key/name: alphanumeric char followed by anything except (whitespace or >> '=' or terminator) >> >> * value: either quoted-string or anything except (whitespace or >> terminator); could be empty >> >> The parser strips whitespace and strips quotes around quoted strings. >> Quoted strings use backslash to escape any octet. > Okay. So modelling on the ssl_crtd parser instead of external_acl_type > parser. I am trying to model this on the above "as simple as ..." criteria. I do not really remember the specifics of any particular parser we have (so adjustments may be necessary). >>>> If you want one format for all, you probably need something like >>>> >>>> [channel-ID] (OK/ERR/BH) [key-pairs] [BS <body>] <terminator> >>>> >>>> where "BS" is some special marker that starts all bodies and cannot be >>>> found in key-pairs. Since any body-carrying message is likely to have >>>> new lines (and, hence, would need a non-newline terminator), this BS >>>> sequence could be something like "body:\n", I guess. >>> We REQUIRE a format that does not rely on new special markers like that >>> in order to cope with the existing helpers responses in a >>> backward-compatible way without additional user configuration. >> When your parser sees "n=v " after ERR and "n" is a known key name, how >> would it distinguish these two possibilities: >> >> * This is a start of a <blob> >> * This is a key=value pair > > The second. *IF* it was in the [key-pair] area directly after ERR. And how do you decide that IF? It is exactly the same question I asked, just phrased differently. The BS-less approach does not work (as you have already acknowledged in the beginning of this email): The end of "[key-pair] area" is indistinguishable from the beginning of a <blob> in this case. In other words, you do not know whether the <blob> has started or not. That is exactly the problem BS is solving (because it is not a valid key). >>> Or would you advocate moving the external_acl_type "protocol=N.N" config >>> option into the helper child object for use by all helpers on >>> determining what key-pairs and format is accepted? >> I am probably missing some important use cases, but my generic parser >> would just call a virtual method for each key=value pair found (with no >> pre-registration of supported keys) and another virtual method for the >> <body> blob, if any. It will parse everything using the key, value, and >> BS formats discussed above. > > You mean function pointers right? not virtual methods in the HelperReply > object? I did mean virtual methods. The child parsers will form HelperReply's kid objects by converting key-value pairs and other parsed strings into helper-specific replies. Those parsers may be integrated with HelperReply hierarchy or be separate from it. If they are integrated, it would look like this: class HelperReply { ... virtual void setResult(string); virtual void setKeyValue(string, string); virtual void setBody(MemBuf); }; ... // a sketch of a child implementation of setKeyValue() Ssl::CrtdReply::setKeyValue(string k, string v) { if (k == "foo") ... this->foo_ = StringToInt(v); else HelperReply::parseKeyValue(k,v); } > and um, this is a high-use code path - both in parsing and in > retrieving the details out of HelperReply. Agreed. I cannot think of a significantly more efficient but still general approach that results in helper-specific ready-to-use replies. > That style of generic parser is good for new formats but not so much for > backward-compatibility from old formats. I am going to have to manually > receive things like the TT/LD/AF/NA NTLM result codes and map the > existing format into record of the HelperReply object. Why not have NtmlHelperReply::setResult() do the mapping? Is there something format-incompatible with those NTLM result codes that the generic parser cannot parse them using the following format? [channel-ID] <result> [key-pairs] [<BS> <blob>] <terminator> where result is a anything except whitespace, BS, or terminator. Cheers, Alex.