Re: Generic helper I/O format

2012-07-06 Thread Henrik Nordström
tor 2012-07-05 klockan 16:00 +1200 skrev Amos Jeffries:

 The blob only exists in this discussion for two reasons; the old helpers 
 backward compatibility requires it, and  you wanted to discuss a body 
 field for the responses. Even not understanding properly the specifics 
 of why you wanted to discuss it I dont think its a good idea since we 
 can key-pair and encode anything new.

Agreed. And please use URL-encoding requiring encoding of at minimum %
and any whitespace. It's much less error prone than the quoting
mechanism.

 If we can agree on making it key-pair across the board for all the 
 details which might be passed back we can move on.

Yes.

With blob only for legacy helpers.

Regards
Henrik



Re: Generic helper I/O format

2012-07-06 Thread Amos Jeffries

On 5/07/2012 4:16 p.m., Robert Collins wrote:

On Thu, Jul 5, 2012 at 4:00 PM, Amos Jeffries wrote:

Why do we need backwards compat in the new protocol?

As an alternative, consider setting a protocol= option on the helpers,
making the default our latest-and-greatest,a nd folk running
third-party helpers can set protocl=v1 or whatever to get backwards
compat.

This lets us also warn when we start deprecating the old protocol,
that its going away.

-Rob


I'm trying to avoid adding more options and complexity with this. To be 
done properly that would mean adding whole new config design bits to set 
protocol= on several of the more hidden helper lookups. But with one 
shared parser and a fallback per-helper parser handling the blob cases 
for backward compatibility we can migrate people over without any config 
change.


Amos



Re: Generic helper I/O format

2012-07-04 Thread Amos Jeffries

On 4/07/2012 4:54 p.m., Alex Rousskov wrote:

On 07/03/2012 08:10 PM, Amos Jeffries wrote:


[channel-ID] (OK/ERR/BH) [key-pairs] blob terminator

How will the generic code be able to tell where key-pairs end and blob
begins?

The generic code will have a format for key-pair of ( 1#token =
1#(token / quoted-string) )  [to be finalized still] with a set of known
key names. If the bytes being parsed do not match that format and keys
its not part of the key-pairs.

There are two problems with this approach, IMO:

   * It cannot handle a body that just happened to start with supported
key-value pair characters.


This is shared with any body which just happens to start with the BS 
token as well. This is a good reason to use key-pair across the board 
and not document any blob support.




   * It cannot detect a case of helper sending unsupported key-value
pairs (if that helper may send a body too).

Yes.


It also requires extra code/work to pre-register all known key names,
but that is minor.


This is not an issue. We have to code operations for handling any 
accepted key-pair in the handler callbacks anyway.






The plan for patch-#2 is to discuss key-pair syntax (which we seem to
have started), and move some particular key-pair parsing from helper
handlers into the shared one. Opening up user=, tag=, password=, log=,
message= keys to non-ACL helper responses will be VERY useful for RADIUS
and systems needing to maintain cross-helper request state.

I think we should keep it as simple as possible without causing problems
with existing parsers and leaving some room for future extensions:

* key/name: alphanumeric char followed by anything except (whitespace or
'=' or terminator)

* value: either quoted-string or anything except (whitespace or
terminator); could be empty

The parser strips whitespace and strips quotes around quoted strings.
Quoted strings use backslash to escape any octet.


Okay. So modelling on the ssl_crtd parser instead of external_acl_type 
parser.






If you want one format for all, you probably need something like

  [channel-ID] (OK/ERR/BH) [key-pairs] [BS body] terminator

where BS is some special marker that starts all bodies and cannot be
found in key-pairs. Since any body-carrying message is likely to have
new lines (and, hence, would need a non-newline terminator), this BS
sequence could be something like body:\n, I guess.

We REQUIRE a format that does not rely on new special markers like that
in order to cope with the existing helpers responses in a
backward-compatible way without additional user configuration.

When your parser sees n=v  after ERR and n is a known key name, how
would it distinguish these two possibilities:

   * This is a start of a blob
   * This is a key=value pair


The second. *IF* it was in the [key-pair] area directly after ERR. Once 
blob starts everything is blob.



Or would you advocate moving the external_acl_type protocol=N.N config
option into the helper child object for use by all helpers on
determining what key-pairs and format is accepted?

I am probably missing some important use cases, but my generic parser
would just call a virtual method for each key=value pair found (with no
pre-registration of supported keys) and another virtual method for the
body blob, if any. It will parse everything using the key, value, and
BS formats discussed above.


You mean function pointers right? not virtual methods in the HelperReply 
object?
 and um, this is a high-use code path - both in parsing and in 
retrieving the details out of HelperReply.


That style of generic parser is good for new formats but not so much for 
backward-compatibility from old formats. I am going to have to manually 
receive things like the TT/LD/AF/NA NTLM result codes and map the 
existing format into record of the HelperReply object.


Amos


Re: Generic helper I/O format

2012-07-04 Thread Alex Rousskov
On 07/04/2012 12:02 AM, Amos Jeffries wrote:
 On 4/07/2012 4:54 p.m., Alex Rousskov wrote:
 On 07/03/2012 08:10 PM, Amos Jeffries wrote:

 [channel-ID] (OK/ERR/BH) [key-pairs] blob terminator
 How will the generic code be able to tell where key-pairs end and blob
 begins?
 The generic code will have a format for key-pair of ( 1#token =
 1#(token / quoted-string) )  [to be finalized still] with a set of known
 key names. If the bytes being parsed do not match that format and keys
 its not part of the key-pairs.
 There are two problems with this approach, IMO:

* It cannot handle a body that just happened to start with supported
 key-value pair characters.
 
 This is shared with any body which just happens to start with the BS
 token as well. 

No, it is not. BS is required if the body is present and BS is not a
valid key name. Thus, BS cannot be confused with a start of a key-value
pair _and_ if a body starts with BS as well, there is no problem because
we already know to expect a body there (after BS).

[ but I am glad you acknowledge this issue with the BS-less approach ]


 I think we should keep it as simple as possible without causing problems
 with existing parsers and leaving some room for future extensions:

 * key/name: alphanumeric char followed by anything except (whitespace or
 '=' or terminator)

 * value: either quoted-string or anything except (whitespace or
 terminator); could be empty

 The parser strips whitespace and strips quotes around quoted strings.
 Quoted strings use backslash to escape any octet.

 Okay. So modelling on the ssl_crtd parser instead of external_acl_type
 parser.

I am trying to model this on the above as simple as ... criteria. I do
not really remember the specifics of any particular parser we have (so
adjustments may be necessary).



 If you want one format for all, you probably need something like

   [channel-ID] (OK/ERR/BH) [key-pairs] [BS body] terminator

 where BS is some special marker that starts all bodies and cannot be
 found in key-pairs. Since any body-carrying message is likely to have
 new lines (and, hence, would need a non-newline terminator), this BS
 sequence could be something like body:\n, I guess.
 We REQUIRE a format that does not rely on new special markers like that
 in order to cope with the existing helpers responses in a
 backward-compatible way without additional user configuration.
 When your parser sees n=v  after ERR and n is a known key name, how
 would it distinguish these two possibilities:

* This is a start of a blob
* This is a key=value pair
 
 The second. *IF* it was in the [key-pair] area directly after ERR.

And how do you decide that IF? It is exactly the same question I asked,
just phrased differently. The BS-less approach does not work (as you
have already acknowledged in the beginning of this email): The end of
[key-pair] area is indistinguishable from the beginning of a blob in
this case. In other words, you do not know whether the blob has
started or not. That is exactly the problem BS is solving (because it is
not a valid key).


 Or would you advocate moving the external_acl_type protocol=N.N config
 option into the helper child object for use by all helpers on
 determining what key-pairs and format is accepted?
 I am probably missing some important use cases, but my generic parser
 would just call a virtual method for each key=value pair found (with no
 pre-registration of supported keys) and another virtual method for the
 body blob, if any. It will parse everything using the key, value, and
 BS formats discussed above.
 
 You mean function pointers right? not virtual methods in the HelperReply
 object?

I did mean virtual methods. The child parsers will form HelperReply's
kid objects by converting key-value pairs and other parsed strings into
helper-specific replies. Those parsers may be integrated with
HelperReply hierarchy or be separate from it. If they are integrated, it
would look like this:

class HelperReply {
...
 virtual void setResult(string);
 virtual void setKeyValue(string, string);
 virtual void setBody(MemBuf);
};
...

// a sketch of a child implementation of setKeyValue()
Ssl::CrtdReply::setKeyValue(string k, string v)
{
 if (k == foo)
... this-foo_ = StringToInt(v);
 else
HelperReply::parseKeyValue(k,v);
}


  and um, this is a high-use code path - both in parsing and in
 retrieving the details out of HelperReply.

Agreed. I cannot think of a significantly more efficient but still
general approach that results in helper-specific ready-to-use replies.


 That style of generic parser is good for new formats but not so much for
 backward-compatibility from old formats. I am going to have to manually
 receive things like the TT/LD/AF/NA NTLM result codes and map the
 existing format into record of the HelperReply object.

Why not have NtmlHelperReply::setResult() do the mapping? Is there

Re: Generic helper I/O format

2012-07-04 Thread Henrik Nordström
ons 2012-07-04 klockan 13:02 -0600 skrev Alex Rousskov:

 No, it is not. BS is required if the body is present and BS is not a
 valid key name. Thus, BS cannot be confused with a start of a key-value
 pair _and_ if a body starts with BS as well, there is no problem because
 we already know to expect a body there (after BS).

Why do we need a blob marker? Why not simply use key=value pairs across
the board?


 Why not have NtmlHelperReply::setResult() do the mapping? Is there
 something format-incompatible with those NTLM result codes that the
 generic parser cannot parse them using the following format?
 
   [channel-ID] result [key-pairs] [BS blob] terminator

NTLM do not need a blob. It's just a value. Moving to key=value is fine.
Additionally worth noting is that = is not a valid character in NTLM
blobs so same parser can be used if tokens without = is supported
(keyless values).

Regards
Henrik



Re: Generic helper I/O format

2012-07-04 Thread Alex Rousskov
On 07/04/2012 02:34 PM, Henrik Nordström wrote:
 ons 2012-07-04 klockan 13:02 -0600 skrev Alex Rousskov: 

 Why not simply use key=value pairs across the board?

We need blobs because some values in use today are not expressed as
tokens or quoted strings (the two value formats Amos is considering).

We could, of course, _encode_ those inconvenient values so that things
like SSL certificates can be represented as huge quoted strings. Is that
what you are getting at? I am not against that, even though I am a
little worried about the added encoding overhead. This will require
changing ssl_crtd helper code, but I do not think that would be a big
problem.



 Why not have NtmlHelperReply::setResult() do the mapping? Is there
 something format-incompatible with those NTLM result codes that the
 generic parser cannot parse them using the following format?

   [channel-ID] result [key-pairs] [BS blob] terminator
 
 NTLM do not need a blob.

We are talking about a general format, not NTLM-specific format. Blob is
optional. If NTLM helper does not need it, it will not use it.


 It's just a value. Moving to key=value is fine.
 Additionally worth noting is that = is not a valid character in NTLM
 blobs so same parser can be used if tokens without = is supported
 (keyless values).

Yes, anonymous values or valueless names can be allowed if it helps
existing helpers. Can somebody paste a typical NTLM helper response or two?


Thank you,

Alex.


Re: Generic helper I/O format

2012-07-04 Thread Amos Jeffries

On 05.07.2012 08:34, Henrik Nordström wrote:

ons 2012-07-04 klockan 13:02 -0600 skrev Alex Rousskov:


No, it is not. BS is required if the body is present and BS is not a
valid key name. Thus, BS cannot be confused with a start of a 
key-value
pair _and_ if a body starts with BS as well, there is no problem 
because

we already know to expect a body there (after BS).


Why do we need a blob marker? Why not simply use key=value pairs 
across

the board?


We need it as a temporary measure to handle old helpers responses.

I agree on the desire to move to key-pair across the board and am 
trying to reach it out of this forest of individual fixed-syntax 
parsers. I think its possible (easy even) to do it using a blob instead 
of new config settings.







Why not have NtmlHelperReply::setResult() do the mapping? Is there
something format-incompatible with those NTLM result codes that the
generic parser cannot parse them using the following format?

  [channel-ID] result [key-pairs] [BS blob] terminator


NTLM do not need a blob. It's just a value. Moving to key=value is 
fine.

Additionally worth noting is that = is not a valid character in NTLM
blobs so same parser can be used if tokens without = is supported
(keyless values).



Good point here. Same stands for all the other helpers.

Alex,
 before starting I went through each of the helper response styles we 
have and documented exactly what comes and goes 
(http://wiki.squid-cache.org/Features/AddonHelpers) to determined that 
none of the response formats started with a token containing [a-z0-9] 
then =. We are *not* starting from a random-input state where blob 
can overlap with key.


Anything which does not start with a registered result status code, 
is blob automatically. The ones which do use result are safely 
mapped to the key-pair format based on the presence of a valid key= 
token, or its absence indicating blob to be handled by the old parser.


Have a look through that wiki page at the set of Result line sent back 
to Squid and its clear.


The proposed shared format is *exactly* what external ACL already 
processes. It also if you look at the code has the proposed background 
blob for backward compatibility with some squid-2.5 era helpers which 
only sent a reason field in plain-text (random input! yuck, as an 
established PoC it seems not to have had any complaints).


Amos



Re: Generic helper I/O format

2012-07-04 Thread Amos Jeffries

On 05.07.2012 10:15, Alex Rousskov wrote:

On 07/04/2012 02:34 PM, Henrik Nordström wrote:

ons 2012-07-04 klockan 13:02 -0600 skrev Alex Rousskov:



Why not simply use key=value pairs across the board?


We need blobs because some values in use today are not expressed as
tokens or quoted strings (the two value formats Amos is considering).

We could, of course, _encode_ those inconvenient values so that 
things
like SSL certificates can be represented as huge quoted strings. Is 
that

what you are getting at? I am not against that, even though I am a
little worried about the added encoding overhead. This will require
changing ssl_crtd helper code, but I do not think that would be a big
problem.




Why not have NtmlHelperReply::setResult() do the mapping? Is there
something format-incompatible with those NTLM result codes that the
generic parser cannot parse them using the following format?

  [channel-ID] result [key-pairs] [BS blob] terminator


NTLM do not need a blob.


We are talking about a general format, not NTLM-specific format. Blob 
is

optional. If NTLM helper does not need it, it will not use it.



It's just a value. Moving to key=value is fine.
Additionally worth noting is that = is not a valid character in NTLM
blobs so same parser can be used if tokens without = is supported
(keyless values).


Yes, anonymous values or valueless names can be allowed if it helps
existing helpers. Can somebody paste a typical NTLM helper response 
or two?




http://wiki.squid-cache.org/Features/AddonHelpers#Negotiate_and_NTLM_Scheme

If you note, the reason field is usually something other than OK/ERR, 
so we can parse the old syntax differently to the new OK/ERR responses.


Where there is an overlap (BH) the reason field is an error message, 
usually the program name and a ':' colon (not valid in the proposed 
key-name) or number in brackets (again not valid in key name) or plain 
text message (no '=' expected on human-readable words).



Amos



Re: Generic helper I/O format

2012-07-04 Thread Alex Rousskov
On 07/04/2012 04:52 PM, Amos Jeffries wrote:
 On 05.07.2012 08:34, Henrik Nordström wrote:
 ons 2012-07-04 klockan 13:02 -0600 skrev Alex Rousskov:
 
 Alex,
  before starting I went through each of the helper response styles we
 have and documented exactly what comes and goes
 (http://wiki.squid-cache.org/Features/AddonHelpers) to determined that
 none of the response formats started with a token containing [a-z0-9]
 then =. We are *not* starting from a random-input state where blob
 can overlap with key.

I thought you wanted a general format. I agree that current bodies will
not clash with current keys.

Alex.


 Anything which does not start with a registered result status code, is
 blob automatically. The ones which do use result are safely mapped
 to the key-pair format based on the presence of a valid key= token, or
 its absence indicating blob to be handled by the old parser.
 
 Have a look through that wiki page at the set of Result line sent back
 to Squid and its clear.



 The proposed shared format is *exactly* what external ACL already
 processes. It also if you look at the code has the proposed background
 blob for backward compatibility with some squid-2.5 era helpers which
 only sent a reason field in plain-text (random input! yuck, as an
 established PoC it seems not to have had any complaints).
 
 Amos




Re: Generic helper I/O format

2012-07-04 Thread Alex Rousskov
On 07/04/2012 05:07 PM, Amos Jeffries wrote:

 http://wiki.squid-cache.org/Features/AddonHelpers#Negotiate_and_NTLM_Scheme
 
 If you note, the reason field is usually something other than OK/ERR,
 so we can parse the old syntax differently to the new OK/ERR responses.
 
 Where there is an overlap (BH) the reason field is an error message,
 usually the program name and a ':' colon (not valid in the proposed
 key-name) or number in brackets (again not valid in key name) or plain
 text message (no '=' expected on human-readable words).

You can let setResult() and setKeyValue() methods in HelperReply
children to indicate that what follows is a blob:

NtlmHelperReply::setResult(string result)
{
if (result == BH) {
... remember that the result was BH ...
// no more key=value pairs; read the entire reason
expectBody(true);
}
...
}


This is not something you can easily express in a static format spec,
but it will work nicely to avoid custom code when handling reason
phrases in BH responses or ssl_crtd certificates. After expectBody() is
called, the general parser will get everything up to the terminator
and call setBody().


HTH,

Alex.


Re: Generic helper I/O format

2012-07-04 Thread Amos Jeffries

On 5/07/2012 11:10 a.m., Alex Rousskov wrote:

On 07/04/2012 04:52 PM, Amos Jeffries wrote:

On 05.07.2012 08:34, Henrik Nordström wrote:

ons 2012-07-04 klockan 13:02 -0600 skrev Alex Rousskov:

Alex,
  before starting I went through each of the helper response styles we
have and documented exactly what comes and goes
(http://wiki.squid-cache.org/Features/AddonHelpers) to determined that
none of the response formats started with a token containing [a-z0-9]
then =. We are *not* starting from a random-input state where blob
can overlap with key.

I thought you wanted a general format. I agree that current bodies will
not clash with current keys.


General as in shared by all helpers yes. Preferably where we can add 
fields at will in future without having to code around positioning of 
specific details and special cases.


The blob only exists in this discussion for two reasons; the old helpers 
backward compatibility requires it, and  you wanted to discuss a body 
field for the responses. Even not understanding properly the specifics 
of why you wanted to discuss it I dont think its a good idea since we 
can key-pair and encode anything new.


If we can agree on making it key-pair across the board for all the 
details which might be passed back we can move on.


Amos



Re: Generic helper I/O format

2012-07-04 Thread Robert Collins
On Thu, Jul 5, 2012 at 4:00 PM, Amos Jeffries squ...@treenet.co.nz wrote:

Why do we need backwards compat in the new protocol?

As an alternative, consider setting a protocol= option on the helpers,
making the default our latest-and-greatest,a nd folk running
third-party helpers can set protocl=v1 or whatever to get backwards
compat.

This lets us also warn when we start deprecating the old protocol,
that its going away.

-Rob


Re: Generic helper I/O format

2012-07-04 Thread Alex Rousskov
On 07/04/2012 10:00 PM, Amos Jeffries wrote:

 The blob only exists in this discussion for two reasons; the old helpers
 backward compatibility requires it, and  you wanted to discuss a body
 field for the responses. Even not understanding properly the specifics
 of why you wanted to discuss it I dont think its a good idea since we
 can key-pair and encode anything new.

But if blob is required for backward compatibility, why not use it for
new stuff as well, especially since it will reduce encoding overheads?
Seems like a good solution to me. And since specific helper replies can
tell the generic parser when to expect the blob, we do not have to argue
about the BS part; that part is not needed.


 If we can agree on making it key-pair across the board for all the
 details which might be passed back we can move on.

but we cannot use key-pair across the board because, as you said, the
old helpers backward compatibility requires [the blob].

I am OK with going key-pair across the board if no other helper (old or
new) is going to use blobs. Otherwise, the [new] helpers that can
benefit from blobs should be able to use them. Why waste a good feature
if it is required for other reasons anyway?


Thank you,

Alex.


Generic helper I/O format

2012-07-03 Thread Alex Rousskov
On 07/03/2012 08:10 PM, Amos Jeffries wrote:

 [channel-ID] (OK/ERR/BH) [key-pairs] blob terminator

 How will the generic code be able to tell where key-pairs end and blob
 begins?
 
 The generic code will have a format for key-pair of ( 1#token =
 1#(token / quoted-string) )  [to be finalized still] with a set of known
 key names. If the bytes being parsed do not match that format and keys
 its not part of the key-pairs.

There are two problems with this approach, IMO:

  * It cannot handle a body that just happened to start with supported
key-value pair characters.

  * It cannot detect a case of helper sending unsupported key-value
pairs (if that helper may send a body too).

It also requires extra code/work to pre-register all known key names,
but that is minor.


 Some issues being resolved around char* vs
 String vs MemBuf (vs Sbuf?) for blob. Sort of settling on MemBuf for the
 ssl_crtd \0 handling requirements.

I agree that we should use MemBuf for this.


 The plan for patch-#2 is to discuss key-pair syntax (which we seem to
 have started), and move some particular key-pair parsing from helper
 handlers into the shared one. Opening up user=, tag=, password=, log=,
 message= keys to non-ACL helper responses will be VERY useful for RADIUS
 and systems needing to maintain cross-helper request state.

I think we should keep it as simple as possible without causing problems
with existing parsers and leaving some room for future extensions:

* key/name: alphanumeric char followed by anything except (whitespace or
'=' or terminator)

* value: either quoted-string or anything except (whitespace or
terminator); could be empty

The parser strips whitespace and strips quotes around quoted strings.
Quoted strings use backslash to escape any octet.


 If you want one format for all, you probably need something like

  [channel-ID] (OK/ERR/BH) [key-pairs] [BS body] terminator

 where BS is some special marker that starts all bodies and cannot be
 found in key-pairs. Since any body-carrying message is likely to have
 new lines (and, hence, would need a non-newline terminator), this BS
 sequence could be something like body:\n, I guess.
 
 We REQUIRE a format that does not rely on new special markers like that
 in order to cope with the existing helpers responses in a
 backward-compatible way without additional user configuration.

When your parser sees n=v  after ERR and n is a known key name, how
would it distinguish these two possibilities:

  * This is a start of a blob
  * This is a key=value pair


 Or would you advocate moving the external_acl_type protocol=N.N config
 option into the helper child object for use by all helpers on
 determining what key-pairs and format is accepted?

I am probably missing some important use cases, but my generic parser
would just call a virtual method for each key=value pair found (with no
pre-registration of supported keys) and another virtual method for the
body blob, if any. It will parse everything using the key, value, and
BS formats discussed above.


HTH,

Alex.