Is this patch OK?
Should I commit to trunk?

On 08/21/2013 06:34 PM, Tsantilas Christos wrote:
> On 08/13/2013 11:57 PM, Alex Rousskov wrote:
>> On 08/07/2013 03:11 AM, Tsantilas Christos wrote:
>>> On 07/31/2013 07:59 PM, Alex Rousskov wrote:
>>>> 2. When configuration_includes_quoted_values is on, new "strict syntax"
>>>> rules are enforced:
>>
>>>> 2c. By default, token delimiter is whitespace. Bare (i.e., unquoted)
>>>> tokens containing any character other than alphanumeric, underscore,
>>>> period, '(', ')', '=', and '-' terminate Squid. For example, foo{bar},
>>>> foo@bar, and foo"bar" terminate Squid in strict syntax mode.
>>
>>> OK.
>>> In this patch allow the following ".,()-=_%/:"
>>> We can easily add remove characters from this list.
>>> If we remove from this list the ':' then we should require quotes to
>>> define http, icap or ecap urls.
>>> The % used to define percent values and the '/' to define filenames.
>>
>> I am OK with adding ':' and '/' to the allowed set.
>>
>> I think we should allow % at the end of a bare token (e.g., 100%) but
>> not elsewhere (e.g., http://www.%host.com/). I am worried that by
>> allowing %macro-like strings in bare tokens we would not be able to tell
>> the admin when they forgot to quote a string that contains a macro.
> 
> 
> OK. This patch allow only [0-9]+% tokens (eg 100%, 80% etc).
> 
>>
>> Another potential problem here is with the '=' character. We have to
>> allow it because existing code expects it in directive options with
>> values (e.g., bypass=on). However, making '=' a part of the token makes
>> it difficult to support quoted option values:
>>
>>   bypass="on" // illegal because a token cannot have quotes
>>
>>   "bypass=on" // legal but looks kind of weird
>>               // because there are actually several tokens here
>>
>> For simple values such as "on", this is not a problem, but not all
>> option values are that simple, and some of them will require quoting:
>>
>>     uri="icap://example.com/?mode=reqmod" // illegal?!
>>
>>     "uri=icap://example.com/?mode=reqmod" // weird
>>
>> Changing all callers to treat name=value sequence as three tokens is
>> probably too much work, right?
> 
> This is requires changes in many places...
> 
> 
>>
>>
>> A similar problem exists for function() calls but we hack around it by
>> parsing parameters() specially and requiring that file names are quoted,
>> I guess. More on that below.
>>
>>
>>
>>
>>> +    bool saveRecognizeQuotedValues = ConfigParser::RecognizeQuotedValues;
>> ...
>>> +            bool savePreview = ConfigParser::PreviewMode_;
>>
>>
>> Please make those variables const.
> 
> ok
> 
>>
>>
>>> +bool ConfigParser::ParseQuotedOrToEOL_ = false;
>>
>> Please s/EOL/Eol/.
> 
> ok
> 
>>
>>
>>> +static const char *SQUID_ERROR_TOKEN = "SQUID_ERROR_TOKEN";
>>
>> The value should probably contain spaces and other special characters to
>> minimize a chance of collision with the actual token. For example,
>> "[invalid token]". Note that you never test for SQUID_ERROR_TOKEN being
>> returned -- the code relies on known values to be different from the
>> SQUID_ERROR_TOKEN value.
> 
> ok
> 
> 
>>
>>
>>> +        (void)ConfigParser::NextToken(); //Get token fron cfg line
>>
>> typo: s/fron/from/
> 
> fixed
> 
>>
>>
>>>      /* scan until the end of the quoted string, unescaping " and \  */
>>> -    while (*s && *s != quoteChar) {
>>> -        if (*s == '\\' && isalnum(*( s + 1))) {
>>> -            debugs(3, DBG_CRITICAL, "Unsupported escape sequence: " << s);
>>> -            self_destruct();
>>> +    while (*s && *s != quoteChar && !errorStr && (d - UnQuoted) < 
>>> sizeof(UnQuoted)) {
>>
>> The old comment is now out of sync. You can say something like "scan
>> until the end of the quoted string, handling escape sequences".
> 
> ok
> 
>>
>>
>>>          } else if (*s == '$' && quoteChar == '"') {
>>> -            debugs(3, DBG_CRITICAL, "Unsupported cfg macro: " << s);
>>> -            self_destruct();
>>> +            errorStr = "Unsupported cfg macro";
>>> +            errorPos = s;
>>
>> If there is consensus that we do not need to support $macros now or in
>> the future (except for pre-processor SMP macros that are already handled
>> by the time the above code runs), then the above special $macor case can
>> be removed.
> 
> I put the related code inside an "#if 0 ... #endif"
> I will remove it if required...
> 
>>
>>
>>> +        char *token = xstrdup(UnQuote(nextToken, &nextToken));
>>> +        CfgLineTokens_.push(token);
>> ...
>>> +    while (!CfgLineTokens_.empty()) {
>>> +        char *token = CfgLineTokens_.front();
>>> +        CfgLineTokens_.pop();
>>> +        free(token);
>>> +    }
>>
>> Can we use String or std::string for these stored tokens so that we do
>> not have to be so careful about allocating and deleting them?
> 
> Not easy.
> This is requires to modify NextToken and related methods to return
> "const char *" instead of "char *".
> This is requires changes in many places. I remember I had problems when
> I try it...
> 
> 
> 
>>
>>
>>>  4) Add the ConfigParser::RegexPattern() to get the next regex token
>>>
>>>  5) Add the ConfigParser::RegexStrtokFile() to get the next regex token
>>> which is compatible with the old strtokFile
>>
>>> +char *
>>> +ConfigParser::RegexStrtokFile()
>>> +{
>>> +    ParseRegex_ = true;
>>> +    char * token = strtokFile();
>>> +    ParseRegex_ = false;
>>> +    return token;
>>> +}
>>>  
>>> +char *
>>> +ConfigParser::RegexPattern()
>>> +{
>>> +    ParseRegex_ = true;
>>> +    char * token = NextToken();
>>> +    ParseRegex_ = false;
>>> +    return token;
>>> +}
>>
>>
>> This feels wrong. It is good to have these methods so that we know which
>> callers need REs, but their behavior should be different IMO:
>>
>> * In legacy mode, they should call NextToken() and strtokFile() which
>> will parse legacy REs as expected. No need for ParseRegex_.
>>
>> * In strict mode, they should fail immediately so that we can introduce
>> proper RE support later, without screwing up already working
>> configurations that use strict mode. Supporting REs in strict mode is a
>> TODO, just like supporting single quoted strings is. For now, we just
>> need to make sure that we can add such support without breaking
>> configurations that start using strict mode.
> 
> OK. I implement this behaviour for now.
> The parser return the message:
>  "Can not read regex expresion while
> configuration_includes_quoted_values is enabled"
> 
> When we are trying to read regex expresions (eg parse regex acl) when
> the configuration_includes_quoted_values is set to on.
> 
> The user should use something like the following:
> # Enable quoted values on squid cfg file parsing
> configuration_includes_quoted_values on
> ....
> #Temporary disable quoted values to parse regex expresions
> configuration_includes_quoted_values off
> refresh_pattern ^ftp:           1440    20%     10080
> refresh_pattern ^gopher:        1440    0%      1440
> refresh_pattern -i (/cgi-bin/|\?) 0     0%      0
> refresh_pattern .               0       20%     4320
> configuration_includes_quoted_values on
> ...
> 
>>
>> For example, if we decide that REs in strict mode should start with '/'
>> or m{} as they do in Perl, then we have to make sure _now_ that no
>> legacy RE that starts with those characters sneaks into the strict mode
>> while we are working on RE support. If we fail to do that, we will
>> create a new set of fresh backward compatibility problems to deal with.
> 
> I see....
> 
>>
>>
>>>      /// configuration_includes_quoted_values in squid.conf
>>> -    static int RecognizeQuotedValues;
>>> +    static bool RecognizeQuotedValues;
>>> +
>>> +    /// Strict syntax mode. Does not allow not alphanumeric characters in 
>>> unquoted tokens
>>> +    static bool StrictMode;
>>
>> You may want to add a comment saying that StrictMode may remain false
>> when the legacy ConfigParser::NextQuotedToken() call forces
>> RecognizeQuotedValues to be temporary true. Otherwise, it is not clear
>> why we need both members.
> 
> Ok
> 
>>
>>
>>> +    if (!PreviewMode_ || type == FunctionParameters)
>>> +        parsePos = pos;
>>> +    // else next call will read the same token;
>>
>> Please add a comment explaining this logic. It is not clear why parsePos
>> is updated if we are not in preview mode or if we are parsing
>> FunctionParameters [in preview mode].
> 
> I add a comment.
> The true is that the related code is a little complex. But for function
> "parameters()" we need to update the current parsing position, even if
> we are in preview mode, because we need to read the next quoted string
> which is  the "parameters()" argument, the name of the file we want to
> open to read the next valid token.
> For preview the caller needs the next valid token not the "parameters()"
> function and its argument.
> 
> 
>>
>>
>>> -     * Return the last TokenUndo() or TokenPutBack() queued element, or 
>>> NULL
>>> +     * Return the last TokenPutBack() queued element, or NULL
>>
>> No comma is needed after this change.
> 
> ok.
> 
>>
>>
>> Thank you,
>>
>> Alex.
>>
>>
> 

Reply via email to