Re: [MERGE] Rework urlAbsolute to be a little more streamlined.

Alex Rousskov Mon, 01 Sep 2008 14:51:42 -0700

On Mon, 2008-09-01 at 12:42 +1000, Benno Rice wrote:
> Resubmit this patch, including changes based on comments by various people.
> 
> - Mention RFC text in relation to changing the default behaviour in relation
>   to unknown HTTP methods.
> - Use safe_free instead of xfree.
> - Rework urlAbsolute to use snprintf in a slightly better way.  Snprintf is 
> now
>   used to construct the initial portion of the url and the rest is added on
>   using POSIX string routines.



> +    const char *url, *absUrl;
> +    
> +    if ((url = rep->header.getStr(hdr)) != NULL) {

The above is better written as

    if (const char *url = rep->header.getStr(hdr)) {

It would also be better to name this URL "hdrUrl". You already have
"reqUrl" and "absUrl" and it is difficult to keep track.

> +    const char *url, *absUrl;
...
> +    if (absUrl != NULL) {
> +       url = absUrl;
> +    }
> +    if (absUrl != NULL) { // if the URL was relative, it is by nature the 
> same host
> +       purgeEntriesByUrl(url);
> +    } else if (sameUrlHosts(reqUrl, url)) { // prevent purging DoS, per RFC 
> 2616 13.10, second last paragraph
> +       purgeEntriesByUrl(url);
> +    }
> +    if (absUrl != NULL) {
> +       safe_free(absUrl);
> +    }

The above looks strange but since urlAbsolute is still not documented, I
may be misinterpreting this code. I would expect something like the code
below, which uses cleanup names:

  if (const char *absUrl = urlRelativeToAbsolute(req, hdrUrl)) {
        // hdrUrl was relative so absUrl points to the request host
        purgeEntriesByUrl(absUrl);
        safe_free(absUrl);
  } else
  if (sameUrlHosts(reqUrl, url)) { // prevent purging DoS; RFC 2616 13.10
        purgeEntriesByUrl(url);
  }

If you want to make it even more clear, split urlAbsolute in two:
urlIsRelative and urlMakeAbsolute, arriving at something like 

  if (isRelative(hdrUrl)) {
        const char *absUrl = urlMakeAbsolute(req, hdrUrl);
        // hdrUrl was relative so absUrl points to the request host
        purgeEntriesByUrl(absUrl);
        safe_free(absUrl);
  } else
  if (sameUrlHosts(reqUrl, url)) { // prevent purging DoS: RFC 2616 13.10
        purgeEntriesByUrl(url);
  }

The above are polishing comments.

I would still insist that urlAbsolute() is documented as its
complicated, multi-purpose interface is impossible to guess by its name.
I think it returns NULL when the URL we feed is already absolute, but I
am not sure. If my understanding if the intent is correct, then
splitting urlAbsolute in two functions as shown above would be
preferred.


> +    if (strchr(relUrl, ':') != NULL) {
> +        return (NULL);
> +    }

According to RFC 2396, Section 3.3 "Path Component", a URL path may
include a colon (':') character. This would make the above check wrong.
Did I misread the RFC? Are colons banned from URL paths?


I found this call in urlCanonical():

> if (request->protocol == PROTO_URN) {
>         snprintf(urlbuf, MAX_URL, "urn:%s", request->urlpath.buf());
>     } else {
> /// \todo AYJ: this could use "if..else and method == METHOD_CONNECT" easier.
>         switch (request->method.id()) {
> 
>         case METHOD_CONNECT:
>             snprintf(urlbuf, MAX_URL, "%s:%d", request->GetHost(), 
> request->port);
>             break;
> 
>         default:
>             portbuf[0] = '\0';
> 
>             if (request->port != urlDefaultPort(request->protocol))
>                 snprintf(portbuf, 32, ":%d", request->port);
> 
>             snprintf(urlbuf, MAX_URL, "%s://%s%s%s%s%s",
>                      ProtocolStr[request->protocol],
>                      request->login,
>                      *request->login ? "@" : null_string,
>                      request->GetHost(),
>                      portbuf,
>                      request->urlpath.buf());
> 

this code in urlCanonicalClean():

> if (request->protocol == PROTO_URN) {
>         snprintf(buf, MAX_URL, "urn:%s", request->urlpath.buf());
>     } else {
> /// \todo AYJ: this could use "if..else and method == METHOD_CONNECT" easier.
>         switch (request->method.id()) {
> 
>         case METHOD_CONNECT:
>             snprintf(buf, MAX_URL, "%s:%d", 
>                      request->GetHost(),
>                      request->port);
>             break;
> 
>         default:
>             portbuf[0] = '\0';
> 
>             if (request->port != urlDefaultPort(request->protocol))
>                 snprintf(portbuf, 32, ":%d", request->port);
> 
>             loginbuf[0] = '\0';
> 
>             if ((int) strlen(request->login) > 0) {
>                 strcpy(loginbuf, request->login);
> 
>                 if ((t = strchr(loginbuf, ':')))
>                     *t = '\0';
> 
>                 strcat(loginbuf, "@");
>             }
> 
>             snprintf(buf, MAX_URL, "%s://%s%s%s%s",
>                      ProtocolStr[request->protocol],
>                      loginbuf,
>                      request->GetHost(),
>                      portbuf,
>                      request->urlpath.buf());

and now there is this code in urlAbsolute (which handles the CONNECT
case earlier so the code below is guaranteed not to get a CONNECT
method):

> +    if (req->protocol == PROTO_URN) {
> +        snprintf(urlbuf, MAX_URL, "urn:%s", req->urlpath.buf());
> +    } else {
> +       if (req->port != urlDefaultPort(req->protocol)) {
> +           urllen = snprintf(urlbuf, MAX_URL, "%s://%s%s%s:%d",
> +                       ProtocolStr[req->protocol],
> +                       req->login,
> +                       *req->login ? "@" : null_string,
> +                       req->GetHost(),
> +                       req->port
> +           );
> +       } else {
> +           urllen = snprintf(urlbuf, MAX_URL, "%s://%s%s%s",
> +                       ProtocolStr[req->protocol],
> +                       req->login,
> +                       *req->login ? "@" : null_string,
> +                       req->GetHost()
> +           );      
> +       }


There got to be some way to remove this code duplication (and related
inconsistencies)!


I hesitate voting against because I do not know whether that would
affect future re-submissions. I do not want to block them. Does anybody
know whether a negative vote is sticky and needs to be "cleared" by the
voter?

Thank you,

Alex.

Re: [MERGE] Rework urlAbsolute to be a little more streamlined.

Reply via email to