Re: Handling of Urlencoded string in URL in Varnish

Jason Woods Mon, 07 Apr 2014 07:05:33 -0700

Hi

On 7 Apr 2014, at 10.25, Per Buer <[email protected]> wrote:


> Hi Jason.
> 
> Docwilco from Fastly has written an URL encoder/decorder VMOD that you can 
> use. You could run it through it twice or patch it do uppcase/lowercase the 
> encoding.
> 
> https://www.varnish-cache.org/vmod/url-code
> 
> Varnish itself doesn't try to interpret the URL much.
> 
> Per.

Thanks Per, that looks great!

Would you agree this would be better resolved in varnish itself?
It looks as though in default VCL it uses hash_data(req.url) - but I question 
the intension. If the intension is to cache distinct URLs then it needs to use 
hash_data(urldecode.from.core(req.url)) or hash_data(req.urldecoded).
In using hash_data(req.url) it appears to say that it wants to cache distinct 
binary representations of a URL, which to be is not the intention.

For reference, RFC3986 (I don't know if this means much though) it says in 2.1:

> The uppercase hexadecimal digits 'A' through 'F' are equivalent to
>    the lowercase digits 'a' through 'f', respectively.  If two URIs
>    differ only in the case of hexadecimal digits used in percent-encoded
>    octets, they are equivalent.  For consistency, URI producers and
>    normalizers should use uppercase hexadecimal digits for all percent-
>    encodings.


So I wonder if really the varnish core should decode before it hashes?
I guess this is a very edge scenario though so not likely to be touched since 
it only affects outside latin characters and most places will use a more 
friendly URL or latin characters where it doesn't have any issue.
What are your thoughts? Do you think it worth raising or best to just leave it 
be and work around it elsewhere?

Regards,

Jason

_______________________________________________
varnish-misc mailing list
[email protected]
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc

Re: Handling of Urlencoded string in URL in Varnish

Reply via email to