Just to add to the discussion, I have Varnish running in front of a couple of Thai language sites.
The URL '/กลางประเทศไทย' corresponds to the following entry in varnishlog in Varnish 2.1.1: 13 RxURL c /%E0%B8%81%E0%B8%A5%E0%B8%B2%E0%B8%87%E0%B8%9B%E0%B8%A3%E0%B8%B0%E0%B9%80%E0%B8%97%E0%B8%A8%E0%B9%84%E0%B8%97%E0%B8%A2 which is just all the high bits escaped as %nn sequences. This is actually a result of the browser (in this case Chrome) doing the conversion. This is confirmed by a netcat session. I am not sure if all browsers do the same conversion. Some more details might be gleaned from here: http://code.google.com/p/browsersec/wiki/Part1#Unicode_in_URLs But obviously varnish needs to be able to cope with these conversions. -felix On Tue, May 25, 2010 at 01:08:26PM +0300, Angie T. Muhammad wrote: > Thank you Sam for your response. I already logged requests to cached Arabic > URLs and here is the result of one request: > =========================================================================================== > Cookie: SESScfc90a62c81b7bfc6f292320b1d0b8ca=t7t650vu5qu02916unbtil9o66; > SESS50745c6a3729e7f46278f7d281511580=qjc658f7cthp6dvj65rt6a8c64; > SESS8348e9a0e0f6133hash*%ntrol: max-age=0%c9c2n9td5uuvj0hp73; > SESSb323fb39997d18c5bde4c32f7bc0ffe1=0r5ve4k3i2ubmqu > ▒±␊: 0 ┼: ┐␊␊⎻-▒┌␋┴␊ 806 > =========================================================================================== > > I tried opening the log file with less, vim, and tail but all what am > getting is either binary (less) or stuff like above (tail). > I even tried limiting the accepted charset header sent by the browser to > UTF-8 but failed. Here is my config for limiting the charset under sub > vcl_rcv { } : > ====================================== > if (req.http.Accept-Charset) { > remove req.http.Accept-Charset; > set req.http.Accept-Charset = "utf-8"; > } > ====================================== > > I also tried including C header files as follows: > =================================== > C{ > #include <string.h> > #include <locale.h> > #include <wctype.h> > #include <wchar.h> > #include <curses.h> > }C > =================================== > but it did not give me any result. > > I am thinking of recompiling with ncurses wchar enabled. Any ideas? > > > 2010/5/24 Sam Crawford <[email protected]> > > > It's not one that I'm familiar with, but if it were me, I'd try > > running varnishlog whilst putting a request for one of these URLs > > through. See how varnish prints it out in the RxURL field. This might > > give you some clues as how to specify it in the rules. > > > > Thanks, > > > > Sam > > > > > > 2010/5/23 Angie T. Muhammad <[email protected]>: > > > Hello Varnish team > > > > > > I have varnish v. 2.1.2 on production and test servers . We are running a > > > bilingual news website. > > > On my test server I am trying to parse non-English URLs like follows: > > > > > > ....................... > > > else if (req.url == "/تقارير") { > > > set beresp.http.X-Cacheable = "Yes"; > > > set beresp.ttl = 60m; > > > return(deliver); > > > } > > > ....................... > > > > > > The word in bold red is in Arabic and it is a right-to-left language. The > > > link can not be made in English and has no English equivalent. In case > > you > > > are wondering, the word means "reports". My sole problem now is that > > varnish > > > applies all other if-statements with full English URLs but not this one > > with > > > Arabiv. Even if I try regex say: req.url ~ "^/تقارير" instead of the == > > > sign, it starts with no errors but does not apply the rule. > > > > > > I tried the following: > > > 1- Reversing the letters of the arabic word, so تقارير would be ريراقت > > but > > > it did not work > > > 2- Copying the link directly into /etc/varnish/default.vcl, it produces > > > something like: %D9%88%D8%B3%D9%88%D9%85%D8%A7%D8%AA > > > Such html address handling prevents varnish from starting > > > > > > Any ideas? Your help is really appreciated. > > > > > > > > > -- > > > All the best, > > > Angie > > > > > > _______________________________________________ > > > varnish-misc mailing list > > > [email protected] > > > http://lists.varnish-cache.org/mailman/listinfo/varnish-misc > > > > > > > > > -- > All the best, > Angie > _______________________________________________ > varnish-misc mailing list > [email protected] > http://lists.varnish-cache.org/mailman/listinfo/varnish-misc -- email: [email protected] web: http://seconddrawer.com.au/ gpg: E6FC 5BC6 268D B874 E546 8F6F A2BB 220B D5F6 92E3 Please don't send me Word or PowerPoint attachments. See http://www.gnu.org/philosophy/no-word-attachments.html _______________________________________________ varnish-misc mailing list [email protected] http://lists.varnish-cache.org/mailman/listinfo/varnish-misc
