On 30/05/11 00:22, Ghassan Gharabli wrote:
Hello,

I was trying to cache this website :

http://down2.nogomi.com.xn55571528exgem0o65xymsgtmjiy75924mjqqybp.nogomi.com/M15/Alaa_Zalzaly/Atrak/Nogomi.com_Alaa_Zalzaly-3ali_Tar.mp3

How do you cache or rewrite its uRL to static domain! :
down2.nogomi.com.xn55571528exgem0o65xymsgtmjiy75924mjqqybp.nogomi.com

Does that URL matches this REGEX EXAMPLE or who can help me match this
Nogomi.com CDN?

         #generic http://variable.domain.com/path/filename."ex";, "ext" or "exte"

The line above describes what the 'm/' pattern produces for the $y array.
Well, kind of...

$1 is anything. utter garbage. could be a full worth of loose bits:
      "http://evil.example.com/cache-poison?url=http://";

$2 appears to be a two-part domain name (ie "example.com" as opposed to a three-part "www.example.com")

$3 is the file or script name.
$4 is the file extension type.


         #http://cdn1-28.projectplaylist.com
         #http://s1sdlod041.bcst.cdn.s1s.yimg.com
} elsif 
(m/^http:\/\/(.*?)(\.[^\.\-]*?\..*?)\/([^\?\&\=]*)\.([\w\d]{2,4})\??.*$/)
{
        @y = ($1,$2,$3,$4);
        $y[0] =~
s/([a-z][0-9][a-z]dlod[\d]{3})|((cache|cdn)[-\d]*)|([a-zA-A]+-?[0-9]+(-[a-zA-Z]*)?)/cdn/;

I assume you are trying to compress "down2.nogomi.com.xn55571528exgem0o65xymsgtmjiy75924mjqqybp" down to "cdn" without allowing any non-FQDN garbage to compress?

I would use:  s/[a-z0-9A-Z\.\-]+/cdn/
and add a fixed portion to ensure that $y[1] is one of the base domains in the CDN. Just in case some other site uses the same host naming scheme.

        print $x . "storeurl://" . $y[0] . $y[1] . "/" . $y[2] . "." .
$y[3] . "\n";

I also tried to study more about REGULAR EXPRESSIONS but their
examples are only for simple URLS .. I really need to study more about
Complex URL .


Relax. You do not have to combine them all into one regex.

You can make it simple and efficient to start with and improve as your knowledge does. If in doubt play it safe, storeurl_rewriting has at its core the risk of XSS attack on your own clients (in the example above $y[0] comes very close).

The hardest part is knowing for certain what all the parts of the URL mean to the designers of that website. So that you only erase the useless trackers and routing tags, while keeping everything important.

Amos
--
Please be using
  Current Stable Squid 2.7.STABLE9 or 3.1.12
  Beta testers wanted for 3.2.0.7 and 3.1.12.1

Reply via email to