I add some debug code into cacheurl plug as follows, it shows the video URL for
google is with very short version. There is no parameters in the URL any more.
static int regex_substitute(char **buf, char *str, regex_info *info) {
...
if (matchcount < 0) {
switch (matchcount) {
case PCRE_ERROR_NOMATCH:
//TODO ADD MISMATCH URL OUTPUT HERE.
if (log) {
TSTextLogObjectWrite(log,
"Mismatch pattern:'%s' -> URL:'%s'\n",
info->pattern, str);
}
TSDebug(PLUGIN_NAME, "Mismatch pattern:'%s' -> URL:'%s'\n",
info->pattern, str);
break;
default:
The log is as
20150218.15h00m40s Mismatch
pattern:'http://.*\..*\.com/images/tuiguang/([[:digit:]]{6,6})/(.*\.mp4)' ->
URL:'r10---sn-nwj7knek.googlevideo.com:443/'
20150218.15h00m51s Mismatch
pattern:'http://.*\..*\.com/images/tuiguang/([[:digit:]]{6,6})/(.*\.mp4)' ->
URL:'clients4.google.com:443/'
Is there any way to cache youtube video with TS? Please kindly advise!
Thanks,
Cong
From: Yue, Cong [mailto:[email protected]]
Sent: Wednesday, February 18, 2015 10:42 AM
To: [email protected]
Subject: cacheurl plugin does not work for youtube
Hi
I am trying to make youtube be cached with forward proxy, but youtube URL can
not be redirected.
I configured /usr/local/libexec/trafficserver/cacheurl.config as
---
http://[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}[^&]*/f4v/.*id=tudou.itemid\=([0-9]*).*<http://[[:digit:]]%7b1,3%7d/.%5b%5b:digit:%5d%5d%7b1,3%7d/.%5b%5b:digit:%5d%5d%7b1,3%7d/.%5b%5b:digit:%5d%5d%7b1,3%7d%5b%5e&%5d*/f4v/.*id=tudou.itemid/=(%5b0-9%5d*).*>
http://www.tudou.com/$1
http://[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}[^&]*/flv/.*id=tudou.itemid\=([0-9]*).*<http://[[:digit:]]%7b1,3%7d/.%5b%5b:digit:%5d%5d%7b1,3%7d/.%5b%5b:digit:%5d%5d%7b1,3%7d/.%5b%5b:digit:%5d%5d%7b1,3%7d%5b%5e&%5d*/flv/.*id=tudou.itemid/=(%5b0-9%5d*).*>
http://www.tudou.com/$1
http://[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}/youku/.*/(.*-.*-.*-.*-[^?]*)(.*)<http://[[:digit:]]%7b1,3%7d/.%5b%5b:digit:%5d%5d%7b1,3%7d/.%5b%5b:digit:%5d%5d%7b1,3%7d/.%5b%5b:digit:%5d%5d%7b1,3%7d/youku/.*/(.*-.*-.*-.*-%5b%5e?%5d*)(.*)>
http://www.youku.com/$1
http://[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}/sohu/[0-9]*/[0-9]*/[0-9]*/(.*).mp4?key=.*<http://[[:digit:]]%7b1,3%7d/.%5b%5b:digit:%5d%5d%7b1,3%7d/.%5b%5b:digit:%5d%5d%7b1,3%7d/.%5b%5b:digit:%5d%5d%7b1,3%7d/sohu/%5b0-9%5d*/%5b0-9%5d*/%5b0-9%5d*/(.*).mp4?key=.*>
http://tv.sohu.com/$1.mp4
http://.*\..*\..*\..*/.*\.com/flvdownload/[[:digit:]]{1,3}/[[:digit:]]{1,3}/([^?]*)(.*)<http://.*/..*/..*/..*/.*/.com/flvdownload/%5b%5b:digit:%5d%5d%7b1,3%7d/%5b%5b:digit:%5d%5d%7b1,3%7d/(%5b%5e?%5d*)(.*)>
http://www.56.com/$1
http://[[:digit:]]{1,3}/mp4files/.*/.*\.com/images/tuiguang/[[:digit:]]{6,6}/(.*\.mp4)<http://[[:digit:]]%7b1,3%7d/mp4files/.*/.*/.com/images/tuiguang/%5b%5b:digit:%5d%5d%7b6,6%7d/(.*/.mp4)>
http://www.56.com/$1
http://.*\..*\.com/images/tuiguang/([[:digit:]]{6,6})/(.*\.mp4)<http://.*/..*/.com/images/tuiguang/(%5b%5b:digit:%5d%5d%7b6,6%7d)/(.*/.mp4)>
http://www.56.com/tuiguang/$1/$2
http://[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}/mov.bn.netease.com/.*/.*/.*/.*/.*/([^?]*)(.*)<http://[[:digit:]]%7b1,3%7d/.%5b%5b:digit:%5d%5d%7b1,3%7d/.%5b%5b:digit:%5d%5d%7b1,3%7d/.%5b%5b:digit:%5d%5d%7b1,3%7d/mov.bn.netease.com/.*/.*/.*/.*/.*/(%5b%5e?%5d*)(.*)>
http://v.163.com/$1
http://[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}/.*-.*-.*/.*/cemov.bn.netease.com/.*/.*/.*/.*/.*/([^?]*)(.*)<http://[[:digit:]]%7b1,3%7d/.%5b%5b:digit:%5d%5d%7b1,3%7d/.%5b%5b:digit:%5d%5d%7b1,3%7d/.%5b%5b:digit:%5d%5d%7b1,3%7d/.*-.*-.*/.*/cemov.bn.netease.com/.*/.*/.*/.*/.*/(%5b%5e?%5d*)(.*)>
http://v.163.com/$1
#YOUTUBE
https:\/\/(.*\.googlevideo\.com)\/(get_video|videoplayback|videodownload)\?.*?\&(id=[a-zA-Z0-9.\-\_]*).*
http://video-srv.youtube.comi.atsinternal/$3.mp4
---
>From /usr/local/var/log/trafficserver/cacheurl.log
it shows for youku.com, the URL match can happen, but for youtube.com, it does
not happen.
I checked my url with online regular expression tool, it shows the URL of " "
works.
This is the log of /usr/local/var/log/trafficserver/cacheurl.log
----
20150218.10h24m54s Adding pattern/replacement pair:
'http://[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}[^&]*/f4v/.*id=tudou.itemid\=([0-9]*).*'
-> 'http://www.tudou.com/$1'
20150218.10h24m54s Adding pattern/replacement pair:
'http://[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}[^&]*/flv/.*id=tudou.itemid\=([0-9]*).*'
-> 'http://www.tudou.com/$1'
20150218.10h24m54s Adding pattern/replacement pair:
'http://[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}/youku/.*/(.*-.*-.*-.*-[^?]*)(.*)'
-> 'http://www.youku.com/$1'
20150218.10h24m54s Adding pattern/replacement pair:
'http://[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}/sohu/[0-9]*/[0-9]*/[0-9]*/(.*).mp4?key=.*'
-> 'http://tv.sohu.com/$1.mp4'
20150218.10h24m54s Adding pattern/replacement pair:
'http://.*\..*\..*\..*/.*\.com/flvdownload/[[:digit:]]{1,3}/[[:digit:]]{1,3}/([^?]*)(.*)'
-> 'http://www.56.com/$1'
20150218.10h24m54s Adding pattern/replacement pair:
'http://[[:digit:]]{1,3}/mp4files/.*/.*\.com/images/tuiguang/[[:digit:]]{6,6}/(.*\.mp4)'
-> 'http://www.56.com/$1'
20150218.10h24m54s Adding pattern/replacement pair:
'http://.*\..*\.com/images/tuiguang/([[:digit:]]{6,6})/(.*\.mp4)' ->
'http://www.56.com/tuiguang/$1/$2'
20150218.10h24m54s Adding pattern/replacement pair:
'http://[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}/mov.bn.netease.com/.*/.*/.*/.*/.*/([^?]*)(.*)'
-> 'http://v.163.com/$1'
20150218.10h24m54s Adding pattern/replacement pair:
'http://[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}/.*-.*-.*/.*/cemov.bn.netease.com/.*/.*/.*/.*/.*/([^?]*)(.*)'
-> 'http://v.
163.com/$1'
20150218.10h24m54s Adding pattern/replacement pair:
'https:\/\/(.*\.googlevideo\.com)\/(get_video|videoplayback|videodownload)\?.*?\&(id=[a-zA-Z0-9.\-\_]*).*'
-> 'http://video-srv.youtube.comi.atsinternal/
$3.mp4'
20150218.10h26m22s Rewriting cache URL for
http://63.243.196.157/youku/6976A4404493A8379EE16C6BCF/03000811005447043729FB19A339D634175695-FCC1-8DAC-C94F-72DE72FA6302.mp4?nk=58632024139_23738066383&ns=164654
35_23569878&special=true to
http://www.youku.com/03000811005447043729FB19A339D634175695-FCC1-8DAC-C94F-72DE72FA6302.mp4
---------
The squid log is as follows.
[root@ats1 trafficserver]# traffic_logcat squid.blog
-----
1424283922.032 880 10.0.0.45 TCP_MISS/200 105743 CONNECT
r6---sn-vgqsenel.googlevideo.com:443/ - DIRECT/r6---sn-vgqsenel.googlevideo.com
-
1424283927.747 17076 10.0.0.45 TCP_MISS/200 588 CONNECT s.ytimg.com:443/ -
DIRECT/s.ytimg.com -
1424283927.747 17076 10.0.0.45 TCP_MISS/200 588 CONNECT s.ytimg.com:443/ -
DIRECT/s.ytimg.com -
1424283927.748 17046 10.0.0.45 TCP_MISS/200 588 CONNECT yt3.ggpht.com:443/ -
DIRECT/yt3.ggpht.com -
1424283927.748 17079 10.0.0.45 TCP_MISS/200 588 CONNECT s.ytimg.com:443/ -
DIRECT/s.ytimg.com -
1424283927.749 17079 10.0.0.45 TCP_MISS/200 588 CONNECT s.ytimg.com:443/ -
DIRECT/s.ytimg.com -
1424283927.749 17080 10.0.0.45 TCP_MISS/200 588 CONNECT s.ytimg.com:443/ -
DIRECT/s.ytimg.com -
1424283927.826 17155 10.0.0.45 TCP_MISS/200 588 CONNECT yt3.ggpht.com:443/ -
DIRECT/yt3.ggpht.com -
1424283930.226 175 10.0.0.210 TCP_MISS/200 43699 CONNECT
www.youtube.com:443/<http://www.youtube.com:443/> - DIRECT/www.youtube.com -
1424283930.386 99 10.0.0.210 TCP_MISS/200 7901 CONNECT
manifest.googlevideo.com:443/ - DIRECT/manifest.googlevideo.com -
1424283941.575 30178 10.0.0.45 TCP_MISS/200 897 CONNECT gg.google.com:443/ -
DIRECT/gg.google.com -
1424283941.576 30899 10.0.0.45 TCP_MISS/200 1107 CONNECT yt3.ggpht.com:443/ -
DIRECT/yt3.ggpht.com -
1424283941.577 30906 10.0.0.45 TCP_MISS/200 2410 CONNECT s.ytimg.com:443/ -
DIRECT/s.ytimg.com -
1424283942.571 30347 10.0.0.45 TCP_MISS/200 947 CONNECT ssl.gstatic.com:443/ -
DIRECT/ssl.gstatic.com -
1424283970.120 39690 10.0.0.210 TCP_MISS/200 437526696 CONNECT
r10---sn-a5m7lnel.googlevideo.com:443/ -
DIRECT/r10---sn-a5m7lnel.googlevideo.com -
1424283980.655 671 10.0.0.45 TCP_MISS/200 252 GET
1424283983.193 12 10.0.0.45 TCP_MEM_HIT/200 128564 GET
http://63.243.196.157/youku/6976A4404493A8379EE16C6BCF/03000811005447043729FB19A339D634175695-FCC1-8DAC-C94F-72DE72FA6302.mp4?nk=76165731532_23738066400&ns=16576127_23459186&special=true
- NONE/- video/mp4
1424283984.199 0 10.0.0.45 TCP_MEM_HIT/200 128564 GET
http://63.243.196.157/youku/6976A4404493A8379EE16C6BCF/03000811005447043729FB19A339D634175695-FCC1-8DAC-C94F-72DE72FA6302.mp4?nk=314386706919_23738066416&ns=16686819_23348494&special=true
- NONE/- video/mp4
1424283985.253 4 10.0.0.45 TCP_MEM_HIT/200 128564 GET
http://63.243.196.157/youku/6976A4404493A8379EE16C6BCF/03000811005447043729FB19A339D634175695-FCC1-8DAC-C94F-72DE72FA6302.mp4?nk=410790030197_23738066434&ns=16797511_23237802&special=true
- NONE/- video/mp4
1424283987.279 7 10.0.0.45 TCP_MEM_HIT/200 128564 GET
http://63.243.196.157/youku/6976A4404493A8379EE16C6BCF/03000811005447043729FB19A339D634175695-FCC1-8DAC-C94F-72DE72FA6302.mp4?nk=314386706950_23738066468&ns=17018895_23016418&special=true
- NONE/- video/mp4
1424283997.579 9 10.0.0.45 TCP_MEM_HIT/200 128564 GET
http://63.243.196.157/youku/6976A4404493A8379EE16C6BCF/03000811005447043729FB19A339D634175695-FCC1-8DAC-C94F-72DE72FA6302.mp4?nk=410790030290_23738066639&ns=18125815_21909498&special=true
- NONE/- video/mp4
-------
>From squid log, it seems I can not get full URL of youtube, but with
>youtube-dl, I check the url should work with my regular expression.
-----
[root@test-client1 webpages]# youtube-dl -v --proxy
http://10.0.0.204:80 https://www.youtube.com/watch?v=q1mndAYZlio
[debug] System config: []
[debug] User config: []
[debug] Command-line args: ['-v', '--proxy', 'http://10.0.0.204:80',
'https://www.youtube.com/watch?v=q1mndAYZlio']
[debug] Encodings: locale UTF-8, fs UTF-8, out UTF-8, pref UTF-8 [debug]
youtube-dl version 2015.02.18.1 [debug] Python version 2.6.6 -
Linux-2.6.32-504.3.3.el6.x86_64-x86_64-with-centos-6.6-Final
[debug] exe versions: none
[debug] Proxy map: {u'http': 'http://10.0.0.204:80', u'https':
'http://10.0.0.204:80'}
[youtube] q1mndAYZlio: Downloading webpage [youtube] q1mndAYZlio: Extracting
video information [youtube] q1mndAYZlio: Downloading DASH manifest [debug]
Invoking downloader on
u'https://r10---sn-a5m7lnel.googlevideo.com/videoplayback?signature=78D8EB7D568039D930E67820E5D8751C67AD3273.68A236DF130AC81356A4970AFEFD7168E277111A&upn=ypVWJMpcfaA&mime=video%2Fmp4&initcwndbps=5183750&source=youtube&pl=18&sver=3&expire=1424305562&mm=31&dur=2857.052&id=o-AITnvpDOi0FfUY-UijIKeWK61KsdMQZSUd0E_NnKO8Od&itag=22&key=yt5&ip=208.184.212.172&fexp=902039%2C905657%2C927622%2C936109%2C9405708%2C9406015%2C9407010%2C943917%2C947225%2C948124%2C948807%2C952302%2C952605%2C952612%2C952901%2C955100%2C955301%2C957201%2C959701%2C960610&mt=1424283746&mv=m&ms=au&ratebypass=yes&sparams=dur%2Cid%2Cinitcwndbps%2Cip%2Cipbits%2Citag%2Cmime%2Cmm%2Cms%2Cmv%2Cpl%2Cratebypass%2Crequiressl%2Csource%2Cupn%2Cexpire&ipbits=0&requiressl=yes'
[download] Destination: Building a large scale CDN with Apache Traffic Server -
Jan van Doorn-q1mndAYZlio.mp4 [download] 100% of 416.52MiB in 00:39
------
Can somebody kindly advise?
Thanks,
Cong
________________________________
This e-mail message is for the sole use of the intended recipient(s) and may
contain confidential and privileged information. Any unauthorized review, use,
disclosure or distribution is prohibited. If you are not the intended
recipient, please contact the sender by reply e-mail and destroy all copies of
the original message. If you are the intended recipient, please be advised that
the content of this message is subject to access, review and disclosure by the
sender's e-mail System Administrator.
________________________________
This e-mail message is for the sole use of the intended recipient(s) and may
contain confidential and privileged information. Any unauthorized review, use,
disclosure or distribution is prohibited. If you are not the intended
recipient, please contact the sender by reply e-mail and destroy all copies of
the original message. If you are the intended recipient, please be advised that
the content of this message is subject to access, review and disclosure by the
sender's e-mail System Administrator.