Re: [squid-users] New StoreID helper: squid_dedup
Hi Eliezer, Thanks for your feedback, much appreciated, /especially/ from you. The most important part is in dedup.py. I've kept an eye on efficiency without sacrificing readability (much) and extendability: https://github.com/frispete/squid_dedup/blob/master/squid_dedup/dedup.py A big part of the rest is related to configuration management, which tries to maximize convenience (as many config files as wanted, automatic reload option on changes, etc..) Depending on public interest, it would be cool to create a public CDN collection, that is shared among users, or even distributed automatically. Pete On Montag, 16. Mai 2016 03:44:29 Eliezer Croitoru wrote: > Thanks for sharing! > > I didn't had enough time to understand the tool structure since I am not > a python expert but, > This is the first squid helper I have seen which is based on python and > implements concurrency. > > Thanks!! > Eliezer Croitoru > > On 10/05/2016 00:56, Hans-Peter Jansen wrote: > > Hi, > > > > I'm pleased to announce the availability of squid_dedup, a helper for > > deduplicating CDN accesses, implementing the squid 3 StoreID protocol. > > > > It is a multi-threaded tool, written in python3, with no further > > dependencies, hosted at: https://github.com/frispete/squid_dedup > > available at: https://pypi.python.org/pypi/squid-dedup > > > > For openSUSE users, a ready made rpm package is available here: > > https://build.opensuse.org/package/show/home:frispete:python3/squid_dedup > > > > Any feedback is greatly appreciated. > > > > Cheers, > > Pete ___ squid-users mailing list squid-users@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-users
Re: [squid-users] Getting the full file content on a range request, but not on EVERY get ...
On Freitag, 13. Mai 2016 01:09:39 Yuri Voinov wrote: > I suggest it is very bad idea to transform caching proxy to linux > distro's or something else archive. Yuri, if I wanted an archive, I would mirror all stuff and use local repos. I went that route for a long time - it's a lot of work to keep up everywhere, and generates an awful amount of traffic (and I did it the sanest way possible - with a custom script, that was using rsync..) > As Amos said, "Squid is a cache, not an archive". Yes, updating 20 similar machines makes a significant difference with the squid as a deduplicated cache - with no recurring work at all. Pete ___ squid-users mailing list squid-users@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-users
Re: [squid-users] Getting the full file content on a range request, but not on EVERY get ...
Hi Heiler, On Donnerstag, 12. Mai 2016 13:28:00 Heiler Bemerguy wrote: > Hi Pete, thanks for replying... let me see if I got it right.. > > Will I need to specify every url/domain I want it to act on ? I want > squid to do it for every range-request downloads that should/would be > cached (based on other rules, pattern_refreshs etc) Yup, that's right. At least, that's the common approach to deal with CDNs. I think, that disallowing range requests is too drastic to work fine on the long run, but let us know, if you get to satisfactory solution this way. > It doesn't need to delay any downloads as long as it isn't a dupe of > what's already being downloaded. You can set to delay to zero of course. This is only one side of the issues with CDNs. The other, more problematic side of it is, that many server with different URLs provide the same files. Every new address will result in a new download of otherwise identical content. Here's an example of openSUSE: # # this file was generated by gen_openSUSE_dedups # from http://mirrors.opensuse.org/list/all.html # with timestamp Thu, 12 May 2016 05:30:18 +0200 # [openSUSE] match: # openSUSE Headquarter http\:\/\/[a-z0-9]+\.opensuse\.org\/(.*) # South Africa (za) http\:\/\/ftp\.up\.ac\.za\/mirrors\/opensuse\/opensuse\/(.*) # Bangladesh (bd) http\:\/\/mirror\.dhakacom\.com\/opensuse\/(.*) http\:\/\/mirrors\.ispros\.com\.bd\/opensuse\/(.*) # China (cn) http\:\/\/mirror\.bjtu\.edu\.cn\/opensuse\/(.*) http\:\/\/fundawang\.lcuc\.org\.cn\/opensuse\/(.*) http\:\/\/mirrors\.tuna\.tsinghua\.edu\.cn\/opensuse\/(.*) http\:\/\/mirrors\.skyshe\.cn\/opensuse\/(.*) http\:\/\/mirrors\.hust\.edu\.cn\/opensuse\/(.*) http\:\/\/c\.mirrors\.lanunion\.org\/opensuse\/(.*) http\:\/\/mirrors\.hustunique\.com\/opensuse\/(.*) http\:\/\/mirrors\.sohu\.com\/opensuse\/(.*) http\:\/\/mirrors\.ustc\.edu\.cn\/opensuse\/(.*) # Hong Kong (hk) http\:\/\/mirror\.rackspace\.hk\/openSUSE\/(.*) # Indonesia (id) http\:\/\/mirror\.linux\.or\.id\/linux\/opensuse\/(.*) http\:\/\/buaya\.klas\.or\.id\/opensuse\/(.*) http\:\/\/kartolo\.sby\.datautama\.net\.id\/openSUSE\/(.*) http\:\/\/opensuse\.idrepo\.or\.id\/opensuse\/(.*) http\:\/\/mirror\.unej\.ac\.id\/opensuse\/(.*) http\:\/\/download\.opensuse\.or\.id\/(.*) http\:\/\/repo\.ugm\.ac\.id\/opensuse\/(.*) http\:\/\/dl2\.foss\-id\.web\.id\/opensuse\/(.*) # Israel (il) http\:\/\/mirror\.isoc\.org\.il\/pub\/opensuse\/(.*) [...] -> this list contains about 180 entries replace: http://download.opensuse.org.%(intdomain)s/\1 # fetch all redirected objects explicitly fetch: true This is, how CDNs work, but it's a nightmare for caching proxies. In such scenarios squid_dedup comes to rescue. Cheers, Pete ___ squid-users mailing list squid-users@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-users
Re: [squid-users] Getting the full file content on a range request, but not on EVERY get ...
On Mittwoch, 11. Mai 2016 21:37:17 Heiler Bemerguy wrote: > Hey guys, > > First take a look at the log: > > root@proxy:/var/log/squid# tail -f access.log |grep > http://download.cdn.mozilla.net/pub/firefox/releases/45.0.1/update/win32/pt-> > BR/firefox-45.0.1.complete.mar 1463011781.572 8776 10.1.3.236 TCP_MISS/206 > 300520 GET [...] > Now think: An user is just doing a segmented/ranged download, right? > Squid won't cache the file because it is a range-download, not a full > file download. > But I WANT squid to cache it. So I decide to use "range_offset_limit > -1", but then on every GET squid will re-download the file from the > beginning, opening LOTs of simultaneous connections and using too much > bandwidth, doing just the OPPOSITE it's meant to! > > Is there a smart way to allow squid to download it from the beginning to > the end (to actually cache it), but only on the FIRST request/get? Even > if it makes the user wait for the full download, or cancel it > temporarily, or.. whatever!! Anything!! Well, this is exactly, what my squid_dedup helper was created for! See my announcement: Subject: [squid-users] New StoreID helper: squid_dedup Date: Mon, 09 May 2016 23:56:45 +0200 My openSUSE environment is fetching _all_ updates with byte-ranges from many servers. Therefor, I created squid_dedup. Your specific config could look like this: /etc/squid/dedup/mozilla.conf: [mozilla] match: http\:\/\/download\.cdn\.mozilla\.net/(.*) replace: http://download.cdn.mozilla.net.%(intdomain)s/\1 fetch: true The fetch parameter is unique among the other StoreID helper (AFAIK): it is fetching the object after a certain delay with a pool of fetcher threads. The idea is: after the first access for an object, wait a bit (global setting, default: 15 secs), and then fetch the whole thing once. It won't solve anything for the first client, but for all subsequent accesses. The fetcher avoids fetching anything more than once by checking the http headers. This is a pretty new project, but be assured, that the basic functions are working fine, and I will do my best to solve any upcoming issues. It is implemented with Python3 and prepared for supporting additional features easily, while keeping a good part of an eye on efficiency. Let me know, if you're going to try it. Pete ___ squid-users mailing list squid-users@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-users
[squid-users] New StoreID helper: squid_dedup
Hi, I'm pleased to announce the availability of squid_dedup, a helper for deduplicating CDN accesses, implementing the squid 3 StoreID protocol. It is a multi-threaded tool, written in python3, with no further dependencies, hosted at: https://github.com/frispete/squid_dedup available at: https://pypi.python.org/pypi/squid-dedup For openSUSE users, a ready made rpm package is available here: https://build.opensuse.org/package/show/home:frispete:python3/squid_dedup Any feedback is greatly appreciated. Cheers, Pete ___ squid-users mailing list squid-users@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-users