Re: [squid-users] Ideas for better caching these popular urls

2018-04-11 Thread Eliezer Croitoru
Hey Omid,

I found the service I wrote and packed it in a RPM at:
http://ngtech.co.il/repo/centos/7/x86_64/response-dumper-icap-1.0.0-1.el7.centos.x86_64.rpm

If you are using other OS let me know and I will try to package it for your OS.
Currently debian\ubuntu alien converts the RPM smoothly.

The dumps directory is at:
/var/response-dumper

But the cleanup and filtering ACL's are your job.
You can define which GET requests the service dump\log into the files.
Each individual file in this directory will be name in the next format:
-<8 bytes uuid>-<md5(GET:full url)>

This format will allow multiple requests happen at the same time but have a 
different name but the URL hash is still the same so you can filter files by 
this.
To calculate the hash of a URL use:
$ echo -n "GET:http:/url-to-has.com/path?query=terms"|md5sum

In each and every file the full ICAP respmod details exits ie:
ICAP Request\r\n
HTTP Request \r\n
HTTP Response\r\n

By default cookies+authorization headers are censored from both request and 
response in the dump to avoid some privacy law issues.

Now the only missing feature is RedBot is to feed a single request and a single 
response to get a full analysis.

Let me know if it works OK for you(works here fine for a while now).

Eliezer


Eliezer Croitoru
Linux System Administrator
Mobile: +972-5-28704261
Email: elie...@ngtech.co.il


-Original Message-
From: squid-users <squid-users-boun...@lists.squid-cache.org> On Behalf Of Omid 
Kosari
Sent: Wednesday, April 11, 2018 12:32
To: squid-users@lists.squid-cache.org
Subject: Re: [squid-users] Ideas for better caching these popular urls

Eliezer Croitoru wrote
> You will need more then just the urls but also the response headers for
> these.
> I might be able to write an ICAP service that will log requests and
> response headers and it can assist Cache admins to improve their
> efficiency but this can take a while.

Hi Eliezer,

Nice idea. I am ready to test/help/share what you need in real production
environment. Please also do a general thing which includes other domains in
first post attachment. They worth a try .

Thanks




--
Sent from: 
http://squid-web-proxy-cache.1019090.n4.nabble.com/Squid-Users-f1019091.html
___
squid-users mailing list
squid-users@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-users

___
squid-users mailing list
squid-users@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-users


Re: [squid-users] Ideas for better caching these popular urls

2018-04-11 Thread Eliezer Croitoru
Hey Omid,

I will try to use a file format similar to this:
## FILENAME = unixtime-sha256
ESPMOD icap://127.0.0.1:1344/dumper ICAP/1.0
date: Wed, 11 Apr 2018 16:52:13 GMT
encapsulated: req-hdr=0, res-hdr=105, res-body=413
preview: 0
allow: 204
host: 127.0.0.1:1344
Socket-Remote-Addr: 127.0.0.1:55178

GET http://ngtech.co.il/index.html HTTP/1.1
Accept: */*
User-Agent: curl/7.29.0

HTTP/1.1 200 OK
Content-Length: 17230
Accept-Ranges: bytes
Access-Control-Allow-Methods: GET, POST, OPTIONS
Access-Control-Allow-Origin: *
Content-Type: text/html
Date: Wed, 11 Apr 2018 16:52:13 GMT
Last-Modified: Tue, 03 Apr 2018 20:19:05 GMT
Server: nginx/1.10.3 (Ubuntu)
Vary: Accept-Encoding
## EOF

I have a prototype that I wrote three years ago but it needs to be polished for 
general use.
I will update when I will have some progress.

Eliezer


Eliezer Croitoru
Linux System Administrator
Mobile: +972-5-28704261
Email: elie...@ngtech.co.il



-Original Message-
From: squid-users <squid-users-boun...@lists.squid-cache.org> On Behalf Of Omid 
Kosari
Sent: Wednesday, April 11, 2018 12:32
To: squid-users@lists.squid-cache.org
Subject: Re: [squid-users] Ideas for better caching these popular urls

Eliezer Croitoru wrote
> You will need more then just the urls but also the response headers for
> these.
> I might be able to write an ICAP service that will log requests and
> response headers and it can assist Cache admins to improve their
> efficiency but this can take a while.

Hi Eliezer,

Nice idea. I am ready to test/help/share what you need in real production
environment. Please also do a general thing which includes other domains in
first post attachment. They worth a try .

Thanks




--
Sent from: 
http://squid-web-proxy-cache.1019090.n4.nabble.com/Squid-Users-f1019091.html
___
squid-users mailing list
squid-users@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-users

___
squid-users mailing list
squid-users@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-users


Re: [squid-users] Ideas for better caching these popular urls

2018-04-11 Thread Omid Kosari
Eliezer Croitoru wrote
> You will need more then just the urls but also the response headers for
> these.
> I might be able to write an ICAP service that will log requests and
> response headers and it can assist Cache admins to improve their
> efficiency but this can take a while.

Hi Eliezer,

Nice idea. I am ready to test/help/share what you need in real production
environment. Please also do a general thing which includes other domains in
first post attachment. They worth a try .

Thanks




--
Sent from: 
http://squid-web-proxy-cache.1019090.n4.nabble.com/Squid-Users-f1019091.html
___
squid-users mailing list
squid-users@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-users


Re: [squid-users] Ideas for better caching these popular urls

2018-04-10 Thread Eliezer Croitoru
Hey Omid,

From what I remember the basics of math to verify the patter of a specific set 
of numbers have some kind of pattern is to have at-least 3 items.
But in the cryptography world it another story.
I have not researched playstation downloads and will probably won't do that.
Others might offer some help but you must understand what you are trying to 
predict in these urls and downloads.
From what I have seen it seem that this CDN "llnwd.net" is very cache friendly 
but you need to know how to handle their traffic.
They don’t use any form of ETAG headers but they do provide some pieces of 
information in the url's that can identify something about it.
If they use a ticketing system such as couple other CDN providers you would 
need to know the "ID" of the url before it's being downloaded.
You will need more then just the urls but also the response headers for these.
I might be able to write an ICAP service that will log requests and response 
headers and it can assist Cache admins to improve their efficiency but this can 
take a while.

All The Bests,
Eliezer


Eliezer Croitoru
Linux System Administrator
Mobile: +972-5-28704261
Email: elie...@ngtech.co.il



-Original Message-
From: squid-users <squid-users-boun...@lists.squid-cache.org> On Behalf Of Omid 
Kosari
Sent: Tuesday, April 10, 2018 14:20
To: squid-users@lists.squid-cache.org
Subject: Re: [squid-users] Ideas for better caching these popular urls

Thanks for reply . 

I assumed the community at different scales from little isp to large ISPs
may have common domains like those i highlighted so they may have same issue
as mine . So i ignored common parts .

One of problems with redbot is it shows timeout for big files like 

http://gs2.ww.prod.dl.playstation.net/gs2/appkgo/prod/CUSA00900_00/2/f_2df8e321f37e2f5ea3930f6af4e9571144916013ee38893d881890b454b5fed6/f/UP9000-CUSA00900_00-BLOODBORNE00_4.pkg?downloadId=0187=018700e2291bda0f868f=us=ob=aa2cd9c8d1f359feb843ae4a6c99cfcdb6569ca9cc60ad6d28b6f8de3b5fac23=0=23.57.69.81=0027

http://gs2.ww.prod.dl.playstation.net/gs2/ppkgo/prod/CUSA07557_00/25/f_053bab8c9dec6fbc68a0bd9fc58793285ae350ccf7dadacb35b5840228a9d802/f/EP4001-CUSA07557_00-F12017EMASTER000-A0113-V0100_0.pkg?downloadId=0059=005900e22977e62f91a2=ob=0183=8.248.5.254=0032


I assumed anyone with few thousand of users may have same problem and maybe
they like to share for example their refresh_pattern or storeid to solve my
problem . You better know that playstation is everywhere playstation ;)

Here is part of storeid_db file
^http:\/\/.*\.sonycoment\.loris-e\.llnwd\.net\/(.*?\.pkg)
http://playstation.net.squidinternal/$1
^http:\/\/.*\.playstation\.net\/(.*?\.pkg)
http://playstation.net.squidinternal/$1

Almost all of the playstation huge downloads are with 206 code but it will
download the file from start to end , if i remember correctly in this
situation squid will correctly cache the file .



--
Sent from: 
http://squid-web-proxy-cache.1019090.n4.nabble.com/Squid-Users-f1019091.html
___
squid-users mailing list
squid-users@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-users

___
squid-users mailing list
squid-users@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-users


Re: [squid-users] Ideas for better caching these popular urls

2018-04-10 Thread Omid Kosari
Thanks for reply . 

I assumed the community at different scales from little isp to large ISPs
may have common domains like those i highlighted so they may have same issue
as mine . So i ignored common parts .

One of problems with redbot is it shows timeout for big files like 

http://gs2.ww.prod.dl.playstation.net/gs2/appkgo/prod/CUSA00900_00/2/f_2df8e321f37e2f5ea3930f6af4e9571144916013ee38893d881890b454b5fed6/f/UP9000-CUSA00900_00-BLOODBORNE00_4.pkg?downloadId=0187=018700e2291bda0f868f=us=ob=aa2cd9c8d1f359feb843ae4a6c99cfcdb6569ca9cc60ad6d28b6f8de3b5fac23=0=23.57.69.81=0027

http://gs2.ww.prod.dl.playstation.net/gs2/ppkgo/prod/CUSA07557_00/25/f_053bab8c9dec6fbc68a0bd9fc58793285ae350ccf7dadacb35b5840228a9d802/f/EP4001-CUSA07557_00-F12017EMASTER000-A0113-V0100_0.pkg?downloadId=0059=005900e22977e62f91a2=ob=0183=8.248.5.254=0032


I assumed anyone with few thousand of users may have same problem and maybe
they like to share for example their refresh_pattern or storeid to solve my
problem . You better know that playstation is everywhere playstation ;)

Here is part of storeid_db file
^http:\/\/.*\.sonycoment\.loris-e\.llnwd\.net\/(.*?\.pkg)
http://playstation.net.squidinternal/$1
^http:\/\/.*\.playstation\.net\/(.*?\.pkg)
http://playstation.net.squidinternal/$1

Almost all of the playstation huge downloads are with 206 code but it will
download the file from start to end , if i remember correctly in this
situation squid will correctly cache the file .



--
Sent from: 
http://squid-web-proxy-cache.1019090.n4.nabble.com/Squid-Users-f1019091.html
___
squid-users mailing list
squid-users@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-users


Re: [squid-users] Ideas for better caching these popular urls

2018-04-10 Thread Amos Jeffries
On 10/04/18 22:32, Omid Kosari wrote:
> Hello,
> 
> squid-top-domains.JPG
> 
>   
> 
> This image shows stats from one of my squid boxes . I have question about
> highlighted ones . I think they should have better hit ratio because they
> are popular between clients .

There are no URLs in that image. There are only wildcards for top-level
domains and a HIT % over the *entire* domain.

To figure out whether any of them should actually have better HIT ratios
you have to look at the actual URLs and see how much uniqueness exists
there.

Then for the _full_ URLs (scheme, domain, path, *and* ?query portions)
which are not very unique look at the response headers to see why they
are not caching well. The tool at redbot.org can help with that last part.

Amos
___
squid-users mailing list
squid-users@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-users


[squid-users] Ideas for better caching these popular urls

2018-04-10 Thread Omid Kosari
Hello,

squid-top-domains.JPG

  

This image shows stats from one of my squid boxes . I have question about
highlighted ones . I think they should have better hit ratio because they
are popular between clients .
I have checked a lot of things like calamaris and logs , played with
refresh_pattern , storeid rules etc .

I want gurus and community to please help for better HITs .

Also i am ready to share specific parts of access.log and others if
requested .

Thanks



--
Sent from: 
http://squid-web-proxy-cache.1019090.n4.nabble.com/Squid-Users-f1019091.html
___
squid-users mailing list
squid-users@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-users