On 27/09/2012 4:09 p.m., Eliezer Croitoru wrote:
Well I was reading here and there the store code and other things then
started testing some theories about how store_url can be and should be
implemented.
If you do remember or not I refactored the source in couple places to
use originalUrl and original_url to check all the places where
original url is used and then decide where the store_url should be
placed if needed.
(memobject->original_url and has memobject->store_url)
the main place of all I have seen that should be used with the
store_url is the
http://bazaar.launchpad.net/~squid/squid/3-trunk/view/head:/src/store_key_md5.cc#L144
which gets requests from couple places and mainly the setPublic for
store entry
(Does bazaar has search option in it ?)
Doxygen provides all the references/referenced-by information.
http://www.squid-cache.org/Doc/code/store__key__md5_8cc.html#a5461ad8f7a969c8bec3008a70a74c42a
so I changed the code to use the store_url in a case it exists in the
memobject instead of the "urlCanonical(request)".
The result is good since it makes the predicted md5 hash for the url
based on the store_url and not the original_url.
At this point I started looking up for more places that original_url
might should be replaced since maybe there is a place I missed like
when storing or fetching the file for any reason there is that I cant
see and think of.
I placed a test file on my server called v.flv which is a static text
file.
I used a store_url helper that will use the arguments id itag and
range (something I know and have script for) to rewrite the url.
from
'http://www1.ngtech.co.il/v.flv?id=1000&itag=34&range=0-100&icu=0&tag1' into
'http://ngtech.squid.internal/id=1000&itag=34&range=0-100'
which I later will change the id itag and range on the request and
check what happens in my extended logs.
so as expected:
2012/09/27 04:34:17 kid1| A candidate recived from Store URL rewriter
accepted: http://ngtech.squid.internal/id=1000&itag=34&range=0-100
2012/09/27 04:34:17.157 kid1| storeKeypublicreuest could be with URL
and method: GET http://ngtech.squid.internal/id=1000&itag=34&range=0-100
2012/09/27 04:34:17.157 kid1| storeKeypublicreuest with URL and
method: GET http://ngtech.squid.internal/id=1000&itag=34&range=0-100
2012/09/27 04:34:17.157 kid1| storeKeypublicreuest was created using
method and url from request
2012/09/27 04:34:17.157 kid1| storeGet: looking up
4F2F79AE5171037CBE02EA95ECD5B86C
the ^^^ hexdigest is what expected to be looked for.
instead of looking up for: 7F77FC394B370386D2E6AD9F8B748846
its looking for: 4F2F79AE5171037CBE02EA95ECD5B86C
which is a very good progress and the file was cached based on the
store_url hash.
I started seeing somethings with large files so I reduced the file
size to 702 bytes and then I will be able to look over things without
too much stuff running because of the size.
a new session new file and other url which result a good 304 validation
and the web server logs shows 200 for the first and 304 for the second.
##access.log
1348715219.641 205 127.0.0.1 TCP_MISS/200 1088 GET
http://www1.ngtech.co.il/v.flv?id=100&itag=34&ii=1 -
HIER_DIRECT/79.181.248.214 video/x-flv
1348715278.783 11 127.0.0.1 TCP_REFRESH_UNMODIFIED/200 1096 GET
http://www1.ngtech.co.il/v.flv?id=100&itag=34&ii=1 -
HIER_DIRECT/79.181.248.214 video/x-flv
(you can skip this log or jump over what you know)
##cache.log
A candidate recived from Store URL rewriter accepted:
http://ngtech.squid.internal/id=100&itag=34
storeKeypublicreuest could be with URL and method: GET
http://ngtech.squid.internal/id=100&itag=34
storeKeypublicreuest with URL and method: GET
http://ngtech.squid.internal/id=100&itag=34
storeKeypublicreuest was created using method and url from request
storeGet: looking up A5E995AF63D94AC8E7B40ACA255948F1
src/store_dir.cc(748) get: got in-transit entry: -1@-1=1/2/0/0
StoreEntry::lock: key 'A5E995AF63D94AC8E7B40ACA255948F1' count=1
storeCreateEntry: 'http://www1.ngtech.co.il/v.flv?id=100&itag=34&ii=1'
src/store.cc(395) StoreEntry: new StoreEntry 0x247c080
src/MemObject.cc(109) MemObject: new MemObject 0x2788050
storeKeyPrivate: GET http://www1.ngtech.co.il/v.flv?id=100&itag=34&ii=1
storeKeyPrivate was created using method and url is:
StoreEntry::hashInsert: Inserting Entry 0x247c080 key
'56E7D8D797D8F666B16138EF1686E5B7'
StoreEntry::lock: key '56E7D8D797D8F666B16138EF1686E5B7' count=2
StoreEntry::lock: key '56E7D8D797D8F666B16138EF1686E5B7' count=3
src/store.cc(1677) originalUrl: A storeEntry::origianlUrl being used
here using URL:http://www1.ngtech.co.il/v.flv?id=100&itag=34&ii=1
src/store.cc(1678) originalUrl: A storeEntry::origianlUrl Could be
used with URL:http://ngtech.squid.internal/id=100&itag=34
StoreEntry::lock: key '56E7D8D797D8F666B16138EF1686E5B7' count=4
StoreEntry::unlock: key '56E7D8D797D8F666B16138EF1686E5B7' count=3
src/store.cc(1677) originalUrl: A storeEntry::origianlUrl being used
here using URL:http://www1.ngtech.co.il/v.flv?id=100&itag=34&ii=1
src/store.cc(1678) originalUrl: A storeEntry::origianlUrl Could be
used with URL:http://ngtech.squid.internal/id=100&itag=34
src/store.cc(1677) originalUrl: A storeEntry::origianlUrl being used
here using URL:http://www1.ngtech.co.il/v.flv?id=100&itag=34&ii=1
src/store.cc(1678) originalUrl: A storeEntry::origianlUrl Could be
used with URL:http://ngtech.squid.internal/id=100&itag=34
StoreEntry::replaceHttpReply:
http://www1.ngtech.co.il/v.flv?id=100&itag=34&ii=1
ctx: enter level 0: 'http://www1.ngtech.co.il/v.flv?id=100&itag=34&ii=1'
StoreEntry::expireNow: '56E7D8D797D8F666B16138EF1686E5B7'
StoreEntry::setReleaseFlag: '56E7D8D797D8F666B16138EF1686E5B7'
ctx: exit level 0
StoreEntry::lock: key '56E7D8D797D8F666B16138EF1686E5B7' count=4
storeSwapOutFileClose: 56E7D8D797D8F666B16138EF1686E5B7 how=1
storeSwapOutFileClose: sio = 0
StoreEntry::unlock: key '56E7D8D797D8F666B16138EF1686E5B7' count=3
StoreEntry::unlock: key '56E7D8D797D8F666B16138EF1686E5B7' count=2
StoreEntry::lock: key 'A5E995AF63D94AC8E7B40ACA255948F1' count=2
StoreEntry::unlock: key '56E7D8D797D8F666B16138EF1686E5B7' count=1
StoreEntry::unlock: key '56E7D8D797D8F666B16138EF1686E5B7' count=0
storeRelease: Releasing: '56E7D8D797D8F666B16138EF1686E5B7'
src/store.cc(460) destroyStoreEntry: destroyStoreEntry: destroying
0x247c088
src/store.cc(438) destroyMemObject: destroyMemObject 0x2788050
src/MemObject.cc(132) ~MemObject: del MemObject 0x2788050
StoreEntry::unlock: key 'A5E995AF63D94AC8E7B40ACA255948F1' count=1
StoreEntry::unlock: key 'A5E995AF63D94AC8E7B40ACA255948F1' count=0
now comes the weird part:
I have tried using another url that should result in the same
rewritten url: http://www1.ngtech.co.il/v.flv?id=100&itag=34&ii=11
which will result in: http://ngtech.squid.internal/id=100&itag=34
But while in the access.log it shows:
1348716354.472 115 127.0.0.1 TCP_HIT/200 1096 GET
http://www1.ngtech.co.il/v.flv?id=100&itag=34&ii=11 -
HIER_DIRECT/79.181.248.214 video/x-flv
the actual server log shows:
IP - - [27/Sep/2012:05:24:38 +0200] "GET /v.flv?id=100&itag=34&ii=11
HTTP/1.1" 200 702
So I looked at the cache.log to understand what happens that it shows
HIT in one place but does another thing.
the full log is here: http://www1.ngtech.co.il/store_url/test3.txt
but I noticed one thing:
storeGet: looking up A5E995AF63D94AC8E7B40ACA255948F1
src/store_dir.cc(748) get: got in-transit entry: -1@-1=1/2/0/0
StoreEntry::lock: key 'A5E995AF63D94AC8E7B40ACA255948F1' count=1
clientProcessHit: URL mismatch,
'http://www1.ngtech.co.il/v.flv?id=100&itag=34&ii=1' !=
'http://www1.ngtech.co.il/v.flv?id=100&itag=34&ii=11'
StoreEntry::unlock: key 'A5E995AF63D94AC8E7B40ACA255948F1' count=0
storeCreateEntry: 'http://www1.ngtech.co.il/v.flv?id=100&itag=34&ii=11'
So there is a case in squid which the log shows HIT but the request is
being fetched from source since there is a mismatch between the url in
the stored entry (regardless the hash match) and the actual request.
related code at:
http://bazaar.launchpad.net/~squid/squid/3-trunk/view/head:/src/client_side_reply.cc#L504
This is good for regular requests but not good for rewritten store_url
since it's nature.
The options are:
1. store in meta data the store_url and using it while checking for a
HIT if exists.
2. do not test for url mismatch in a case of store_url.
The above is the only blockage I until now saw in my testings.
I want to dig into the code more and see the exact reason and
mechanism of this clientProcessHit and the surrounding related code
(that actually fetch the file from the server).
This is the HTTP/1.1 variant handling code. Each request URL may have
multiple response variants depending on the things indicated by Vary:
header and possibly ETag header as a unique ID for each variant. How it
is done in Squid is that the store contains a vary meta object as well
as the normal file meta object.
The storeURL should point at the meta object, and then when loaded meta
object points at other bits to combine in addition to the storeURL to
make a new key for a second lookup which is expected to find the actual
file object for that variant.
You should make these secondary keys use the storeURL instead of the
original URL (only when the original URL was used, some variant may be
looked up based on ETag or digest values alone).
Some other places which need it are:
PURGE requets handling lookups
ICP request lookups
HTCP request lookups
Cache Digest is a little more tricky. Ignore it for now. The remote
proxy will never be guaranteed to do the right lookups even if you add
both original and store URL to the digest.
I Also noticed something disturbing that is not related to any of
store_url code but more to debug_options(maybe one of my debug
additions but it is not happening for almost anything else)
when and only when i'm fetching crossdomain.xml from most of the sites
I have tried like youtube domains while debug_options are at ALL,1 20,3
I'm getting some segmentation problem and squid dies and restarts.
Hmm. Worth checking separately and fixing outside the storeURL project
if you can identify what it is.
It's something that I have never saw until couple weeks ago.
the cache.log is here:
http://www1.ngtech.co.il/store_url/segmentation.txt
Eliezer