On 27/09/2012 4:09 p.m., Eliezer Croitoru wrote:
Well I was reading here and there the store code and other things then started testing some theories about how store_url can be and should be implemented.

If you do remember or not I refactored the source in couple places to use originalUrl and original_url to check all the places where original url is used and then decide where the store_url should be placed if needed.
(memobject->original_url and has memobject->store_url)

the main place of all I have seen that should be used with the store_url is the http://bazaar.launchpad.net/~squid/squid/3-trunk/view/head:/src/store_key_md5.cc#L144

which gets requests from couple places and mainly the setPublic for store entry
(Does bazaar has search option in it ?)

Doxygen provides all the references/referenced-by information.
http://www.squid-cache.org/Doc/code/store__key__md5_8cc.html#a5461ad8f7a969c8bec3008a70a74c42a


so I changed the code to use the store_url in a case it exists in the memobject instead of the "urlCanonical(request)". The result is good since it makes the predicted md5 hash for the url based on the store_url and not the original_url.

At this point I started looking up for more places that original_url might should be replaced since maybe there is a place I missed like when storing or fetching the file for any reason there is that I cant see and think of.

I placed a test file on my server called v.flv which is a static text file. I used a store_url helper that will use the arguments id itag and range (something I know and have script for) to rewrite the url. from 'http://www1.ngtech.co.il/v.flv?id=1000&itag=34&range=0-100&icu=0&tag1' into 'http://ngtech.squid.internal/id=1000&itag=34&range=0-100'

which I later will change the id itag and range on the request and check what happens in my extended logs.

so as expected:

2012/09/27 04:34:17 kid1| A candidate recived from Store URL rewriter accepted: http://ngtech.squid.internal/id=1000&itag=34&range=0-100 2012/09/27 04:34:17.157 kid1| storeKeypublicreuest could be with URL and method: GET http://ngtech.squid.internal/id=1000&itag=34&range=0-100 2012/09/27 04:34:17.157 kid1| storeKeypublicreuest with URL and method: GET http://ngtech.squid.internal/id=1000&itag=34&range=0-100 2012/09/27 04:34:17.157 kid1| storeKeypublicreuest was created using method and url from request 2012/09/27 04:34:17.157 kid1| storeGet: looking up 4F2F79AE5171037CBE02EA95ECD5B86C

the ^^^ hexdigest is what expected to be looked for.
instead of looking up for: 7F77FC394B370386D2E6AD9F8B748846
its looking for: 4F2F79AE5171037CBE02EA95ECD5B86C

which is a very good progress and the file was cached based on the store_url hash.

I started seeing somethings with large files so I reduced the file size to 702 bytes and then I will be able to look over things without too much stuff running because of the size.

a new session new file and other url which result a good 304 validation
and the web server logs shows 200 for the first and 304 for the second.
##access.log
1348715219.641 205 127.0.0.1 TCP_MISS/200 1088 GET http://www1.ngtech.co.il/v.flv?id=100&itag=34&ii=1 - HIER_DIRECT/79.181.248.214 video/x-flv 1348715278.783 11 127.0.0.1 TCP_REFRESH_UNMODIFIED/200 1096 GET http://www1.ngtech.co.il/v.flv?id=100&itag=34&ii=1 - HIER_DIRECT/79.181.248.214 video/x-flv

(you can skip this log or jump over what you know)
##cache.log
A candidate recived from Store URL rewriter accepted: http://ngtech.squid.internal/id=100&itag=34 storeKeypublicreuest could be with URL and method: GET http://ngtech.squid.internal/id=100&itag=34 storeKeypublicreuest with URL and method: GET http://ngtech.squid.internal/id=100&itag=34
storeKeypublicreuest was created using method and url from request
storeGet: looking up A5E995AF63D94AC8E7B40ACA255948F1
src/store_dir.cc(748) get: got in-transit entry: -1@-1=1/2/0/0
StoreEntry::lock: key 'A5E995AF63D94AC8E7B40ACA255948F1' count=1
storeCreateEntry: 'http://www1.ngtech.co.il/v.flv?id=100&itag=34&ii=1'
src/store.cc(395) StoreEntry: new StoreEntry 0x247c080
src/MemObject.cc(109) MemObject: new MemObject 0x2788050
storeKeyPrivate: GET http://www1.ngtech.co.il/v.flv?id=100&itag=34&ii=1
storeKeyPrivate was created using method and url is:
StoreEntry::hashInsert: Inserting Entry 0x247c080 key '56E7D8D797D8F666B16138EF1686E5B7'
StoreEntry::lock: key '56E7D8D797D8F666B16138EF1686E5B7' count=2
StoreEntry::lock: key '56E7D8D797D8F666B16138EF1686E5B7' count=3
src/store.cc(1677) originalUrl: A storeEntry::origianlUrl being used here using URL:http://www1.ngtech.co.il/v.flv?id=100&itag=34&ii=1 src/store.cc(1678) originalUrl: A storeEntry::origianlUrl Could be used with URL:http://ngtech.squid.internal/id=100&itag=34
StoreEntry::lock: key '56E7D8D797D8F666B16138EF1686E5B7' count=4
StoreEntry::unlock: key '56E7D8D797D8F666B16138EF1686E5B7' count=3
src/store.cc(1677) originalUrl: A storeEntry::origianlUrl being used here using URL:http://www1.ngtech.co.il/v.flv?id=100&itag=34&ii=1 src/store.cc(1678) originalUrl: A storeEntry::origianlUrl Could be used with URL:http://ngtech.squid.internal/id=100&itag=34 src/store.cc(1677) originalUrl: A storeEntry::origianlUrl being used here using URL:http://www1.ngtech.co.il/v.flv?id=100&itag=34&ii=1 src/store.cc(1678) originalUrl: A storeEntry::origianlUrl Could be used with URL:http://ngtech.squid.internal/id=100&itag=34 StoreEntry::replaceHttpReply: http://www1.ngtech.co.il/v.flv?id=100&itag=34&ii=1
ctx: enter level  0: 'http://www1.ngtech.co.il/v.flv?id=100&itag=34&ii=1'
StoreEntry::expireNow: '56E7D8D797D8F666B16138EF1686E5B7'
StoreEntry::setReleaseFlag: '56E7D8D797D8F666B16138EF1686E5B7'
ctx: exit level  0
StoreEntry::lock: key '56E7D8D797D8F666B16138EF1686E5B7' count=4
storeSwapOutFileClose: 56E7D8D797D8F666B16138EF1686E5B7 how=1
storeSwapOutFileClose: sio = 0
StoreEntry::unlock: key '56E7D8D797D8F666B16138EF1686E5B7' count=3
StoreEntry::unlock: key '56E7D8D797D8F666B16138EF1686E5B7' count=2
StoreEntry::lock: key 'A5E995AF63D94AC8E7B40ACA255948F1' count=2
StoreEntry::unlock: key '56E7D8D797D8F666B16138EF1686E5B7' count=1
StoreEntry::unlock: key '56E7D8D797D8F666B16138EF1686E5B7' count=0
storeRelease: Releasing: '56E7D8D797D8F666B16138EF1686E5B7'
src/store.cc(460) destroyStoreEntry: destroyStoreEntry: destroying 0x247c088
src/store.cc(438) destroyMemObject: destroyMemObject 0x2788050
src/MemObject.cc(132) ~MemObject: del MemObject 0x2788050
StoreEntry::unlock: key 'A5E995AF63D94AC8E7B40ACA255948F1' count=1
StoreEntry::unlock: key 'A5E995AF63D94AC8E7B40ACA255948F1' count=0


now comes the weird part:
I have tried using another url that should result in the same rewritten url: http://www1.ngtech.co.il/v.flv?id=100&itag=34&ii=11
which will result in: http://ngtech.squid.internal/id=100&itag=34

But while in the access.log it shows:
1348716354.472 115 127.0.0.1 TCP_HIT/200 1096 GET http://www1.ngtech.co.il/v.flv?id=100&itag=34&ii=11 - HIER_DIRECT/79.181.248.214 video/x-flv

the actual server log shows:
IP - - [27/Sep/2012:05:24:38 +0200] "GET /v.flv?id=100&itag=34&ii=11 HTTP/1.1" 200 702

So I looked at the cache.log to understand what happens that it shows HIT in one place but does another thing.

the full log is here: http://www1.ngtech.co.il/store_url/test3.txt
but I noticed one thing:

storeGet: looking up A5E995AF63D94AC8E7B40ACA255948F1
src/store_dir.cc(748) get: got in-transit entry: -1@-1=1/2/0/0
StoreEntry::lock: key 'A5E995AF63D94AC8E7B40ACA255948F1' count=1
clientProcessHit: URL mismatch, 'http://www1.ngtech.co.il/v.flv?id=100&itag=34&ii=1' != 'http://www1.ngtech.co.il/v.flv?id=100&itag=34&ii=11'
StoreEntry::unlock: key 'A5E995AF63D94AC8E7B40ACA255948F1' count=0
storeCreateEntry: 'http://www1.ngtech.co.il/v.flv?id=100&itag=34&ii=11'

So there is a case in squid which the log shows HIT but the request is being fetched from source since there is a mismatch between the url in the stored entry (regardless the hash match) and the actual request.

related code at:
http://bazaar.launchpad.net/~squid/squid/3-trunk/view/head:/src/client_side_reply.cc#L504


This is good for regular requests but not good for rewritten store_url since it's nature.
The options are:
1. store in meta data the store_url and using it while checking for a HIT if exists.
2. do not test for url mismatch in a case of store_url.

The above is the only blockage I until now saw in my testings.

I want to dig into the code more and see the exact reason and mechanism of this clientProcessHit and the surrounding related code (that actually fetch the file from the server).

This is the HTTP/1.1 variant handling code. Each request URL may have multiple response variants depending on the things indicated by Vary: header and possibly ETag header as a unique ID for each variant. How it is done in Squid is that the store contains a vary meta object as well as the normal file meta object.

The storeURL should point at the meta object, and then when loaded meta object points at other bits to combine in addition to the storeURL to make a new key for a second lookup which is expected to find the actual file object for that variant.

You should make these secondary keys use the storeURL instead of the original URL (only when the original URL was used, some variant may be looked up based on ETag or digest values alone).


Some other places which need it are:
  PURGE requets handling lookups
  ICP request lookups
  HTCP request lookups

Cache Digest is a little more tricky. Ignore it for now. The remote proxy will never be guaranteed to do the right lookups even if you add both original and store URL to the digest.


I Also noticed something disturbing that is not related to any of store_url code but more to debug_options(maybe one of my debug additions but it is not happening for almost anything else) when and only when i'm fetching crossdomain.xml from most of the sites I have tried like youtube domains while debug_options are at ALL,1 20,3
I'm getting some segmentation problem and squid dies and restarts.


Hmm. Worth checking separately and fixing outside the storeURL project if you can identify what it is.

It's something that I have never saw until couple weeks ago.
the cache.log is here: http://www1.ngtech.co.il/store_url/segmentation.txt


Eliezer

Reply via email to