Well I was reading here and there the store code and other things then started testing some theories about how store_url can be and should be implemented.

If you do remember or not I refactored the source in couple places to use originalUrl and original_url to check all the places where original url is used and then decide where the store_url should be placed if needed.
(memobject->original_url and has memobject->store_url)

the main place of all I have seen that should be used with the store_url is the http://bazaar.launchpad.net/~squid/squid/3-trunk/view/head:/src/store_key_md5.cc#L144

which gets requests from couple places and mainly the setPublic for store entry
(Does bazaar has search option in it ?)

so I changed the code to use the store_url in a case it exists in the memobject instead of the "urlCanonical(request)". The result is good since it makes the predicted md5 hash for the url based on the store_url and not the original_url.

At this point I started looking up for more places that original_url might should be replaced since maybe there is a place I missed like when storing or fetching the file for any reason there is that I cant see and think of.

I placed a test file on my server called v.flv which is a static text file.
I used a store_url helper that will use the arguments id itag and range (something I know and have script for) to rewrite the url. from 'http://www1.ngtech.co.il/v.flv?id=1000&itag=34&range=0-100&icu=0&tag1' into 'http://ngtech.squid.internal/id=1000&itag=34&range=0-100'

which I later will change the id itag and range on the request and check what happens in my extended logs.

so as expected:

2012/09/27 04:34:17 kid1| A candidate recived from Store URL rewriter accepted: http://ngtech.squid.internal/id=1000&itag=34&range=0-100 2012/09/27 04:34:17.157 kid1| storeKeypublicreuest could be with URL and method: GET http://ngtech.squid.internal/id=1000&itag=34&range=0-100 2012/09/27 04:34:17.157 kid1| storeKeypublicreuest with URL and method: GET http://ngtech.squid.internal/id=1000&itag=34&range=0-100 2012/09/27 04:34:17.157 kid1| storeKeypublicreuest was created using method and url from request 2012/09/27 04:34:17.157 kid1| storeGet: looking up 4F2F79AE5171037CBE02EA95ECD5B86C

the ^^^ hexdigest is what expected to be looked for.
instead of looking up for: 7F77FC394B370386D2E6AD9F8B748846
its looking for: 4F2F79AE5171037CBE02EA95ECD5B86C

which is a very good progress and the file was cached based on the store_url hash.

I started seeing somethings with large files so I reduced the file size to 702 bytes and then I will be able to look over things without too much stuff running because of the size.

a new session new file and other url which result a good 304 validation
and the web server logs shows 200 for the first and 304 for the second.
##access.log
1348715219.641 205 127.0.0.1 TCP_MISS/200 1088 GET http://www1.ngtech.co.il/v.flv?id=100&itag=34&ii=1 - HIER_DIRECT/79.181.248.214 video/x-flv 1348715278.783 11 127.0.0.1 TCP_REFRESH_UNMODIFIED/200 1096 GET http://www1.ngtech.co.il/v.flv?id=100&itag=34&ii=1 - HIER_DIRECT/79.181.248.214 video/x-flv

(you can skip this log or jump over what you know)
##cache.log
A candidate recived from Store URL rewriter accepted: http://ngtech.squid.internal/id=100&itag=34 storeKeypublicreuest could be with URL and method: GET http://ngtech.squid.internal/id=100&itag=34 storeKeypublicreuest with URL and method: GET http://ngtech.squid.internal/id=100&itag=34
storeKeypublicreuest was created using method and url from request
storeGet: looking up A5E995AF63D94AC8E7B40ACA255948F1
src/store_dir.cc(748) get: got in-transit entry: -1@-1=1/2/0/0
StoreEntry::lock: key 'A5E995AF63D94AC8E7B40ACA255948F1' count=1
storeCreateEntry: 'http://www1.ngtech.co.il/v.flv?id=100&itag=34&ii=1'
src/store.cc(395) StoreEntry: new StoreEntry 0x247c080
src/MemObject.cc(109) MemObject: new MemObject 0x2788050
storeKeyPrivate: GET http://www1.ngtech.co.il/v.flv?id=100&itag=34&ii=1
storeKeyPrivate was created using method and url is:
StoreEntry::hashInsert: Inserting Entry 0x247c080 key '56E7D8D797D8F666B16138EF1686E5B7'
StoreEntry::lock: key '56E7D8D797D8F666B16138EF1686E5B7' count=2
StoreEntry::lock: key '56E7D8D797D8F666B16138EF1686E5B7' count=3
src/store.cc(1677) originalUrl: A storeEntry::origianlUrl being used here using URL:http://www1.ngtech.co.il/v.flv?id=100&itag=34&ii=1 src/store.cc(1678) originalUrl: A storeEntry::origianlUrl Could be used with URL:http://ngtech.squid.internal/id=100&itag=34
StoreEntry::lock: key '56E7D8D797D8F666B16138EF1686E5B7' count=4
StoreEntry::unlock: key '56E7D8D797D8F666B16138EF1686E5B7' count=3
src/store.cc(1677) originalUrl: A storeEntry::origianlUrl being used here using URL:http://www1.ngtech.co.il/v.flv?id=100&itag=34&ii=1 src/store.cc(1678) originalUrl: A storeEntry::origianlUrl Could be used with URL:http://ngtech.squid.internal/id=100&itag=34 src/store.cc(1677) originalUrl: A storeEntry::origianlUrl being used here using URL:http://www1.ngtech.co.il/v.flv?id=100&itag=34&ii=1 src/store.cc(1678) originalUrl: A storeEntry::origianlUrl Could be used with URL:http://ngtech.squid.internal/id=100&itag=34 StoreEntry::replaceHttpReply: http://www1.ngtech.co.il/v.flv?id=100&itag=34&ii=1
ctx: enter level  0: 'http://www1.ngtech.co.il/v.flv?id=100&itag=34&ii=1'
StoreEntry::expireNow: '56E7D8D797D8F666B16138EF1686E5B7'
StoreEntry::setReleaseFlag: '56E7D8D797D8F666B16138EF1686E5B7'
ctx: exit level  0
StoreEntry::lock: key '56E7D8D797D8F666B16138EF1686E5B7' count=4
storeSwapOutFileClose: 56E7D8D797D8F666B16138EF1686E5B7 how=1
storeSwapOutFileClose: sio = 0
StoreEntry::unlock: key '56E7D8D797D8F666B16138EF1686E5B7' count=3
StoreEntry::unlock: key '56E7D8D797D8F666B16138EF1686E5B7' count=2
StoreEntry::lock: key 'A5E995AF63D94AC8E7B40ACA255948F1' count=2
StoreEntry::unlock: key '56E7D8D797D8F666B16138EF1686E5B7' count=1
StoreEntry::unlock: key '56E7D8D797D8F666B16138EF1686E5B7' count=0
storeRelease: Releasing: '56E7D8D797D8F666B16138EF1686E5B7'
src/store.cc(460) destroyStoreEntry: destroyStoreEntry: destroying 0x247c088
src/store.cc(438) destroyMemObject: destroyMemObject 0x2788050
src/MemObject.cc(132) ~MemObject: del MemObject 0x2788050
StoreEntry::unlock: key 'A5E995AF63D94AC8E7B40ACA255948F1' count=1
StoreEntry::unlock: key 'A5E995AF63D94AC8E7B40ACA255948F1' count=0


now comes the weird part:
I have tried using another url that should result in the same rewritten url: http://www1.ngtech.co.il/v.flv?id=100&itag=34&ii=11
which will result in: http://ngtech.squid.internal/id=100&itag=34

But while in the access.log it shows:
1348716354.472 115 127.0.0.1 TCP_HIT/200 1096 GET http://www1.ngtech.co.il/v.flv?id=100&itag=34&ii=11 - HIER_DIRECT/79.181.248.214 video/x-flv

the actual server log shows:
IP - - [27/Sep/2012:05:24:38 +0200] "GET /v.flv?id=100&itag=34&ii=11 HTTP/1.1" 200 702

So I looked at the cache.log to understand what happens that it shows HIT in one place but does another thing.

the full log is here: http://www1.ngtech.co.il/store_url/test3.txt
but I noticed one thing:

storeGet: looking up A5E995AF63D94AC8E7B40ACA255948F1
src/store_dir.cc(748) get: got in-transit entry: -1@-1=1/2/0/0
StoreEntry::lock: key 'A5E995AF63D94AC8E7B40ACA255948F1' count=1
clientProcessHit: URL mismatch, 'http://www1.ngtech.co.il/v.flv?id=100&itag=34&ii=1' != 'http://www1.ngtech.co.il/v.flv?id=100&itag=34&ii=11'
StoreEntry::unlock: key 'A5E995AF63D94AC8E7B40ACA255948F1' count=0
storeCreateEntry: 'http://www1.ngtech.co.il/v.flv?id=100&itag=34&ii=11'

So there is a case in squid which the log shows HIT but the request is being fetched from source since there is a mismatch between the url in the stored entry (regardless the hash match) and the actual request.

related code at:
http://bazaar.launchpad.net/~squid/squid/3-trunk/view/head:/src/client_side_reply.cc#L504


This is good for regular requests but not good for rewritten store_url since it's nature.
The options are:
1. store in meta data the store_url and using it while checking for a HIT if exists.
2. do not test for url mismatch in a case of store_url.

The above is the only blockage I until now saw in my testings.

I want to dig into the code more and see the exact reason and mechanism of this clientProcessHit and the surrounding related code (that actually fetch the file from the server).

I do know that the Client* code is using the request url since it's fetching the original request but I want to prevent it from doing it and since there is a store_url to ignore any mismatches there is to the url.

So thoughts and opinions are more then welcome.

If you do have question just ask me and I will clear out whatever needed and notice that the changes to the original code are very very small and can be understood in less then minutes.


I Also noticed something disturbing that is not related to any of store_url code but more to debug_options(maybe one of my debug additions but it is not happening for almost anything else) when and only when i'm fetching crossdomain.xml from most of the sites I have tried like youtube domains while debug_options are at ALL,1 20,3
I'm getting some segmentation problem and squid dies and restarts.

It's something that I have never saw until couple weeks ago.
the cache.log is here: http://www1.ngtech.co.il/store_url/segmentation.txt


Eliezer

Reply via email to