Re: [PATCH] Fixes to get more collapsed forwarding hits
On 06/24/2014 01:44 PM, Kinkie wrote: > Amos already +1-ed the patch, I have no insights to add so unless > someone speaks up real fast, we proceed. Committed to trunk as r13476 and r13477. Thank you, Alex. > On Tue, Jun 24, 2014 at 9:29 PM, Alex Rousskov > wrote: >> On 06/24/2014 12:55 PM, Kinkie wrote: >> >> >>> Did you run coadvisor and polygraph against it? >> >> Co-Advisor does not have collapsed forwarding cases (there are no >> explicit RFC 2616 MUSTs that cover CF although some cases can be >> certainly added to test some MUSTs in a CF context). >> >> Polygraph can be used to simulate CF-heavy environment, but all you will >> get from this patch is an improved hit ratio. It is not going to tell >> you which collapsing cases Squid have missed. >> >> >>> If not, and if the branch is public, it's trivial tor un them against >>> it now (thanks TMF, thanks Pavel!). It doesn't guarantee to catch all >>> cases, but at least it should raise confidence. >> >> There is currently no branch for this -- just a trunk patch. A bigger >> problem is that you will need a different Polygraph workload -- the one >> we use for Squid regression testing via Jenkins is unlikely to produce a >> lot of CF opportunities. >> >> Overall, while I would certainly welcome Co-Advisor and Polygraph >> results, I decided to rely on existing trunk regression testing and not >> spend more time on organizing custom pre-commit tests. If somebody wants >> to volunteer to test this, please let me know. >> >> >> Thank you, >> >> Alex. >> >> >>> On Tue, Jun 24, 2014 at 7:37 PM, Alex Rousskov >>> wrote: On 06/23/2014 11:58 PM, Amos Jeffries wrote: > On 24/06/2014 4:07 p.m., Alex Rousskov wrote: >> * Do not abandon writing a collapsed cache entry when we cannot cache >> the entry in RAM if the entry can be cached on disk instead. Both shared >> memory cache and the disk cache have to refuse to cache the entry for it >> to become non-collapsible. This dual refusal is difficult to detect >> because each cache may make the caching decision at different times. >> Added StoreEntry methods to track those decisions and react to them. > Hmm. FTR: IMHO the only reasons a response should be non-collapsible is > if it is a) private, b) authenticated to another user, c) explicitly > known non-cacheable, d) a non-matching variant of the URL, or e) we > already discarded some body bytes. Yes, there are many "only" reasons why a we may not be able to collapse what looked like a collapsible transaction at the beginning. The bullet we are discussing fixes a case where the Squid shared memory caching code said "if we cannot cache this entry in the shared memory cache, then we cannot collapse onto this entry". That was wrong; we should still be able to collapse if the disk cache stores the entry. > The factor of memory/disk caches not having provided cacheable size > allow/reject result yet is no reason to be non-collapsible. It is a > reason to wait on that result to determine the cacheability condition (c). Yes, kind of. Today, Squid treats "HTTP cachable" and "storable in a Squid cache" as two rather different things. The two conditions are checked and managed in different code places and at different times (with some overlap). And some of those decisions are unrelated to the entry size. However, they are essentially the same from user-perceived caching point of view and can be listed under a single "explicitly known non-cacheable" bullet if one is careful to interpret that correctly. > +1. Although I do not know enough about store yet to do much in-depth > audit of the logic changes. FWIW, knowing "enough" about Store makes me even less confident about changes like these ones. This is very tricky stuff not just because of the complexity of the features involved (caching, collapsing, etc.) but because there are very few simple invariants in Store. I rarely know for sure what I am going to break when changing this stuff. The code is getting better, and I am trying to document the invariants I find, but it is a very slow process, and there will be many more bugs, possibly from this patch as well. Cheers, Alex. >>> >>> >>> >> > > >
Re: [PATCH] Fixes to get more collapsed forwarding hits
Amos already +1-ed the patch, I have no insights to add so unless someone speaks up real fast, we proceed. On Tue, Jun 24, 2014 at 9:29 PM, Alex Rousskov wrote: > On 06/24/2014 12:55 PM, Kinkie wrote: > > >> Did you run coadvisor and polygraph against it? > > Co-Advisor does not have collapsed forwarding cases (there are no > explicit RFC 2616 MUSTs that cover CF although some cases can be > certainly added to test some MUSTs in a CF context). > > Polygraph can be used to simulate CF-heavy environment, but all you will > get from this patch is an improved hit ratio. It is not going to tell > you which collapsing cases Squid have missed. > > >> If not, and if the branch is public, it's trivial tor un them against >> it now (thanks TMF, thanks Pavel!). It doesn't guarantee to catch all >> cases, but at least it should raise confidence. > > There is currently no branch for this -- just a trunk patch. A bigger > problem is that you will need a different Polygraph workload -- the one > we use for Squid regression testing via Jenkins is unlikely to produce a > lot of CF opportunities. > > Overall, while I would certainly welcome Co-Advisor and Polygraph > results, I decided to rely on existing trunk regression testing and not > spend more time on organizing custom pre-commit tests. If somebody wants > to volunteer to test this, please let me know. > > > Thank you, > > Alex. > > >> On Tue, Jun 24, 2014 at 7:37 PM, Alex Rousskov >> wrote: >>> On 06/23/2014 11:58 PM, Amos Jeffries wrote: On 24/06/2014 4:07 p.m., Alex Rousskov wrote: > * Do not abandon writing a collapsed cache entry when we cannot cache > the entry in RAM if the entry can be cached on disk instead. Both shared > memory cache and the disk cache have to refuse to cache the entry for it > to become non-collapsible. This dual refusal is difficult to detect > because each cache may make the caching decision at different times. > Added StoreEntry methods to track those decisions and react to them. >>> >>> Hmm. FTR: IMHO the only reasons a response should be non-collapsible is if it is a) private, b) authenticated to another user, c) explicitly known non-cacheable, d) a non-matching variant of the URL, or e) we already discarded some body bytes. >>> >>> Yes, there are many "only" reasons why a we may not be able to collapse >>> what looked like a collapsible transaction at the beginning. The bullet >>> we are discussing fixes a case where the Squid shared memory caching >>> code said "if we cannot cache this entry in the shared memory cache, >>> then we cannot collapse onto this entry". That was wrong; we should >>> still be able to collapse if the disk cache stores the entry. >>> >>> The factor of memory/disk caches not having provided cacheable size allow/reject result yet is no reason to be non-collapsible. It is a reason to wait on that result to determine the cacheability condition (c). >>> >>> Yes, kind of. Today, Squid treats "HTTP cachable" and "storable in a >>> Squid cache" as two rather different things. The two conditions are >>> checked and managed in different code places and at different times >>> (with some overlap). And some of those decisions are unrelated to the >>> entry size. However, they are essentially the same from user-perceived >>> caching point of view and can be listed under a single "explicitly known >>> non-cacheable" bullet if one is careful to interpret that correctly. >>> >>> +1. Although I do not know enough about store yet to do much in-depth audit of the logic changes. >>> >>> FWIW, knowing "enough" about Store makes me even less confident about >>> changes like these ones. This is very tricky stuff not just because of >>> the complexity of the features involved (caching, collapsing, etc.) but >>> because there are very few simple invariants in Store. I rarely know for >>> sure what I am going to break when changing this stuff. The code is >>> getting better, and I am trying to document the invariants I find, but >>> it is a very slow process, and there will be many more bugs, possibly >>> from this patch as well. >>> >>> >>> Cheers, >>> >>> Alex. >>> >> >> >> > -- Francesco
Re: [PATCH] Fixes to get more collapsed forwarding hits
On 06/24/2014 12:55 PM, Kinkie wrote: > Did you run coadvisor and polygraph against it? Co-Advisor does not have collapsed forwarding cases (there are no explicit RFC 2616 MUSTs that cover CF although some cases can be certainly added to test some MUSTs in a CF context). Polygraph can be used to simulate CF-heavy environment, but all you will get from this patch is an improved hit ratio. It is not going to tell you which collapsing cases Squid have missed. > If not, and if the branch is public, it's trivial tor un them against > it now (thanks TMF, thanks Pavel!). It doesn't guarantee to catch all > cases, but at least it should raise confidence. There is currently no branch for this -- just a trunk patch. A bigger problem is that you will need a different Polygraph workload -- the one we use for Squid regression testing via Jenkins is unlikely to produce a lot of CF opportunities. Overall, while I would certainly welcome Co-Advisor and Polygraph results, I decided to rely on existing trunk regression testing and not spend more time on organizing custom pre-commit tests. If somebody wants to volunteer to test this, please let me know. Thank you, Alex. > On Tue, Jun 24, 2014 at 7:37 PM, Alex Rousskov > wrote: >> On 06/23/2014 11:58 PM, Amos Jeffries wrote: >>> On 24/06/2014 4:07 p.m., Alex Rousskov wrote: * Do not abandon writing a collapsed cache entry when we cannot cache the entry in RAM if the entry can be cached on disk instead. Both shared memory cache and the disk cache have to refuse to cache the entry for it to become non-collapsible. This dual refusal is difficult to detect because each cache may make the caching decision at different times. Added StoreEntry methods to track those decisions and react to them. >> >> >>> Hmm. FTR: IMHO the only reasons a response should be non-collapsible is >>> if it is a) private, b) authenticated to another user, c) explicitly >>> known non-cacheable, d) a non-matching variant of the URL, or e) we >>> already discarded some body bytes. >> >> Yes, there are many "only" reasons why a we may not be able to collapse >> what looked like a collapsible transaction at the beginning. The bullet >> we are discussing fixes a case where the Squid shared memory caching >> code said "if we cannot cache this entry in the shared memory cache, >> then we cannot collapse onto this entry". That was wrong; we should >> still be able to collapse if the disk cache stores the entry. >> >> >>> The factor of memory/disk caches not having provided cacheable size >>> allow/reject result yet is no reason to be non-collapsible. It is a >>> reason to wait on that result to determine the cacheability condition (c). >> >> Yes, kind of. Today, Squid treats "HTTP cachable" and "storable in a >> Squid cache" as two rather different things. The two conditions are >> checked and managed in different code places and at different times >> (with some overlap). And some of those decisions are unrelated to the >> entry size. However, they are essentially the same from user-perceived >> caching point of view and can be listed under a single "explicitly known >> non-cacheable" bullet if one is careful to interpret that correctly. >> >> >>> +1. Although I do not know enough about store yet to do much in-depth >>> audit of the logic changes. >> >> FWIW, knowing "enough" about Store makes me even less confident about >> changes like these ones. This is very tricky stuff not just because of >> the complexity of the features involved (caching, collapsing, etc.) but >> because there are very few simple invariants in Store. I rarely know for >> sure what I am going to break when changing this stuff. The code is >> getting better, and I am trying to document the invariants I find, but >> it is a very slow process, and there will be many more bugs, possibly >> from this patch as well. >> >> >> Cheers, >> >> Alex. >> > > >
Re: [PATCH] Fixes to get more collapsed forwarding hits
We can't know except by hammering it. Did you run coadvisor and polygraph against it? If not, and if the branch is public, it's trivial tor un them against it now (thanks TMF, thanks Pavel!). It doesn't guarantee to catch all cases, but at least it should raise confidence. On Tue, Jun 24, 2014 at 7:37 PM, Alex Rousskov wrote: > On 06/23/2014 11:58 PM, Amos Jeffries wrote: >> On 24/06/2014 4:07 p.m., Alex Rousskov wrote: >>> * Do not abandon writing a collapsed cache entry when we cannot cache >>> the entry in RAM if the entry can be cached on disk instead. Both shared >>> memory cache and the disk cache have to refuse to cache the entry for it >>> to become non-collapsible. This dual refusal is difficult to detect >>> because each cache may make the caching decision at different times. >>> Added StoreEntry methods to track those decisions and react to them. > > >> Hmm. FTR: IMHO the only reasons a response should be non-collapsible is >> if it is a) private, b) authenticated to another user, c) explicitly >> known non-cacheable, d) a non-matching variant of the URL, or e) we >> already discarded some body bytes. > > Yes, there are many "only" reasons why a we may not be able to collapse > what looked like a collapsible transaction at the beginning. The bullet > we are discussing fixes a case where the Squid shared memory caching > code said "if we cannot cache this entry in the shared memory cache, > then we cannot collapse onto this entry". That was wrong; we should > still be able to collapse if the disk cache stores the entry. > > >> The factor of memory/disk caches not having provided cacheable size >> allow/reject result yet is no reason to be non-collapsible. It is a >> reason to wait on that result to determine the cacheability condition (c). > > Yes, kind of. Today, Squid treats "HTTP cachable" and "storable in a > Squid cache" as two rather different things. The two conditions are > checked and managed in different code places and at different times > (with some overlap). And some of those decisions are unrelated to the > entry size. However, they are essentially the same from user-perceived > caching point of view and can be listed under a single "explicitly known > non-cacheable" bullet if one is careful to interpret that correctly. > > >> +1. Although I do not know enough about store yet to do much in-depth >> audit of the logic changes. > > FWIW, knowing "enough" about Store makes me even less confident about > changes like these ones. This is very tricky stuff not just because of > the complexity of the features involved (caching, collapsing, etc.) but > because there are very few simple invariants in Store. I rarely know for > sure what I am going to break when changing this stuff. The code is > getting better, and I am trying to document the invariants I find, but > it is a very slow process, and there will be many more bugs, possibly > from this patch as well. > > > Cheers, > > Alex. > -- Francesco
Re: [PATCH] Fixes to get more collapsed forwarding hits
On 06/23/2014 11:58 PM, Amos Jeffries wrote: > On 24/06/2014 4:07 p.m., Alex Rousskov wrote: >> * Do not abandon writing a collapsed cache entry when we cannot cache >> the entry in RAM if the entry can be cached on disk instead. Both shared >> memory cache and the disk cache have to refuse to cache the entry for it >> to become non-collapsible. This dual refusal is difficult to detect >> because each cache may make the caching decision at different times. >> Added StoreEntry methods to track those decisions and react to them. > Hmm. FTR: IMHO the only reasons a response should be non-collapsible is > if it is a) private, b) authenticated to another user, c) explicitly > known non-cacheable, d) a non-matching variant of the URL, or e) we > already discarded some body bytes. Yes, there are many "only" reasons why a we may not be able to collapse what looked like a collapsible transaction at the beginning. The bullet we are discussing fixes a case where the Squid shared memory caching code said "if we cannot cache this entry in the shared memory cache, then we cannot collapse onto this entry". That was wrong; we should still be able to collapse if the disk cache stores the entry. > The factor of memory/disk caches not having provided cacheable size > allow/reject result yet is no reason to be non-collapsible. It is a > reason to wait on that result to determine the cacheability condition (c). Yes, kind of. Today, Squid treats "HTTP cachable" and "storable in a Squid cache" as two rather different things. The two conditions are checked and managed in different code places and at different times (with some overlap). And some of those decisions are unrelated to the entry size. However, they are essentially the same from user-perceived caching point of view and can be listed under a single "explicitly known non-cacheable" bullet if one is careful to interpret that correctly. > +1. Although I do not know enough about store yet to do much in-depth > audit of the logic changes. FWIW, knowing "enough" about Store makes me even less confident about changes like these ones. This is very tricky stuff not just because of the complexity of the features involved (caching, collapsing, etc.) but because there are very few simple invariants in Store. I rarely know for sure what I am going to break when changing this stuff. The code is getting better, and I am trying to document the invariants I find, but it is a very slow process, and there will be many more bugs, possibly from this patch as well. Cheers, Alex.
Re: [PATCH] Fixes to get more collapsed forwarding hits
On 24/06/2014 4:07 p.m., Alex Rousskov wrote: > Hello, > > The attached patch contains several changes to improve the > probability of getting a collapsed forwarding hit: > > * Allow STORE_MEMORY_CLIENTs to open disk files if needed and possible. > STORE_*_CLIENT designation is rather buggy (several known XXXs). Some > collapsed clients are marked as STORE_MEMORY_CLIENTs (for the lack of > info at determination time) but their hit content may actually come from > a disk cache. > > * Do not abandon writing a collapsed cache entry when we cannot cache > the entry in RAM if the entry can be cached on disk instead. Both shared > memory cache and the disk cache have to refuse to cache the entry for it > to become non-collapsible. This dual refusal is difficult to detect > because each cache may make the caching decision at different times. > Added StoreEntry methods to track those decisions and react to them. Hmm. FTR: IMHO the only reasons a response should be non-collapsible is if it is a) private, b) authenticated to another user, c) explicitly known non-cacheable, d) a non-matching variant of the URL, or e) we already discarded some body bytes. The factor of memory/disk caches not having provided cacheable size allow/reject result yet is no reason to be non-collapsible. It is a reason to wait on that result to determine the cacheability condition (c). > > * Recognize disk cache as a potential source of the collapsed entry when > the memory cache is configured. While collapsed entries would normally > be found in the shared memory cache, caching policies and other factors > may prohibit memory caching but still allow disk caching. Memory cache > is still preferred. > > * Do not use unknown entry size in StoreEntry::checkTooSmall() > determination. The size of collapsed entries is often unknown, even when > they are STORE_OK (because swap_hdr_sz is unknown when the other worker > has created the cache entry). The same code has been using this > ignore-unknowns logic for the Content-Length header value, so the > rejection of unknown entry size (added as a part of C++ conversion > without a dedicated message in r5766) could have been a typo. > > > If approved, I will commit the change for the last bullet separately > because it is easier to isolate and is less specific to collapsed > forwarding. > > > The patch does not attempt to cover all possible collapsing cases that > Squid currently misses, of course. More work remains in this area > (including a serious STORE_*_CLIENT redesign), but the proposed fixes > should make a dent. > > These changes were developed for an earlier trunk version but the > trunk-ported patch posted here also passed simple lab tests. > > > Thank you, > > Alex. > +1. Although I do not know enough about store yet to do much in-depth audit of the logic changes. Amos