Re: [PATCH] Fixes to get more collapsed forwarding hits

2014-06-24 Thread Alex Rousskov
On 06/24/2014 01:44 PM, Kinkie wrote:
> Amos already +1-ed the patch, I have no insights to add so unless
> someone speaks up real fast, we proceed.

Committed to trunk as r13476 and r13477.


Thank you,

Alex.



> On Tue, Jun 24, 2014 at 9:29 PM, Alex Rousskov
>  wrote:
>> On 06/24/2014 12:55 PM, Kinkie wrote:
>>
>>
>>> Did you run coadvisor and polygraph against it?
>>
>> Co-Advisor does not have collapsed forwarding cases (there are no
>> explicit RFC 2616 MUSTs that cover CF although some cases can be
>> certainly added to test some MUSTs in a CF context).
>>
>> Polygraph can be used to simulate CF-heavy environment, but all you will
>> get from this patch is an improved hit ratio. It is not going to tell
>> you which collapsing cases Squid have missed.
>>
>>
>>> If not, and if the branch is public, it's trivial tor un them against
>>> it now (thanks TMF, thanks Pavel!). It doesn't guarantee to catch all
>>> cases, but at least it should raise confidence.
>>
>> There is currently no branch for this -- just a trunk patch. A bigger
>> problem is that you will need a different Polygraph workload -- the one
>> we use for Squid regression testing via Jenkins is unlikely to produce a
>> lot of CF opportunities.
>>
>> Overall, while I would certainly welcome Co-Advisor and Polygraph
>> results, I decided to rely on existing trunk regression testing and not
>> spend more time on organizing custom pre-commit tests. If somebody wants
>> to volunteer to test this, please let me know.
>>
>>
>> Thank you,
>>
>> Alex.
>>
>>
>>> On Tue, Jun 24, 2014 at 7:37 PM, Alex Rousskov
>>>  wrote:
 On 06/23/2014 11:58 PM, Amos Jeffries wrote:
> On 24/06/2014 4:07 p.m., Alex Rousskov wrote:
>> * Do not abandon writing a collapsed cache entry when we cannot cache
>> the entry in RAM if the entry can be cached on disk instead. Both shared
>> memory cache and the disk cache have to refuse to cache the entry for it
>> to become non-collapsible. This dual refusal is difficult to detect
>> because each cache may make the caching decision at different times.
>> Added StoreEntry methods to track those decisions and react to them.


> Hmm. FTR: IMHO the only reasons a response should be non-collapsible is
> if it is a) private, b) authenticated to another user, c) explicitly
> known non-cacheable, d) a non-matching variant of the URL, or e) we
> already discarded some body bytes.

 Yes, there are many "only" reasons why a we may not be able to collapse
 what looked like a collapsible transaction at the beginning. The bullet
 we are discussing fixes a case where the Squid shared memory caching
 code said "if we cannot cache this entry in the shared memory cache,
 then we cannot collapse onto this entry". That was wrong; we should
 still be able to collapse if the disk cache stores the entry.


> The factor of memory/disk caches not having provided cacheable size
> allow/reject result yet is no reason to be non-collapsible. It is a
> reason to wait on that result to determine the cacheability condition (c).

 Yes, kind of. Today, Squid treats "HTTP cachable" and "storable in a
 Squid cache" as two rather different things. The two conditions are
 checked and managed in different code places and at different times
 (with some overlap). And some of those decisions are unrelated to the
 entry size. However, they are essentially the same from user-perceived
 caching point of view and can be listed under a single "explicitly known
 non-cacheable" bullet if one is careful to interpret that correctly.


> +1. Although I do not know enough about store yet to do much in-depth
> audit of the logic changes.

 FWIW, knowing "enough" about Store makes me even less confident about
 changes like these ones. This is very tricky stuff not just because of
 the complexity of the features involved (caching, collapsing, etc.) but
 because there are very few simple invariants in Store. I rarely know for
 sure what I am going to break when changing this stuff. The code is
 getting better, and I am trying to document the invariants I find, but
 it is a very slow process, and there will be many more bugs, possibly
 from this patch as well.


 Cheers,

 Alex.

>>>
>>>
>>>
>>
> 
> 
> 



Re: [PATCH] Fixes to get more collapsed forwarding hits

2014-06-24 Thread Kinkie
Amos already +1-ed the patch, I have no insights to add so unless
someone speaks up real fast, we proceed.

On Tue, Jun 24, 2014 at 9:29 PM, Alex Rousskov
 wrote:
> On 06/24/2014 12:55 PM, Kinkie wrote:
>
>
>> Did you run coadvisor and polygraph against it?
>
> Co-Advisor does not have collapsed forwarding cases (there are no
> explicit RFC 2616 MUSTs that cover CF although some cases can be
> certainly added to test some MUSTs in a CF context).
>
> Polygraph can be used to simulate CF-heavy environment, but all you will
> get from this patch is an improved hit ratio. It is not going to tell
> you which collapsing cases Squid have missed.
>
>
>> If not, and if the branch is public, it's trivial tor un them against
>> it now (thanks TMF, thanks Pavel!). It doesn't guarantee to catch all
>> cases, but at least it should raise confidence.
>
> There is currently no branch for this -- just a trunk patch. A bigger
> problem is that you will need a different Polygraph workload -- the one
> we use for Squid regression testing via Jenkins is unlikely to produce a
> lot of CF opportunities.
>
> Overall, while I would certainly welcome Co-Advisor and Polygraph
> results, I decided to rely on existing trunk regression testing and not
> spend more time on organizing custom pre-commit tests. If somebody wants
> to volunteer to test this, please let me know.
>
>
> Thank you,
>
> Alex.
>
>
>> On Tue, Jun 24, 2014 at 7:37 PM, Alex Rousskov
>>  wrote:
>>> On 06/23/2014 11:58 PM, Amos Jeffries wrote:
 On 24/06/2014 4:07 p.m., Alex Rousskov wrote:
> * Do not abandon writing a collapsed cache entry when we cannot cache
> the entry in RAM if the entry can be cached on disk instead. Both shared
> memory cache and the disk cache have to refuse to cache the entry for it
> to become non-collapsible. This dual refusal is difficult to detect
> because each cache may make the caching decision at different times.
> Added StoreEntry methods to track those decisions and react to them.
>>>
>>>
 Hmm. FTR: IMHO the only reasons a response should be non-collapsible is
 if it is a) private, b) authenticated to another user, c) explicitly
 known non-cacheable, d) a non-matching variant of the URL, or e) we
 already discarded some body bytes.
>>>
>>> Yes, there are many "only" reasons why a we may not be able to collapse
>>> what looked like a collapsible transaction at the beginning. The bullet
>>> we are discussing fixes a case where the Squid shared memory caching
>>> code said "if we cannot cache this entry in the shared memory cache,
>>> then we cannot collapse onto this entry". That was wrong; we should
>>> still be able to collapse if the disk cache stores the entry.
>>>
>>>
 The factor of memory/disk caches not having provided cacheable size
 allow/reject result yet is no reason to be non-collapsible. It is a
 reason to wait on that result to determine the cacheability condition (c).
>>>
>>> Yes, kind of. Today, Squid treats "HTTP cachable" and "storable in a
>>> Squid cache" as two rather different things. The two conditions are
>>> checked and managed in different code places and at different times
>>> (with some overlap). And some of those decisions are unrelated to the
>>> entry size. However, they are essentially the same from user-perceived
>>> caching point of view and can be listed under a single "explicitly known
>>> non-cacheable" bullet if one is careful to interpret that correctly.
>>>
>>>
 +1. Although I do not know enough about store yet to do much in-depth
 audit of the logic changes.
>>>
>>> FWIW, knowing "enough" about Store makes me even less confident about
>>> changes like these ones. This is very tricky stuff not just because of
>>> the complexity of the features involved (caching, collapsing, etc.) but
>>> because there are very few simple invariants in Store. I rarely know for
>>> sure what I am going to break when changing this stuff. The code is
>>> getting better, and I am trying to document the invariants I find, but
>>> it is a very slow process, and there will be many more bugs, possibly
>>> from this patch as well.
>>>
>>>
>>> Cheers,
>>>
>>> Alex.
>>>
>>
>>
>>
>



-- 
Francesco


Re: [PATCH] Fixes to get more collapsed forwarding hits

2014-06-24 Thread Alex Rousskov
On 06/24/2014 12:55 PM, Kinkie wrote:


> Did you run coadvisor and polygraph against it?

Co-Advisor does not have collapsed forwarding cases (there are no
explicit RFC 2616 MUSTs that cover CF although some cases can be
certainly added to test some MUSTs in a CF context).

Polygraph can be used to simulate CF-heavy environment, but all you will
get from this patch is an improved hit ratio. It is not going to tell
you which collapsing cases Squid have missed.


> If not, and if the branch is public, it's trivial tor un them against
> it now (thanks TMF, thanks Pavel!). It doesn't guarantee to catch all
> cases, but at least it should raise confidence.

There is currently no branch for this -- just a trunk patch. A bigger
problem is that you will need a different Polygraph workload -- the one
we use for Squid regression testing via Jenkins is unlikely to produce a
lot of CF opportunities.

Overall, while I would certainly welcome Co-Advisor and Polygraph
results, I decided to rely on existing trunk regression testing and not
spend more time on organizing custom pre-commit tests. If somebody wants
to volunteer to test this, please let me know.


Thank you,

Alex.


> On Tue, Jun 24, 2014 at 7:37 PM, Alex Rousskov
>  wrote:
>> On 06/23/2014 11:58 PM, Amos Jeffries wrote:
>>> On 24/06/2014 4:07 p.m., Alex Rousskov wrote:
 * Do not abandon writing a collapsed cache entry when we cannot cache
 the entry in RAM if the entry can be cached on disk instead. Both shared
 memory cache and the disk cache have to refuse to cache the entry for it
 to become non-collapsible. This dual refusal is difficult to detect
 because each cache may make the caching decision at different times.
 Added StoreEntry methods to track those decisions and react to them.
>>
>>
>>> Hmm. FTR: IMHO the only reasons a response should be non-collapsible is
>>> if it is a) private, b) authenticated to another user, c) explicitly
>>> known non-cacheable, d) a non-matching variant of the URL, or e) we
>>> already discarded some body bytes.
>>
>> Yes, there are many "only" reasons why a we may not be able to collapse
>> what looked like a collapsible transaction at the beginning. The bullet
>> we are discussing fixes a case where the Squid shared memory caching
>> code said "if we cannot cache this entry in the shared memory cache,
>> then we cannot collapse onto this entry". That was wrong; we should
>> still be able to collapse if the disk cache stores the entry.
>>
>>
>>> The factor of memory/disk caches not having provided cacheable size
>>> allow/reject result yet is no reason to be non-collapsible. It is a
>>> reason to wait on that result to determine the cacheability condition (c).
>>
>> Yes, kind of. Today, Squid treats "HTTP cachable" and "storable in a
>> Squid cache" as two rather different things. The two conditions are
>> checked and managed in different code places and at different times
>> (with some overlap). And some of those decisions are unrelated to the
>> entry size. However, they are essentially the same from user-perceived
>> caching point of view and can be listed under a single "explicitly known
>> non-cacheable" bullet if one is careful to interpret that correctly.
>>
>>
>>> +1. Although I do not know enough about store yet to do much in-depth
>>> audit of the logic changes.
>>
>> FWIW, knowing "enough" about Store makes me even less confident about
>> changes like these ones. This is very tricky stuff not just because of
>> the complexity of the features involved (caching, collapsing, etc.) but
>> because there are very few simple invariants in Store. I rarely know for
>> sure what I am going to break when changing this stuff. The code is
>> getting better, and I am trying to document the invariants I find, but
>> it is a very slow process, and there will be many more bugs, possibly
>> from this patch as well.
>>
>>
>> Cheers,
>>
>> Alex.
>>
> 
> 
> 



Re: [PATCH] Fixes to get more collapsed forwarding hits

2014-06-24 Thread Kinkie
We can't know except by hammering it.
Did you run coadvisor and polygraph against it?
If not, and if the branch is public, it's trivial tor un them against
it now (thanks TMF, thanks Pavel!). It doesn't guarantee to catch all
cases, but at least it should raise confidence.

On Tue, Jun 24, 2014 at 7:37 PM, Alex Rousskov
 wrote:
> On 06/23/2014 11:58 PM, Amos Jeffries wrote:
>> On 24/06/2014 4:07 p.m., Alex Rousskov wrote:
>>> * Do not abandon writing a collapsed cache entry when we cannot cache
>>> the entry in RAM if the entry can be cached on disk instead. Both shared
>>> memory cache and the disk cache have to refuse to cache the entry for it
>>> to become non-collapsible. This dual refusal is difficult to detect
>>> because each cache may make the caching decision at different times.
>>> Added StoreEntry methods to track those decisions and react to them.
>
>
>> Hmm. FTR: IMHO the only reasons a response should be non-collapsible is
>> if it is a) private, b) authenticated to another user, c) explicitly
>> known non-cacheable, d) a non-matching variant of the URL, or e) we
>> already discarded some body bytes.
>
> Yes, there are many "only" reasons why a we may not be able to collapse
> what looked like a collapsible transaction at the beginning. The bullet
> we are discussing fixes a case where the Squid shared memory caching
> code said "if we cannot cache this entry in the shared memory cache,
> then we cannot collapse onto this entry". That was wrong; we should
> still be able to collapse if the disk cache stores the entry.
>
>
>> The factor of memory/disk caches not having provided cacheable size
>> allow/reject result yet is no reason to be non-collapsible. It is a
>> reason to wait on that result to determine the cacheability condition (c).
>
> Yes, kind of. Today, Squid treats "HTTP cachable" and "storable in a
> Squid cache" as two rather different things. The two conditions are
> checked and managed in different code places and at different times
> (with some overlap). And some of those decisions are unrelated to the
> entry size. However, they are essentially the same from user-perceived
> caching point of view and can be listed under a single "explicitly known
> non-cacheable" bullet if one is careful to interpret that correctly.
>
>
>> +1. Although I do not know enough about store yet to do much in-depth
>> audit of the logic changes.
>
> FWIW, knowing "enough" about Store makes me even less confident about
> changes like these ones. This is very tricky stuff not just because of
> the complexity of the features involved (caching, collapsing, etc.) but
> because there are very few simple invariants in Store. I rarely know for
> sure what I am going to break when changing this stuff. The code is
> getting better, and I am trying to document the invariants I find, but
> it is a very slow process, and there will be many more bugs, possibly
> from this patch as well.
>
>
> Cheers,
>
> Alex.
>



-- 
Francesco


Re: [PATCH] Fixes to get more collapsed forwarding hits

2014-06-24 Thread Alex Rousskov
On 06/23/2014 11:58 PM, Amos Jeffries wrote:
> On 24/06/2014 4:07 p.m., Alex Rousskov wrote:
>> * Do not abandon writing a collapsed cache entry when we cannot cache
>> the entry in RAM if the entry can be cached on disk instead. Both shared
>> memory cache and the disk cache have to refuse to cache the entry for it
>> to become non-collapsible. This dual refusal is difficult to detect
>> because each cache may make the caching decision at different times.
>> Added StoreEntry methods to track those decisions and react to them.


> Hmm. FTR: IMHO the only reasons a response should be non-collapsible is
> if it is a) private, b) authenticated to another user, c) explicitly
> known non-cacheable, d) a non-matching variant of the URL, or e) we
> already discarded some body bytes.

Yes, there are many "only" reasons why a we may not be able to collapse
what looked like a collapsible transaction at the beginning. The bullet
we are discussing fixes a case where the Squid shared memory caching
code said "if we cannot cache this entry in the shared memory cache,
then we cannot collapse onto this entry". That was wrong; we should
still be able to collapse if the disk cache stores the entry.


> The factor of memory/disk caches not having provided cacheable size
> allow/reject result yet is no reason to be non-collapsible. It is a
> reason to wait on that result to determine the cacheability condition (c).

Yes, kind of. Today, Squid treats "HTTP cachable" and "storable in a
Squid cache" as two rather different things. The two conditions are
checked and managed in different code places and at different times
(with some overlap). And some of those decisions are unrelated to the
entry size. However, they are essentially the same from user-perceived
caching point of view and can be listed under a single "explicitly known
non-cacheable" bullet if one is careful to interpret that correctly.


> +1. Although I do not know enough about store yet to do much in-depth
> audit of the logic changes.

FWIW, knowing "enough" about Store makes me even less confident about
changes like these ones. This is very tricky stuff not just because of
the complexity of the features involved (caching, collapsing, etc.) but
because there are very few simple invariants in Store. I rarely know for
sure what I am going to break when changing this stuff. The code is
getting better, and I am trying to document the invariants I find, but
it is a very slow process, and there will be many more bugs, possibly
from this patch as well.


Cheers,

Alex.



Re: [PATCH] Fixes to get more collapsed forwarding hits

2014-06-23 Thread Amos Jeffries
On 24/06/2014 4:07 p.m., Alex Rousskov wrote:
> Hello,
> 
> The attached patch contains several changes to improve the
> probability of getting a collapsed forwarding hit:
> 
> * Allow STORE_MEMORY_CLIENTs to open disk files if needed and possible.
> STORE_*_CLIENT designation is rather buggy (several known XXXs). Some
> collapsed clients are marked as STORE_MEMORY_CLIENTs (for the lack of
> info at determination time) but their hit content may actually come from
> a disk cache.
> 
> * Do not abandon writing a collapsed cache entry when we cannot cache
> the entry in RAM if the entry can be cached on disk instead. Both shared
> memory cache and the disk cache have to refuse to cache the entry for it
> to become non-collapsible. This dual refusal is difficult to detect
> because each cache may make the caching decision at different times.
> Added StoreEntry methods to track those decisions and react to them.

Hmm. FTR: IMHO the only reasons a response should be non-collapsible is
if it is a) private, b) authenticated to another user, c) explicitly
known non-cacheable, d) a non-matching variant of the URL, or e) we
already discarded some body bytes.

The factor of memory/disk caches not having provided cacheable size
allow/reject result yet is no reason to be non-collapsible. It is a
reason to wait on that result to determine the cacheability condition (c).

> 
> * Recognize disk cache as a potential source of the collapsed entry when
> the memory cache is configured. While collapsed entries would normally
> be found in the shared memory cache, caching policies and other factors
> may prohibit memory caching but still allow disk caching. Memory cache
> is still preferred.
> 
> * Do not use unknown entry size in StoreEntry::checkTooSmall()
> determination. The size of collapsed entries is often unknown, even when
> they are STORE_OK (because swap_hdr_sz is unknown when the other worker
> has created the cache entry).  The same code has been using this
> ignore-unknowns logic for the Content-Length header value, so the
> rejection of unknown entry size (added as a part of C++ conversion
> without a dedicated message in r5766) could have been a typo.
> 
> 
> If approved, I will commit the change for the last bullet separately
> because it is easier to isolate and is less specific to collapsed
> forwarding.
> 
> 
> The patch does not attempt to cover all possible collapsing cases that
> Squid currently misses, of course. More work remains in this area
> (including a serious STORE_*_CLIENT redesign), but the proposed fixes
> should make a dent.
> 
> These changes were developed for an earlier trunk version but the
> trunk-ported patch posted here also passed simple lab tests.
> 
> 
> Thank you,
> 
> Alex.
> 

+1. Although I do not know enough about store yet to do much in-depth
audit of the logic changes.

Amos