Re: [squid-dev] [PATCH] Bug 7: Headers are not updated on disk after 304s

2016-03-12 Thread Eliezer Croitoru

I will try to follow up at:
http://bugs.squid-cache.org/show_bug.cgi?id=7

Eliezer

On 11/03/2016 20:16, Alex Rousskov wrote:

On 03/11/2016 02:17 AM, Amos Jeffries wrote:

On 11/03/2016 2:59 p.m., Alex Rousskov wrote:

 The attached compressed patch fixes a 15+ years old Bug #7 [1] for
the shared memory cache and rock cache_dirs. I am not aware of anybody
working on ufs-based cache_dirs, but this patch provides a Store API and
a cache_dir example on how to fix those as well.

   [1] http://bugs.squid-cache.org/show_bug.cgi?id=7




Ah. I'm getting deja-vu on this. Thought those two cache types were
fixed long ago and recent talk was you were working on the UFS side of it.


There was some noise about this bug and related issues some months ago.
It was easy to get confused by all the mis[leading]information being
posted on bugzilla, including reports that "the bug is fixed" for some
ufs-based cache_dirs. I tried to correct those reports but failed to
convince people that they do not see what they think they see.

After this patch, the following cache stores (and only them) should
support header updates:

   * non-shared memory cache (in non-SMP Squids only)
   * shared memory cache
   * rock cache_dir

Needless to say, the posted patch does not fix all the problems with
header updates, even for the above stores. For example, the code that
decides which headers to update may still violate HTTP in some ways (I
have not checked). The patch "just" writes the headers computed by Squid
to shared memory cache and to rock cache_dirs.

Moreover, given the [necessary] complexity of the efficient update code
combined with the [unnecessary] complexity of some old Store APIs, I
would be surprised if there are no new bugs or problems introduced by
our changes. I am not aware of any, but we continue to test and plan to
fix the ones we find.



Besides unavoidable increase in rock-based caching code complexity, the
[known] costs of this fix are:

1. 8 additional bytes per cache entry for shared memory cache and rock
cache_dirs. Much bigger but short-lived RAM _savings_ for rock
cache_dirs (due to less RAM-hungry index rebuild code) somewhat mitigate
this RAM usage increase.

2. Increased slot fragmentation when updated headers are slightly larger
than old ones. This can probably be optimized away later if needed by
padding HTTP headers or StoreEntry metadata.

3. Somewhat slower rock cache_dir index rebuild time. IMO, this should
eventually be dealt with by not rebuilding the index on most startups at
all (rather than focusing on the index rebuild optimization).


Hmm. Nod, agreed on the long-term approach.



The patch preamble (also quoted below) contains more technical details,
including a list of side changes that, ideally, should go in as separate
commits. The posted patch is based on our bug7 branch on lp[2] which has
many intermediate commits. I am not yet sure whether it makes sense to
_merge_ that branch into trunk or simply commit it as a single/atomic
change (except for those side changes). Opinions welcomed.




Do you know how to do a merge like that with bzr properly?
  My experience has been that it only likes atomic-like merges.


I sense a terminology conflict. By "merge", I meant "bzr merge". Trunk
already has many merged branches, of course:

   revno: 14574 [merge]
   revno: 14573 [merge]
   revno: 14564 [merge]
   ...

By single/atomic change, I meant "patch < bug7.patch". Merges preserve
individual branch commits which is good when those commits are valuable
and bad when those commits are noise. In case of our bug7 branch, it is
a mixture of valuable stuff and noise. I decided to do a single/atomic
change to avoid increasing the noise level.



in src/StoreIOState.h:

* if the XXX about file_callback can the removal TODO be enacted ?
  - at least as one of the side-change patches


Yes, of course, but out of this project scope. We already did the
difficult part -- detected and verified that the API is unused.
Hopefully, somebody will volunteer to do the rest (and to take the
responsibility for it).



* the docs on touchingStoreEntry() seem to contradict your description
of how the entry chains work. Now. You said readers could read whatever
chain they were attached to after the update switch. The doc says they
only ever read the primary.


Done: Clarified that the primary chain (which the readers always start
with) may become secondary later:


 // Tests whether we are working with the primary/public StoreEntry chain.
 // Reads start reading the primary chain, but it may become secondary.
 // There are two store write kinds:
 // * regular writes that change (usually append) the entry visible to all 
and
 // * header updates that create a fresh chain (while keeping the stale one 
usable).
 bool touchingStoreEntry() const;


The readers do not matter in the current code because reading code does
not use this method, but that may change in the future, of course.



in 

Re: [squid-dev] [PATCH] Bug 7: Headers are not updated on disk after 304s

2016-03-11 Thread Alex Rousskov
On 03/11/2016 02:17 AM, Amos Jeffries wrote:
> On 11/03/2016 2:59 p.m., Alex Rousskov wrote:
>> The attached compressed patch fixes a 15+ years old Bug #7 [1] for
>> the shared memory cache and rock cache_dirs. I am not aware of anybody
>> working on ufs-based cache_dirs, but this patch provides a Store API and
>> a cache_dir example on how to fix those as well.
>>
>>   [1] http://bugs.squid-cache.org/show_bug.cgi?id=7


> Ah. I'm getting deja-vu on this. Thought those two cache types were
> fixed long ago and recent talk was you were working on the UFS side of it.

There was some noise about this bug and related issues some months ago.
It was easy to get confused by all the mis[leading]information being
posted on bugzilla, including reports that "the bug is fixed" for some
ufs-based cache_dirs. I tried to correct those reports but failed to
convince people that they do not see what they think they see.

After this patch, the following cache stores (and only them) should
support header updates:

  * non-shared memory cache (in non-SMP Squids only)
  * shared memory cache
  * rock cache_dir

Needless to say, the posted patch does not fix all the problems with
header updates, even for the above stores. For example, the code that
decides which headers to update may still violate HTTP in some ways (I
have not checked). The patch "just" writes the headers computed by Squid
to shared memory cache and to rock cache_dirs.

Moreover, given the [necessary] complexity of the efficient update code
combined with the [unnecessary] complexity of some old Store APIs, I
would be surprised if there are no new bugs or problems introduced by
our changes. I am not aware of any, but we continue to test and plan to
fix the ones we find.


>> Besides unavoidable increase in rock-based caching code complexity, the
>> [known] costs of this fix are:
>>
>> 1. 8 additional bytes per cache entry for shared memory cache and rock
>> cache_dirs. Much bigger but short-lived RAM _savings_ for rock
>> cache_dirs (due to less RAM-hungry index rebuild code) somewhat mitigate
>> this RAM usage increase.
>>
>> 2. Increased slot fragmentation when updated headers are slightly larger
>> than old ones. This can probably be optimized away later if needed by
>> padding HTTP headers or StoreEntry metadata.
>>
>> 3. Somewhat slower rock cache_dir index rebuild time. IMO, this should
>> eventually be dealt with by not rebuilding the index on most startups at
>> all (rather than focusing on the index rebuild optimization).
> 
> Hmm. Nod, agreed on the long-term approach.
> 
>>
>> The patch preamble (also quoted below) contains more technical details,
>> including a list of side changes that, ideally, should go in as separate
>> commits. The posted patch is based on our bug7 branch on lp[2] which has
>> many intermediate commits. I am not yet sure whether it makes sense to
>> _merge_ that branch into trunk or simply commit it as a single/atomic
>> change (except for those side changes). Opinions welcomed.


> Do you know how to do a merge like that with bzr properly?
>  My experience has been that it only likes atomic-like merges.

I sense a terminology conflict. By "merge", I meant "bzr merge". Trunk
already has many merged branches, of course:

  revno: 14574 [merge]
  revno: 14573 [merge]
  revno: 14564 [merge]
  ...

By single/atomic change, I meant "patch < bug7.patch". Merges preserve
individual branch commits which is good when those commits are valuable
and bad when those commits are noise. In case of our bug7 branch, it is
a mixture of valuable stuff and noise. I decided to do a single/atomic
change to avoid increasing the noise level.


> in src/StoreIOState.h:
> 
> * if the XXX about file_callback can the removal TODO be enacted ?
>  - at least as one of the side-change patches

Yes, of course, but out of this project scope. We already did the
difficult part -- detected and verified that the API is unused.
Hopefully, somebody will volunteer to do the rest (and to take the
responsibility for it).


> * the docs on touchingStoreEntry() seem to contradict your description
> of how the entry chains work. Now. You said readers could read whatever
> chain they were attached to after the update switch. The doc says they
> only ever read the primary.

Done: Clarified that the primary chain (which the readers always start
with) may become secondary later:

> // Tests whether we are working with the primary/public StoreEntry chain.
> // Reads start reading the primary chain, but it may become secondary.
> // There are two store write kinds:
> // * regular writes that change (usually append) the entry visible to all 
> and
> // * header updates that create a fresh chain (while keeping the stale 
> one usable).
> bool touchingStoreEntry() const;

The readers do not matter in the current code because reading code does
not use this method, but that may change in the future, of course.


> in src/fs/rock/RockHeaderUpdater.cc:
> 
> 

Re: [squid-dev] [PATCH] Bug 7: Headers are not updated on disk after 304s

2016-03-11 Thread Amos Jeffries
On 11/03/2016 2:59 p.m., Alex Rousskov wrote:
> Hello,
> 
> The attached compressed patch fixes a 15+ years old Bug #7 [1] for
> the shared memory cache and rock cache_dirs. I am not aware of anybody
> working on ufs-based cache_dirs, but this patch provides a Store API and
> a cache_dir example on how to fix those as well.
> 
>   [1] http://bugs.squid-cache.org/show_bug.cgi?id=7
> 

Ah. I'm getting deja-vu on this. Thought those two cache types were
fixed long ago and recent talk was you were working on the UFS side of it.

Sigh. Oh well.


> Besides unavoidable increase in rock-based caching code complexity, the
> [known] costs of this fix are:
> 
> 1. 8 additional bytes per cache entry for shared memory cache and rock
> cache_dirs. Much bigger but short-lived RAM _savings_ for rock
> cache_dirs (due to less RAM-hungry index rebuild code) somewhat mitigate
> this RAM usage increase.
> 
> 2. Increased slot fragmentation when updated headers are slightly larger
> than old ones. This can probably be optimized away later if needed by
> padding HTTP headers or StoreEntry metadata.
> 
> 3. Somewhat slower rock cache_dir index rebuild time. IMO, this should
> eventually be dealt with by not rebuilding the index on most startups at
> all (rather than focusing on the index rebuild optimization).

Hmm. Nod, agreed on the long-term approach.

> 
> The patch preamble (also quoted below) contains more technical details,
> including a list of side changes that, ideally, should go in as separate
> commits. The posted patch is based on our bug7 branch on lp[2] which has
> many intermediate commits. I am not yet sure whether it makes sense to
> _merge_ that branch into trunk or simply commit it as a single/atomic
> change (except for those side changes). Opinions welcomed.

Do you know how to do a merge like that with bzr properly?
 My experience has been that it only likes atomic-like merges.

> 
>   [2] https://code.launchpad.net/~measurement-factory/squid/bug7
> 
> 
> ---
> Bug 7: Update cached entries on 304 responses
> 
> New Store API to update entry metadata and headers on 304s.
> Support entry updates in shared memory cache and rock cache_dirs.
> 
> * Highlights:
> 
> 1. Atomic StoreEntry metadata updating
> 
>StoreEntry metadata (swap_file_sz, timestamps, etc.) is used
>throughout Squid code. Metadata cannot be updated atomically because
>it has many fields, but a partial update to those fields causes
>assertions. Still, we must update metadata when updating HTTP
>headers. Locking the entire entry for a rewrite does not work well
>because concurrent requests will attempt to download a new entry
>copy, defeating the very HTTP 304 optimization we want to support.
> 
>Ipc::StoreMap index now uses an extra level of indirection (the
>StoreMap::fileNos index) which allows StoreMap control which
>anchor/fileno is associated with a given StoreEntry key. The entry
>updating code creates a disassociated (i.e., entry/key-less) anchor,
>writes new metadata and headers using that new anchor, and then
>_atomically_ switches the map to use that new anchor. This allows old
>readers to continue reading using the stale anchor/fileno as if
>nothing happened while a new reader gets the new anchor/fileno.

:-)

> 
>Shared memory usage increase: 8 additional bytes per cache entry: 4
>for the extra level of indirection (StoreMapFileNos) plus 4 for
>splicing fresh chain prefix with the stale chain suffix
>(StoreMapAnchor::splicingPoint). However, if the updated headers are
>larger than the stale ones, Squid will allocate shared memory pages
>to accommodate for the increase, leading to shared memory
>fragmentation/waste for small increases.

> 
> 2. Revamped rock index rebuild process
> 
>The index rebuild process had to be completely revamped because
>splicing fresh and stale entry slot chain segments implies tolerating
>multiple entry versions in a single chain and the old code was based
>on the assumption that different slot versions are incompatible. We
>were also uncomfortable with the old cavalier approach to accessing
>two differently indexed layers of information (entry vs. slot) using
>the same set of class fields, making it trivial to accidentally
>access entry data while using slot index.
> 
>During the rewrite of the index rebuilding code, we also discovered a
>way to significantly reduce RAM usage for the index build map (a
>temporary object that is allocated in the beginning and freed at the
>end of the index build process). The savings depend on the cache
>size: A small cache saves about 30% (17 vs 24 bytes per entry/slot)
>while a 1TB cache_dir with 32KB slots (which implies uneven
>entry/slot indexes) saves more than 50% (~370MB vs. ~800MB).
> 
>Adjusted how invalid slots are counted. The code was sometimes
>counting invalid entries and sometimes invalid 

[squid-dev] [PATCH] Bug 7: Headers are not updated on disk after 304s

2016-03-10 Thread Alex Rousskov
Hello,

The attached compressed patch fixes a 15+ years old Bug #7 [1] for
the shared memory cache and rock cache_dirs. I am not aware of anybody
working on ufs-based cache_dirs, but this patch provides a Store API and
a cache_dir example on how to fix those as well.

  [1] http://bugs.squid-cache.org/show_bug.cgi?id=7

Besides unavoidable increase in rock-based caching code complexity, the
[known] costs of this fix are:

1. 8 additional bytes per cache entry for shared memory cache and rock
cache_dirs. Much bigger but short-lived RAM _savings_ for rock
cache_dirs (due to less RAM-hungry index rebuild code) somewhat mitigate
this RAM usage increase.

2. Increased slot fragmentation when updated headers are slightly larger
than old ones. This can probably be optimized away later if needed by
padding HTTP headers or StoreEntry metadata.

3. Somewhat slower rock cache_dir index rebuild time. IMO, this should
eventually be dealt with by not rebuilding the index on most startups at
all (rather than focusing on the index rebuild optimization).


The patch preamble (also quoted below) contains more technical details,
including a list of side changes that, ideally, should go in as separate
commits. The posted patch is based on our bug7 branch on lp[2] which has
many intermediate commits. I am not yet sure whether it makes sense to
_merge_ that branch into trunk or simply commit it as a single/atomic
change (except for those side changes). Opinions welcomed.

  [2] https://code.launchpad.net/~measurement-factory/squid/bug7


---
Bug 7: Update cached entries on 304 responses

New Store API to update entry metadata and headers on 304s.
Support entry updates in shared memory cache and rock cache_dirs.

* Highlights:

1. Atomic StoreEntry metadata updating

   StoreEntry metadata (swap_file_sz, timestamps, etc.) is used
   throughout Squid code. Metadata cannot be updated atomically because
   it has many fields, but a partial update to those fields causes
   assertions. Still, we must update metadata when updating HTTP
   headers. Locking the entire entry for a rewrite does not work well
   because concurrent requests will attempt to download a new entry
   copy, defeating the very HTTP 304 optimization we want to support.

   Ipc::StoreMap index now uses an extra level of indirection (the
   StoreMap::fileNos index) which allows StoreMap control which
   anchor/fileno is associated with a given StoreEntry key. The entry
   updating code creates a disassociated (i.e., entry/key-less) anchor,
   writes new metadata and headers using that new anchor, and then
   _atomically_ switches the map to use that new anchor. This allows old
   readers to continue reading using the stale anchor/fileno as if
   nothing happened while a new reader gets the new anchor/fileno.

   Shared memory usage increase: 8 additional bytes per cache entry: 4
   for the extra level of indirection (StoreMapFileNos) plus 4 for
   splicing fresh chain prefix with the stale chain suffix
   (StoreMapAnchor::splicingPoint). However, if the updated headers are
   larger than the stale ones, Squid will allocate shared memory pages
   to accommodate for the increase, leading to shared memory
   fragmentation/waste for small increases.

2. Revamped rock index rebuild process

   The index rebuild process had to be completely revamped because
   splicing fresh and stale entry slot chain segments implies tolerating
   multiple entry versions in a single chain and the old code was based
   on the assumption that different slot versions are incompatible. We
   were also uncomfortable with the old cavalier approach to accessing
   two differently indexed layers of information (entry vs. slot) using
   the same set of class fields, making it trivial to accidentally
   access entry data while using slot index.

   During the rewrite of the index rebuilding code, we also discovered a
   way to significantly reduce RAM usage for the index build map (a
   temporary object that is allocated in the beginning and freed at the
   end of the index build process). The savings depend on the cache
   size: A small cache saves about 30% (17 vs 24 bytes per entry/slot)
   while a 1TB cache_dir with 32KB slots (which implies uneven
   entry/slot indexes) saves more than 50% (~370MB vs. ~800MB).

   Adjusted how invalid slots are counted. The code was sometimes
   counting invalid entries and sometimes invalid entry slots. We should
   always count _slots_ now because progress is measured in the number
   of slots scanned, not entries loaded. This accounting change may
   surprise users with much higher "Invalid entries" count in cache.log
   upon startup, but at least the new reports are meaningful.

   This rewrite does not attempt to solve all rock index build problems.
   For example, the code still assumes that StoreEntry metadata fits a
   single slot which is not always true for very small slots.


* Side-changes to be committed separately, to the extent