On 15/02/11 17:41, Alex Rousskov wrote:
Hello,

     The attached patch contains some of the Store changes we made while
working on adding SMP-shared support to cache_dirs. These changes were
made while chasing and fixing a particular set of bugs introduced and
exposed by shared Rock Store cache_dirs.

I wish the changes were better isolated but many are inter-dependent and
most were discovered/developed at the same time while trying to make the
code work. Fully isolating all of them will not be possible and
isolating many will take a long time.

They look like a small enough set to go in as a group by themselves after they have been tested.

<snip>

--------- change log ---------------

Changes revolving around making Store work with SMP-shared max-size
cache_dirs:

* Added MemObject::expectedReplySize() and used it instead of object_sz.

When deciding whether an object with a known content length can be
swapped out, do not wait until the object is completely received and its
size (mem_obj->object_sz) becomes known (while asking the store to
recheck in vain with every incoming chunk). Instead, use the known
content length, if any, to make the decision.

This optimizes the common case where the complete object is eventually
received and swapped out, preventing accumulating potentially large
objects in RAM while waiting for the end of the response. Should not
affect objects with unknown content length.

Hooray! Thank you.


Side-effect1: probably fixes several cases of unknowingly using negative
(unknown) mem_obj->object_sz in calculations. I added a few assertions
to double check some of the remaining object_sz/objectLen() uses.

Side-effect2: When expectedReplySize() is stored on disk as StoreEntry
metadata, it may help to detect truncated entries when the writer
process dies before completing the swapout.


Indeed.


* Removed mem->swapout.memnode in favor of mem->swapout.queue_offset.

The code used swapout.memnode pointer to keep track of the last page
that was swapped out. The code was semi-buggy because it could reset the
pointer to NULL if no new data came in before the call to doPages().
Perhaps the code relied on the assumption that the caller will never
doPages if there is no new data, but I am not sure that assumption was
correct in all cases (it could be that I broke the calling code, of course).

Moreover, the page pointer was kept without any protection from page
disappearing during asynchronous swapout. There were "Evil hack time"
comments discussing how the page might disappear.

Fortunately, we already have mem->swapout.queue_offset that can be fed
to getBlockContainingLocation to find the page that needs to be swapped
out. There is no need to keep the page pointer around. The
queue_offset-based math is the same so we are not adding any overheads
by using that offset (in fact, we are removing some minor computations).


Excellent.


* Added "close how?" parameter to storeClose() and friends.

The old code would follow the same path when closing swapout activity
for an aborted entry and when completing a perfectly healthy swapout. In
non-shared case, that could have been OK because the abort code would
then release the entry, removing any half-written entry from the index
and the disk (but I am not sure that release happened fast enough in
100% of cases).

I think it did not. Debian has bugs open against apt doing parallel downloads and/or fast recovery repeats producing a corrupt/truncated object. Other fixes have reduced the problem greatly but it still pops up on (now rare) occasions.

<snip>

* Moved "can you store this entry?" code to virtual SwapDir::canStore().

The old code had some of the tests in SwapDir-specific canStore()
methods and some in storeDirSelect*() methods. This resulted in
inconsistencies, code duplication, and extra calculation overheads.
Making this call virtual allows individual cache_dir types to do custom
access controls.

Interesting potentials there. Looks good.


The same method is used for cache_dir load reporting (if it returns
true). Load management needs more work, but the current code is no worse
than the old one in this aspect, and further improvements are outside
this change scope.

In UFSSwapDir::canStore I think load needs to be set regardless of the shedLoad() results. If I understand it right that would help balance things better when they are all overloaded.



* Minimized from-disk StoreEntry loading/unpacking code duplication.

Moved common (and often rather complex!) code from store modules into
storeRebuildLoadEntry, storeRebuildParseEntry, and storeRebuildKeepEntry.


I'm a little suspicious of the removal from StoreEntry::getSerialisedMetaData and its effect on storeSwapMetaBuild. IIRC you had reasons for adding it earlier that involved storeSwapMetaBuild using an invalid value in one or more cases. I hope those cases are resolved now?

<snip>
* When swapout initiation fails, release StoreEntry. This prevents the
caller code from trying to swap out again and again because swap_status
becomes SWAPOUT_NONE.

TODO: Consider add SWAPOUT_ERROR, STORE_ERROR, and similar states. It
may solve several problems where the code sees _NONE or _OK and thinks
everything is peachy when in fact there was an error.


in StoreEntry::swapoutPossible the logic tests of (currentEnd > store_maxobjsize) is broken. To reach it we already have checked (expectedEnd > 0 and expectedEnd < store_maxobjsize) so if (expectedEnd < currentEnd) there is a bug somewhere.




* Always call StoreEntry::abort() instead of setting ENTRY_ABORTED manually.


? I see no changes implementing that statement.

<snip>

* Added operator<<  for dumping StoreEntry summary into the debugging
log. Needs more work to report more info (and not report yet-unknown info).

? I see no changes implementing that statement.


in storeSwapOutFileClosed the assert commented "we checked that above" needs to be dropped. We are trying to remove useless asserts remember?


The bits snipped look okay to me, though I don't know store enough yet to find any subtle logic bugs.

Amos
--
Please be using
  Current Stable Squid 2.7.STABLE9 or 3.1.11
  Beta testers wanted for 3.2.0.5

Reply via email to