Re: [squid-dev] [PATCH] Bug 3819: "fd >= 0" assertion in file_write() during reconfiguration

2016-09-20 Thread Alex Rousskov
On 09/20/2016 11:58 AM, Alex Rousskov wrote:
> On 09/20/2016 04:10 AM, Amos Jeffries wrote:
>> On 20/09/2016 9:52 a.m., Alex Rousskov wrote:
>>> The attached trunk patch fixes bug 3819 in our tests.
> 
>> +1. Thank you.
> 
> Committed with one additional fix to trunk (r14815 and r14816).

Correction: trunk r14849.

Alex.

___
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev


Re: [squid-dev] [PATCH] Bug 3819: "fd >= 0" assertion in file_write() during reconfiguration

2016-09-20 Thread Alex Rousskov
On 09/20/2016 04:10 AM, Amos Jeffries wrote:
> On 20/09/2016 9:52 a.m., Alex Rousskov wrote:
>> The attached trunk patch fixes bug 3819 in our tests.

> +1. Thank you.

Committed with one additional fix to trunk (r14815 and r14816).

The bug report has a v3.5 patch providing the same fix:
http://bugs.squid-cache.org/show_bug.cgi?id=3819#c28


Thank you,

Alex.

___
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev


Re: [squid-dev] [PATCH] Bug 3819: "fd >= 0" assertion in file_write() during reconfiguration

2016-09-20 Thread Amos Jeffries
On 20/09/2016 9:52 a.m., Alex Rousskov wrote:
> Hello,
> 
> The attached trunk patch fixes bug 3819 in our tests.
> 
> Please note that the underlying problem may lead to other assertions,
> including "NumberOfUFSDirs" in UFSSwapDir and "fd >= 0" in file_close().
> 
> Since trunk r9181.3.1, reconfiguration is done in at least two steps:
> First, mainReconfigureStart() closes/cleans/disables all modules
> supporting hot reconfiguration and then, at the end of the event loop
> iteration, mainReconfigureFinish() opens/configures/enables them again.
> UFS code hits assertions if it has to log entries or rebuild swap.state
> between those two steps.  The patch disables or delays those activities
> during reconfiguration.
> 
> The tiny assertion opportunity window hides these bugs from most proxies
> that are not doing enough disk I/O or are not being reconfigured
> frequently enough.
> 

+1. Thank you.

Amos

___
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev


[squid-dev] [PATCH] Bug 3819: "fd >= 0" assertion in file_write() during reconfiguration

2016-09-19 Thread Alex Rousskov
Hello,

The attached trunk patch fixes bug 3819 in our tests.

Please note that the underlying problem may lead to other assertions,
including "NumberOfUFSDirs" in UFSSwapDir and "fd >= 0" in file_close().

Since trunk r9181.3.1, reconfiguration is done in at least two steps:
First, mainReconfigureStart() closes/cleans/disables all modules
supporting hot reconfiguration and then, at the end of the event loop
iteration, mainReconfigureFinish() opens/configures/enables them again.
UFS code hits assertions if it has to log entries or rebuild swap.state
between those two steps.  The patch disables or delays those activities
during reconfiguration.

The tiny assertion opportunity window hides these bugs from most proxies
that are not doing enough disk I/O or are not being reconfigured
frequently enough.


HTH,

Alex.

Bug 3819: "fd >= 0" assertion in file_write() during reconfiguration

Other possible assertions include "NumberOfUFSDirs" in UFSSwapDir and
"fd >= 0" in file_close().

Since trunk r9181.3.1, reconfiguration is done in at least two steps:
First, mainReconfigureStart() closes/cleans/disables all modules
supporting hot reconfiguration and then, at the end of the event loop
iteration, mainReconfigureFinish() opens/configures/enables them again.
UFS code hits assertions if it has to log entries or rebuild swap.state
between those two steps. The tiny assertion opportunity window hides
these bugs from most proxies that are not doing enough disk I/O or are
not being reconfigured frequently enough.

Asynchronous UFS cache_dirs such as diskd were the most exposed, but
even blocking UFS caching code could probably hit [rebuild] assertions.

The swap.state rebuilding (always initiated at startup) probably did not
work as intended if reconfiguration happened during the rebuild time
because reconfiguration closed the swap.state file being rebuilt. We now
protect that swap.state file and delay rebuilding progress until
reconfiguration is over.

TODO: To squash more similar bugs, we need to move hot reconfiguration
support into the modules so that each module can reconfigure instantly.

=== modified file 'src/fs/ufs/RebuildState.cc'
--- src/fs/ufs/RebuildState.cc	2016-04-03 23:41:58 +
+++ src/fs/ufs/RebuildState.cc	2016-09-16 01:17:04 +
@@ -74,43 +74,45 @@ Fs::Ufs::RebuildState::RebuildState(RefC
 
 if (!clean)
 flags.need_to_validate = true;
 
 debugs(47, DBG_IMPORTANT, "Rebuilding storage in " << sd->path << " (" <<
(clean ? "clean log" : (LogParser ? "dirty log" : "no log")) << ")");
 }
 
 Fs::Ufs::RebuildState::~RebuildState()
 {
 sd->closeTmpSwapLog();
 
 if (LogParser)
 delete LogParser;
 }
 
 void
 Fs::Ufs::RebuildState::RebuildStep(void *data)
 {
 RebuildState *rb = (RebuildState *)data;
-rb->rebuildStep();
+if (!reconfiguring)
+rb->rebuildStep();
 
-if (!rb->isDone())
+// delay storeRebuildComplete() when reconfiguring to protect storeCleanup()
+if (!rb->isDone() || reconfiguring)
 eventAdd("storeRebuild", RebuildStep, rb, 0.01, 1);
 else {
 -- StoreController::store_dirs_rebuilding;
 storeRebuildComplete(&rb->counts);
 delete rb;
 }
 }
 
 /// load entries from swap.state or files until we run out of entries or time
 void
 Fs::Ufs::RebuildState::rebuildStep()
 {
 currentEntry(NULL);
 
 // Balance our desire to maximize the number of entries processed at once
 // (and, hence, minimize overheads and total rebuild time) with a
 // requirement to also process Coordinator events, disk I/Os, etc.
 const int maxSpentMsec = 50; // keep small: most RAM I/Os are under 1ms
 const timeval loopStart = current_time;
 

=== modified file 'src/fs/ufs/UFSSwapDir.cc'
--- src/fs/ufs/UFSSwapDir.cc	2016-04-03 23:41:58 +
+++ src/fs/ufs/UFSSwapDir.cc	2016-09-16 01:05:19 +
@@ -300,41 +300,42 @@ Fs::Ufs::UFSSwapDir::init()
 void
 Fs::Ufs::UFSSwapDir::create()
 {
 debugs(47, 3, "Creating swap space in " << path);
 createDirectory(path, 0);
 createSwapSubDirs();
 }
 
 Fs::Ufs::UFSSwapDir::UFSSwapDir(char const *aType, const char *anIOType) :
 SwapDir(aType),
 IO(NULL),
 fsdata(NULL),
 map(new FileMap()),
 suggest(0),
 l1(16),
 l2(256),
 swaplog_fd(-1),
 currentIOOptions(new ConfigOptionVector()),
 ioType(xstrdup(anIOType)),
 cur_size(0),
-n_disk_objects(0)
+n_disk_objects(0),
+rebuilding_(false)
 {
 /* modulename is only set to disk modules that are built, by configure,
  * so the Find call should never return NULL here.
  */
 IO = new Fs::Ufs::UFSStrategy(DiskIOModule::Find(anIOType)->createStrategy());
 }
 
 Fs::Ufs::UFSSwapDir::~UFSSwapDir()
 {
 if (swaplog_fd > -1) {
 file_close(swaplog_fd);
 swaplog_fd = -1;
 }
 xfree(ioType);
 delete map;
 delete IO;
 delete currentIOOptions;
 }
 
 void
@@ -708,78 +709,81 @@ Fs::Ufs::UFSSwapDir::logFile(char cons