https://bugzilla.wikimedia.org/show_bug.cgi?id=33409
Web browser: ---
Bug #: 33409
Summary: Transient CDB read/write failures
Product: MediaWiki
Version: 1.18
Platform: All
OS/Version: All
Status: NEW
Severity: normal
Priority: Unprioritized
Component: General/Unknown
AssignedTo: [email protected]
ReportedBy: [email protected]
Classification: Unclassified
On the main Wikimedia app cluster we log occasional read and write failures in
CDB, causing bug 31576 and other issues.
Analysis of the CDB code and limited testing indicates that a disk full
condition may cause a truncated CDB file to be moved into the destination. Most
of our app servers have /tmp mounted on a tiny (2GB) partition which is
commonly 60-70% full. A spike in disk space usage could cause a transient
disk-full condition.
More defensive handling of CDB writes might fix this issue. The return values
of the underlying write() and close() syscalls should be checked, and if an
error occurs, the temporary file should be deleted without the rename being
done, and an exception thrown.
There are problems in many places:
* PHP's dba_close() does not check whether the write of the hashtable or the
close() succeeded. This is not easy to fix, since dba_close() just calls the
resource destructor, and the resource destructor interface (rsrc_dtor_func_t)
does not provide a return value.
* Using CdbWriter_PHP would at least allow us to check for failures during the
hashtable write. However fclose() suffers from the same problem as dba_close():
it calls the resource destructor and so cannot provide a meaningful error
return. It doesn't even check for failures during the final flush of the
internal write buffer, see https://bugs.php.net/bug.php?id=60110
* CdbWriter_DBA::set() just passes through the return value of dba_insert(), it
does not throw an exception. Thus most callers will continue despite a
disk-full error.
* CdbWriter_PHP::set() throws an exception on a write error, however this will
lead to __destruct() being called and thus a truncated CDB file being moved
into the destination.
Also, the fact that CdbWriter objects are finalised and closed on __destruct()
may lead to an incomplete file being moved into place if some unrelated
exception is thrown in the caller during rebuild. Fixing this will mean
requiring that callers call CdbWriter::close() before letting the object go out
of scope, breaking backwards compatibility for callers like dumpInterwiki.php
and extensions/Babel/txt2cdb.php
My recommendations are:
* Add a configuration variable to configure which CdbWriter subclass is used,
and make it CdbWriter_PHP by default. It looks like it makes sense to use the C
reader but the PHP writer.
* Fix the exception and error handling in both the DBA and PHP versions of
CdbWriter.
* Encourage the PHP devs to do something about resource destructor error
propagation.
* Increase the size of the /tmp partition on the app servers.
--
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l