Hello,
This has been discussed before (see the old thread quoted below),
and the bump-ssl-server-first feature is being bitten by this especially
badly.
The attached patch may not apply to trunk "as is" right now, but the
changes should be clearly visible, and I would like to port it there and
get it committed if there are no better ideas for the fix.
Technical summary:
The ERR_ZERO_SIZE_OBJECT errors were visible to the client when the
destination had only one address because serverDestinations.shift()
made the list of destination empty and startConnectionOrFail() failed.
The new "wasIdle" state is remembered in the Connection object instead
of FwdState itself because we need to maintain that information for a
pinned connection as well (and FwdState is not preserved between a
request that pins a connection and a request that uses the pinned
connection).
This may need more work. I am not sure the FTP and Gopher cases are
correct. The current code assumes that those protocols do not have pconn
[races].
Please review.
Thank you,
Alex.
Old thread titled "server-side pconn races no longer retried after
r10057.1.8":
On 09/21/2011 08:27 AM, Alex Rousskov wrote:
> On 09/20/2011 11:43 PM, Amos Jeffries wrote:
>> On 21/09/11 16:56, Alex Rousskov wrote:
>>> On 09/20/2011 05:57 PM, Amos Jeffries wrote:
>>>>
>>>> Basing it on the error type/code/something would be better.
>>>>
>>>> There are three sets of cases feeding into this function (maybe more if
>>>> other races exist).
>>>> a) server error responses 5xx/403. new path required before
>>>> connectStart().
>>>> b) TCP connection failures. new path required before connectStart().
>>>> c) this pconn race. same path can be passed back to connectStart()
>
>
>>>> At this point it should have (local.GetPort()> 0) from a previously
>>>> open connection (c or a) [NP: fd will be -1 already]. So checking for
>>>> the zero-sized error AND a present local port we could unset the local
>>>> port, and call connectStart() without the shift().
>>>
>>> A positive local.GetPort() check only tells us whether we had used a
>>> connection, not whether we had reused a persistent connection, right?
>>
>> Yes it tells us the conn was previously successful at opening. So I
>> believe it should be the distinguishing difference between (a) and (b)
>> cases. The ZERO_SIZED error differentiates between (c) and (a|b).
>
>
> Not exactly. I believe a ZERO_SIZED error does not indicate a pconn
> race. It implies that there was a connection, so a positive
> local.GetPort() check would not even be needed if we have ZERO_SIZED
> error, but it is insufficient to indicate a pconn race.
>
>
>
>>> How about adding an explicit flag.avoidPconn[NextTime] or similar and
>>> setting it as sketched below?
>>>
>>> FwdState::connectStart():
>>> ...
>>> fwdPconnPool->pop(..., !flag.avoidPconn&& checkRetriable());
>>> flag.avoidPconn = we are reusing pconn;
>>
>> The problem was the last pconn, we may have N perfectly fine ones to use
>> on the list.
>
> Correct. However, we may also have N bad ones (imagine a server that was
> just rebooted, silently dropping all our N pconns) and we do not want to
> iterate any of them to find out: The user suffered (has been waiting)
> enough already. Other users will test those N-1 remaining pconns, if any.
>
> Moreover, even RFC 2616 implies that one should not reuse a second pconn
> for the same request if the first one failed:
>
>> Client software SHOULD reopen the
>> transport connection and retransmit the aborted sequence of requests
>> without user interaction
>
> (reopen implies opening a new connection)
>
>> The automatic retry SHOULD NOT
>> be repeated if the second sequence of requests fails.
>
> (the above implies that if the second pconn fails due to the same benign
> reason, we SHOULD NOT retry and would have to server the error to the
> user; we want to avoid that possibility)
>
>
>
>>> And any time we change destination:
>>>
>>> serverDestinations.shift();
>>> flag.avoidPconn = false;
>>>
>>> As the above sketch illustrates, the same flag can then be used in
>>> FwdState::connectStart() to _avoid_ using a pconn if we are retrying the
>>> same destination.
>>>
>>> I do not like that we would have to remember to reset this flag every
>>> time we shift the serverDestinations array. Perhaps that logic should be
>>> moved into a dedicated FwdState method.
>>
>> Yes, a flag is possible if we need to go that way. I think we can get
>> the same information from local port>0 for free without any state
>> maintenance overheads though. And we only need to use it at this one
>> point where the cases are clear.
>
> My understanding is that local port>0 does not mean we have used a
> pconn, with or without a ZERO_SIZED error. Please correct me if I am wrong.
>
>
> Thank you,
>
> Alex.
Retry requests that failed due to a persistent connection race
instead of replying with ERR_ZERO_SIZE_OBJECT "Bad Gateway".
The ERR_ZERO_SIZE_OBJECT errors were visible to the client when the
destination had only one address because serverDestinations.shift()
made the list of destination empty and startConnectionOrFail() failed.
The new "wasIdle" state is remembered in the Connection object instead of
FwdState itself because we need to maintain that information for a pinned
connection as well (and FwdState is not preserved between a request that
pins a connection and a request that uses the pinned connection).
This may need more work. I am not sure the FTP and Gopher cases are correct.
The current code assumes that those protocols do not have pconn [races].
=== modified file 'src/comm/Connection.cc'
--- src/comm/Connection.cc 2011-09-06 08:38:27 +0000
+++ src/comm/Connection.cc 2012-02-14 02:59:21 +0000
@@ -1,67 +1,69 @@
#include "config.h"
#include "cbdata.h"
#include "comm.h"
#include "comm/Connection.h"
#include "fde.h"
#include "SquidTime.h"
bool
Comm::IsConnOpen(const Comm::ConnectionPointer &conn)
{
return conn != NULL && conn->isOpen();
}
Comm::Connection::Connection() :
local(),
remote(),
peerType(HIER_NONE),
fd(-1),
tos(0),
flags(COMM_NONBLOCKING),
+ wasIdle(false),
_peer(NULL)
{
*rfc931 = 0; // quick init the head. the rest does not matter.
}
static int64_t lost_conn = 0;
Comm::Connection::~Connection()
{
if (fd >= 0) {
debugs(5, 0, "BUG: Orphan Comm::Connection: " << *this);
debugs(5, 0, "NOTE: " << ++lost_conn << " Orphans since last started.");
close();
}
cbdataReferenceDone(_peer);
}
Comm::ConnectionPointer
Comm::Connection::copyDetails() const
{
ConnectionPointer c = new Comm::Connection;
c->local = local;
c->remote = remote;
c->peerType = peerType;
c->tos = tos;
c->flags = flags;
+ // wasIdle is not a preserved detail
// ensure FD is not open in the new copy.
c->fd = -1;
// ensure we have a cbdata reference to _peer not a straight ptr copy.
c->_peer = cbdataReference(getPeer());
return c;
}
void
Comm::Connection::close()
{
if (isOpen()) {
comm_close(fd);
fd = -1;
if (getPeer())
getPeer()->stats.conn_open--;
}
}
=== modified file 'src/comm/Connection.h'
--- src/comm/Connection.h 2011-07-12 05:52:07 +0000
+++ src/comm/Connection.h 2012-02-14 02:59:57 +0000
@@ -130,69 +130,74 @@
Ip::Address local;
/** Address for the Remote end of a TCP link. */
Ip::Address remote;
/** Hierarchy code for this connection link */
hier_code peerType;
/** Socket used by this connection. Negative if not open. */
int fd;
/** Quality of Service TOS values currently sent on this connection */
tos_t tos;
/** Netfilter MARK values currently sent on this connection */
nfmark_t nfmark;
/** COMM flags set on this connection */
int flags;
+ /** Whether a pconn race may exist (i.e., server closed while we idled). */
+ bool wasIdle;
+
char rfc931[USER_IDENT_SZ];
#if USE_SQUID_EUI
Eui::Eui48 remoteEui48;
Eui::Eui64 remoteEui64;
#endif
private:
// XXX: we need to call this member peer_ but the struct peer_ global type
// behind peer* clashes despite our private Comm:: namespace
// (it being global gets inherited here too).
/** cache_peer data object (if any) */
peer *_peer;
};
MEMPROXY_CLASS_INLINE(Connection);
}; // namespace Comm
// NP: Order and namespace here is very important.
// * The second define inlines the first.
// * Stream inheritance overloading is searched in the global scope first.
inline std::ostream &
operator << (std::ostream &os, const Comm::Connection &conn)
{
os << "local=" << conn.local << " remote=" << conn.remote;
if (conn.fd >= 0)
os << " FD " << conn.fd;
if (conn.flags != COMM_UNSET)
os << " flags=" << conn.flags;
+ if (conn.wasIdle)
+ os << " pconn";
#if USE_IDENT
if (*conn.rfc931)
os << " IDENT::" << conn.rfc931;
#endif
return os;
}
inline std::ostream &
operator << (std::ostream &os, const Comm::ConnectionPointer &conn)
{
if (conn != NULL)
os << *conn;
return os;
}
#endif
=== modified file 'src/forward.cc'
--- src/forward.cc 2012-02-05 21:55:51 +0000
+++ src/forward.cc 2012-02-14 03:18:31 +0000
@@ -553,41 +553,45 @@
default:
return false;
}
return true;
}
void
FwdState::serverClosed(int fd)
{
debugs(17, 2, HERE << "FD " << fd << " " << entry->url());
retryOrBail();
}
void
FwdState::retryOrBail()
{
if (checkRetry()) {
debugs(17, 3, HERE << "re-forwarding (" << n_tries << " tries, " << (squid_curtime - start_t) << " secs)");
- serverDestinations.shift(); // last one failed. try another.
+ // we should retry the same destination if it failed due to pconn race
+ if (serverConn != NULL && !serverConn->wasIdle)
+ serverDestinations.shift(); // last one failed. try another.
+ else
+ debugs(17, 4, HERE << "retrying the same destination");
startConnectionOrFail();
return;
}
// TODO: should we call completed() here and move doneWithRetries there?
doneWithRetries();
if (self != NULL && !err && shutting_down) {
ErrorState *anErr = errorCon(ERR_SHUTTING_DOWN, HTTP_SERVICE_UNAVAILABLE, request);
errorAppendEntry(entry, anErr);
}
self = NULL; // refcounted
}
// If the Server quits before nibbling at the request body, the body sender
// will not know (so that we can retry). Call this if we will not retry. We
// will notify the sender so that it does not get stuck waiting for space.
void
FwdState::doneWithRetries()
=== modified file 'src/ftp.cc'
--- src/ftp.cc 2011-10-21 16:20:42 +0000
+++ src/ftp.cc 2012-02-14 03:19:45 +0000
@@ -477,40 +477,42 @@
/* NP: failure recovery may be possible when its only a data.conn failure.
* if the ctrl.conn is still fine, we can send ABOR down it and retry.
* Just need to watch out for wider Squid states like shutting down or reconfigure.
*/
}
FtpStateData::FtpStateData(FwdState *theFwdState, const Comm::ConnectionPointer &conn) : AsyncJob("FtpStateData"), ServerStateData(theFwdState)
{
const char *url = entry->url();
debugs(9, 3, HERE << "'" << url << "'" );
statCounter.server.all.requests++;
statCounter.server.ftp.requests++;
theSize = -1;
mdtm = -1;
if (Config.Ftp.passive && !flags.pasv_failed)
flags.pasv_supported = 1;
flags.rest_supported = 1;
+ conn->wasIdle = false; // no pconns for FTP
+
typedef CommCbMemFunT<FtpStateData, CommCloseCbParams> Dialer;
AsyncCall::Pointer closer = JobCallback(9, 5, Dialer, this, FtpStateData::ctrlClosed);
ctrl.opened(conn, closer);
if (request->method == METHOD_PUT)
flags.put = 1;
}
FtpStateData::~FtpStateData()
{
debugs(9, 3, HERE << entry->url() );
if (reply_hdr) {
memFree(reply_hdr, MEM_8K_BUF);
reply_hdr = NULL;
}
if (data.opener != NULL) {
data.opener->cancel("FtpStateData destructed");
data.opener = NULL;
=== modified file 'src/gopher.cc'
--- src/gopher.cc 2011-11-22 12:00:59 +0000
+++ src/gopher.cc 2012-02-14 03:14:19 +0000
@@ -981,25 +981,27 @@
/* Index URL without query word */
/* We have to generate search page back to client. No need for connection */
gopherMimeCreate(gopherState);
if (gopherState->type_id == GOPHER_INDEX) {
gopherState->conversion = gopher_ds::HTML_INDEX_PAGE;
} else {
if (gopherState->type_id == GOPHER_CSO) {
gopherState->conversion = gopher_ds::HTML_CSO_PAGE;
} else {
gopherState->conversion = gopher_ds::HTML_INDEX_PAGE;
}
}
gopherToHTML(gopherState, (char *) NULL, 0);
fwd->complete();
return;
}
gopherState->serverConn = fwd->serverConnection();
+ gopherState->serverConn->wasIdle = false; // no pconns for Gopher
+
gopherSendRequest(fwd->serverConnection()->fd, gopherState);
AsyncCall::Pointer timeoutCall = commCbCall(5, 4, "gopherTimeout",
CommTimeoutCbPtrFun(gopherTimeout, gopherState));
commSetConnTimeout(fwd->serverConnection(), Config.Timeout.read, timeoutCall);
}
=== modified file 'src/http.cc'
--- src/http.cc 2011-10-21 16:20:42 +0000
+++ src/http.cc 2012-02-14 03:10:05 +0000
@@ -1070,40 +1070,44 @@
}
// handle I/O errors
if (io.flag != COMM_OK || len < 0) {
debugs(11, 2, HERE << io.conn << ": read failure: " << xstrerror() << ".");
if (ignoreErrno(io.xerrno)) {
flags.do_next_read = 1;
} else {
ErrorState *err;
err = errorCon(ERR_READ_ERROR, HTTP_BAD_GATEWAY, fwd->request);
err->xerrno = io.xerrno;
fwd->fail(err);
flags.do_next_read = 0;
serverConnection->close();
}
return;
}
+ // no pconn races are possible if we have received something from server
+ if (serverConnection != NULL && len > 0)
+ serverConnection->wasIdle = false;
+
// update I/O stats
if (len > 0) {
readBuf->appended(len);
reply_bytes_read += len;
#if USE_DELAY_POOLS
DelayId delayId = entry->mem_obj->mostBytesAllowed();
delayId.bytesIn(len);
#endif
kb_incr(&statCounter.server.all.kbytes_in, len);
kb_incr(&statCounter.server.http.kbytes_in, len);
IOStats.Http.reads++;
for (clen = len - 1, bin = 0; clen; bin++)
clen >>= 1;
IOStats.Http.read_hist[bin]++;
// update peer response time stats (%<pt)
const timeval &sent = request->hier.peer_http_request_sent;
=== modified file 'src/pconn.cc'
--- src/pconn.cc 2011-11-08 13:40:26 +0000
+++ src/pconn.cc 2012-02-14 03:01:29 +0000
@@ -155,40 +155,42 @@
size_ -= n;
}
if (parent_ && size_ == 0) {
debugs(48, 3, HERE << "deleting " << hashKeyStr(&hash));
delete this;
}
}
void
IdleConnList::clearHandlers(const Comm::ConnectionPointer &conn)
{
debugs(48, 3, HERE << "removing close handler for " << conn);
comm_read_cancel(conn->fd, IdleConnList::Read, this);
commUnsetConnTimeout(conn);
}
void
IdleConnList::push(const Comm::ConnectionPointer &conn)
{
+ conn->wasIdle = true;
+
if (size_ == capacity_) {
debugs(48, 3, HERE << "growing idle Connection array");
capacity_ <<= 1;
const Comm::ConnectionPointer *oldList = theList_;
theList_ = new Comm::ConnectionPointer[capacity_];
for (int index = 0; index < size_; index++)
theList_[index] = oldList[index];
delete[] oldList;
}
if (parent_)
parent_->noteConnectionAdded();
theList_[size_++] = conn;
AsyncCall::Pointer readCall = commCbCall(5,4, "IdleConnList::Read",
CommIoCbPtrFun(IdleConnList::Read, this));
comm_read(conn, fakeReadBuf_, sizeof(fakeReadBuf_), readCall);
AsyncCall::Pointer timeoutCall = commCbCall(5,4, "IdleConnList::Timeout",
CommTimeoutCbPtrFun(IdleConnList::Timeout, this));