Re: [AOLSERVER] Null responses

2014-08-28 Thread Jeff Rogers
Do you happen to have a custom error page for 503 errors (service 
unavailable)?  I'm looking over the code for places where a connection 
might be dropped without sending anything, and although this path 
doesn't seem to do that it does look possible for it to do other bad 
things (i.e., crash).   It could correlate with a quick burst of 
requests from a single user (loading all the images for a page, for 
example).

I'd expect an outright crash if this was the case, which would probably 
be more noticeable.  However, crashes can be funny things and there's a 
chance that it could do something entirely unexpected.

-J

Cyan ogilvie wrote:
> On Wed, Aug 27, 2014 at 9:34 PM, Torben Brosten  There is usually a cluster of up to about 6 requests that fail in this
> way for a given server in the space of around 2 seconds, with long
> intervals between the clusters where no failures happen.


--
Slashdot TV.  
Video for Nerds.  Stuff that matters.
http://tv.slashdot.org/
___
aolserver-talk mailing list
aolserver-talk@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/aolserver-talk


Re: [AOLSERVER] Null responses

2014-08-28 Thread Gustaf Neumann

Dear Cyan,

you seem to have invested in detail into the problem. We are using
aolserver and naviserver since many years with heavy traffic
(since two last years just naviserver), but we have not seen this
problem.

Some question pop up:
- you say, you have a legacy application: did the problem show up
  just recently or did you become aware of this just recently?

- can you reproduce the problem on are bare metal machine?

- do you have the option to replace haproxy by nginx in front of aolserver?
  nginx can use for the intratalk persistent connections (we could reduce
  the number of connections to naviserver by some order of magnitude),
  nginx has various retry options. maybe haproxy has the same, i have not
  worked with this.

-gustaf neumann

Am 28.08.14 13:51, schrieb Cyan ogilvie:
On Wed, Aug 27, 2014 at 9:34 PM, Torben Brosten > wrote:

>
> On 08/27/2014 12:02 PM, Cyan ogilvie wrote:> ..
>  > There doesn't seem to be a pattern to the failing requests,
>  > sometimes it's small static files like favicon.ico, but mostly not
>  > (although in> our case we're not using fastpath for that - different
> favicons are
>  > served based on the request context).  At the moment I'm leaning
>  > towards some sort of corrupted connection thread state - the failures
>  > tend to cluster by time, server, user - so that, although the 
failures
>  > are exceedingly rare overall (220 yesterday), it's often the case 
that

>  > a given user will have to reload a page several times before they get
>  > a successful response.  The servers are fronted by haproxy which will
>  > tend to send a given session back to the same server.
>  > ..
>
> Have you ruled out a router issue, such as from ipv4 exhaustion or
> localized network flooding?

I'm pretty sure it's not network related at this stage.  To test this 
I built a man-in-the-middle relay listening on port 8008 running on 
the same server as nsd, which forwards all connection traffic to 
127.0.0.1:80  and records the events and data it 
sees flowing in both directions, and a packet trace using tcpdump of 
both the requests arriving on eth0 and the relayed traffic on lo.  An 
iptables nat prerouting rule DNATs connections coming in on eth0 port 
80 to the relay's port 8008.  Another process watches the haproxy logs 
for indications of a failed request and retrieves a dump of the ring 
buffers from the relay and saves them for later analysis.


Here is the packet trace of the relay -> nsd traffic on the lo 
interface for a typical event (as captured by tcpdump, which indicated 
that no packets had been dropped):


Time  Prot Len  Info
34.750440 TCP  76 47576 > http [SYN] Seq=0 Win=32792 Len=0 MSS=16396 
SACK_PERM=1 WS=32
34.750465 TCP  76   http > 47576 [SYN, ACK] Seq=0 Ack=1 Win=32768 
Len=0 MSS=16396 SACK_PERM=1 WS=32

34.750479 TCP  68 47576 > http [ACK] Seq=1 Ack=1 Win=32800 Len=0
34.750720 TCP  4412 [TCP segment of a reassembled PDU]
34.750756 TCP  68 http > 47576 [ACK] Seq=1 Ack=4345 Win=32768 Len=0
34.751274 HTTP 439  GET 
/item/531138-RR-1012/Elgin-Art-Deco-Dial-Pocket-Watch HTTP/1.1

34.751295 TCP  68 http > 47576 [ACK] Seq=1 Ack=4716 Win=32768 Len=0
34.751377 TCP  68 http > 47576 [FIN, ACK] Seq=1 Ack=4716 Win=32768 Len=0
34.751492 TCP  68 47576 > http [FIN, ACK] Seq=4716 Ack=2 Win=32800 Len=0
34.751515 TCP  68 http > 47576 [ACK] Seq=2 Ack=4717 Win=32768 Len=0

The connection reaches the ESTABLISHED state and the HTTP request data 
is sent to nsd which is acked.  Then 0.8 ms later the connection is 
closed by nsd.  The first Tcl code that should execute for this 
request is a preauth filter, which starts by writing a log containing 
[ns_conn request].  That log message isn't reached in cases like this. 
 In this example the request is quite large (around 4.7 KB) because of 
some large cookies, but the same pattern happens for requests of 
around 400 bytes.


There appears to be a less common failure mode where the request 
processing happens normally and the Tcl code generates a normal 
response (HTTP code 200), but the response data never hits the 
network.  The network trace looks the same as the example I gave above 
except that the time between the ACK of the GET request and nsd's FIN, 
ACK is longer - around 20 - 70 ms which is in-line with the normal 
times for a successful request.  I haven't yet caught the packet trace 
between the relay and nsd for this case (only the trace on eth0 
between haproxy and the relay), so I'm not 100% certain of my 
interpretation of this failure mode yet.


There is usually a cluster of up to about 6 requests that fail in this 
way for a given server in the space of around 2 seconds, with long 
intervals between the clusters where no failures happen.


Cyan


--
Slashdot TV.
Video for Nerds.  Stuff that matters.
http://tv.slashdot.org/


___
aolserver-

Re: [AOLSERVER] Null responses

2014-08-28 Thread Cyan ogilvie
On Wed, Aug 27, 2014 at 9:34 PM, Torben Brosten  wrote:
>
> On 08/27/2014 12:02 PM, Cyan ogilvie wrote:> ..
>  > There doesn't seem to be a pattern to the failing requests,
>  > sometimes it's small static files like favicon.ico, but mostly not
>  > (although in> our case we're not using fastpath for that - different
> favicons are
>  > served based on the request context).  At the moment I'm leaning
>  > towards some sort of corrupted connection thread state - the failures
>  > tend to cluster by time, server, user - so that, although the failures
>  > are exceedingly rare overall (220 yesterday), it's often the case that
>  > a given user will have to reload a page several times before they get
>  > a successful response.  The servers are fronted by haproxy which will
>  > tend to send a given session back to the same server.
>  > ..
>
> Have you ruled out a router issue, such as from ipv4 exhaustion or
> localized network flooding?

I'm pretty sure it's not network related at this stage.  To test this I
built a man-in-the-middle relay listening on port 8008 running on the same
server as nsd, which forwards all connection traffic to 127.0.0.1:80 and
records the events and data it sees flowing in both directions, and a
packet trace using tcpdump of both the requests arriving on eth0 and the
relayed traffic on lo.  An iptables nat prerouting rule DNATs connections
coming in on eth0 port 80 to the relay's port 8008.  Another process
watches the haproxy logs for indications of a failed request and retrieves
a dump of the ring buffers from the relay and saves them for later analysis.

Here is the packet trace of the relay -> nsd traffic on the lo interface
for a typical event (as captured by tcpdump, which indicated that no
packets had been dropped):

Time  Prot Len  Info
34.750440 TCP  76   47576 > http [SYN] Seq=0 Win=32792 Len=0 MSS=16396
SACK_PERM=1 WS=32
34.750465 TCP  76   http > 47576 [SYN, ACK] Seq=0 Ack=1 Win=32768 Len=0
MSS=16396 SACK_PERM=1 WS=32
34.750479 TCP  68   47576 > http [ACK] Seq=1 Ack=1 Win=32800 Len=0
34.750720 TCP  4412 [TCP segment of a reassembled PDU]
34.750756 TCP  68   http > 47576 [ACK] Seq=1 Ack=4345 Win=32768 Len=0
34.751274 HTTP 439  GET
/item/531138-RR-1012/Elgin-Art-Deco-Dial-Pocket-Watch HTTP/1.1
34.751295 TCP  68   http > 47576 [ACK] Seq=1 Ack=4716 Win=32768 Len=0
34.751377 TCP  68   http > 47576 [FIN, ACK] Seq=1 Ack=4716 Win=32768 Len=0
34.751492 TCP  68   47576 > http [FIN, ACK] Seq=4716 Ack=2 Win=32800 Len=0
34.751515 TCP  68   http > 47576 [ACK] Seq=2 Ack=4717 Win=32768 Len=0

The connection reaches the ESTABLISHED state and the HTTP request data is
sent to nsd which is acked.  Then 0.8 ms later the connection is closed by
nsd.  The first Tcl code that should execute for this request is a preauth
filter, which starts by writing a log containing [ns_conn request].  That
log message isn't reached in cases like this.  In this example the request
is quite large (around 4.7 KB) because of some large cookies, but the same
pattern happens for requests of around 400 bytes.

There appears to be a less common failure mode where the request processing
happens normally and the Tcl code generates a normal response (HTTP code
200), but the response data never hits the network.  The network trace
looks the same as the example I gave above except that the time between the
ACK of the GET request and nsd's FIN, ACK is longer - around 20 - 70 ms
which is in-line with the normal times for a successful request.  I haven't
yet caught the packet trace between the relay and nsd for this case (only
the trace on eth0 between haproxy and the relay), so I'm not 100% certain
of my interpretation of this failure mode yet.

There is usually a cluster of up to about 6 requests that fail in this way
for a given server in the space of around 2 seconds, with long intervals
between the clusters where no failures happen.

Cyan
--
Slashdot TV.  
Video for Nerds.  Stuff that matters.
http://tv.slashdot.org/___
aolserver-talk mailing list
aolserver-talk@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/aolserver-talk


Re: [AOLSERVER] Null responses

2014-08-27 Thread Torben Brosten

On 08/27/2014 12:02 PM, Cyan ogilvie wrote:> ..
 > There doesn't seem to be a pattern to the failing requests,
 > sometimes it's small static files like favicon.ico, but mostly not
 > (although in> our case we're not using fastpath for that - different 
favicons are
 > served based on the request context).  At the moment I'm leaning
 > towards some sort of corrupted connection thread state - the failures
 > tend to cluster by time, server, user - so that, although the failures
 > are exceedingly rare overall (220 yesterday), it's often the case that
 > a given user will have to reload a page several times before they get
 > a successful response.  The servers are fronted by haproxy which will
 > tend to send a given session back to the same server.
 > ..

Have you ruled out a router issue, such as from ipv4 exhaustion or 
localized network flooding?


--
Slashdot TV.  
Video for Nerds.  Stuff that matters.
http://tv.slashdot.org/
___
aolserver-talk mailing list
aolserver-talk@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/aolserver-talk


Re: [AOLSERVER] Null responses

2014-08-27 Thread Cyan ogilvie
On Tue, Aug 26, 2014 at 11:05 PM, Tony Bennett (Brown Paper Tickets)
 wrote:
> Does this only happen under heavy load?  You might be able to adjust your
> init.tcl to handle more connections. I'm not sure what you would need to
> change so I pulled out any config that might help.

Thanks, the current config has been been minimally modified from what
was being used for 3.4, so there are a few settings in your example
that we're not explicitly setting and that I'm not familiar with, so
they're good leads to follow.

Cyan

--
Slashdot TV.  
Video for Nerds.  Stuff that matters.
http://tv.slashdot.org/
___
aolserver-talk mailing list
aolserver-talk@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/aolserver-talk


Re: [AOLSERVER] Null responses

2014-08-27 Thread Cyan ogilvie
On Wed, Aug 27, 2014 at 12:29 AM, Jeff Rogers  wrote:
>
> Do you know if this connection dropping happens mostly when there is a lot of 
> activity or more frequently when there is very low activity?
>
> I recall a few edge cases in the thread pooling where a thread would in some 
> circumstances wait until another connection came in before running, and there 
> might have been a related case where a connection could get dropped.  IIRC, 
> these both happened generally when there was low traffic (or more 
> specifically low concurrent traffic).  Playing with maxconns might diminish 
> the problem in this case.

There doesn't seem to be a correlation between the site load and the
connection drops.  We never really see very low activity - the design
of the site means there are hundreds of thousands of pages, so we're
constantly being crawled by every bot ever spawned (we serve about 8GB
daily to googlebot alone).  But we use fairly small Amazon EC2
instances that are primarily memory limited, so the config is tuned to
reach a memory high water mark of around 700 - 800 MB, which based on
the load testing I did found maxthreads of 6 and maxconnections of 50
(modules/tcl/pools.tcl seems to take the maxconnections ns_param as
the ns_pools -maxconns parameter).  The project was most recently
moved from version 3.4 / Tcl 7.6 to 4.5 / Tcl 8.6, so there are a lot
of legacy bits and shims in place.  I recall something about the
meaning of the maxconnections ns_param changing meaning between these
versions, but it seems to be working as I would expect).

Server load tends to vary between a loadavg of 0.4 to 2.0, typically
around 0.6 (2 cores).  Concurrency is pretty low, it seldom reaches 6,
and usually sits at around 1 - 3, based on the monitoring we've
currently got in place.  I'm currently trying to reconstruct exact the
thread / concurrency / request context for the connection drop events
by parsing the log files, I'm hoping that might reveal some pattern in
the failures.  But so far they don't seem to correspond with
connection lifecycle events.

> You also mention favicon.ico; is it mostly or always that?  It's notable for 
> being a small static file, which could point to other causes, like a corrupt 
> interpreter state as Peter suggested.  Or there might be some weirdness with 
> mmap if you have that enabled.

There doesn't seem to be a pattern to the failing requests, sometimes
it's small static files like favicon.ico, but mostly not (although in
our case we're not using fastpath for that - different favicons are
served based on the request context).  At the moment I'm leaning
towards some sort of corrupted connection thread state - the failures
tend to cluster by time, server, user - so that, although the failures
are exceedingly rare overall (220 yesterday), it's often the case that
a given user will have to reload a page several times before they get
a successful response.  The servers are fronted by haproxy which will
tend to send a given session back to the same server.

> One other thought, can you switch to naviserver?  The connection handling 
> there has evolved somewhat differently not to mention more recently) than 
> aolserver, but programming-wise there are not a lot of differences.

It's probably not out of the question if there is a strong argument to
be made that it would fix the problem, we're taking quite a reputation
hit at the moment.  I initially attempted to make the site work on
naviserver since that seemed to be more active, but I ran into
problems with the nsdb / nsdbi change and segfaults when I tried to
get nsdb working on it.  It's also a 15 year old code base that seemed
to be quite sensitive to the small config and api changes in
naviserver, and the port from 3.4 / Tcl 7.6 was tricky enough as it
was (encoding issues, list parsing differences, regexp syntax, etc)
that the call was made to go with AOLserver 4.5 instead to minimize
the changes required.

But the site has been running well on the new version since March /
April, so porting to naviserver should be feasible, but I'd need to
make a very strong case.

Cyan

--
Slashdot TV.  
Video for Nerds.  Stuff that matters.
http://tv.slashdot.org/
___
aolserver-talk mailing list
aolserver-talk@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/aolserver-talk


Re: [AOLSERVER] Null responses

2014-08-27 Thread Tony Bennett (Brown Paper Tickets)
Does this only happen under heavy load?  You might be able to adjust
your init.tcl to handle more connections. I'm not sure what you would
need to change so I pulled out any config that might help.

-Tony

ns_section "ns/parameters"
ns_param listenbacklog32;# Max length of pending conn queue

# keepalivetimeout
# In prior versions of AOLserver this may have been under
# ns/servers/$server.  Set to 0 to disable Keep-alive support, any other
# value will enable Keep-alive but won't actually configure the timeout,
# which is now configured under the driver.
# (http://aolserver.am.net/docs/tuning.adpx)
ns_param keepalivetimeout30;# Max time conn is kept alive
(keepalive)

# maxkeepalive
# In prior versions of AOLserver this may have been under
# ns/servers/$server.  In AOLserver 4.0 and newer, this setting exists
# but appears to be non-operational
# (http://aolserver.am.net/docs/tuning.adpx)
ns_param maxkeepalive100;# Max no. of conns in keepalive
state

ns_section "ns/server/${server_name}"
ns_limits set default -maxrun 500;# the maximum number of
connections which can be running simultaneously

### Tuning Options ###
ns_param connsperthread0;# Normally there's one conn per
thread
#ns_param flushcontentfalse;# Flush all data before
returning
ns_param maxconnections100;# Max connections per
connection thread before it is shut down
#ns_param maxdropped0;# Shut down if dropping too many conns
ns_param maxthreads20;# Tune this to scale your server
ns_param minthreads0;# Tune this to scale your server
ns_param threadtimeout120;# Idle timeout for connection
threads
ns_param spread20;# Variance factor for
threadtimeout and maxconnections to prevent mass mortality of theads
(e.g. +-20%)

ns_section "ns/server/${server_name}/module/nssock"
ns_param Address$address
ns_param Hostname$host
ns_param Port$port
ns_param maxsock500
ns_param keepwait30;# number of seconds to hang-up on
clients while waiting for connection
ns_param socktimeout30;# number of seconds to wait for a
client request. increase for file upload
ns_param maxinput1MB;# maximum size of data sent from browser
ns_param maxline16k;# maximum number of bytes for http
request or header line.
ns_param maxheader64k;# maximum number of bytes for all
HTTP header lines in a request


# Thread library (nsthread) parameters
#
# If the server is crashing with no explanation you may have a corrupted
# data due to a stack overflow.  Calculate locally declared data types
# and function parameters to get the stack size needed.

ns_section "ns/threads"
#ns_param mutexmetertrue;# measure lock contention
#ns_param stacksize[expr 128*1024] ;# stack size per thread (in
bytes)


On 8/26/14, 9:05 AM, Cyan ogilvie wrote:
> Hi
>
> I'm hoping there are still some subscribers to this list ;)
>
> I'm trying to debug a strange condition we're seeing on a small
> percentage of our connections: connections are being closed by the
> server without any response being sent back on the connection
> (verified by looking at network packet traces and inserting a logging
> transparent proxy between the client and server).  The network packet
> pattern we see is:
>
> 
>
> Request data (in a single frame, or multiple), ACK
>
> Then the connection is closed by the server after 10 - 70 ms, without
> any data being sent, with a FIN/ACK (still getting confirmation on
> this - these logs are from the other side of the man-in-the-middle
> proxy I'm using to get debugging info).
>
> For some of the failed requests, the server processing never gets as
> far as the start of our Tcl code (a preauth filter that starts with an
> ns_log that doesn't show up in the server log).
>
> For others the request is processed normally and an access.log message
> written indicating that a response was generated with HTTP code 200,
> but no packet shows up on the network.
>
> There is no pattern to the failed requests (sometimes requests for
> favicon.ico fail), and retrying the exact request shortly afterwards
> often succeeds.
>
> Has anyone seen anything like this before, or have any advice on how
> to narrow down the cause further?
>
> We're running a slightly patched version of the last 4.5.2 rc, on
> Ubuntu 12.04.5 64bit on Amazon EC2 instances, with Tcl 8.6.1
>
> Thanks
>
> Cyan
>
>
> --
> Slashdot TV.  
> Video for Nerds.  Stuff that matters.
> http://tv.slashdot.org/
>
>
> ___
> aolserver-talk mailing list
> aolserver-talk@lists.sourceforge.ne

Re: [AOLSERVER] Null responses

2014-08-26 Thread Jeff Rogers
Hi Cyan,

Yes, there's still a few subscribers.

Do you know if this connection dropping happens mostly when there is a 
lot of activity or more frequently when there is very low activity?

I recall a few edge cases in the thread pooling where a thread would in 
some circumstances wait until another connection came in before running, 
and there might have been a related case where a connection could get 
dropped.  IIRC, these both happened generally when there was low traffic 
(or more specifically low concurrent traffic).  Playing with maxconns 
might diminish the problem in this case.

You also mention favicon.ico; is it mostly or always that?  It's notable 
for being a small static file, which could point to other causes, like a 
corrupt interpreter state as Peter suggested.  Or there might be some 
weirdness with mmap if you have that enabled.

One other thought, can you switch to naviserver?  The connection 
handling there has evolved somewhat differently not to mention more 
recently) than aolserver, but programming-wise there are not a lot of 
differences.

-J

Cyan ogilvie wrote:
> Hi
>
> I'm hoping there are still some subscribers to this list ;)
>
> I'm trying to debug a strange condition we're seeing on a small
> percentage of our connections: connections are being closed by the
> server without any response being sent back on the connection (verified
> by looking at network packet traces and inserting a logging transparent
> proxy between the client and server).  The network packet pattern we see is:
>
> 
>
> Request data (in a single frame, or multiple), ACK
>
> Then the connection is closed by the server after 10 - 70 ms, without
> any data being sent, with a FIN/ACK (still getting confirmation on this
> - these logs are from the other side of the man-in-the-middle proxy I'm
> using to get debugging info).
>
> For some of the failed requests, the server processing never gets as far
> as the start of our Tcl code (a preauth filter that starts with an
> ns_log that doesn't show up in the server log).
>
> For others the request is processed normally and an access.log message
> written indicating that a response was generated with HTTP code 200, but
> no packet shows up on the network.
>
> There is no pattern to the failed requests (sometimes requests for
> favicon.ico fail), and retrying the exact request shortly afterwards
> often succeeds.
>
> Has anyone seen anything like this before, or have any advice on how to
> narrow down the cause further?
>
> We're running a slightly patched version of the last 4.5.2 rc, on Ubuntu
> 12.04.5 64bit on Amazon EC2 instances, with Tcl 8.6.1
>
> Thanks
>
> Cyan
>
>
> --
> Slashdot TV.
> Video for Nerds.  Stuff that matters.
> http://tv.slashdot.org/
>
>
>
> ___
> aolserver-talk mailing list
> aolserver-talk@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/aolserver-talk
>


--
Slashdot TV.  
Video for Nerds.  Stuff that matters.
http://tv.slashdot.org/
___
aolserver-talk mailing list
aolserver-talk@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/aolserver-talk


Re: [AOLSERVER] Null responses

2014-08-26 Thread Peter Sadlon
Not sure if this will help, but
If you have a preauth filter (possibly other filters too) which does an 
ns_adp_abort (and/or possibly ns_adp_break) within the filter or within a proc 
called from the filter then the next time that thread is used, the state will 
be "abort" and will end the connection.
You can put this line as the first lines of your pre_auth filter:
ns_adp_exception statens_log notice $state
If you find states of "abort" in your log, do this as a simple hack to reset 
the state:
ns_adp_exception stateif {$state=="abort" || $state=="break"} {catch 
{ns_adp_return}set foobar [ns_adp_parse "Hello"]}
It may be a place to start looking, but if you have a ns_log in your preauth 
which is not logging then I would guess the issue is somewhat different.  But 
anyone who may use an ns_adp_abort or ns_adp_break from within a filter should 
be aware of this.  I believe using ns_adp_abort in a filter before version 4 
worked fine, since 4.0 or 4.5 it causes this abort state issue, I don't 
remember exactly, it was a long time ago that I had upgraded and had to figure 
that out.
_Peter

Date: Tue, 26 Aug 2014 18:05:31 +0200
From: c...@rubylane.com
To: aolserver-talk@lists.sourceforge.net
Subject: [AOLSERVER] Null responses

Hi
I'm hoping there are still some subscribers to this list ;)
I'm trying to debug a strange condition we're seeing on a small percentage of 
our connections: connections are being closed by the server without any 
response being sent back on the connection (verified by looking at network 
packet traces and inserting a logging transparent proxy between the client and 
server).  The network packet pattern we see is:


Request data (in a single frame, or multiple), ACK
Then the connection is closed by the server after 10 - 70 ms, without any data 
being sent, with a FIN/ACK (still getting confirmation on this - these logs are 
from the other side of the man-in-the-middle proxy I'm using to get debugging 
info).

For some of the failed requests, the server processing never gets as far as the 
start of our Tcl code (a preauth filter that starts with an ns_log that doesn't 
show up in the server log).

For others the request is processed normally and an access.log message written 
indicating that a response was generated with HTTP code 200, but no packet 
shows up on the network.
There is no pattern to the failed requests (sometimes requests for favicon.ico 
fail), and retrying the exact request shortly afterwards often succeeds.

Has anyone seen anything like this before, or have any advice on how to narrow 
down the cause further?
We're running a slightly patched version of the last 4.5.2 rc, on Ubuntu 
12.04.5 64bit on Amazon EC2 instances, with Tcl 8.6.1

Thanks
Cyan

--
Slashdot TV.  
Video for Nerds.  Stuff that matters.
http://tv.slashdot.org/
___
aolserver-talk mailing list
aolserver-talk@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/aolserver-talk 
  --
Slashdot TV.  
Video for Nerds.  Stuff that matters.
http://tv.slashdot.org/___
aolserver-talk mailing list
aolserver-talk@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/aolserver-talk