Re: HAProxy-1.8 sometimes sends a shorter html when using multithread function

2021-07-01 Thread Kazuhiro Takenaka
>
> Hello Willy,
>
> I was doing other work, but now I can resume
> investigating this mattter.
>
> As I reported in my previous message,
> The status of HAProxy-1.8 is as follows.
>
> Shoter html issue:
>
> The underlying problem is unknown, but
> the patch2 you suggeted works well,
> this issue is resolved.
>
> HTTP408 issue:
>
> This issue still remains.
>
> Do you think the HTTP408 issue should be
> resolved to ship the next version of HAProxy-1.8 ?
>
> Kazu
>


Re: HAProxy-1.8 sometimes sends a shorter html when using multithread function

2021-06-01 Thread Kazuhiro Takenaka
Hello Willy, Ryan

I have ran the 3 times tests for each case
The following lists are the results.

TestA haproxy-1.8.30 + (patch 1+2+3)
1st 2nd 3rd
  Shorter HTML:   0   0   0
  HTTP 408:   2   1   4

TestB haproxy-1.8.30 + (patch2 only)
1st 2nd 3rd
  Shorter HTML:   0   0   0
  HTTP 408:   3   4   5

TestC haproxy-2.4.0
1st 2nd 3rd
  Shorter HTML:   0   0   0
  HTTP 408:   0   0   0

Every case, the number of threads is 16.

# I also ran the tests without multithread
# funtuion and found no shoter HTML and
# HTTP408 issue.

>From this result, I thought as follows.

* haproxy-1.8.30
  Both of all patches and patch 2 only
  resolve the shoter html issue using
  multithread function, but the HTTP 408
  issue remains.

* haproxy-2.4.0
  This version doesn't have both of
  the shoter html issue and the HTTP
  408 issue using multithread function.

So I checked even for other thread counts on
haproxy-1.8.30 + patch2 as follows.

N_THREADS N_SHORT N_HTTP408
1   0 0
2   0 0
8   0 2
   12   0 1 (12 is CPUs inlucdes Hyper-Thread)
   16   0 3 (1st test of previous list)
   32   0 0
   64   0 0

Now I think patch2 is enough for the shorter
HTML isusue as Willy previously expected.



The following text is about performance
terms of mutithread function, I wrote this
for your infomation.

* The multithread function improves throuhput

According the tests bellow, which download same HTML
10 million times in 20 parallel client processes,
I found the multithread function clearly improves
the throuhput.

haproxy-1.8.30+(patch2 only)
  N_THREADS / RUN_TIME(sec)
  1 / 1280
  2 /  916
  8 /  737
 12 /  744
 16 /  751
 32 /  797
 64 /  908

haproxy-2.4.0
  N_THREADS / RUN_TIME(sec)
  1 / 1400
  2 /  947
  8 /  726
 12 /  730
 16 /  743
 32 /  782
 64 /  905

# In the case only haproxy-2.4.0 + 64 threads,
# the number of log messages was 24 less than
# that of downloads, I think this is another issue.

Before these tests, I also ran TPC-W tests and
I couldn't find out performance improvement in
HTTP mode.
(In HTTPS mode, I found it improves performance)

Yes, I know TPC-W is an obsolete test model.
But the department I belong to have accumelated
experience and tools about TPC-W.

My department is verifying with these tools
for performance and high availability
upgrade verification.

I first noticed the shorter HTML issue in
these TPC-W tests.

* Relation between the number of threads and CPUs

In the number of threads is lower(1,2),
the throughput of haproxy-2.4.0 is lesser
than that of haproxy-1.8.30.

But the throughput of haproxy-2.4.0 is
superior when the number of threads is
8 or more.

This may be due to the detailed difference in
multithreaded exclusive control.

Throughput is good when the number of
threads(8,12,16) is about the same as the
number of CPUs(12).

Throughput drops if the number of threads
is increased too much(32,64).

Kazu

On Tue, May 25, 2021 at 5:19 PM Kazuhiro Takenaka  wrote:

> Hellow Willy,
>
> # Sorry I post this message directly to you,
> # so i resend to the mailing list.
>
> I tested the patches you suggested, and got
> the following results.
>
> TestA haproxy-1.8.30 + patch(1 + 2 + 3)
>   Shorter HTML: 0
>   HTTP 408: 2
>
> TestB haproxy-1.8.30 + patch(2 only)
>   Shorter HTML: 0
>   HTTP 408: 3
>
> TestC haproxy-2.4.0
>   Shorter HTML: 0
>   HTTP 408: 0
>
> haproxy-2.4.0 ran without no issue,
> but I found the bytes_read column of
> haproxy-2.4.0 was diffrent from that of
> haproxy-1.8.30 and haproxy-2.0.22 in the
> normal cases.
>
> Values of bytes_read:
>   haproxy-1.8.30: 4864
>   haproxy-2.0.22: 4837 (Also 2.3.X)
>   haproxy-2.4.0 : 4832
>
> I don't think the difference of bytes_read is
> related to the shorter html issue and the
> HTTP 408 issue, I just noticed so report it.
>
> For now, I've ran TestA,B,C once.
> I am planning to do 2 more tests on each case.
>
> >> I also tried haproxy-2.0.22 and I didn't find
> >> messages about shorter htmls but the number of
> >> log messages is smaller than the number of HTTP
> >> accesses.
> >
> >This should definitely not happen, but it obviously depends
> >where they are counted.
> >
> >For example some aborted idempotent requests between haproxy
> >and the server on a reused connection could lead to a silent
> >close to let the client try again.
> >
> >In this case it is guaranteed by the compliance with the
> >HTTP protocol that the server has not done anything that
> >cannot be retried, and all haproxy can see there is a
> >failure to process (while in fact it could just be caused
> >by an abort in 

Re: HAProxy-1.8 sometimes sends a shorter html when using multithread function

2021-05-25 Thread Kazuhiro Takenaka
Hellow Willy,

# Sorry I post this message directly to you,
# so i resend to the mailing list.

I tested the patches you suggested, and got
the following results.

TestA haproxy-1.8.30 + patch(1 + 2 + 3)
  Shorter HTML: 0
  HTTP 408: 2

TestB haproxy-1.8.30 + patch(2 only)
  Shorter HTML: 0
  HTTP 408: 3

TestC haproxy-2.4.0
  Shorter HTML: 0
  HTTP 408: 0

haproxy-2.4.0 ran without no issue,
but I found the bytes_read column of
haproxy-2.4.0 was diffrent from that of
haproxy-1.8.30 and haproxy-2.0.22 in the
normal cases.

Values of bytes_read:
  haproxy-1.8.30: 4864
  haproxy-2.0.22: 4837 (Also 2.3.X)
  haproxy-2.4.0 : 4832

I don't think the difference of bytes_read is
related to the shorter html issue and the
HTTP 408 issue, I just noticed so report it.

For now, I've ran TestA,B,C once.
I am planning to do 2 more tests on each case.

>> I also tried haproxy-2.0.22 and I didn't find
>> messages about shorter htmls but the number of
>> log messages is smaller than the number of HTTP
>> accesses.
>
>This should definitely not happen, but it obviously depends
>where they are counted.
>
>For example some aborted idempotent requests between haproxy
>and the server on a reused connection could lead to a silent
>close to let the client try again.
>
>In this case it is guaranteed by the compliance with the
>HTTP protocol that the server has not done anything that
>cannot be retried, and all haproxy can see there is a
>failure to process (while in fact it could just be caused
>by an abort in flight).
>
>So if you're missing very few logs it could be an explanation.
>Or you may simply be losing logs since they're sent as datagrams
>only in 2.0.

Yes, difference of the number of logs was 3.
Your explanation hit the point, thank you.

Kazu

On Tue, May 18, 2021 at 6:52 PM Willy Tarreau  wrote:

> Hello,
>
> On Mon, May 17, 2021 at 09:47:10AM +0900, Kazuhiro Takenaka wrote:
> > Hello
> >
> > This is my first post to this mailing list.
> > I am not good at using English, so feel free to ask me
> > if my text is hard to understand.
>
> Rest assured that the majority of participants here (me included) do
> not have English as their native language, so you're not special on
> this point. And I had absolutely no problem understanding all your
> explanations, your English is better than you seem to think :-)
>
> > I noticed haproxy-1.8 sometimes sent incomplete htmls to
> > clients when running haproxy with the attached haproxy.cfg
> > that uses multithread function.
> >
> > # I also attached content01.html and check.html that
> > # are deployed on backend servers.
> > # content01.html is used in the confirmnation test
> > # described below, check.html is for health check
> > # purpose.
> >
> > In this case, the client receives a shorter html.
> >
> > I confirmed haoproxy's log message corresponded to the
> > shorter html, the following line is one of them.
> >
> > 2021-03-23T15:54:46.869626+09:00 lb01 [err] haproxy[703957]:
> > 192.168.1.73:60572 [23/Mar/2021:15:54:46.868] vs_http01
> rs_http01web/web01
> > 0/0/0/0/0 200 2896 - - SD-- 12/12/7/2/0 0/0 "GET
> > /content01.html?x=170026897 HTTP/1.1"
>
> So it indicates an immediate close or error that is being detected on
> the server side just after two TCP segments. This can have some
> importance because depending how the server's TCP stack is tuned, it
> is very likely that it will only send two segments before growing its
> window, leaving a short blank period after them which can allow haproxy
> to detect an error.
>
> (...)
> > The following list is the frequency of abnormal access
> > when a total of 10 million accesses are made in 20 parallel
> > curl processes.
> >
> > status_code bytes_read occurence
> > 200 4344   1
> => exactly 3 segments
> > 200 2896   9
> => exactly 2 segments
> > 200 1448   6
> => exactly 1 segment
>
> It makes me think about a server which closes with a TCP reset after
> the data. It then becomes extremely timing-dependent and could explain
> why the size is a multiple of the number of segments.
>
> > 408 2162
> >
> > The following line is the log messages
> > in the case of 408.
> >
> > 2021-03-23T16:02:42.444084+09:00 lb01 [err] haproxy[703957]:
> > 192.168.1.73:37052 [23/Mar/2021:16:02:32.472] vs_http01
> vs_http01/
> > -1/-1/-1/-1/1 408 212 - - cR-- 14/14/0/0/0 0/0 ""
>
> So the client's request didn't make it into haproxy.
>
> > When I first met this behavior, I used haproxy-1.8.25 shipped
> > with RHEL8.3. So I obtained haproxy-1.8.30 from http://git.haproxy.org/
> > and built it, ran the test and got the result of the same sort.
> >
> > This behavior didn't happen without using multithread function.
>
> So that definitely indicates a race condition somewhere.
>
> > Next, I tried on haproxy-2.0.0 and confirmed it ran normally
> > without this behavior.
> >
> > Then I picked up several versions of haproxy
> > between 1.8.0 and 2.0.0 and built them, test them
> > and 

Re: HAProxy-1.8 sometimes sends a shorter html when using multithread function

2021-05-24 Thread Ryan O'Hara
On Tue, May 18, 2021 at 12:00 PM Willy Tarreau  wrote:

> Hi Ryan,
>
> On Tue, May 18, 2021 at 10:54:11AM -0500, Ryan O'Hara wrote:
> > > > I confirmed haoproxy's log message corresponded to the
> > > > shorter html, the following line is one of them.
> > > >
> > > > 2021-03-23T15:54:46.869626+09:00 lb01 [err] haproxy[703957]:
> > > > 192.168.1.73:60572 [23/Mar/2021:15:54:46.868] vs_http01
> > > rs_http01web/web01
> > > > 0/0/0/0/0 200 2896 - - SD-- 12/12/7/2/0 0/0 "GET
> > > > /content01.html?x=170026897 HTTP/1.1"
> > >
> >
> > This is exactly the same problem as I reported on the mailing list a
> couple
> > weeks ago. I accidentally replied to off-list to Kazuhiro but will
> continue
> > the conversation here.
>
> Excellent, that will be one less issue to chase!
>
> > > > So I commented out "fdtab[fd].ev &= FD_POLL_STICKY;"
> > > > from both of haproxy-1.8.25 and haproxy-1.8.30,
> > > > then the behavior is resolved.
> > >
> > > This is very strange. I could almost have understood the opposite, i.e.
> > > remove an activity report before leaving so that we don't wake up, but
> > > here *not* removing the flags indicates that we're leaving the FD
> > > reports for another call.
> > >
> > > > I don't know why this commit resolves the behavior,
> > > > I just tried and ran the test.
> > >
> > > What I *suspect* is happening is the following:
> > >
> > >   - the server closes with a TCP reset
> > >   - sometimes for an unknown reason we do not process the event
> > > before leaving this function
> > >   - we then flush the activity flags
> > >   - the reset flag is still there, reported as POLLERR, forcing
> > > an immediate subsequent wakeup
> > >   - the I/O handler sees POLLERR without POLLIN and concludes this is
> > > a hard error and doesn't even bother to try to read, resulting in
> > > the loss of the pending segments.
> > >
> >
> > In my original message, I included a portion of the tcpdump and the RST
> > package is being sent by haproxy to the server. I never see a TCP reset
> > from the server itself.
>
> Ah yes, seeing it now. Thanks, this will help!
>
> > Under wireshark, I can see that the HTTP response is a total of 3
> segments,
> > and as far as I can tell all three segments were received by haproxy.
>
> In fact what you know from a trace is that they're received by the NIC,
> you can know they're received by the TCP stack when you see them ACKed,
> and you know they're received by haproxy if you're seeing haproxy pass
> them on the other side. I mean, most of the case something that arrives
> to the stack will reach haproxy but there are some situations where this
> will not happen, such as  if haproxy closes before, or if some errors
> are detected early and prevent it from reading. The difference is subtle
> but it explains how/why some error flags may be reported indicating an
> error at the transport layer, with the transport layer including the
> lower layers of haproxy itself.
>

Ah yes. I should have recognized this.



> > Pardon if this next bit doesn't make sense, but the third and final
> segment
> > is shown under wireshark as the HTTP response itself. In other words:
> >
> > Segment #1: 2896 bytes
> > Segment #2: 1448 bytes
> > Segment #3: 1059 bytes <- this final frame also includes the HTTP
> response
> > header
>
> No that's something common and weird about wireshark/tshark, it logs the
> status code on the last segment. Someone once told me why he found that
> convenient but I since forgot, considering all the cases where I find this
> misleading! Otherwise just use tcpdump, it doesn't try to be smart and to
> report lies, it will not cheat on you.
>
> So in your trace we can see that the stack ACKs receipt of the first two
> segments (2896 bytes of data), then the 3rd segment (1448). Then haproxy
> closes "cleanly" (something it shouldn't do on this side, at the risk of
> leaving TIME_WAIT sockets). Maybe it found an HTTP close in the response,
> but regardless it should not close like this.
>

It definitely did have connection close, but I agree it should not behave
this way.



> I'm not sure I understand this "HTTP 1125" response, whether it's the last
> bytes of response or a late retransmit for any reason, and as tshark is
> still unable to display sequence and ack numbers on *all* packets, it's
> impossible to reliably know what this corresponds to :-/ for the next
> reset it's apparently the brutal nolinger close at the end avoiding the
> TIME_WAIT, because the Seq is the same as the one on the FIN, indicating
> it wipes the FIN.
>

Right. I suspected that the RST packet was caused by a nolinger with time
set to 0. I found the code that does that, I just don't understand why it
is being triggered in this case.



> > When the problem occurs, haproxy logs a termination state of "SD--" and
> the
> > byte count is short by exactly the size of the last segment (1059 in this
> > case). I think this is well established here but I just wanted to be
> 

Re: HAProxy-1.8 sometimes sends a shorter html when using multithread function

2021-05-19 Thread Kazuhiro Takenaka
High, Willy and Ryan.

Thanks for replying my message.

I am reading your replies to decide
what I will do next.

I will report results.

Due to my schedule, It will be this weekend
or early next week.

Kazu


Re: HAProxy-1.8 sometimes sends a shorter html when using multithread function

2021-05-18 Thread Willy Tarreau
Hi Ryan,

On Tue, May 18, 2021 at 10:54:11AM -0500, Ryan O'Hara wrote:
> > > I confirmed haoproxy's log message corresponded to the
> > > shorter html, the following line is one of them.
> > >
> > > 2021-03-23T15:54:46.869626+09:00 lb01 [err] haproxy[703957]:
> > > 192.168.1.73:60572 [23/Mar/2021:15:54:46.868] vs_http01
> > rs_http01web/web01
> > > 0/0/0/0/0 200 2896 - - SD-- 12/12/7/2/0 0/0 "GET
> > > /content01.html?x=170026897 HTTP/1.1"
> >
> 
> This is exactly the same problem as I reported on the mailing list a couple
> weeks ago. I accidentally replied to off-list to Kazuhiro but will continue
> the conversation here.

Excellent, that will be one less issue to chase!

> > > So I commented out "fdtab[fd].ev &= FD_POLL_STICKY;"
> > > from both of haproxy-1.8.25 and haproxy-1.8.30,
> > > then the behavior is resolved.
> >
> > This is very strange. I could almost have understood the opposite, i.e.
> > remove an activity report before leaving so that we don't wake up, but
> > here *not* removing the flags indicates that we're leaving the FD
> > reports for another call.
> >
> > > I don't know why this commit resolves the behavior,
> > > I just tried and ran the test.
> >
> > What I *suspect* is happening is the following:
> >
> >   - the server closes with a TCP reset
> >   - sometimes for an unknown reason we do not process the event
> > before leaving this function
> >   - we then flush the activity flags
> >   - the reset flag is still there, reported as POLLERR, forcing
> > an immediate subsequent wakeup
> >   - the I/O handler sees POLLERR without POLLIN and concludes this is
> > a hard error and doesn't even bother to try to read, resulting in
> > the loss of the pending segments.
> >
> 
> In my original message, I included a portion of the tcpdump and the RST
> package is being sent by haproxy to the server. I never see a TCP reset
> from the server itself.

Ah yes, seeing it now. Thanks, this will help!

> Under wireshark, I can see that the HTTP response is a total of 3 segments,
> and as far as I can tell all three segments were received by haproxy.

In fact what you know from a trace is that they're received by the NIC,
you can know they're received by the TCP stack when you see them ACKed,
and you know they're received by haproxy if you're seeing haproxy pass
them on the other side. I mean, most of the case something that arrives
to the stack will reach haproxy but there are some situations where this
will not happen, such as  if haproxy closes before, or if some errors
are detected early and prevent it from reading. The difference is subtle
but it explains how/why some error flags may be reported indicating an
error at the transport layer, with the transport layer including the
lower layers of haproxy itself.

> Pardon if this next bit doesn't make sense, but the third and final segment
> is shown under wireshark as the HTTP response itself. In other words:
> 
> Segment #1: 2896 bytes
> Segment #2: 1448 bytes
> Segment #3: 1059 bytes <- this final frame also includes the HTTP response
> header

No that's something common and weird about wireshark/tshark, it logs the
status code on the last segment. Someone once told me why he found that
convenient but I since forgot, considering all the cases where I find this
misleading! Otherwise just use tcpdump, it doesn't try to be smart and to
report lies, it will not cheat on you.

So in your trace we can see that the stack ACKs receipt of the first two
segments (2896 bytes of data), then the 3rd segment (1448). Then haproxy
closes "cleanly" (something it shouldn't do on this side, at the risk of
leaving TIME_WAIT sockets). Maybe it found an HTTP close in the response,
but regardless it should not close like this.

I'm not sure I understand this "HTTP 1125" response, whether it's the last
bytes of response or a late retransmit for any reason, and as tshark is
still unable to display sequence and ack numbers on *all* packets, it's
impossible to reliably know what this corresponds to :-/ for the next
reset it's apparently the brutal nolinger close at the end avoiding the
TIME_WAIT, because the Seq is the same as the one on the FIN, indicating
it wipes the FIN.

> When the problem occurs, haproxy logs a termination state of "SD--" and the
> byte count is short by exactly the size of the last segment (1059 in this
> case). I think this is well established here but I just wanted to be clear.

I think it is possible that the response is not correct (i.e. too short
content-length compared to what's sent back, or incorrect chunked-encoded
length). Ryan, if you could maybe pass the trace between .152 and .153
through tcpdump -nvvX, this could be very useful (and possibly even help
figure a reproducer). Be careful, it may need to be edited to mask private
info.

> I can help test these in my environment. Thanks.

Great, thank you! I suspect that there is something else, though,
considering that haproxy closed first. Or maybe it *thought* that 

Re: HAProxy-1.8 sometimes sends a shorter html when using multithread function

2021-05-18 Thread Ryan O'Hara
On Tue, May 18, 2021 at 5:21 AM Willy Tarreau  wrote:

> Hello,
>
> On Mon, May 17, 2021 at 09:47:10AM +0900, Kazuhiro Takenaka wrote:
> > Hello
> >
> > This is my first post to this mailing list.
> > I am not good at using English, so feel free to ask me
> > if my text is hard to understand.
>
> Rest assured that the majority of participants here (me included) do
> not have English as their native language, so you're not special on
> this point. And I had absolutely no problem understanding all your
> explanations, your English is better than you seem to think :-)
>
> > I noticed haproxy-1.8 sometimes sent incomplete htmls to
> > clients when running haproxy with the attached haproxy.cfg
> > that uses multithread function.
> >
> > # I also attached content01.html and check.html that
> > # are deployed on backend servers.
> > # content01.html is used in the confirmnation test
> > # described below, check.html is for health check
> > # purpose.
> >
> > In this case, the client receives a shorter html.
> >
> > I confirmed haoproxy's log message corresponded to the
> > shorter html, the following line is one of them.
> >
> > 2021-03-23T15:54:46.869626+09:00 lb01 [err] haproxy[703957]:
> > 192.168.1.73:60572 [23/Mar/2021:15:54:46.868] vs_http01
> rs_http01web/web01
> > 0/0/0/0/0 200 2896 - - SD-- 12/12/7/2/0 0/0 "GET
> > /content01.html?x=170026897 HTTP/1.1"
>

This is exactly the same problem as I reported on the mailing list a couple
weeks ago. I accidentally replied to off-list to Kazuhiro but will continue
the conversation here.

So it indicates an immediate close or error that is being detected on
> the server side just after two TCP segments. This can have some
> importance because depending how the server's TCP stack is tuned, it
> is very likely that it will only send two segments before growing its
> window, leaving a short blank period after them which can allow haproxy
> to detect an error.
>

This is interesting.


> (...)
> > The following list is the frequency of abnormal access
> > when a total of 10 million accesses are made in 20 parallel
> > curl processes.
> >
> > status_code bytes_read occurence
> > 200 4344   1
> => exactly 3 segments
> > 200 2896   9
> => exactly 2 segments
> > 200 1448   6
> => exactly 1 segment
>
> It makes me think about a server which closes with a TCP reset after
> the data. It then becomes extremely timing-dependent and could explain
> why the size is a multiple of the number of segments.
>
> > 408 2162
> >
> > The following line is the log messages
> > in the case of 408.
> >
> > 2021-03-23T16:02:42.444084+09:00 lb01 [err] haproxy[703957]:
> > 192.168.1.73:37052 [23/Mar/2021:16:02:32.472] vs_http01
> vs_http01/
> > -1/-1/-1/-1/1 408 212 - - cR-- 14/14/0/0/0 0/0 ""
>
> So the client's request didn't make it into haproxy.
>
> > When I first met this behavior, I used haproxy-1.8.25 shipped
> > with RHEL8.3. So I obtained haproxy-1.8.30 from http://git.haproxy.org/
> > and built it, ran the test and got the result of the same sort.
> >
> > This behavior didn't happen without using multithread function.
>
> So that definitely indicates a race condition somewhere.
>

In my case it was also reported that nbthread must be set in order to
trigger the HTTP response status 200 with SD-- termination state, along
with shortened response size. Everything about this bug seems exactly
inline with what I am seeing.


> > Next, I tried on haproxy-2.0.0 and confirmed it ran normally
> > without this behavior.
> >
> > Then I picked up several versions of haproxy
> > between 1.8.0 and 2.0.0 and built them, test them
> > and found the commit below resolved this behavior.
> >
> > ===
> > commit 524344b4e0434b86d83869ef051f98d24505c08f
> > Author: Olivier Houchard 
> > Date:   Wed Sep 12 17:12:53 2018 +0200
> >
> > MEDIUM: connections: Don't reset the polling flags in
> conn_fd_handler().
> >
> > Resetting the polling flags at the end of conn_fd_handler()
> shouldn't be
> > needed anymore, and it will create problem when we won't handle
> > send/recv
> > from conn_fd_handler() anymore.
> >
> > diff --git a/src/connection.c b/src/connection.c
> > index ab32567b..e303f2c3 100644
> > --- a/src/connection.c
> > +++ b/src/connection.c
> > @@ -203,9 +203,6 @@ void conn_fd_handler(int fd)
> > conn->mux->wake(conn) < 0)
> > return;
> >
> > -   /* remove the events before leaving */
> > -   fdtab[fd].ev &= FD_POLL_STICKY;
> > -
> > /* commit polling changes */
> > conn->flags &= ~CO_FL_WILL_UPDATE;
> > conn_cond_update_polling(conn);
> > ===
> >
> > So I commented out "fdtab[fd].ev &= FD_POLL_STICKY;"
> > from both of haproxy-1.8.25 and haproxy-1.8.30,
> > then the behavior is resolved.
>
> This is very strange. I could almost have understood the opposite, i.e.
> remove an 

Re: HAProxy-1.8 sometimes sends a shorter html when using multithread function

2021-05-18 Thread Willy Tarreau
Hello,

On Mon, May 17, 2021 at 09:47:10AM +0900, Kazuhiro Takenaka wrote:
> Hello
> 
> This is my first post to this mailing list.
> I am not good at using English, so feel free to ask me
> if my text is hard to understand.

Rest assured that the majority of participants here (me included) do
not have English as their native language, so you're not special on
this point. And I had absolutely no problem understanding all your
explanations, your English is better than you seem to think :-)

> I noticed haproxy-1.8 sometimes sent incomplete htmls to
> clients when running haproxy with the attached haproxy.cfg
> that uses multithread function.
>
> # I also attached content01.html and check.html that
> # are deployed on backend servers.
> # content01.html is used in the confirmnation test
> # described below, check.html is for health check
> # purpose.
> 
> In this case, the client receives a shorter html.
> 
> I confirmed haoproxy's log message corresponded to the
> shorter html, the following line is one of them.
> 
> 2021-03-23T15:54:46.869626+09:00 lb01 [err] haproxy[703957]:
> 192.168.1.73:60572 [23/Mar/2021:15:54:46.868] vs_http01 rs_http01web/web01
> 0/0/0/0/0 200 2896 - - SD-- 12/12/7/2/0 0/0 "GET
> /content01.html?x=170026897 HTTP/1.1"

So it indicates an immediate close or error that is being detected on
the server side just after two TCP segments. This can have some
importance because depending how the server's TCP stack is tuned, it
is very likely that it will only send two segments before growing its
window, leaving a short blank period after them which can allow haproxy
to detect an error.

(...)
> The following list is the frequency of abnormal access
> when a total of 10 million accesses are made in 20 parallel
> curl processes.
> 
> status_code bytes_read occurence
> 200 4344   1
=> exactly 3 segments
> 200 2896   9
=> exactly 2 segments
> 200 1448   6
=> exactly 1 segment

It makes me think about a server which closes with a TCP reset after
the data. It then becomes extremely timing-dependent and could explain
why the size is a multiple of the number of segments.

> 408 2162
>
> The following line is the log messages
> in the case of 408.
> 
> 2021-03-23T16:02:42.444084+09:00 lb01 [err] haproxy[703957]:
> 192.168.1.73:37052 [23/Mar/2021:16:02:32.472] vs_http01 vs_http01/
> -1/-1/-1/-1/1 408 212 - - cR-- 14/14/0/0/0 0/0 ""

So the client's request didn't make it into haproxy.

> When I first met this behavior, I used haproxy-1.8.25 shipped
> with RHEL8.3. So I obtained haproxy-1.8.30 from http://git.haproxy.org/
> and built it, ran the test and got the result of the same sort.
> 
> This behavior didn't happen without using multithread function.

So that definitely indicates a race condition somewhere.

> Next, I tried on haproxy-2.0.0 and confirmed it ran normally
> without this behavior.
> 
> Then I picked up several versions of haproxy
> between 1.8.0 and 2.0.0 and built them, test them
> and found the commit below resolved this behavior.
> 
> ===
> commit 524344b4e0434b86d83869ef051f98d24505c08f
> Author: Olivier Houchard 
> Date:   Wed Sep 12 17:12:53 2018 +0200
> 
> MEDIUM: connections: Don't reset the polling flags in conn_fd_handler().
> 
> Resetting the polling flags at the end of conn_fd_handler() shouldn't be
> needed anymore, and it will create problem when we won't handle
> send/recv
> from conn_fd_handler() anymore.
> 
> diff --git a/src/connection.c b/src/connection.c
> index ab32567b..e303f2c3 100644
> --- a/src/connection.c
> +++ b/src/connection.c
> @@ -203,9 +203,6 @@ void conn_fd_handler(int fd)
> conn->mux->wake(conn) < 0)
> return;
> 
> -   /* remove the events before leaving */
> -   fdtab[fd].ev &= FD_POLL_STICKY;
> -
> /* commit polling changes */
> conn->flags &= ~CO_FL_WILL_UPDATE;
> conn_cond_update_polling(conn);
> ===
> 
> So I commented out "fdtab[fd].ev &= FD_POLL_STICKY;"
> from both of haproxy-1.8.25 and haproxy-1.8.30,
> then the behavior is resolved.

This is very strange. I could almost have understood the opposite, i.e.
remove an activity report before leaving so that we don't wake up, but
here *not* removing the flags indicates that we're leaving the FD
reports for another call.

> I don't know why this commit resolves the behavior,
> I just tried and ran the test.

What I *suspect* is happening is the following:

  - the server closes with a TCP reset
  - sometimes for an unknown reason we do not process the event
before leaving this function
  - we then flush the activity flags
  - the reset flag is still there, reported as POLLERR, forcing
an immediate subsequent wakeup
  - the I/O handler sees POLLERR without POLLIN and concludes this is
a hard error and doesn't even bother to try to read, resulting in
the loss