Re: Plans for 1.9

2018-02-08 Thread Willy Tarreau
On Fri, Feb 09, 2018 at 02:13:07PM +1100, Igor Cicimov wrote:
> Hi Willy,
> 
> On Fri, Feb 9, 2018 at 1:16 AM, Willy Tarreau  wrote:
> 
> Fred plans to bring SSL support to the peers among
> > other things, and is working on a regression testing suite (yeah!).
> 
> 
> Does this mean it will be possible to share the sessions tickets between
> the peers?

Ah no it was not this (though someone should likely try to work on it).
It's using SSL between the peers. Some people are synchronising their
haproxy nodes between multiple DCs and using SSL in this case is desired
but not convenient.

Willy



Re: Plans for 1.9

2018-02-08 Thread Igor Cicimov
Hi Willy,​

On Fri, Feb 9, 2018 at 1:16 AM, Willy Tarreau  wrote:

Fred plans to bring SSL support to the peers among
> other things, and is working on a regression testing suite (yeah!).


​Does this mean it will be possible to share the sessions tickets between
the peers?​


Re: haproxy 1.8 ssl backend server leads to server session aborts

2018-02-08 Thread Tomek Gacek

Hi Willy

On 2018-02-03 10:05, Willy Tarreau wrote:

Hi Tomek,

On Sat, Feb 03, 2018 at 08:47:35AM +0100, Tomek Gacek wrote:

I have same issue. It's pretty random as I would say about 60-70% requests
are OK, but rest is failing. I compiled all 1.8 versions and was able to
isolate this a little bit. It's fine up to 1.8.0-dev3 branch and it's
failing since 1.8.0-rc1.
The problem is for SSL connections and after digging in my apps logs it
looks like it's related to reusing keep alived connections between client
and haproxy.
By default I have in config "option http-server-close" present and when it's
there I can see the problem. When this option is removed - problem is
solved.

This is extremely valuable information. I suspect we might have damaged
something in the backend code when inserting the mux layer. Given that
the connection and stream are a bit more independant from each other now
due to the mux, it might be possible that depending the sequence of some
events (eg: connection close), it changes how the close event is
interpreted in the stream. This will definitely help us narrow down the
cause of the issue.



My tests show the problem is caused by this commit
http://git.haproxy.org/?p=haproxy-1.8.git;a=commitdiff;h=c2aae74f010f97a3415542fe649198a5d3be1ea8
This is first snapshot were I was able to recreate problem.

Unable to recreate in previous commit - 
http://git.haproxy.org/?p=haproxy-1.8.git;a=commit;h=253c62b257c137e7da5c273f42bc5d6eacd31d2c



Regards,
Tomek



Re: [ANNOUNCE] haproxy-1.8.4

2018-02-08 Thread Aleksandar Lazic

Hi.

Am 08.02.2018 um 14:26 schrieb Willy Tarreau:

Hi,

HAProxy 1.8.4 was released on 2018/02/08. It added 51 new commits
after version 1.8.3.


Great as always ;-)

The new version is now available on docker hub

https://hub.docker.com/r/me2digital/haproxy18/
https://hub.docker.com/r/me2digital/openshift-ocp-router-hap18/

Best Regards
Aleks


There are no very important issue fixed in this version but a number
of small and annoying ones. Overall I'd say that it's quite good and
it can start to make sense to try it for those who hesitated or were
waiting for the initial stability issues to be addressed.

One of the main changes touches the polling system in threaded mode.
While we started with a unified poller when using kqueue and epoll, it
appears that it wasn't the best idea because some threads cannot sleep
due to the activity of others, especially when these other ones are
dealing with long processing (eg: SSL). So we had to change this to
have a per-thread poller and to change the way the polling updates are
propagated down the layers. This possibly seems a bit scary like this
(and it would have scared me as well) but in the end the changes are
not that big and better match the way it works for a single thread.
Thanks to this change we don't face anymore the situation where all
threads suddenly go to 100% because only one is heavily loaded. This
sensitive change also explains the time it took for 1.8.4 to be issued.

The usual lot of master-worker small fixes is present, but the last ones
were very cosmetic (doc updates) so I think we're getting something
solid now.

I must say that I'm currently very satisfied with the way issues are
reported and addressed.

I know that some issues remain and are currently being investigated :
   - problem with the cache reported by Pierre Cheynier whereby using
 certain sample fetches like "path_end" in an ACL to decide to
 cache or not could cause some cache corruption ; let's try to
 narrow it down before giving conclusions.

   - http-send-name-header apparently struck again, though I failed to
 reproduce the problem. It might be an old one that manifests in a
 very special situation or time frame.

   - several H2 minor issues (DATA padding incorrectly accounted for in
 the connection window, DATA frames for closed streams not properly
 accounted, RST sometimes in response to an RST). None of them has
 a real visible impact in practice so I preferred to issue 1.8.4
 first to address the pending issues.

Overall nothing terrible, and we can issue 1.8.5 once these ones are
figured and addressed (and by then we'll get new ones).

To make a long story short, if you're using threads, master-worker,
SPOE or any version before 1.8.3, you should definitely update to avoid
facing already fixed issues. If you're already on 1.8.3 without any of
these features, well, have a look at the changelog below but in general
I'd say there's no emergency for you to update.

Please find the usual URLs below :
Site index   : http://www.haproxy.org/
Discourse: http://discourse.haproxy.org/
Sources  : http://www.haproxy.org/download/1.8/src/
Git repository   : http://git.haproxy.org/git/haproxy-1.8.git/
Git Web browsing : http://git.haproxy.org/?p=haproxy-1.8.git
Changelog: http://www.haproxy.org/download/1.8/src/CHANGELOG
Cyril's HTML doc : http://cbonte.github.io/haproxy-dconv/

Willy
---
Complete changelog :
Chris Lane (1):
   MINOR: init: emit warning when -sf/-sd cannot parse argument

Christopher Faulet (11):
   MINOR: threads/fd: Use a bitfield to know if there are FDs for a thread 
in the FD cache
   BUG/MEDIUM: threads/polling: Use fd_cache_mask instead of fd_cache_num
   BUG/MEDIUM: threads/server: Fix deadlock in 
srv_set_stopping/srv_set_admin_flag
   BUG/MEDIUM: checks: Don't try to release undefined conn_stream when a 
check is freed
   BUG/MINOR: kqueue/threads: Don't forget to close kqueue_fd[tid] on each 
thread
   MINOR: threads: Use __decl_hathreads instead of #ifdef/#endif
   BUILD: epoll/threads: Add test on MAX_THREADS to avoid warnings when 
complied without threads
   BUILD: kqueue/threads: Add test on MAX_THREADS to avoid warnings when 
complied without threads
   BUG/MINOR: threads: Update labels array because of changes in lock_label 
enum
   BUG/MEDIUM: spoe: Always try to receive or send the frame to detect 
shutdowns
   BUG/MEDIUM: spoe: Allow producer to read and to forward shutdown on 
request side

David Carlier (1):
   BUILD/MINOR: ancient gcc versions atomic fix

Emeric Brun (1):
   BUG/MEDIUM: peers: fix expire date wasn't updated if entry is modified 
remotely.

Jérôme Magnin (2):
   DOC: clarify the scope of ssl_fc_is_resumed
   DOC: Describe routing impact of using interface keyword on bind lines

Olivier Houchard (3):
   MINOR: dns: Handle SRV record weight correctly.
   MINOR: 

Randomness in hard-stop-after

2018-02-08 Thread Samuel Reed
When reconfiguring across a large web tier, it's possible that many
HAProxy reloads are initiated at nearly the same time. We have a large
number of long-lived TCP sessions, so we must use hard-stop-after to
eventually kill them off, or HAProxy instances will remain open for days.

It would be very helpful to have an option like
`spread-hard-stop-after`, similar to `spread-checks`, to introduce some
random variance into the timeout such that the disconnection of these
sessions does not cause a major traffic event across the stack.

Also, thank you very much for the 1.8.4 release!




Plans for 1.9

2018-02-08 Thread Willy Tarreau
Hi all,

I would have liked to send this ealier but 1.8 kept us quite busy for a
while.

Last year's development cycle went impressively well in my opinion, with
multiple teams being able to work in parallel without stepping too much
onto each others' toes, and bug fixes being batched thanks to the awesome
help from some members here and on discourse. So I'd like to stick to
that model for this development cycle, still with the same flexibility
we had during the previous cycle.

Last year I said that the dev cycle would be cut in 3 phases with the
first one ending at a strict date. In fact we never respect dates, but
they are still useful to fix a direction and let people know whether
they're about to be late or they are really too late.

Thus I'd like that all those willing to work on something announce it
before end of March so that we know who does what. After this, no more
unplanned important merges, but we can possibly decide to open a -next
branch if needed. We should start to emit -rc1 around end of September,
marking the freeze of any development, and plan for a release between
mid-October and end of November. In an ideal world we would stop any
dev before the summer holiday, but we tried this many times in the
past and it never worked, it always shifts.

For now we have two big changes scheduled in parallel, that are absolutely
required to pursue the work done on H2 so that we can do end-to-end H2 :

  - internal HTTP representation : instead of translating H2 to H1 at
the edge, we'll translate H2 and H1 to "HTTP". This will allow us
not to lose any semantics due to the representation (eg: the "never
indexed" header fields), and to clean up a lot of the internal API.
It should also improve our ability to have a properly working
"http-send-name-header", and to simplify everything related to HTTP
header manipulation (no more memmove, possibly a better interfacing
with Lua, etc). Christopher has already started taking a look at this
and I'm withing him good luck ;-)

  - rearchitecture of the connection layers. The current architecture we
have dates from 2012 now and was made to support SSL. But with QUIC
at the corner, a file decriptor oriented I/O scheduler is no more
future-proof and we've already met some difficulties in H2 that
justify breaking this into pieces that stack much better, and hopefully
more independantly of the threads. This also directly involves some
scheduler updates. Olivier has started working on this, with already
a first success in removing the locks from the FD cache, bringing a
60% performance increase at 12 threads. This and the rework above will
cause a lot of pain at merge time...

In addition to this, William has identified a number of improvements to
make to the cache, that he's willing to work on, as well as to improve
the way the SSL certificates are managed (ie: split crt vs key, and merge
same ones together in memory to save RAM for those dealing with millions
of certs in memory). Fred plans to bring SSL support to the peers among
other things, and is working on a regression testing suite (yeah!).

Among the cool stuff to do that is not assigned yet, we know that the
health checks should be improved (eg: ability to add a header after a
POST request). Probably that other things need to be discussed regarding
health checks. We just need to be careful that such changes don't collide
too much with the work above.

It would also be cool to exploit the peers a bit more. In the past we
imagined being able to list them with their state on the CLI, use
their state to divide a server's "maxconn" setting by #peers+1 when a
"shared-with " directive is present so that each maxconn
automatically adapts depending on the number of the visible peers.

We thought about a "debug" converter which would dump the sample into
an internal ring buffer that could be consulted from the CLI using
"show debug-ring" for example. That could also help collect information
on line number of matched ACLs.

A number of people asked for a per-server log-format, possibly coupled
with the ability to define log-format profiles in their own sections,
that could be reused at various places. I'm not aware of anyone working
on this, feel free to volunteer.

I have identified a number of non-user-visible stuff to rework internally
such as unifying chunks and buffers, but that's part of the permanent
maintenance, cleanup and update of the internal infrastructure. I'd also
like to support an option on the CLI to mask some fields so that bug
reporters could more easily anonymize their output when we ask them to
do so. I started to look into the code for this and it would benefit a
lot from the aforementionned change (merging chunks and buffers).

Some may have other ideas. Please keep in mind that it's nice to put a
wish list, but it's much better to be able to do certain things by
oneself. It may seem difficult first but once you start you'll 

Re: Issue in HAproxy requests with http-send-name-header

2018-02-08 Thread Roque Porchetto
Hello Willy,

thank you very much for your time on this. I'm adding here some extra
information:

Did you notice this problem with 1.7 or not ?


No, I tried with 1.8.1 (required by the system I'm using) and then I moved
to last version to try. Downgrading the version would take me some hours
due I would have to rebuild the whole system.

Could you please check if adding "option http-buffer-request" makes it
> worse or better ?


Yes! With that option I get 100% success.

Also please check in your logs if the faulty requests have experienced any
> retry or redispatch.


The faulty requests are not being retried or similar, because they don't
generate a failure, they just arrive to the login form with a wrong
(corrupt) password, so the login fails but the requests are ok.

I tried also another thing, related to the content-type of the requests,
and maybe could help you troubleshooting.
If I sent the requests from the client with the header "
*X-Content-Type-Options*" = "*nosniff*", the error is no longer reproduced.

I hope this helps, please let me know of you need any extra details or to
try more tests.

Regards,
Roque





2018-02-08 10:22 GMT+01:00 Willy Tarreau :

> On Thu, Feb 08, 2018 at 09:44:38AM +0100, Willy Tarreau wrote:
> > Now I'm seeing it. So it means that we're having something wrong with the
> > position computation once data start to come in the buffer. Did you
> notice
> > this problem with 1.7 or not ? The random speed at which data arrive in
> the
> > buffer explain the fact that you observe neither 0% nor 100% hit.
> >
> > Could you please check if adding "option http-buffer-request" makes it
> > worse or better ? I guess you'll get either 100% failure or 100% success,
> > this will help troubleshooting (and will possibly help you workaround the
> > issue for some time).
>
> By the way, I tried hard but never managed to get it to fail at all, so the
> tests above will be useful. Also please check in your logs if the faulty
> requests have experienced any retry or redispatch. It's in these cases that
> the http-send-name-header gets trickier. And since you're using maxconn 1,
> I'm wondering if one reason might not be because the server behind is a bit
> limited, possibly causing a significant enough number of retries exhibiting
> the bug.
>
> Thanks,
> Willy
>



-- 
Roque Porchetto


Peer tables don't synch on clear

2018-02-08 Thread Franks Andy (IT Technical Architecture Manager)
Hi all,
  Haproxy 1.6.13
  I've checked the documentation again but can't see an option for this.
We sometimes clear backup path server use for individual connections and whilst 
the peers synchronisation works for new connections, it doesn't clear on the 
secondary peer node we're using.
Is this by design or an option I'm not seeing?
Thanks
Andy


Re: Issue in HAproxy requests with http-send-name-header

2018-02-08 Thread Willy Tarreau
On Thu, Feb 08, 2018 at 09:44:38AM +0100, Willy Tarreau wrote:
> Now I'm seeing it. So it means that we're having something wrong with the
> position computation once data start to come in the buffer. Did you notice
> this problem with 1.7 or not ? The random speed at which data arrive in the
> buffer explain the fact that you observe neither 0% nor 100% hit.
> 
> Could you please check if adding "option http-buffer-request" makes it
> worse or better ? I guess you'll get either 100% failure or 100% success,
> this will help troubleshooting (and will possibly help you workaround the
> issue for some time).

By the way, I tried hard but never managed to get it to fail at all, so the
tests above will be useful. Also please check in your logs if the faulty
requests have experienced any retry or redispatch. It's in these cases that
the http-send-name-header gets trickier. And since you're using maxconn 1,
I'm wondering if one reason might not be because the server behind is a bit
limited, possibly causing a significant enough number of retries exhibiting
the bug.

Thanks,
Willy



Re: Issue in HAproxy requests with http-send-name-header

2018-02-08 Thread Willy Tarreau
Hello Roque,

On Tue, Feb 06, 2018 at 05:37:32PM +0100, Roque Porchetto wrote:
> Hello,
> 
> I'm working on a scalability test project that currently involves to
> simulate several user logins on a system whose load balancing relies on
> haproxy (latest stable version 1.8.3).
> 
> In a scenario where 20 or more users try to login to the system, around 20%
> of them fail. After intense debugging and analysis of tcpdumps, as far as I
> see, the issue seems to come from haproxy: some of the http requests are
> malformed, due the headers are inserted in a bad way. Specifically, the
> header "http-send-name-header". If that keyword is removed from the haproxy
> configuration, the error is no longer reproduced. But http-send-name-header
> is needed by the system to manage user-backends relations through cookies.

Argh, that's not fun, because it's the worst option ever brought to haproxy,
regularly breaking due to insignificant changes :-(

> Here an example of a request malformed by haproxy:
> [image: Imágenes integradas 2]

(note: in the future, please avoid posting images for header captures,
 it's quite a pain to read, search and comment on).

> That request must have been:
> 
> ...
> 
> X-Balancer-Current-Cookie: SERVERID
> 
> X-Balancer-Current-Server: user-0
> 
> cancel_url=http...&__ac_password=myPassword
> ...
> 
> 
> But we can see in the malformed request that the X-Balancer-Current-Server
> key-value was inserted inside the body changing the value of __ac_password.

Now I'm seeing it. So it means that we're having something wrong with the
position computation once data start to come in the buffer. Did you notice
this problem with 1.7 or not ? The random speed at which data arrive in the
buffer explain the fact that you observe neither 0% nor 100% hit.

Could you please check if adding "option http-buffer-request" makes it
worse or better ? I guess you'll get either 100% failure or 100% success,
this will help troubleshooting (and will possibly help you workaround the
issue for some time).

Thanks!
Willy



Re: [PATCH] DOC: Mention -Ws in the list of available options

2018-02-08 Thread Willy Tarreau
Hi Pavlos,

On Wed, Feb 07, 2018 at 09:58:13PM +0100, Pavlos Parissis wrote:
> Hi,
> 
> Please consider applying the attached patch.
> 
> It is a patch for the management.txt and adds the '-Ws' in the list of 
> available options. haproxy
> --help reports that option, so I thought we need to have in the document as 
> well.

Thank you. I mistakenly used -Ws instead of -W a few days ago during a
test and didn't find it anywhere, leaving me thinking I had dreamed :-)
Now merged.

Willy