show stat (CSV format) seems broken .

2024-01-22 Thread Emeric Brun
Hi All,


Enabling agent-check brokes the parsing of the show stat's CSV for multiple 
script/soft parser I use:


3934 recvfrom(7, "L7OK,200,0,0,0,0,0,0,00,0,0,-1,,\"agent warns : 
Backend is using a static LB algorithm and only accepts weights '0%' and ", 
128, 0, NULL, NULL) = 128
3935 recvfrom(7, "'100%'.\n\",0,0,0,0,CHECKED,,4,Layer7 check passed,No status 
change,2,3,4,1,1,1,172.16.27.21:80,,http0,0,0,,,0,,0,0,0,0,0,", 128, 0, 
NULL, NULL) = 128


On the second line you see an unescaped '\n' between quotes is returned. But 
most of CSV parser are using line by line parsing and it is now broken.

I'm not sure such unescaped \n is authorized in a CSV even if between quotes. 

This issue was firstly showed on v2.8 but not sure it affects previous versions.

Emeric 



Re: [PATCH] fine guard for ssl random extraction functions

2021-03-29 Thread Emeric Brun
On 3/26/21 3:10 PM, William Lallemand wrote:
> On Fri, Mar 26, 2021 at 03:02:27PM +0100, Willy Tarreau wrote:
>> On Fri, Mar 26, 2021 at 06:45:22PM +0500,  ??? wrote:
>>> Ping :)
>>
>> Ilya, please use the MAINTAINERS file to be sure to direct your messages
>> to the relevant maintainers, because each time you forward them to me, I
>> forward them in turn and the integration of your work gets needlessly
>> delayed.
> 
> I agree. You can also wait more than 1 day before doing a "ping" for a
> minor patch, it's more likely that we didn't read it yet than we missed
> it.
> 
>> @Emeric, @William, could one of you please have a look ?
>>
> 
> I'll take a look.
> 

All patches pushing ssl library feature detection on ssl-compat.h are welcome. 
This patch sounds good to me.

Emeric



Re: [PR] Skip unsupported ciphers for ecdsa cert

2020-12-03 Thread Emeric Brun
Hi Marcoen,


Before resubnmit, elease remember to use more explicit variables to know 
server/client side cipher list.

R,
Emeric

On 12/1/20 10:26 AM, Marcoen Hirschberg wrote:
> Thanks, they are now enabled.
> 
> I've fixed boringssl builds and tested it with libressl locally as well.
> 
> I will work within my fork until I'm happy with the code before I submit it 
> again. I made the code more efficient but want to do a bit more testing and 
> need to clean up the commit messages.
> 
> https://github.com/markun/haproxy/pull/1
> 
> On Tue, Dec 1, 2020 at 10:05 AM Илья Шипицин  > wrote:
> 
> You can enable github actions on your fork (by default actions are 
> disabled)
> 
> It should start several builds including libressl and boringssl
> 
> On Tue, Dec 1, 2020, 1:14 AM Marcoen Hirschberg  > wrote:
> 
> Good point, I just tried with boringssl and compilation failed. 
> Thanks for pointing that out.
> 
> On Mon, Nov 30, 2020 at 8:28 PM Илья Шипицин  > wrote:
> 
> will it run on LibreSSL, BoringSSL ?
> 
> вт, 1 дек. 2020 г. в 00:26, PR Bot 
> mailto:haproxy-pr-bot-no-re...@ltri.eu>>:
> 
> Dear list!
> 
> Author: Marcoen Hirschberg  >
> Number of patches: 3
> 
> This is an automated relay of the Github pull request:
>    Skip unsupported ciphers for ecdsa cert
> 
> Patch title(s):
>    MINOR: ssl: variable renames for clarity
>    MINOR: ssl: skip unknown client cipher
>    BUG/MINOR: ssl: only choose ECDSA cert if server and 
> client have common ECDSA ciphers
> 
> Link:
>    https://github.com/haproxy/haproxy/pull/983
> 
> Edit locally:
>    wget https://github.com/haproxy/haproxy/pull/983.patch && 
> vi 983.patch
> 
> Apply locally:
>    curl https://github.com/haproxy/haproxy/pull/983.patch | 
> git am -
> 
> Description:
> 
> 
> Instructions:
>    This github pull request will be closed automatically; 
> patch should be
>    reviewed on the haproxy mailing list (haproxy@formilux.org 
> ). Everyone is
>    invited to comment, even the patch's author. Please keep 
> the author and
>    list CCed in replies. Please note that in absence of any 
> response this
>    pull request will be lost.
> 




Re: [PR] Add srvkey option to stick-table

2020-12-01 Thread Emeric Brun
Hi,
On 11/30/20 10:23 AM, PR Bot wrote:
> Dear list!
> 
> Author: Thayne McCombs 
> Number of patches: 2
> 
> This is an automated relay of the Github pull request:
>Add srvkey option to stick-table
> 
> Patch title(s): 
>Add srvkey option to stick-table
>Harden sa2str agains 107-byte-long abstract unix domain path
> 
> Link:
>https://github.com/haproxy/haproxy/pull/979
> 
> Edit locally:
>wget https://github.com/haproxy/haproxy/pull/979.patch && vi 979.patch
> 
> Apply locally:
>curl https://github.com/haproxy/haproxy/pull/979.patch | git am -
> 
> Description:
>This allows using the address of the server rather than the name of
>the
>server for keeping track of servers in a backend for
>stickiness.
>
>Fixes #814
>
>I haven't tested this at all
>yet, and it still needs some polish, but here is a draft of how to fix
>#814. 
>
>This is my first significant contribution to haproxy,
>so I would not be surprised if I'm doing something terribly wrong, and
>I'm sure there are at least some small mistakes in it. Initial
>feedback would be very welcome.
> 
> Instructions:
>This github pull request will be closed automatically; patch should be
>reviewed on the haproxy mailing list (haproxy@formilux.org). Everyone is
>invited to comment, even the patch's author. Please keep the author and
>list CCed in replies. Please note that in absence of any response this
>pull request will be lost.
> 


I'm CCing Fred

This pull request seems the extension for stick on server adresses we discussed 
a long time Ago when Fred implements
the stick on server name.

Willy exposed what to do here (using the dictionnary etc..)
https://github.com/haproxy/haproxy/issues/814

I think Fred is more accurate on the subject and implemntation to perform the 
pull request revue.

R,
Emeric



Re: Question: How to not reset the TTL on a stick table entry?

2020-11-04 Thread Emeric Brun
Hi Nick,

On 11/2/20 10:26 PM, Nick Ramirez wrote:
> Hello,
> 
> In my HAProxy config, I would like to ban people for a certain amount of time 
> by setting a general-purpose counter from 0 to 1, where 1 = banned, in a 
> stick table. When the stick table entry expires, the counter is reset to 0 
> and the person is un-banned. This works fine. However, I would like to ignore 
> this person's requests while they're banned. That way, as they make requests, 
> they are not continuously banning themselves.
> 
> Consider if I use this ACL and "track" line:
> 
> ```
> acl is_banned sc_get_gpc1(0) gt 0
> http-request track-sc0 be_name unless is_banned
> ```
> 
> This ACL uses `sc_get_gpc1(0)` to read the value of the general-purpose 
> counter. When this ACL is used by the `track-sc0` line, it *resets the TTL* 
> on the stick table entry, which means that a person will be banned forever 
> unless they stop making requests. I don't want this.  I want to ban them for 
> only 10 seconds. So, instead, I use this ACL:
> 
> ```
> acl is_banned be_name,table_gpc1 gt 0
> http-request track-sc0 be_name unless is_banned
> ```
> 
> By using the `table_gpc1` conveter, the TTL is *not* reset when the ACL is 
> used, which is good.
> 
> My question is, is this an undocumented feature? A bug that may one day be 
> "fixed"? Why is there a difference between `sc_get_gpc1(0)` and 
> `table_table_gpc1gpc1`, where the former resets the TTL on the stick table 
> entry, but the latter does not? 
> 
> Also, if this is a bug, would it be helpful to have a parameter on the 
> track-sc0 line that allows me to opt in to not resetting the TTL?
> 
> Thank you,
> Nick Ramirez
> 


Reading the code I didn't see any fetches sc_get_gpc1 nor table_gpc1 updating 
entry expiration. Only the evaluation of the http-request trasck-sc0 will do.

R,
Emeric




Re: DNS Load balancing needs feedback and advice.

2020-11-04 Thread Emeric Brun
Hi Dinko,

> Sadly I haven’t had Kube-DNS anywhere and i think that CoreDNS is supposed to 
> be way to go from Kube-DNS. Hope this helps.
It does.

Really appreciate!

R,
Emeric



Re: DNS Load balancing needs feedback and advice.

2020-11-03 Thread Emeric Brun
Hi Dinko,

On 11/3/20 11:52 AM, Dinko Korunic wrote:
> On 3 Nov 2020, at 10:51, Emeric Brun  wrote:
>>
>>> […]
>>>
>>> We are requesting the community and experienced users of DNS servers to 
>>> share their thoughts about this.
>>
>> sub-questions are about modern DNS servers:
>> - do they support DNS over TCP?
>> - do they support persistent connections with pipelined requests?
>>
> 
> a) Yes, DNS over TCP is in fact pretty much mandatory nowadays and every 
> modern DNS server should support it. Some DNS servers also support DNS over 
> TLS. In fact, some queries (AXFR/IXFR) are always TCP.
great 
> b) Yes, but that’s recent addition as per RFC 7766 and AFAIK only Bind 9, 
> PowerDNS and Unbound support it but I am honestly not sure if there are 
> others supporting that feature. Historically there were also some security 
> issues considering concurrent tcp clients limits like CVE-2019-6477 in early 
> implementations.

I already noticed this for bind 9 and unbound and I know this is also the case 
for NSD. About CVE , NSD use this parameter:  tcp-query-count to limit the 
number of pending query served per connection.

But the question is targeting also DNS servers found in cloud environments such 
as kube-dns, coreDNS or consul.

They seem supporting TCP but I'm not sure about pipelined queries

> My apologies if I have missed to mention anything, I am not up to date with 
> current DNS changes as I used to be.

You're help is really appreciate.

R,
Emeric



Re: DNS Load balancing needs feedback and advice.

2020-11-03 Thread Emeric Brun
Hi All,

On 11/2/20 3:41 PM, Emeric Brun wrote:
> Hi All,
> 
> We are currently studying to develop a DNS messages load balancer (into 
> haproxy core)
> 
> After a global pass on RFCs (DNS, DNS over TCP, eDNS, DNSsec ...) we noticed 
> that practices on DNS have largely evolved
> since stone age.
> 
> Since the last brainstorm meeting I had with Baptiste Assmann and Willy 
> Tarreau, we were attempted to make some
> assumptions and choices and we want to submit them to community to have your 
> thoughts.
> 
> Reading RFCs, I notice multiple fallback cases (if server not support eEDNS 
> we should retry request without eDNS or if response
> is truncated we should retry over TCP) which could clearly make the project 
> really difficult to implement and sub optimal on
> performances point of view. 
> 
> So we decide to make the assumption that nowadays, all modern DNS servers 
> support both TCP (and pipelined requests
> as defined in rfc 7766) and eDNS. In this case the DNS loadbalancer will 
> forward messages received from clients in UDP
> or TCP (supporting eDNS or not) to server via pipelined TCP conn.
> 
> We are requesting the community and experienced users of DNS servers to share 
> their thoughts about this.

sub-questions are about modern DNS servers:
- do they support DNS over TCP?
- do they support persistent connections with pipelined requests?

R,
Emeric



Re: DNS Load balancing needs feedback and advice.

2020-11-02 Thread Emeric Brun
Hi Lukas,
> I find this a little surprising given that there already is a great
> DNS load-balancer out there (dnsdist) from the folks at powerdns and
> when I look at the status of the haproxy resolver, I don't feel like
> DNS sparkes a huge amount of developer interest. Loadbalancing DNS
> will certainly require more attention and enthusiasm than what the
> resolver code get's today, and even more important: long term
> maintenance.

Thanks for this comment :)

>> Reading RFCs, I notice multiple fallback cases (if server not support eEDNS 
>> we should retry request without eDNS
> 
> The edns fallback should be obsolete and has been disabled on the
> large public resolver on flagday 2019.
> 
> https://dnsflagday.net/2019/
> 
Good to know, It confirms basic DNS is from stone age.

 
>> or if response is truncated we should retry over TCP
> 
> This is and always will be very necessary. Deploying the haproxy
> resolver feature would be a lot less dangerous if we would support
> this (or make all requests over TCP in the first place).
> 
> 
>> So we decide to make the assumption that nowadays, all modern DNS servers 
>> support both TCP (and pipelined requests
>> as defined in rfc 7766) and eDNS. In this case the DNS loadbalancer will 
>> forward messages received from clients in UDP
>> or TCP (supporting eDNS or not) to server via pipelined TCP conn.
> 
> That's probably a good idea. You still have to handle all the UDP pain
> on the frontend though.

Exactly, not all UDP pain, fallbacks will be handled by clients :) (handle this 
on backend side would be the worst pain)
 
> 
>> In addition, I had a more technical question: eDNS first purpose is clearly 
>> to bypass the 512 bytes limitation of standard DNS over UDP,
>> but I did'nt find details about usage of eDNS over TCP which seems mandatory 
>> if we want to perform DNSsec (since DNSsec
>> exloit some eDNS pseudo-header fields). The main question is how to handle 
>> the payload size field of the eDNS pseudo header
>> if messages are exchanged over TCP.
> 
> I'm not sure what the RFC specifically says, but I'd say don't send
> the "UDP payload size" field if the transport is TCP and ignore/filter
> it when received over TCP.


payload size field is part of the pseudo header def, we can't wipe this, but it 
would be a kind of "maximum message size" thing in case of TCP I suppose.

 
> not a dns expert here though,
Really appreciate.

R,
Emeric



DNS Load balancing needs feedback and advice.

2020-11-02 Thread Emeric Brun
Hi All,

We are currently studying to develop a DNS messages load balancer (into haproxy 
core)

After a global pass on RFCs (DNS, DNS over TCP, eDNS, DNSsec ...) we noticed 
that practices on DNS have largely evolved
since stone age.

Since the last brainstorm meeting I had with Baptiste Assmann and Willy 
Tarreau, we were attempted to make some
assumptions and choices and we want to submit them to community to have your 
thoughts.

Reading RFCs, I notice multiple fallback cases (if server not support eEDNS we 
should retry request without eDNS or if response
is truncated we should retry over TCP) which could clearly make the project 
really difficult to implement and sub optimal on
performances point of view. 

So we decide to make the assumption that nowadays, all modern DNS servers 
support both TCP (and pipelined requests
as defined in rfc 7766) and eDNS. In this case the DNS loadbalancer will 
forward messages received from clients in UDP
or TCP (supporting eDNS or not) to server via pipelined TCP conn.

We are requesting the community and experienced users of DNS servers to share 
their thoughts about this.

In addition, I had a more technical question: eDNS first purpose is clearly to 
bypass the 512 bytes limitation of standard DNS over UDP,
but I did'nt find details about usage of eDNS over TCP which seems mandatory if 
we want to perform DNSsec (since DNSsec
exloit some eDNS pseudo-header fields). The main question is how to handle the 
payload size field of the eDNS pseudo header
if messages are exchanged over TCP.

Finally, all others advice or thoughts about DNS loadbalancing in Haproxy are 
also welcome.

R,
Emeric 



Re: [PATCH 2/2] MINOR: ssl: add ssl_c_chain_der fetch method

2020-08-05 Thread Emeric Brun
Hi Williams,

> +/* binary, returns a chain certificate in a binary chunk (der/raw).
> + * The 5th keyword char is used to support only peer cert
> + */
> +static int
> +smp_fetch_ssl_x_chain_der(const struct arg *args, struct sample *smp, const 
> char *kw, void *private)
> +{
> + struct buffer *smp_trash;
> + struct buffer *tmp_trash = NULL;
> + struct connection *conn;
> + STACK_OF(X509) *certs = NULL;
> + X509 *crt = NULL;
> + SSL *ssl;
> + int ret = 0;
> + int num_certs;
> + int i;
> +
> + // only peer cert chain is supported
> + if (kw[4] != 'c')
> + return 0;
> +
> + conn = objt_conn(smp->sess->origin);
> + if (!conn)
> + return 0;
> +
> + ssl = ssl_sock_get_ssl_object(conn);
> + if (!ssl)
> + return 0;
> +

I think this code could be useful to declare also a "ssl_s_chain_der" using 
minor changes as this is done on ssl_c_serial:

int conn_server = (kw[4] == 's') ? 1 : 0;
...

if (conn_server)
conn = cs_conn(objt_cs(smp->strm->si[1].end));
else
conn = objt_conn(smp->sess->origin);


Emeric



Re: [PATCH] BUG/MEDIUM: sink: fix crash when null sink is used in __do_send_log

2020-06-22 Thread Emeric Brun
Hi Daniel, Willy,

On 6/19/20 9:22 PM, Willy Tarreau wrote:
> Hi Daniel,
> 
> On Thu, Jun 18, 2020 at 12:35:29AM -0400, Daniel Corbett wrote:
>> Hello,
>>
>>
>> When using a ring log in combination with proto fcgi, it was possible
>> to cause a crash by sending a request for a non-existent fastcgi file
>> to php-fpm, causing it to produce the error "Primary script unknown".
>> When php-fpm produces this error for some reason a sink is not able to be
>> established and __do_send_log gets passed a null sink.
>>
>> This commit verifies that the sink exists in __do_send_log before attempting
>> to use it.
> 
> Thanks for the fix and the detailed report, that's very useful! However
> the problem is somewhere else, and I suspect is slightly harder to solve.
> It's normally not possible to have a null sink on a log server if its
> type is a buffer, so we have an inconsistency somewhere that we must
> address.
> 
> What I'm suspecting (but I could be wrong) is that the fcgi duplicates
> the global log settings before they are resolved, then the global ones
> are resolved, and the copy in the fcgi part keeps an incompletely
> initialized log server. I haven't looked at this area yet since we
> support post-check resolution of sink names, but I suspect that it's
> applied to frontends only, while as can be seen in your examples there
> seem to be other areas requiring log resolution (I wasn't even aware
> that we had other log references). So this means that right now we
> have to make sure they're all properly addressed, and that in the
> future it might be worth keeping a pointer to the global log servers
> instead of copying them.

Indeed, we currently resolve ring names doing a loop on proxies to initialize 
their
server lists but I ignored that in the case of fastcgi-app, there is a 
log-server list
not related to a proxy, I've just talk to Christopher and he told me he already
planned to submit a fix for this bug, initializing correctly those server lists 
on
fastcgi-app's configuration post parsing .

R,
Emeric



Re: Peers Protocol "Table Type"

2020-06-02 Thread Emeric Brun
Hi All,

On 6/2/20 1:10 PM, Tim Düsterhus wrote:
> Emeric,
> 
> Am 02.06.20 um 11:29 schrieb Emeric Brun:
>> In attachement a proposed patch for this issue.
>>
> 
> Thanks. The changes to the doc look good to me.
> 
> Regarding peers.c:
> 
>> +/* network key types;
>> + * network types were directly and mistakenly
>> + * mapped on sample types, to keep backward
>> + * compatiblitiy we keep those values but
>> + * we now use a internal/network mapping
>> + * to avoid further mistakes adding or
>> + * modifying internals types
>> + */
>> +enum {
>> +PEER_KT_ANY = 0,  /* any type */
>> +PEER_KT_RESV1,/* UNUSED */
>> +PEER_KT_SINT, /* signed 64bits integer type */
>> +PEER_KT_RESV2,/* UNUSED */
> 
> Maybe call this RESV3 to make it clear that the numeric value is '3'.
> 
>> +PEER_KT_IPV4, /* ipv4 type */
>> +PEER_KT_IPV6, /* ipv6 type */
>> +PEER_KT_STR,  /* char string type */
>> +PEER_KT_BIN,  /* buffer type */
>> +PEER_KT_TYPES /* number of types, must always be last */
>> +};
> 
> -
> 
>> + * Note: Undeclared mapping maps entry to SMP_ST_ANY == 0
> 
> This should read SMP_T_ANY.
> 
> Also backporting instructions are missing from the commit message.
> 
> Other than that the patch looks good to me, but I didn't actually test a
> binary compiled from it.
> 
> Best regards
> Tim Düsterhus
> 

Thank you Tim!

Here the updated patch.

R,
Emeric
>From c58eed60557fd20e8b972cc3ab4b45e4e598e558 Mon Sep 17 00:00:00 2001
From: Emeric Brun 
Date: Tue, 2 Jun 2020 11:17:42 +0200
Subject: [PATCH] BUG/MINOR: peers: fix internal/network key type mapping.

Network types were directly and mistakenly mapped on sample types:

This patch fix the doc with values effectively used to keep backward
compatiblitiy on existing implementations.

In addition it adds an internal/network mapping for key types to avoid
further mistakes adding or modifying internals types.

This patch should be backported on all maintained branches,
particularly until v1.8 included for documentation part.
---
 doc/peers-v2.0.txt | 10 +-
 src/peers.c| 46 --
 2 files changed, 49 insertions(+), 7 deletions(-)

diff --git a/doc/peers-v2.0.txt b/doc/peers-v2.0.txt
index 477e7bb84..344cb5609 100644
--- a/doc/peers-v2.0.txt
+++ b/doc/peers-v2.0.txt
@@ -191,11 +191,11 @@ between the "Sender Table ID" to identify it directly in case of "Table Switch M
 
 Table Type present the numeric type of key used to store stick table entries:
 integer
- 0: signed integer
- 1: IPv4 address
- 2: IPv6 address
- 3: string
- 4: binary
+ 2: signed integer
+ 4: IPv4 address
+ 5: IPv6 address
+ 6: string
+ 7: binary
 
 Table Keylen present the key length or max length in case of strings or binary (padded with 0).
 
diff --git a/src/peers.c b/src/peers.c
index 2e54ab94d..9782ff3c1 100644
--- a/src/peers.c
+++ b/src/peers.c
@@ -125,6 +125,48 @@ enum {
 	PEER_MSG_ERR_SIZELIMIT,
 };
 
+/* network key types;
+ * network types were directly and mistakenly
+ * mapped on sample types, to keep backward
+ * compatiblitiy we keep those values but
+ * we now use a internal/network mapping
+ * to avoid further mistakes adding or
+ * modifying internals types
+ */
+enum {
+PEER_KT_ANY = 0,  /* any type */
+PEER_KT_RESV1,/* UNUSED */
+PEER_KT_SINT, /* signed 64bits integer type */
+PEER_KT_RESV3,/* UNUSED */
+PEER_KT_IPV4, /* ipv4 type */
+PEER_KT_IPV6, /* ipv6 type */
+PEER_KT_STR,  /* char string type */
+PEER_KT_BIN,  /* buffer type */
+PEER_KT_TYPES /* number of types, must always be last */
+};
+
+/* Map used to retrieve network type from internal type
+ * Note: Undeclared mapping maps entry to PEER_KT_ANY == 0
+ */
+static int peer_net_key_type[SMP_TYPES] = {
+	[SMP_T_SINT] = PEER_KT_SINT,
+	[SMP_T_IPV4] = PEER_KT_IPV4,
+	[SMP_T_IPV6] = PEER_KT_IPV6,
+	[SMP_T_STR]  = PEER_KT_STR,
+	[SMP_T_BIN]  = PEER_KT_BIN,
+};
+
+/* Map used to retrieve internal type from external type
+ * Note: Undeclared mapping maps entry to SMP_T_ANY == 0
+ */
+static int peer_int_key_type[PEER_KT_TYPES] = {
+	[PEER_KT_SINT] = SMP_T_SINT,
+	[PEER_KT_IPV4] = SMP_T_IPV4,
+	[PEER_KT_IPV6] = SMP_T_IPV6,
+	[PEER_KT_STR]  = SMP_T_STR,
+	[PEER_KT_BIN]  = SMP_T_BIN,
+};
+
 /*
  * Parameters used by functions to build peer protocol messages. */
 struct peer_prep_params {
@@ -620,7 +662,7 @@ static int peer_prepare_switchmsg(char *msg, size_t size, struct peer_prep_param
 
 	/* encode table type */
 
-	intencode(st->table->type, );
+	intencode(peer_net_key_type[st->table->type], );
 
 	/* encode table key size */
 	intencode(st->

Re: Peers Protocol "Table Type"

2020-06-02 Thread Emeric Brun
Hi Tim, Willy,
On 3/20/20 3:01 PM, Tim Düsterhus wrote:
> Emeric,
> 
> Am 20.03.20 um 14:29 schrieb Emeric Brun:
>> So I understand that since 1.6 the SMP_T are directly announced on the wire 
>> for key types, and it brokes the documented values and this is hazardous to 
>> rely on internal enum values.
>>
>> So we must re-introduce a mapping between internal and on-wire types.
>>
>> Some questions about choices:
>>
>> - Re-map types to documented values or Update the doc to match currently 
>> used values? 
> 
> There's really only one sane choice after several years of not following
> the documentation:
> 
> Update the documentation to match the currently used values. The peers
> protocol is HAProxy specific, so in practice the correct values are
> "what HAProxy does" (i.e. the protocol is defined by the reference
> implementation). The custom implementation during which I stumbled upon
> this issue is brand new and I needed to look into the code anyway,
> because the docs are incomplete (as I outlined before in this thread).
> 
> Changing the code will cause larger breakage during a HAProxy bugfix
> upgrade if not all machines in a cluster are upgraded simultaneously.
> 
> Best regards
> Tim Düsterhus
> 

In attachement a proposed patch for this issue.

R,
Emeric
>From 3792e5fb69a10daf03f65d142fa308c8f2704588 Mon Sep 17 00:00:00 2001
From: Emeric Brun 
Date: Tue, 2 Jun 2020 11:17:42 +0200
Subject: [PATCH] BUG/MINOR: peers: fix internal/network key type mapping.

Network types were directly and mistakenly mapped on sample types:

This patch fix the doc with values effectively used to keep backward
compatiblitiy on existing implementations.

In addition it adds an internal/network mapping for key types to avoid
further mistakes adding or modifying internals types.
---
 doc/peers-v2.0.txt | 10 +-
 src/peers.c| 46 --
 2 files changed, 49 insertions(+), 7 deletions(-)

diff --git a/doc/peers-v2.0.txt b/doc/peers-v2.0.txt
index 477e7bb84..344cb5609 100644
--- a/doc/peers-v2.0.txt
+++ b/doc/peers-v2.0.txt
@@ -191,11 +191,11 @@ between the "Sender Table ID" to identify it directly in case of "Table Switch M
 
 Table Type present the numeric type of key used to store stick table entries:
 integer
- 0: signed integer
- 1: IPv4 address
- 2: IPv6 address
- 3: string
- 4: binary
+ 2: signed integer
+ 4: IPv4 address
+ 5: IPv6 address
+ 6: string
+ 7: binary
 
 Table Keylen present the key length or max length in case of strings or binary (padded with 0).
 
diff --git a/src/peers.c b/src/peers.c
index 2e54ab94d..91f2b3ad8 100644
--- a/src/peers.c
+++ b/src/peers.c
@@ -125,6 +125,48 @@ enum {
 	PEER_MSG_ERR_SIZELIMIT,
 };
 
+/* network key types;
+ * network types were directly and mistakenly
+ * mapped on sample types, to keep backward
+ * compatiblitiy we keep those values but
+ * we now use a internal/network mapping
+ * to avoid further mistakes adding or
+ * modifying internals types
+ */
+enum {
+PEER_KT_ANY = 0,  /* any type */
+PEER_KT_RESV1,/* UNUSED */
+PEER_KT_SINT, /* signed 64bits integer type */
+PEER_KT_RESV2,/* UNUSED */
+PEER_KT_IPV4, /* ipv4 type */
+PEER_KT_IPV6, /* ipv6 type */
+PEER_KT_STR,  /* char string type */
+PEER_KT_BIN,  /* buffer type */
+PEER_KT_TYPES /* number of types, must always be last */
+};
+
+/* Map used to retrieve network type from internal type
+ * Note: Undeclared mapping maps entry to PEER_KT_ANY == 0
+ */
+static int peer_net_key_type[SMP_TYPES] = {
+	[SMP_T_SINT] = PEER_KT_SINT,
+	[SMP_T_IPV4] = PEER_KT_IPV4,
+	[SMP_T_IPV6] = PEER_KT_IPV6,
+	[SMP_T_STR]  = PEER_KT_STR,
+	[SMP_T_BIN]  = PEER_KT_BIN,
+};
+
+/* Map used to retrieve internal type from external type
+ * Note: Undeclared mapping maps entry to SMP_ST_ANY == 0
+ */
+static int peer_int_key_type[PEER_KT_TYPES] = {
+	[PEER_KT_SINT] = SMP_T_SINT,
+	[PEER_KT_IPV4] = SMP_T_IPV4,
+	[PEER_KT_IPV6] = SMP_T_IPV6,
+	[PEER_KT_STR]  = SMP_T_STR,
+	[PEER_KT_BIN]  = SMP_T_BIN,
+};
+
 /*
  * Parameters used by functions to build peer protocol messages. */
 struct peer_prep_params {
@@ -620,7 +662,7 @@ static int peer_prepare_switchmsg(char *msg, size_t size, struct peer_prep_param
 
 	/* encode table type */
 
-	intencode(st->table->type, );
+	intencode(peer_net_key_type[st->table->type], );
 
 	/* encode table key size */
 	intencode(st->table->key_size, );
@@ -1655,7 +1697,7 @@ static inline int peer_treat_definemsg(struct appctx *appctx, struct peer *p,
 	if (!*msg_cur)
 		goto malformed_exit;
 
-	if (p->remote_table->table->type != table_type
+	if (p->remote_table->table->type != peer_int_key_type[table_type]
 		|| p->remote_table->table->key_size != table_keylen) {
 		p->remote_table = NULL;
 		goto ignore_msg;
-- 
2.17.1



Re: [PR] Add verfied chain

2020-05-18 Thread Emeric Brun
Hi All,

On 5/18/20 4:32 PM, William Dauchy wrote:
> On Mon, May 18, 2020 at 3:58 PM William Lallemand
>  wrote:
>> I suppose it was put in a PKCS7 container to be able to distinguish each
>> DER part of the chain easily? So It can be used by an external tool. I'm not
>> sure of what is done with the result of this.
>>
>> The two patches seem to have different approches, Arjen's one is
>> using a SSL_get0_verified_chain() and Mathild's one is using
>> SSL_get_peer_cert_chain(). I'm not sure what approach is the best, I
>> suppose that SSL_get_peer_cert_chain() is better if we want to have the
>> chain event if it wasn't verified and it could be completed with the
>> ssl_c_verify sample fetch if we need this information!
>>
>> I will be grateful if a .vtc test file is also provided with sample
>> fetches patches, it's difficult to test every sample fetches nowadays.
>>
>> There is already a vtc for client auth which is available here:
>> https://git.haproxy.org/?p=haproxy.git;a=blob;f=reg-tests/ssl/ssl_client_auth.vtc
> 
> Thanks for the feedbacks. I believe we will send our proposition soon.
> 

According to openssl's doc about SSL_get_peer_cert_chain, 
SSL_get0_verified_chain:

NOTES
If the session is resumed peers do not send certificates so a NULL pointer is 
returned by these functions.

It would be great if the ssl_c_ 's documentations precise if those information 
won't return something on resumed sessions.

R,
Emeric



Re: Peers Protocol "Table Type"

2020-03-23 Thread Emeric Brun
Salut Willy,

> 
>> The documented values are not used on any still supported haproxy's version.
>> So I think it would be better to update the doc with the new ones
>> and add a mapping to avoid further changes.
> 
> Yep definitely.

J'essaye de finir les scripts pour la gestion du SSD, et je suis pas dutout sur 
du haproxy en ce moment, alors j'espère qu'il n'y a pas urgence sur ce point ca 
m'arrangerait de m'en occuper qu'après.

Emeric



Re: Peers Protocol "Table Type"

2020-03-20 Thread Emeric Brun
Hi Tim,

On 3/20/20 3:01 PM, Tim Düsterhus wrote:
> Emeric,
> 
> Am 20.03.20 um 14:29 schrieb Emeric Brun:
>> So I understand that since 1.6 the SMP_T are directly announced on the wire 
>> for key types, and it brokes the documented values and this is hazardous to 
>> rely on internal enum values.
>>
>> So we must re-introduce a mapping between internal and on-wire types.
>>
>> Some questions about choices:
>>
>> - Re-map types to documented values or Update the doc to match currently 
>> used values? 
> 
> There's really only one sane choice after several years of not following
> the documentation:
> 
> Update the documentation to match the currently used values. The peers
> protocol is HAProxy specific, so in practice the correct values are
> "what HAProxy does" (i.e. the protocol is defined by the reference
> implementation). The custom implementation during which I stumbled upon
> this issue is brand new and I needed to look into the code anyway,
> because the docs are incomplete (as I outlined before in this thread).
> 
> Changing the code will cause larger breakage during a HAProxy bugfix
> upgrade if not all machines in a cluster are upgraded simultaneously.
> 
> Best regards
> Tim Düsterhus
> 

So we all agree :)

Emeric



Re: Peers Protocol "Table Type"

2020-03-20 Thread Emeric Brun
Hi Willy,

On 3/20/20 2:53 PM, Willy Tarreau wrote:
> Hi Emeric,
> 
> On Fri, Mar 20, 2020 at 02:29:48PM +0100, Emeric Brun wrote:
>> So I understand that since 1.6 the SMP_T are directly announced on the wire
>> for key types, and it brokes the documented values and this is hazardous to
>> rely on internal enum values.
> 
> Yes that's the issue Tim spotted.
> 
>> So we must re-introduce a mapping between internal and on-wire types.
> 
> Indeed.
> 
>> Some questions about choices:
>>
>> - Re-map types to documented values or Update the doc to match currently 
>> used values? 
> 
> I'd say that what's currently on the wire seems to more or less match the
> doc. Given that there are very few peers implementations, what's working
> right now seems the most important to me. So I'd suggest that we change
> the code to perform the mapping again and that we possibly update the doc
> in case it's wrong on some of them.

I understood that documented values are:
 0: signed integer
 1: IPv4 address
 2: IPv6 address
 3: string
 4: binary

and currenty (since 1.6):
   2 = signed int
   4 = IPv4
   5 = IPv6
   6 = string
   7 = binary

I know an other soft talking peers protocol which is compatible with peers 
proto for all haproxy's versions and which uses the new set of values.
It will break this compatibility re-mapping to documented IDs. That's why I 
asked.

The documented values are not used on any still supported haproxy's version. So 
I think it would be better to update the doc with the new ones
and add a mapping to avoid further changes.

> 
>> - About re-introduce the mapping, I'm not sure we should re-introduce the
>> mapping between fetches and tables types. Table types could stay internal,
>> using SMP_T... and we could map to protocol's values only when emitting the
>> message.
> 
> I didn't think about this possibility but I actually think it might be
> the best one. The reason the types were changed was to simplify the
> resolution of casts in "stick" rules, because if you remember we used
> to have two types of casts in the past, sample-to-sample and
> sample-to-table. But with your suggestion we'd keep the internals clean
> and simple to maintain and only the protocol would be adjusted on the
> wire, which is in fact what any agent should do. I really like this
> approach! You'll the probably just need to define some PEERS_TABLE_TYPE_XXX
> for example and map the table->type from SMP_T_* to those types and
> conversely, that sounds really clean.
Great!

 
> Thanks,
> Willy
> 

Emeric



Re: Peers Protocol "Table Type"

2020-03-20 Thread Emeric Brun
On 3/14/20 12:47 PM, Willy Tarreau wrote:
> On Sat, Mar 14, 2020 at 12:20:00PM +0100, Tim Düsterhus wrote:
>> Willy,
>>
>> Am 14.03.20 um 12:13 schrieb Willy Tarreau:
>>> Yes, feel free to do so, this will definitely help get it eventually done.
>>
>> Here it is: https://github.com/haproxy/haproxy/issues/548
>>
>> I labeled it "bug", because it can be considered a bug that the
>> on-the-wire format relies on some internal enum.
> 
> Agreed, thanks!
> Willy
> 

So I understand that since 1.6 the SMP_T are directly announced on the wire for 
key types, and it brokes the documented values and this is hazardous to rely on 
internal enum values.

So we must re-introduce a mapping between internal and on-wire types.

Some questions about choices:

- Re-map types to documented values or Update the doc to match currently used 
values? 

- About re-introduce the mapping, I'm not sure we should re-introduce the 
mapping between fetches and tables types. Table types could stay internal, 
using SMP_T... and we could map to protocol's values only when emitting the 
message.

Your opinion?

R,
Emeric



Re: [PATCH] BUG/MINOR: ssl: fix crt-list neg filter for openssl < 1.1.1

2019-11-18 Thread Emeric Brun
On 11/18/19 2:40 PM, William Lallemand wrote:
> On Fri, Nov 15, 2019 at 06:49:10PM +0100, Willy Tarreau wrote:
>> On Wed, Nov 06, 2019 at 06:47:50PM +0100, Emmanuel Hocdet wrote:
>>> Hi,
>>>
>>> Very difficult to trigger the bug, except with spécific test configuration 
>>> like:
>>> crt-list:
>>> cert.pem !www.dom.tld
>>> cert.pem *.dom.tld
>>>
>>> If you can consider the patch.
>>
>> Guys, I know that everyone has been very busy lately but at least giving
>> me indications like "yes", "no", "let me check", "do as you want" or
>> whatever could help. Letting candidate fixes rot for 9 days with no
>> response is not cool, and while it will always happen once in a while
>> anywhere, it systematically happens in the SSL subsystem. We definitely
>> need to improve this situation :-(
>>
>> Now it's too late for 2.0.9 and 2.1-dev5 anyway.
>>
> 
> Hi,
> 
> I did not see this patch, you should probably ask directly the SSL maintainer
> before issuing a release if there is a pending patch.
> 
> In my opinion we can integrate it.
> 

Seems to be a bug fix done by the initiator of the feature. So I think there is 
no reason to dig and my response will be "do as you want".

I spent recently several days fixing SSL/peers bugs or helping co-workers to 
find them and my time window allocated for this is largely exploded since I 
didn't find time to work on what I'm supposed to do which also have deadlines 
and require to be focused (but this does not concern the opensource's software 
mailing list).

R,
Emeric




Re: [External] Re: QAT intermittent healthcheck errors

2019-05-13 Thread Emeric Brun
Hi Marcin,

> 
> Thank you Marcin, It shows that haproxy is waiting for an event on all those 
> fds because a crypto jobs were launched on the engine 
> and we can't free the session until the end of this job (it would result in a 
> segfault).
> 
> So the processes are stucked, unable to free the session because the engine 
> doesn't signal the end of those job via the async fd.
> 
> I didn't reproduce this issue on QAT 1.5 so I will try to discuss it with 
> intel guys to known why there is this behavior change in the v1.7
> and what we can do.
> 
> R,
> Emeric
> 

Just to known that I'm waiting for the feedback of intel's team and I will 
receive QAT 1.7 compliant hardware soon to make some tests here.

R,
Emeric




Re: [External] Re: QAT intermittent healthcheck errors

2019-05-07 Thread Emeric Brun
On 5/7/19 3:35 PM, Marcin Deranek wrote:
> Hi Emeric,
> 
> On 5/7/19 1:53 PM, Emeric Brun wrote:
>> On 5/7/19 1:24 PM, Marcin Deranek wrote:
>>> Hi Emeric,
>>>
>>> On 5/7/19 11:44 AM, Emeric Brun wrote:
>>>> Hi Marcin,>>>>>> As I use HAProxy 1.8 I had to adjust the patch (see 
>>>> attachment for end result). Unfortunately after applying the patch there 
>>>> is no change in behavior: we still leak /dev/usdm_drv descriptors and have 
>>>> "stuck" HAProxy instances after reload..
>>>>>>> Regards,
>>>>>>
>>>>>>
>>>>
>>>> Could you perform a test recompiling the usdm_drv and the engine with this 
>>>> patch, it applies on QAT 1.7 but I've no hardware to test this version 
>>>> here.
>>>>
>>>> It should fix the fd leak.
>>>
>>> It did fix fd leak:
>>>
>>> # ls -al /proc/2565/fd|fgrep dev
>>> lr-x-- 1 root root 64 May  7 13:15 0 -> /dev/null
>>> lrwx-- 1 root root 64 May  7 13:15 7 -> /dev/usdm_drv
>>>
>>> # systemctl reload haproxy.service
>>> # ls -al /proc/2565/fd|fgrep dev
>>> lr-x-- 1 root root 64 May  7 13:15 0 -> /dev/null
>>> lrwx-- 1 root root 64 May  7 13:15 8 -> /dev/usdm_drv
>>>
>>> # systemctl reload haproxy.service
>>> # ls -al /proc/2565/fd|fgrep dev
>>> lr-x-- 1 root root 64 May  7 13:15 0 -> /dev/null
>>> lrwx-- 1 root root 64 May  7 13:15 9 -> /dev/usdm_drv
>>>
>>> But there are still stuck processes :-( This is with both patches included: 
>>> for QAT and HAProxy.
>>> Regards,
>>>
>>> Marcin Deranek
>>
>> Thank you Marcin! Anyway it's was also a bug.
>>
>> Could you process a 'show fds' command on a stucked process adding the patch 
>> in attachement.
> 
> I did apply this patch and all previous patches (QAT + HAProxy 
> ssl_free_engine). This is what I got after 1st reload:
> 
> show proc
> #         
> 8025    master  0   1   0d 00h03m25s
> # workers
> 31269   worker  1   0   0d 00h00m39s
> 31270   worker  2   0   0d 00h00m39s
> 31271   worker  3   0   0d 00h00m39s
> 31272   worker  4   0   0d 00h00m39s
> # old workers
> 9286    worker  [was: 1]    1   0d 00h03m25s
> 9287    worker  [was: 2]    1   0d 00h03m25s
> 9288    worker  [was: 3]    1   0d 00h03m25s
> 9289    worker  [was: 4]    1   0d 00h03m25s
> 
> @!9286 show fd
>  13 : st=0x05(R:PrA W:pra) ev=0x01(heopI) [lc] cache=0 owner=0x23eaae0 
> iocb=0x4877c0(mworker_accept_wrapper) tmask=0x1 umask=0x0
>  16 : st=0x05(R:PrA W:pra) ev=0x00(heopi) [lc] cache=0 owner=0x4e1ab0 
> iocb=0x4e1ab0(thread_sync_io_handler) tmask=0x umask=0x0
>  20 : st=0x05(R:PrA W:pra) ev=0x00(heopi) [lc] cache=0 owner=0x1601b840 
> iocb=0x4f4d50(ssl_async_fd_free) tmask=0x1 umask=0x0
>  21 : st=0x22(R:pRa W:pRa) ev=0x00(heopi) [lc] cache=0 owner=0x1f0ec4f0 
> iocb=0x4ce6e0(conn_fd_handler) tmask=0x1 umask=0x0 cflg=0x00241300 fe=GLOBAL 
> mux=PASS mux_ctx=0x22ad8630
>    1412 : st=0x05(R:PrA W:pra) ev=0x00(heopi) [lc] cache=0 owner=0x1bab1f30 
> iocb=0x4f4d50(ssl_async_fd_free) tmask=0x1 umask=0x0
>    1413 : st=0x05(R:PrA W:pra) ev=0x00(heopi) [lc] cache=0 owner=0x247e5bc0 
> iocb=0x4f4d50(ssl_async_fd_free) tmask=0x1 umask=0x0
>    1414 : st=0x05(R:PrA W:pra) ev=0x00(heopi) [lc] cache=0 owner=0x18883650 
> iocb=0x4f4d50(ssl_async_fd_free) tmask=0x1 umask=0x0
>    1415 : st=0x05(R:PrA W:pra) ev=0x00(heopi) [lc] cache=0 owner=0x14476c10 
> iocb=0x4f4d50(ssl_async_fd_free) tmask=0x1 umask=0x0
>    1416 : st=0x05(R:PrA W:pra) ev=0x00(heopi) [lc] cache=0 owner=0x11a27850 
> iocb=0x4f4d50(ssl_async_fd_free) tmask=0x1 umask=0x0
>    1418 : st=0x05(R:PrA W:pra) ev=0x00(heopi) [lc] cache=0 owner=0x12008230 
> iocb=0x4f4d50(ssl_async_fd_free) tmask=0x1 umask=0x0
>    1419 : st=0x05(R:PrA W:pra) ev=0x00(heopi) [lc] cache=0 owner=0x1bb0a570 
> iocb=0x4f4d50(ssl_async_fd_free) tmask=0x1 umask=0x0
>    1420 : st=0x05(R:PrA W:pra) ev=0x00(heopi) [lc] cache=0 owner=0x11c94790 
> iocb=0x4f4d50(ssl_async_fd_free) tmask=0x1 umask=0x0
>    1421 : st=0x05(R:PrA W:pra) ev=0x00(heopi) [lc] cache=0 owner=0x1449e050 
> iocb=0x4f4d50(ssl_async_fd_free) tmask=0x1 umask=0x0

Re: [External] Re: QAT intermittent healthcheck errors

2019-05-07 Thread Emeric Brun
On 5/7/19 1:24 PM, Marcin Deranek wrote:
> Hi Emeric,
> 
> On 5/7/19 11:44 AM, Emeric Brun wrote:
>> Hi Marcin,>>>>>> As I use HAProxy 1.8 I had to adjust the patch (see 
>> attachment for end result). Unfortunately after applying the patch there is 
>> no change in behavior: we still leak /dev/usdm_drv descriptors and have 
>> "stuck" HAProxy instances after reload..
>>>>> Regards,
>>>>
>>>>
>>
>> Could you perform a test recompiling the usdm_drv and the engine with this 
>> patch, it applies on QAT 1.7 but I've no hardware to test this version here.
>>
>> It should fix the fd leak.
> 
> It did fix fd leak:
> 
> # ls -al /proc/2565/fd|fgrep dev
> lr-x-- 1 root root 64 May  7 13:15 0 -> /dev/null
> lrwx-- 1 root root 64 May  7 13:15 7 -> /dev/usdm_drv
> 
> # systemctl reload haproxy.service
> # ls -al /proc/2565/fd|fgrep dev
> lr-x-- 1 root root 64 May  7 13:15 0 -> /dev/null
> lrwx-- 1 root root 64 May  7 13:15 8 -> /dev/usdm_drv
> 
> # systemctl reload haproxy.service
> # ls -al /proc/2565/fd|fgrep dev
> lr-x-- 1 root root 64 May  7 13:15 0 -> /dev/null
> lrwx-- 1 root root 64 May  7 13:15 9 -> /dev/usdm_drv
> 
> But there are still stuck processes :-( This is with both patches included: 
> for QAT and HAProxy.
> Regards,
> 
> Marcin Deranek

Thank you Marcin! Anyway it's was also a bug.

Could you process a 'show fds' command on a stucked process adding the patch in 
attachement.

R,
Emeric

>From d0e095c2aa54f020de8fc50db867eff1ef73350e Mon Sep 17 00:00:00 2001
From: Emeric Brun 
Date: Fri, 19 Apr 2019 17:15:28 +0200
Subject: [PATCH] MINOR: ssl/cli: async fd io-handlers printable on show fd

This patch exports the async fd iohandlers and make them printable
doing a 'show fd' on cli.
---
 include/proto/ssl_sock.h | 4 
 src/cli.c| 9 +
 src/ssl_sock.c   | 4 ++--
 3 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/include/proto/ssl_sock.h b/include/proto/ssl_sock.h
index 62ebcb87..ce52fb74 100644
--- a/include/proto/ssl_sock.h
+++ b/include/proto/ssl_sock.h
@@ -85,6 +85,10 @@ SSL_CTX *ssl_sock_get_generated_cert(unsigned int key, struct bind_conf *bind_co
 int ssl_sock_set_generated_cert(SSL_CTX *ctx, unsigned int key, struct bind_conf *bind_conf);
 unsigned int ssl_sock_generated_cert_key(const void *data, size_t len);
 
+#if (OPENSSL_VERSION_NUMBER >= 0x101fL) && !defined(OPENSSL_NO_ASYNC)
+void ssl_async_fd_handler(int fd);
+void ssl_async_fd_free(int fd);
+#endif
 
 /* ssl shctx macro */
 
diff --git a/src/cli.c b/src/cli.c
index 568ceba2..843c3d04 100644
--- a/src/cli.c
+++ b/src/cli.c
@@ -69,6 +69,9 @@
 #include 
 #include 
 #include 
+#ifdef USE_OPENSSL
+#include 
+#endif
 
 #define PAYLOAD_PATTERN "<<"
 
@@ -998,6 +1001,12 @@ static int cli_io_handler_show_fd(struct appctx *appctx)
 			 (fdt.iocb == listener_accept)  ? "listener_accept" :
 			 (fdt.iocb == poller_pipe_io_handler) ? "poller_pipe_io_handler" :
 			 (fdt.iocb == mworker_accept_wrapper) ? "mworker_accept_wrapper" :
+#ifdef USE_OPENSSL
+#if (OPENSSL_VERSION_NUMBER >= 0x101fL) && !defined(OPENSSL_NO_ASYNC)
+			 (fdt.iocb == ssl_async_fd_free) ? "ssl_async_fd_free" :
+			 (fdt.iocb == ssl_async_fd_handler) ? "ssl_async_fd_handler" :
+#endif
+#endif
 			 "unknown");
 
 		if (fdt.iocb == conn_fd_handler) {
diff --git a/src/ssl_sock.c b/src/ssl_sock.c
index 112520c8..58ae8a26 100644
--- a/src/ssl_sock.c
+++ b/src/ssl_sock.c
@@ -573,7 +573,7 @@ fail_get:
 /*
  * openssl async fd handler
  */
-static void ssl_async_fd_handler(int fd)
+void ssl_async_fd_handler(int fd)
 {
 	struct connection *conn = fdtab[fd].owner;
 
@@ -594,7 +594,7 @@ static void ssl_async_fd_handler(int fd)
 /*
  * openssl async delayed SSL_free handler
  */
-static void ssl_async_fd_free(int fd)
+void ssl_async_fd_free(int fd)
 {
 	SSL *ssl = fdtab[fd].owner;
 	OSSL_ASYNC_FD all_fd[32];
-- 
2.17.1



Re: QAT intermittent healthcheck errors

2019-05-07 Thread Emeric Brun
Hi Marcin,>> As I use HAProxy 1.8 I had to adjust the patch (see attachment 
for end result). Unfortunately after applying the patch there is no change in 
behavior: we still leak /dev/usdm_drv descriptors and have "stuck" HAProxy 
instances after reload..
>>> Regards,
>>
>>

Could you perform a test recompiling the usdm_drv and the engine with this 
patch, it applies on QAT 1.7 but I've no hardware to test this version here.

It should fix the fd leak.

R,
Emeric
diff -urN quickassist.old/utilities/libusdm_drv/linux/user_space/qae_mem_hugepage_utils.c quickassist/utilities/libusdm_drv/linux/user_space/qae_mem_hugepage_utils.c
--- quickassist.old/utilities/libusdm_drv/linux/user_space/qae_mem_hugepage_utils.c	2019-05-07 11:35:15.654202291 +0200
+++ quickassist/utilities/libusdm_drv/linux/user_space/qae_mem_hugepage_utils.c	2019-05-07 11:35:44.302292417 +0200
@@ -104,7 +104,7 @@
 /* standard page size */
 page_size = getpagesize();
 
-fd = qae_open("/proc/self/pagemap", O_RDONLY);
+fd = qae_open("/proc/self/pagemap", O_RDONLY|O_CLOEXEC);
 if (fd < 0)
 {
 return 0;
diff -urN quickassist.old/utilities/libusdm_drv/linux/user_space/qae_mem_utils.c quickassist/utilities/libusdm_drv/linux/user_space/qae_mem_utils.c
--- quickassist.old/utilities/libusdm_drv/linux/user_space/qae_mem_utils.c	2019-03-15 15:23:43.0 +0100
+++ quickassist/utilities/libusdm_drv/linux/user_space/qae_mem_utils.c	2019-05-07 11:24:08.755921241 +0200
@@ -745,7 +745,7 @@
 
 if (fd > 0)
 close(fd);
-fd = qae_open(QAE_MEM, O_RDWR);
+fd = qae_open(QAE_MEM, O_RDWR|O_CLOEXEC);
 if (fd < 0)
 {
 CMD_ERROR("%s:%d Unable to initialize memory file handle %s \n",


Re: QAT intermittent healthcheck errors

2019-05-06 Thread Emeric Brun
Hi Marcin,

On 5/6/19 3:31 PM, Emeric Brun wrote:
> Hi Marcin,
> 
> On 5/6/19 3:15 PM, Marcin Deranek wrote:
>> Hi Emeric,
>>
>> On 5/3/19 5:54 PM, Emeric Brun wrote:
>>> Hi Marcin,
>>>
>>> On 5/3/19 4:56 PM, Marcin Deranek wrote:
>>>> Hi Emeric,
>>>>
>>>> On 5/3/19 4:50 PM, Emeric Brun wrote:
>>>>
>>>>> I've a testing platform here but I don't use the usdm_drv but the 
>>>>> qat_contig_mem and I don't reproduce this issue (I'm using QAT 1.5, as 
>>>>> the doc says to use with my chip) .
>>>>
>>>> I see. I use qat 1.7 and qat-engine 0.5.40.
>>>>
>>>>> Anyway, could you re-compile a haproxy's binary if I provide you a 
>>>>> testing patch?
>>>>
>>>> Sure, that should not be a problem.
>>>
>>> The patch in attachment.
>>
>> As I use HAProxy 1.8 I had to adjust the patch (see attachment for end 
>> result). Unfortunately after applying the patch there is no change in 
>> behavior: we still leak /dev/usdm_drv descriptors and have "stuck" HAProxy 
>> instances after reload..
>> Regards,
> 
> 
> Ok, the patch adds a ENGINE_finish() call before the reload. I was supposing 
> that the ENGINE_finish would perform the close of the fd because on the 
> application side there is no different way to interact with the engine.
> 
> Unfortunately, this is not the case. So I will ask to the intel guys what we 
> supposed to do to close this fd.


I've just written to my contact at intel.

Just for note: I'm using hardware supported with QAT 1.5 in this version tu 
usdm_drv was not present and I use the other option qat_contig_mem which seems 
to not cause such fd leak.

Perhaps to switch to this one would be a work-around if you want to continue to 
perform test waiting for intel's guy reply.

R,
Emeric



Re: QAT intermittent healthcheck errors

2019-05-06 Thread Emeric Brun
Hi Marcin,

On 5/6/19 3:15 PM, Marcin Deranek wrote:
> Hi Emeric,
> 
> On 5/3/19 5:54 PM, Emeric Brun wrote:
>> Hi Marcin,
>>
>> On 5/3/19 4:56 PM, Marcin Deranek wrote:
>>> Hi Emeric,
>>>
>>> On 5/3/19 4:50 PM, Emeric Brun wrote:
>>>
>>>> I've a testing platform here but I don't use the usdm_drv but the 
>>>> qat_contig_mem and I don't reproduce this issue (I'm using QAT 1.5, as the 
>>>> doc says to use with my chip) .
>>>
>>> I see. I use qat 1.7 and qat-engine 0.5.40.
>>>
>>>> Anyway, could you re-compile a haproxy's binary if I provide you a testing 
>>>> patch?
>>>
>>> Sure, that should not be a problem.
>>
>> The patch in attachment.
> 
> As I use HAProxy 1.8 I had to adjust the patch (see attachment for end 
> result). Unfortunately after applying the patch there is no change in 
> behavior: we still leak /dev/usdm_drv descriptors and have "stuck" HAProxy 
> instances after reload..
> Regards,


Ok, the patch adds a ENGINE_finish() call before the reload. I was supposing 
that the ENGINE_finish would perform the close of the fd because on the 
application side there is no different way to interact with the engine.

Unfortunately, this is not the case. So I will ask to the intel guys what we 
supposed to do to close this fd.

R,
Emeric 




Re: [External] Re: QAT intermittent healthcheck errors

2019-05-03 Thread Emeric Brun
Hi Marcin,

On 5/3/19 4:56 PM, Marcin Deranek wrote:
> Hi Emeric,
> 
> On 5/3/19 4:50 PM, Emeric Brun wrote:
> 
>> I've a testing platform here but I don't use the usdm_drv but the 
>> qat_contig_mem and I don't reproduce this issue (I'm using QAT 1.5, as the 
>> doc says to use with my chip) .
> 
> I see. I use qat 1.7 and qat-engine 0.5.40.
> 
>> Anyway, could you re-compile a haproxy's binary if I provide you a testing 
>> patch?
> 
> Sure, that should not be a problem.

The patch in attachment.
> 
>> The idea is to perform a deinit in the master to force a close of those 
>> '/dev's at each reload. Perhaps It won't fix our issue but this leak of fd 
>> should not be.
> 
> Hope this will give us at least some more insight..
> Regards,
> 
> Marcin Deranek

R,
Emeric
>From ca57857a492e898759ef211a8fd9714d0f7dd7fa Mon Sep 17 00:00:00 2001
From: Emeric Brun 
Date: Fri, 3 May 2019 17:06:59 +0200
Subject: [PATCH] BUG/MEDIUM: ssl: fix ssl engine's open fds are leaking.

The master didn't call the engine deinit, resulting
in a leak of fd opened by the engine during init. The
workers inherit of these accumulated fds at each reload.

This patch add a call to engine deinit on the master just
before reloading with an exec.
---
 src/haproxy.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/src/haproxy.c b/src/haproxy.c
index 603f084c..f77eb1b4 100644
--- a/src/haproxy.c
+++ b/src/haproxy.c
@@ -588,6 +588,13 @@ void mworker_reload()
 	if (fdtab)
 		deinit_pollers();
 
+#if defined(USE_OPENSSL)
+#ifndef OPENSSL_NO_ENGINE
+	/* Engines may have opened fds and we must close them */
+	ssl_free_engines();
+#endif
+#endif
+
 	/* restore the initial FD limits */
 	limit.rlim_cur = rlim_fd_cur_at_boot;
 	limit.rlim_max = rlim_fd_max_at_boot;
-- 
2.17.1



Re: [External] Re: QAT intermittent healthcheck errors

2019-05-03 Thread Emeric Brun
Hi Marcin,

Good so we progress!

I've a testing platform here but I don't use the usdm_drv but the 
qat_contig_mem and I don't reproduce this issue (I'm using QAT 1.5, as the doc 
says to use with my chip) .

Anyway, could you re-compile a haproxy's binary if I provide you a testing 
patch?

The idea is to perform a deinit in the master to force a close of those '/dev's 
at each reload. Perhaps It won't fix our issue but this leak of fd should not 
be.

R,
Emeric

On 5/3/19 4:21 PM, Marcin Deranek wrote:
> Hi Emeric,
> 
> It looks like on every reload master leaks /dev/usdm_drv device:
> 
> # systemctl restart haproxy.service
> # ls -la /proc/$(cat haproxy.pid)/fd|fgrep dev
> lr-x-- 1 root root 64 May  3 15:40 0 -> /dev/null
> lrwx-- 1 root root 64 May  3 15:40 7 -> /dev/usdm_drv
> 
> # systemctl reload haproxy.service
> # ls -la /proc/$(cat haproxy.pid)/fd|fgrep dev
> lr-x-- 1 root root 64 May  3 15:40 0 -> /dev/null
> lrwx-- 1 root root 64 May  3 15:40 7 -> /dev/usdm_drv
> lrwx-- 1 root root 64 May  3 15:40 9 -> /dev/usdm_drv
> 
> # systemctl reload haproxy.service
> # ls -la /proc/$(cat haproxy.pid)/fd|fgrep dev
> lr-x-- 1 root root 64 May  3 15:40 0 -> /dev/null
> lrwx-- 1 root root 64 May  3 15:40 10 -> /dev/usdm_drv
> lrwx-- 1 root root 64 May  3 15:40 7 -> /dev/usdm_drv
> lrwx-- 1 root root 64 May  3 15:40 9 -> /dev/usdm_drv
> 
> Obviously workers do inherit this from the master. Looking at workers I see 
> the following:
> 
> * 1st gen:
> 
> # ls -al /proc/36083/fd|awk '/dev/ {print $NF}'|sort
> /dev/null
> /dev/null
> /dev/qat_adf_ctl
> /dev/qat_adf_ctl
> /dev/qat_adf_ctl
> /dev/qat_dev_processes
> /dev/uio19
> /dev/uio3
> /dev/uio35
> /dev/usdm_drv
> 
> * 2nd gen:
> 
> # ls -al /proc/41637/fd|awk '/dev/ {print $NF}'|sort
> /dev/null
> /dev/null
> /dev/qat_adf_ctl
> /dev/qat_adf_ctl
> /dev/qat_adf_ctl
> /dev/qat_dev_processes
> /dev/uio23
> /dev/uio39
> /dev/uio7
> /dev/usdm_drv
> /dev/usdm_drv
> 
> Looks like only /dev/usdm_drv is leaked.
> 
> Cheers,
> 
> Marcin Deranek
> 
> On 5/3/19 2:22 PM, Emeric Brun wrote:
>> Hi Marcin,
>>
>> On 4/29/19 6:41 PM, Marcin Deranek wrote:
>>> Hi Emeric,
>>>
>>> On 4/29/19 3:42 PM, Emeric Brun wrote:
>>>> Hi Marcin,
>>>>
>>>>>
>>>>>> I've also a contact at intel who told me to try this option on the qat 
>>>>>> engine:
>>>>>>
>>>>>>> --disable-qat_auto_engine_init_on_fork/--enable-qat_auto_engine_init_on_fork
>>>>>>>    Disable/Enable the engine from being initialized automatically 
>>>>>>> following a
>>>>>>>    fork operation. This is useful in a situation where you want to 
>>>>>>> tightly
>>>>>>>    control how many instances are being used for processes. For 
>>>>>>> instance if an
>>>>>>>    application forks to start a process that does not utilize QAT 
>>>>>>> currently
>>>>>>>    the default behaviour is for the engine to still automatically 
>>>>>>> get started
>>>>>>>    in the child using up an engine instance. After using this flag 
>>>>>>> either the
>>>>>>>    engine needs to be initialized manually using the engine message:
>>>>>>>    INIT_ENGINE or will automatically get initialized on the first 
>>>>>>> QAT crypto
>>>>>>>    operation. The initialization on fork is enabled by default.
>>>>>
>>>>> I tried to build QAT Engine with disabled auto init, but that did not 
>>>>> help. Now I get the following during startup:
>>>>>
>>>>> 2019-04-29T15:13:47.142297+02:00 host1 hapee-lb[16604]: qaeOpenFd:753 
>>>>> Unable to initialize memory file handle /dev/usdm_drv
>>>>> 2019-04-29T15:13:47+02:00 localhost hapee-lb[16611]: 127.0.0.1:60512 
>>>>> [29/Apr/2019:15:13:47.139] vip1/23: SSL handshake failure
>>>>
>>>> " INIT_ENGINE or will automatically get initialized on the first QAT 
>>>> crypto operation"
>>>>
>>>> Perhaps the init appears "with first qat crypto operation" and is delayed 
>>>> after the fork so if a chroot is configured, it doesn't allow some accesses
>>>> to /dev. Could you perform a test in that case without chroot enabled in 
>>>> the haproxy config ?
>>>
>>> Removed chroot and now it initializes properly. Unfortunately reload still 
>>> causes "stuck" HAProxy process :-(
>>>
>>> Marcin Deranek
>>
>> Could you check with "ls -l /proc//fd" if the "/dev/" 
>> is open multiple times after a reload?
>>
>> Emeric
>>




Re: [External] Re: QAT intermittent healthcheck errors

2019-05-03 Thread Emeric Brun
Hi Marcin,

On 4/29/19 6:41 PM, Marcin Deranek wrote:
> Hi Emeric,
> 
> On 4/29/19 3:42 PM, Emeric Brun wrote:
>> Hi Marcin,
>>
>>>
>>>> I've also a contact at intel who told me to try this option on the qat 
>>>> engine:
>>>>
>>>>> --disable-qat_auto_engine_init_on_fork/--enable-qat_auto_engine_init_on_fork
>>>>>   Disable/Enable the engine from being initialized automatically 
>>>>> following a
>>>>>   fork operation. This is useful in a situation where you want to 
>>>>> tightly
>>>>>   control how many instances are being used for processes. For 
>>>>> instance if an
>>>>>   application forks to start a process that does not utilize QAT 
>>>>> currently
>>>>>   the default behaviour is for the engine to still automatically get 
>>>>> started
>>>>>   in the child using up an engine instance. After using this flag 
>>>>> either the
>>>>>   engine needs to be initialized manually using the engine message:
>>>>>   INIT_ENGINE or will automatically get initialized on the first QAT 
>>>>> crypto
>>>>>   operation. The initialization on fork is enabled by default.
>>>
>>> I tried to build QAT Engine with disabled auto init, but that did not help. 
>>> Now I get the following during startup:
>>>
>>> 2019-04-29T15:13:47.142297+02:00 host1 hapee-lb[16604]: qaeOpenFd:753 
>>> Unable to initialize memory file handle /dev/usdm_drv
>>> 2019-04-29T15:13:47+02:00 localhost hapee-lb[16611]: 127.0.0.1:60512 
>>> [29/Apr/2019:15:13:47.139] vip1/23: SSL handshake failure
>>
>> " INIT_ENGINE or will automatically get initialized on the first QAT crypto 
>> operation"
>>
>> Perhaps the init appears "with first qat crypto operation" and is delayed 
>> after the fork so if a chroot is configured, it doesn't allow some accesses
>> to /dev. Could you perform a test in that case without chroot enabled in the 
>> haproxy config ?
> 
> Removed chroot and now it initializes properly. Unfortunately reload still 
> causes "stuck" HAProxy process :-(
> 
> Marcin Deranek

Could you check with "ls -l /proc//fd" if the "/dev/" is 
open multiple times after a reload?

Emeric



Re: leak of handle to /dev/urandom since 1.8?

2019-05-03 Thread Emeric Brun
Hi Lukas,

On 5/3/19 1:49 PM, William Lallemand wrote:
> On Fri, May 03, 2019 at 01:38:00PM +0200, Lukas Tribus wrote:
>> Hello everyone,
>>
>>
>> On Fri, 3 May 2019 at 12:50, Robert Allen1  wrote:
>>> +#if defined(USE_OPENSSL) && (OPENSSL_VERSION_NUMBER >= 0x10101000L)
>>> +   if (global.ssl_used_frontend || global.ssl_used_backend)
>>> +   /* close random device FDs */
>>> +   RAND_keep_random_devices_open(0);
>>> +#endif
>>>
>>> and requests a backport to 1.8 and 1.9 where we noticed this issue (and
>>> which
>>> include the re-exec for reload code, if I followed its history
>>> thoroughly).
>>
>> Please do not commit this yet.
>>
>> We need those random devices open in openssl 1.1.1. We specifically
>> pushed for this and had very long conversations with openssl folks.
>>
>> I don't have time to dig up the entire history right now, will do that
>> later for context, however, please do not commit this yet.
>>
>>
> 
> Lukas,
> 
> This is the code of deinitilisation of the master, which is launched before
> the re-execution of the master, it does not impact the workers.
> 

Indeed if the workers keep the fd open it should work, the master is outside de 
chroot and doesn't need to keep the fd open.

Emeric



Re: QAT intermittent healthcheck errors

2019-04-29 Thread Emeric Brun
Hi Marcin,

> 
>> I've also a contact at intel who told me to try this option on the qat 
>> engine:
>>
>>> --disable-qat_auto_engine_init_on_fork/--enable-qat_auto_engine_init_on_fork
>>>  Disable/Enable the engine from being initialized automatically 
>>> following a
>>>  fork operation. This is useful in a situation where you want to tightly
>>>  control how many instances are being used for processes. For instance 
>>> if an
>>>  application forks to start a process that does not utilize QAT 
>>> currently
>>>  the default behaviour is for the engine to still automatically get 
>>> started
>>>  in the child using up an engine instance. After using this flag either 
>>> the
>>>  engine needs to be initialized manually using the engine message:
>>>  INIT_ENGINE or will automatically get initialized on the first QAT 
>>> crypto
>>>  operation. The initialization on fork is enabled by default.
> 
> I tried to build QAT Engine with disabled auto init, but that did not help. 
> Now I get the following during startup:
> 
> 2019-04-29T15:13:47.142297+02:00 host1 hapee-lb[16604]: qaeOpenFd:753 Unable 
> to initialize memory file handle /dev/usdm_drv
> 2019-04-29T15:13:47+02:00 localhost hapee-lb[16611]: 127.0.0.1:60512 
> [29/Apr/2019:15:13:47.139] vip1/23: SSL handshake failure

" INIT_ENGINE or will automatically get initialized on the first QAT crypto 
operation"

Perhaps the init appears "with first qat crypto operation" and is delayed after 
the fork so if a chroot is configured, it doesn't allow some accesses
to /dev. Could you perform a test in that case without chroot enabled in the 
haproxy config ?

> 
> Probably engine is not manually initialized after forking.
> Regards,
> 
> Marcin Deranek

Emeric



Re: [External] Re: QAT intermittent healthcheck errors

2019-04-29 Thread Emeric Brun
Hi Marcin,

On 4/19/19 3:26 PM, Marcin Deranek wrote:
> Hi Emeric,
> 
> On 4/18/19 4:35 PM, Emeric Brun wrote:
>>> An other interesting trace would be to perform a "show sess" command on a 
>>> stucked process through the master cli.
>>
>> And also the "show fd"
> 
> Here it is:
> 
> show proc
> #         
> 13409   master  0   1   0d 00h03m30s
> # workers
> 15084   worker  1   0   0d 00h03m20s
> 15085   worker  2   0   0d 00h03m20s
> 15086   worker  3   0   0d 00h03m20s
> 15087   worker  4   0   0d 00h03m20s
> # old workers
> 13415   worker  [was: 1]    1   0d 00h03m30s
> 13416   worker  [was: 2]    1   0d 00h03m30s
> 13417   worker  [was: 3]    1   0d 00h03m30s
> 13418   worker  [was: 4]    1   0d 00h03m30s
> 
> @!13415 show sess
> 0x4eee9c0: proto=sockpair ts=0a age=0s calls=1 
> rq[f=40c0c220h,i=0,an=00h,rx=,wx=,ax=] rp[f=80008000h,i=0,an=00h,rx=,wx=,ax=] 
> s0=[7,8h,fd=20,ex=] s1=[7,4018h,fd=-1,ex=] exp=
> 
> @!13415 show fd
>  13 : st=0x05(R:PrA W:pra) ev=0x01(heopI) [lc] cache=0 owner=0x1a74ae0 
> iocb=0x487760(mworker_accept_wrapper) tmask=0x1 umask=0x0
>  16 : st=0x05(R:PrA W:pra) ev=0x00(heopi) [lc] cache=0 owner=0x4e19f0 
> iocb=0x4e19f0(thread_sync_io_handler) tmask=0x umask=0x0
>  20 : st=0x22(R:pRa W:pRa) ev=0x00(heopi) [lc] cache=0 owner=0x4fe1860 
> iocb=0x4ce620(conn_fd_handler) tmask=0x1 umask=0x0 cflg=0x00241300 fe=GLOBAL 
> mux=PASS mux_ctx=0x47dfd50
>  87 : st=0x05(R:PrA W:pra) ev=0x00(heopi) [lc] cache=0 owner=0x3ec1150 
> iocb=0x4f5d30(unknown) tmask=0x1 umask=0x0
>  88 : st=0x05(R:PrA W:pra) ev=0x00(heopi) [lc] cache=0 owner=0x3c237d0 
> iocb=0x4f5d30(unknown) tmask=0x1 umask=0x0
> 
> @!13416 show sess
> 0x48f2990: proto=sockpair ts=0a age=0s calls=1 
> rq[f=40c0c220h,i=0,an=00h,rx=,wx=,ax=] rp[f=80008000h,i=0,an=00h,rx=,wx=,ax=] 
> s0=[7,8h,fd=20,ex=] s1=[7,4018h,fd=-1,ex=] exp=
> 
> @!13416 show fd
>  15 : st=0x05(R:PrA W:pra) ev=0x01(heopI) [lc] cache=0 owner=0x34c1540 
> iocb=0x487760(mworker_accept_wrapper) tmask=0x1 umask=0x0
>  16 : st=0x05(R:PrA W:pra) ev=0x00(heopi) [lc] cache=0 owner=0x4e19f0 
> iocb=0x4e19f0(thread_sync_io_handler) tmask=0x umask=0x0
>  20 : st=0x22(R:pRa W:pRa) ev=0x00(heopi) [lc] cache=0 owner=0x4b3cff0 
> iocb=0x4ce620(conn_fd_handler) tmask=0x1 umask=0x0 cflg=0x00241300 fe=GLOBAL 
> mux=PASS mux_ctx=0x4f0e510
>  75 : st=0x05(R:PrA W:pra) ev=0x00(heopi) [lc] cache=0 owner=0x3a6b2f0 
> iocb=0x4f5d30(unknown) tmask=0x1 umask=0x0
>  76 : st=0x05(R:PrA W:pra) ev=0x00(heopi) [lc] cache=0 owner=0x43a34e0 
> iocb=0x4f5d30(unknown) tmask=0x1 umask=0x0
> 
> Marcin Deranek

87,88,75,76 appears to be async engine FDs and should be cleaned. I will dig 
for that.

I've also a contact at intel who told me to try this option on the qat engine:

> --disable-qat_auto_engine_init_on_fork/--enable-qat_auto_engine_init_on_fork
> Disable/Enable the engine from being initialized automatically following a
> fork operation. This is useful in a situation where you want to tightly
> control how many instances are being used for processes. For instance if 
> an
> application forks to start a process that does not utilize QAT currently
> the default behaviour is for the engine to still automatically get started
> in the child using up an engine instance. After using this flag either the
> engine needs to be initialized manually using the engine message:
> INIT_ENGINE or will automatically get initialized on the first QAT crypto
> operation. The initialization on fork is enabled by default.


R,
Emeric



Re: QAT intermittent healthcheck errors

2019-04-18 Thread Emeric Brun
On 4/18/19 11:06 AM, Emeric Brun wrote:
> Hi Marcin,
> 
> On 4/12/19 6:10 PM, Marcin Deranek wrote:
>> Hi Emeric,
>>
>> On 4/12/19 5:26 PM, Emeric Brun wrote:
>>
>>> Do you have ssl enabled on the server side?
>>
>> Yes, ssl is on frontend and backend with ssl checks enabled.
>>
>>> If it is the case could replace health check with a simple tcp check 
>>> (without ssl)?
>>
>> What I noticed before that if I (re)start HAProxy and reload immediately no 
>> stuck processes are present. If I wait before reloading stuck processes show 
>> up.
>> After disabling checks (I still keep ssl enabled for normal traffic) reloads 
>> work just fine (tried many time). Do you know how to enable TCP healthchecks 
>> while keeping SSL for non-healthcheck requests ?
> 
> I think you can do that this way:
> 
> Remove the option httchk (or prefix it by "no": "no option httchk " if it is 
> configured into the defaults section
> 
> and add the following 2 lines:
> 
> option tcp-check
> tcp-check connect
> 
> This shouldn't perform the handshake but just validate that the port is open. 
> The regular traffic will continue to use the ssl
> on server side.
> 
>  
>>> Regarding the show info/lsoff  it seems there is no more sessions on client 
>>> side but remaining ssl jobs (CurrSslConns) and I supsect the health checks 
>>> to miss a cleanup of their ssl sessions using the QAT. (this is just an 
>>> assumption)
>>
>> In general instance where I test QAT does not have any "real" client traffic 
>> except small amount of healtcheck requests per frontend which are internally 
>> handled by HAProxy itself. Still TLS handshake still needs to take place. 
>> There are many more backend healthchecks. Looks like your assumption was 
>> correct..
> 
> Good!, We continue to dig in that direction.
> 
> An other interesting trace would be to perform a "show sess" command on a 
> stucked process through the master cli.

And also the "show fd" 

R,
Emeric



Re: QAT intermittent healthcheck errors

2019-04-18 Thread Emeric Brun
Hi Marcin,

On 4/12/19 6:10 PM, Marcin Deranek wrote:
> Hi Emeric,
> 
> On 4/12/19 5:26 PM, Emeric Brun wrote:
> 
>> Do you have ssl enabled on the server side?
> 
> Yes, ssl is on frontend and backend with ssl checks enabled.
> 
>> If it is the case could replace health check with a simple tcp check 
>> (without ssl)?
> 
> What I noticed before that if I (re)start HAProxy and reload immediately no 
> stuck processes are present. If I wait before reloading stuck processes show 
> up.
> After disabling checks (I still keep ssl enabled for normal traffic) reloads 
> work just fine (tried many time). Do you know how to enable TCP healthchecks 
> while keeping SSL for non-healthcheck requests ?

I think you can do that this way:

Remove the option httchk (or prefix it by "no": "no option httchk " if it is 
configured into the defaults section

and add the following 2 lines:

option tcp-check
tcp-check connect

This shouldn't perform the handshake but just validate that the port is open. 
The regular traffic will continue to use the ssl
on server side.

 
>> Regarding the show info/lsoff  it seems there is no more sessions on client 
>> side but remaining ssl jobs (CurrSslConns) and I supsect the health checks 
>> to miss a cleanup of their ssl sessions using the QAT. (this is just an 
>> assumption)
> 
> In general instance where I test QAT does not have any "real" client traffic 
> except small amount of healtcheck requests per frontend which are internally 
> handled by HAProxy itself. Still TLS handshake still needs to take place. 
> There are many more backend healthchecks. Looks like your assumption was 
> correct..

Good!, We continue to dig in that direction.

An other interesting trace would be to perform a "show sess" command on a 
stucked process through the master cli.

R,
Emeric



Re: [External] Re: QAT intermittent healthcheck errors

2019-04-12 Thread Emeric Brun
Hi Marcin,

Do you have ssl enabled on the server side? If it is the case could replace 
health check with a simple tcp check (without ssl)?

Regarding the show info/lsoff  it seems there is no more sessions on client 
side but remaining ssl jobs (CurrSslConns) and I supsect the health checks to 
miss a cleanup of their ssl sessions using the QAT. (this is just an 
assumption) 

R,
Emeric

On 4/12/19 4:43 PM, Marcin Deranek wrote:
> Hi Emeric,
> 
> On 4/10/19 2:20 PM, Emeric Brun wrote:
> 
>> On 4/10/19 1:02 PM, Marcin Deranek wrote:
>>> Hi Emeric,
>>>
>>> Our process limit in QAT configuration is quite high (128) and I was able 
>>> to run 100+ openssl processes without a problem. According to Joel from 
>>> Intel problem is in cleanup code - presumably when HAProxy exits and frees 
>>> up QAT resources. Will try to see if I can get more debug information.
>>
>> I've just take a look.
>>
>> Engines deinit ar called:
>>
>> haproxy/src/ssl_sock.c
>> #ifndef OPENSSL_NO_ENGINE
>> void ssl_free_engines(void) {
>>  struct ssl_engine_list *wl, *wlb;
>>  /* free up engine list */
>>  list_for_each_entry_safe(wl, wlb, _engines, list) {
>>  ENGINE_finish(wl->e);
>>  ENGINE_free(wl->e);
>>  LIST_DEL(>list);
>>  free(wl);
>>  }
>> }
>> #endif
>> ...
>> #ifndef OPENSSL_NO_ENGINE
>>  hap_register_post_deinit(ssl_free_engines);
>> #endif
>>
>> I don't know how many haproxy processes you are running but if I describe 
>> the complete scenario of processes you may note that we reach a limit:
> 
> It's very unlikely it's the limit as I lowered number of HAProxy processes 
> (from 10 to 4) while keeping QAT NumProcesses equal 32. HAProxy would have 
> problem with this limit while spawning new instances and not tearing down old 
> ones. In such a case QAT would not be initialized for some HAProxy instances 
> (you would see 1 thread vs 2 thread). About threads read below.
> 
>> - the master sends a signal to older processes, those process will unbind 
>> and stop to accept new conns but continue to serve remaining sessions until 
>> the end.
>> - new processes are started and immediately and init the engine and accept 
>> newconns.
>> - When no more sessions remains on an old process, it calls the deinit 
>> function of the engine before exiting
> 
> What I noticed is that each HAProxy with QAT enabled has 2 threads (LWP) - 
> looks like QAT adds extra thread to the process itself. Would adding extra 
> thread possibly mess up HAProxy termination sequence ?
> Our setup is to run HAProxy in multi process mode - no threads (or 1 thread 
> per process if you wish).
> 
>> I'm also supposed that old processes are stucked because there is some 
>> sessions which never ended, perhaps I'm wrong but a strace on an old process
>> could be interesting to know why those processes are stucked.
> 
> strace only shows these:
> 
> [pid 11392] 23:24:43.164619 epoll_wait(4,  
> [pid 11392] 23:24:43.164687 <... epoll_wait resumed> [], 200, 0) = 0
> [pid 11392] 23:24:43.164761 epoll_wait(4,  
> [pid 11392] 23:24:43.953203 <... epoll_wait resumed> [], 200, 788) = 0
> [pid 11392] 23:24:43.953286 epoll_wait(4,  
> [pid 11392] 23:24:43.953355 <... epoll_wait resumed> [], 200, 0) = 0
> [pid 11392] 23:24:43.953419 epoll_wait(4,  
> [pid 11392] 23:24:44.010508 <... epoll_wait resumed> [], 200, 57) = 0
> [pid 11392] 23:24:44.010589 epoll_wait(4,  
> 
> There are no connections: stucked process only has UDP socket on random port:
> 
> [root@externallb-124 ~]# lsof -p 6307|fgrep IPv4
> hapee-lb 6307 lbengine   83u IPv4 3598779351  0t0 UDP *:19573
> 
> 
>> You can also use the 'master CLI' using '-S' and you could check if it 
>> remains sessions on those older processes (doc is available in 
>> management.txt)
> 
> Before reload
> * systemd
>  Main PID: 33515 (hapee-lb)
>    Memory: 1.6G
>    CGroup: /system.slice/hapee-1.8-lb.service
>    ├─33515 /opt/hapee-1.8/sbin/hapee-lb -Ws -f 
> /etc/lb_engine/haproxy.cfg -p /run/hapee-lb.pid -S 127.0.0.1:1234
>    ├─34858 /opt/hapee-1.8/sbin/hapee-lb -Ws -f 
> /etc/lb_engine/haproxy.cfg -p /run/hapee-lb.pid -S 127.0.0.1:1234
>    ├─34859 /opt/hapee-1.8/sbin/hapee-lb -Ws -f 
> /etc/lb_engine/haproxy.cfg -p /run/hapee-lb.pid -S 127.0.0.1:1234
>    ├─34860 /opt/hapee-1.8/sbin/hapee-lb -Ws -f 
> /etc/lb_engine/haproxy.cfg -p /run/hapee-lb.pid -S 127.0.0.1:1234
>    └─34861 /opt/hapee-1.8/

Re: [External] Re: QAT intermittent healthcheck errors

2019-04-10 Thread Emeric Brun
Hi Marcin,

> You can also use the 'master CLI' using '-S' and you could check if it 
> remains sessions on those older processes (doc is available in management.txt)
Here the doc:

https://cbonte.github.io/haproxy-dconv/1.9/management.html#9.4

Emeric



Re: [External] Re: QAT intermittent healthcheck errors

2019-04-10 Thread Emeric Brun
Hi Marcin,

On 4/10/19 1:02 PM, Marcin Deranek wrote:
> Hi Emeric,
> 
> Our process limit in QAT configuration is quite high (128) and I was able to 
> run 100+ openssl processes without a problem. According to Joel from Intel 
> problem is in cleanup code - presumably when HAProxy exits and frees up QAT 
> resources. Will try to see if I can get more debug information.

I've just take a look.

Engines deinit ar called:

haproxy/src/ssl_sock.c
#ifndef OPENSSL_NO_ENGINE
void ssl_free_engines(void) {
struct ssl_engine_list *wl, *wlb;
/* free up engine list */
list_for_each_entry_safe(wl, wlb, _engines, list) {
ENGINE_finish(wl->e);
ENGINE_free(wl->e);
LIST_DEL(>list);
free(wl);
}
}
#endif
...
#ifndef OPENSSL_NO_ENGINE
hap_register_post_deinit(ssl_free_engines);
#endif

I don't know how many haproxy processes you are running but if I describe the 
complete scenario of processes you may note that we reach a limit:

- the master sends a signal to older processes, those process will unbind and 
stop to accept new conns but continue to serve remaining sessions until the end.
- new processes are started and immediately and init the engine and accept 
newconns.
- When no more sessions remains on an old process, it calls the deinit function 
of the engine before exiting

So there is a time window where you have 2x the number of processes configured 
in haproxy using the engine.

I'm also supposed that old processes are stucked because there is some sessions 
which never ended, perhaps I'm wrong but a strace on an old process
could be interesting to know why those processes are stucked.

You can also use the 'master CLI' using '-S' and you could check if it remains 
sessions on those older processes (doc is available in management.txt)


Emeric



Re: QAT intermittent healthcheck errors

2019-04-09 Thread Emeric Brun
Hi Marcin,

On 4/9/19 3:07 PM, Marcin Deranek wrote:
> Hi Emeric,
> 
> I have followed all instructions and I got to the point where HAProxy starts 
> and does the job using QAT (backend healthchecks work and I frontend can 
> provide content over HTTPS). The problems starts when HAProxy gets reloaded. 
> With our current configuration on reload old HAProxy processes do not exit, 
> so after reload you end up with 2 generations of HAProxy processes: before 
> reload and after reload. I tried to find out what are conditions in which 
> HAProxy processes get "stuck" and I was not able to replicate it 
> consistently. In one case it was related to amount of backend servers with 
> 'ssl' on their line, but trying to add 'ssl' to some other servers in other 
> place had no effect. Interestingly in some cases for example with simple 
> configuration (1 frontend + 1 backend) HAProxy produced errors on reload (see 
> attachment) - in those cases processes rarely got "stuck" even though errors 
> were present.
> /dev/qat_adf_ctl is group writable for the group HAProxy runs on. Any help to 
> get this fixed / resolved would be welcome.
> Regards,
> 
> Marcin Deranek

I've checked the errors.txt and all the messages were written by the engine and 
are not part of the haproxy code. I can only do supposition for now but I think 
we face a first error due to a limitation of the amount of processes trying to 
access the engine: the reload will double the number of processes trying to 
attach the engine. Perhaps this issue can be bypassed tweaking the qat 
configuration file (some advise, from intel would be wellcome).

For the old stucked processes: I think the grow of processes also triggers 
errors on already attached ones in the qat engine but currently I ignore the 
way this errors are/should be raised to the application, it appears that they 
are currently not handled and that's why processes would be stuck (sessions may 
appear still valid for haproxy so the old process continues to wait for their 
end). We expected they were raised by the openssl API but it appears to not be 
the case. We have to check if we miss to handle an error  polling events on the 
file descriptor used to communicate with engine.


So we have to dig deeper and any help from Intel's guy or Qat aware devs will 
be appreciate.

Emeric



Re: [PATCH] MINOR: ssl: Add aes_gcm_dec converter

2019-03-22 Thread Emeric Brun
On 3/22/19 12:04 PM, Nenad Merdanovic wrote:
> I've just renamed the converter based on Emeric's suggestion. And fixed a 
> typo in the doc of course.
> 
> Regards,
> Nenad
> 

Thanks Nenad, well done!

R,
Emeric



Re: [PATCH] ssl: ability to set TLS 1.3 ciphers using ssl-default-server-ciphersuites

2019-03-22 Thread Emeric Brun
Hi Pierre,

On 3/21/19 5:15 PM, Pierre Cheynier wrote:
> Any attempt to put TLS 1.3 ciphers on servers failed with output 'unable
> to set TLS 1.3 cipher suites'.
> 
> This was due to usage of SSL_CTX_set_cipher_list instead of
> SSL_CTX_set_ciphersuites in the TLS 1.3 block (protected by
> OPENSSL_VERSION_NUMBER >= 0x10101000L & so).
> 
> Signed-off-by: Pierre Cheynier 
> Reported-by: Damien Claisse 
> ---
>  src/ssl_sock.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/src/ssl_sock.c b/src/ssl_sock.c
> index 138b1c58c..47548edc1 100644
> --- a/src/ssl_sock.c
> +++ b/src/ssl_sock.c
> @@ -4785,7 +4785,7 @@ int ssl_sock_prepare_srv_ctx(struct server *srv)
>  
>  #if (OPENSSL_VERSION_NUMBER >= 0x10101000L && !defined OPENSSL_IS_BORINGSSL 
> && !defined LIBRESSL_VERSION_NUMBER)
>   if (srv->ssl_ctx.ciphersuites &&
> - !SSL_CTX_set_cipher_list(srv->ssl_ctx.ctx, 
> srv->ssl_ctx.ciphersuites)) {
> + !SSL_CTX_set_ciphersuites(srv->ssl_ctx.ctx, 
> srv->ssl_ctx.ciphersuites)) {
>   ha_alert("Proxy '%s', server '%s' [%s:%d] : unable to set TLS 
> 1.3 cipher suites to '%s'.\n",
>curproxy->id, srv->id,
>srv->conf.file, srv->conf.line, 
> srv->ssl_ctx.ciphersuites);
> 

Indeed, there is a copy-past mistake here from "ciphers".

We must merge this patch and and backport is needed on both 1.9 and 1.8.

R,
Emeric



Re: [External] Re: QAT intermittent healthcheck errors

2019-03-13 Thread Emeric Brun
Hi Marcin,

On 3/11/19 4:27 PM, Marcin Deranek wrote:
> On 3/11/19 11:51 AM, Emeric Brun wrote:
> 
>> Mode async is enabled on both sides, server and frontend side.
>>
>> But on server side, haproxy is using session resuming, so there is a new key 
>> computation (full handshake with RSA/DSA computation) only every 5 minutes 
>> (openssl default value).
>>
>> You can force to recompute each time setting "no-ssl-reuse" on server line, 
>> but it will add a heavy load for ssl computation on the server.
> 
> Indeed, setting no-ssl-reuse makes use of QAT for healthchecks.
> Looks like finally we are ready for QAT testing.
> Thank you Emeric.
> Regards,
> 
> Marcin Deranek
> 


I've just re-check and i think you should also enable the 'PKEY_CRYPTO' algo to 
the engine

ssl-engine qat algo RSA,DSA,EC,DH,PKEY_CRYPTO

It will enable rhe offloading of the TLS1-PRF you can see there:

# /opt/booking-openssl/bin/openssl engine -c qat
(qat) Reference implementation of QAT crypto engine
 [RSA, DSA, DH, AES-128-CBC-HMAC-SHA1, AES-128-CBC-HMAC-SHA256, 
AES-256-CBC-HMAC-SHA1, AES-256-CBC-HMAC-SHA256, TLS1-PRF]

R,
Emeric



Re: QAT intermittent healthcheck errors

2019-03-11 Thread Emeric Brun
On 3/11/19 11:51 AM, Emeric Brun wrote:
> On 3/11/19 11:06 AM, Marcin Deranek wrote:
>> Hi Emeric,
>>
>> On 3/8/19 11:24 AM, Emeric Brun wrote:
>>> Are you sure that servers won't use ECDSA certificates? Do you check that 
>>> conn are successful forcing 'ECDHE-RSA-AES256-GCM-SHA384'
>>
>> Backend servers only support TLS 1.2 and RSA certificates.
>>
>>> Could you check algo supported by QAT doing this ?:
>>> openssl  engine -c qat
>>
>> # /opt/booking-openssl/bin/openssl engine -c qat
>> (qat) Reference implementation of QAT crypto engine
>>  [RSA, DSA, DH, AES-128-CBC-HMAC-SHA1, AES-128-CBC-HMAC-SHA256, 
>> AES-256-CBC-HMAC-SHA1, AES-256-CBC-HMAC-SHA256, TLS1-PRF]
>>
>>> Could you retry with this config:
>>> ssl-engine qat algo RSA,DSA,EC,DH
>>
>> Just did that and experienced the very same effect: no QAT activity for 
>> backend server healthchecks :-( When I add frontend eg.
>>
>> frontend frontend1
>>     bind 127.0.0.1:8443 ssl crt 
>> /etc/lb_engine/data/generated/ssl/10.252.24.7:443
>>     default_backend pool_all
>>
>> and make some connections/requests (TLS1.2 and/or TLS/1.3) to the frontend I 
>> see QAT activity, but *NO* activity when HAProxy is "idle" (only doing 
>> healthchecks to backend servers: TLS 1.2 only).
>> This feels like healthchecks are not passing through QAT engine for whatever 
>> reason :-( Even enabling HTTP check for the backend (option httpchk) does 
>> not make any difference.
>> The question: Is SSL Async Mode actually supported on the backend side 
>> (either healthchecks and/or normal traffic) ?
>> Regards,
> 
> Mode async is enabled on both sides, server and frontend side.
> 
> But on server side, haproxy is using session resuming, so there is a new key 
> computation (full handshake with RSA/DSA computation) only every 5 minutes 
> (openssl default value).
> 
> You can force to recompute each time setting "no-ssl-reuse" on server line, 
> but it will add a heavy load for ssl computation on the server.
> 
> 
> R,
> Emeric
> 

I've just realized that what you observe is the expected behavior: QAT offloads 
on the frontend side, and this is what we want: to offload on QAT the heavy 
load of key computing on the frontend side (the
support of async engines in haproxy was added for this reason).

On the backend side, haproxy acts as a client, re-using session and even if a 
key is re-computed by the server, the cost of processing on the haproxy's 
backend side is much lower compared to frontend side,
perhaps it is not even implemented into QAT.

Once again, you could add the "no-ssl-reuse" statement if you want to check if 
QAT offloads the backend side, but it is clearly not an optimal option for 
production because it will generate an heavy load
on your servers and force them to recompute keys for each connections.

R,
Emeric



Re: QAT intermittent healthcheck errors

2019-03-11 Thread Emeric Brun
On 3/11/19 11:06 AM, Marcin Deranek wrote:
> Hi Emeric,
> 
> On 3/8/19 11:24 AM, Emeric Brun wrote:
>> Are you sure that servers won't use ECDSA certificates? Do you check that 
>> conn are successful forcing 'ECDHE-RSA-AES256-GCM-SHA384'
> 
> Backend servers only support TLS 1.2 and RSA certificates.
> 
>> Could you check algo supported by QAT doing this ?:
>> openssl  engine -c qat
> 
> # /opt/booking-openssl/bin/openssl engine -c qat
> (qat) Reference implementation of QAT crypto engine
>  [RSA, DSA, DH, AES-128-CBC-HMAC-SHA1, AES-128-CBC-HMAC-SHA256, 
> AES-256-CBC-HMAC-SHA1, AES-256-CBC-HMAC-SHA256, TLS1-PRF]
> 
>> Could you retry with this config:
>> ssl-engine qat algo RSA,DSA,EC,DH
> 
> Just did that and experienced the very same effect: no QAT activity for 
> backend server healthchecks :-( When I add frontend eg.
> 
> frontend frontend1
>     bind 127.0.0.1:8443 ssl crt 
> /etc/lb_engine/data/generated/ssl/10.252.24.7:443
>     default_backend pool_all
> 
> and make some connections/requests (TLS1.2 and/or TLS/1.3) to the frontend I 
> see QAT activity, but *NO* activity when HAProxy is "idle" (only doing 
> healthchecks to backend servers: TLS 1.2 only).
> This feels like healthchecks are not passing through QAT engine for whatever 
> reason :-( Even enabling HTTP check for the backend (option httpchk) does not 
> make any difference.
> The question: Is SSL Async Mode actually supported on the backend side 
> (either healthchecks and/or normal traffic) ?
> Regards,

Mode async is enabled on both sides, server and frontend side.

But on server side, haproxy is using session resuming, so there is a new key 
computation (full handshake with RSA/DSA computation) only every 5 minutes 
(openssl default value).

You can force to recompute each time setting "no-ssl-reuse" on server line, but 
it will add a heavy load for ssl computation on the server.


R,
Emeric



Re: QAT intermittent healthcheck errors

2019-03-08 Thread Emeric Brun
Hi Marcin,

On 3/7/19 6:43 PM, Marcin Deranek wrote:
> Hi,
> 
> On 3/6/19 6:36 PM, Emeric Brun wrote:
>> According to the documentation:
>>
>> ssl-mode-async
>>    Adds SSL_MODE_ASYNC mode to the SSL context. This enables asynchronous TLS
>>    I/O operations if asynchronous capable SSL engines are used. The current
>>    implementation supports a maximum of 32 engines. The Openssl ASYNC API
>>    doesn't support moving read/write buffers and is not compliant with
>>    haproxy's buffer management. So the asynchronous mode is disabled on
>>    read/write  operations (it is only enabled during initial and reneg
>>    handshakes).
>>
>> Asynchronous mode is disabled on the read/write operation and is only 
>> enabled during handshake.
>>
>> It means that for the ciphering process the engine will be used in blocking 
>> mode (not async) which could result to
>> unpredictable behavior on timers because the haproxy process will 
>> sporadically fully blocked waiting for the engine.
>>
>> To avoid this issue, you should ensure to use QAT only for the asymmetric 
>> computing algorithm (such as RSA DSA ECDSA).
>> and not for ciphering ones (AES and everything else ...)
> 
> I did explicitly enabled RSA algos:
> 
> ssl-engine qat algo RSA
> 
> and errors were gone at that point. Unfortunately all QAT activity too as
> 
> /sys/kernel/debug/qat_c6xx_\:0*/fw_counters
> 
> were reporting identical values (previously they were incrementing).
> 
> I did explicitly enforce RSA:
> 
> ssl-default-server-ciphers ECDHE-RSA-AES256-GCM-SHA384

I've just realized that if your server are TLSv1.3  ssl-default-server-ciphers
won't force anything (see ssl-default-server-ciphersuites documentation)

R,
Emeric



Re: QAT intermittent healthcheck errors

2019-03-08 Thread Emeric Brun


Hi Marcin,

On 3/7/19 6:43 PM, Marcin Deranek wrote:
> Hi,
> 
> On 3/6/19 6:36 PM, Emeric Brun wrote:
>> According to the documentation:
>>
>> ssl-mode-async
>>    Adds SSL_MODE_ASYNC mode to the SSL context. This enables asynchronous TLS
>>    I/O operations if asynchronous capable SSL engines are used. The current
>>    implementation supports a maximum of 32 engines. The Openssl ASYNC API
>>    doesn't support moving read/write buffers and is not compliant with
>>    haproxy's buffer management. So the asynchronous mode is disabled on
>>    read/write  operations (it is only enabled during initial and reneg
>>    handshakes).
>>
>> Asynchronous mode is disabled on the read/write operation and is only 
>> enabled during handshake.
>>
>> It means that for the ciphering process the engine will be used in blocking 
>> mode (not async) which could result to
>> unpredictable behavior on timers because the haproxy process will 
>> sporadically fully blocked waiting for the engine.
>>
>> To avoid this issue, you should ensure to use QAT only for the asymmetric 
>> computing algorithm (such as RSA DSA ECDSA).
>> and not for ciphering ones (AES and everything else ...)
> 
> I did explicitly enabled RSA algos:
> 
> ssl-engine qat algo RSA
> 
> and errors were gone at that point. Unfortunately all QAT activity too as
> 
> /sys/kernel/debug/qat_c6xx_\:0*/fw_counters
> 
> were reporting identical values (previously they were incrementing).
> 
> I did explicitly enforce RSA:
> 
> ssl-default-server-ciphers ECDHE-RSA-AES256-GCM-SHA384
> 
> but that did not help. Do I miss something ?
> Regards,
> 
> Marcin Deranek
> 

Are you sure that servers won't use ECDSA certificates? Do you check that conn 
are successful forcing 'ECDHE-RSA-AES256-GCM-SHA384'

Could you check algo supported by QAT doing this ?:
openssl  engine -c qat

Could you retry with this config:
ssl-engine qat algo RSA,DSA,EC,DH


R,
Emeric





Re: QAT intermittent healthcheck errors

2019-03-06 Thread Emeric Brun
Hi Marcin,

On 3/6/19 3:23 PM, Marcin Deranek wrote:
> Hi,
> 
> In a process of evaluating performance of Intel Quick Assist Technology in 
> conjunction with HAProxy software I acquired Intel C62x Chipset card for 
> testing. I configured QAT engine in the following manner:
> 
> * /etc/qat/c6xx_dev[012].conf
> 
> [GENERAL]
> ServicesEnabled = cy
> ConfigVersion = 2
> CyNumConcurrentSymRequests = 512
> CyNumConcurrentAsymRequests = 64
> statsGeneral = 1
> statsDh = 1
> statsDrbg = 1
> statsDsa = 1
> statsEcc = 1
> statsKeyGen = 1
> statsDc = 1
> statsLn = 1
> statsPrime = 1
> statsRsa = 1
> statsSym = 1
> KptEnabled = 0
> StorageEnabled = 0
> PkeServiceDisabled = 0
> DcIntermediateBufferSizeInKB = 64
> 
> [KERNEL]
> NumberCyInstances = 0
> NumberDcInstances = 0
> 
> [SHIM]
> NumberCyInstances = 1
> NumberDcInstances = 0
> NumProcesses = 16
> LimitDevAccess = 0
> 
> Cy0Name = "UserCY0"
> Cy0IsPolled = 1
> Cy0CoreAffinity = 0
> 
> OpenSSL produces good results without warnings / errors:
> 
> * No QAT involved
> 
> $ openssl speed -elapsed rsa2048
> You have chosen to measure elapsed time instead of user CPU time.
> Doing 2048 bits private rsa's for 10s: 10858 2048 bits private RSA's in 10.00s
> Doing 2048 bits public rsa's for 10s: 361207 2048 bits public RSA's in 10.00s
> OpenSSL 1.1.1a FIPS  20 Nov 2018
> built on: Tue Jan 22 20:43:41 2019 UTC
> options:bn(64,64) md2(char) rc4(16x,int) des(int) aes(partial) idea(int) 
> blowfish(ptr)
> compiler: gcc -fPIC -pthread -m64 -Wa,--noexecstack -Wall -O3 -O2 -g -pipe 
> -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong 
> --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic 
> -Wa,--noexecstack -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC 
> -DOPENSSL_CPUID_OBJ -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT 
> -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM 
> -DSHA512_ASM -DKECCAK1600_ASM -DRC4_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM 
> -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DX25519_ASM -DPADLOCK_ASM 
> -DPOLY1305_ASM -DZLIB -DNDEBUG -DPURIFY -DDEVRANDOM="\"/dev/urandom\"" 
> -DSYSTEM_CIPHERS_FILE="/opt/openssl/etc/crypto-policies/back-ends/openssl.config"
>   sign    verify    sign/s verify/s
> rsa 2048 bits 0.000921s 0.28s   1085.8  36120.7
> 
> * QAT enabled
> 
> $ openssl speed -elapsed -engine qat -async_jobs 32 rsa2048
> engine "qat" set.
> You have chosen to measure elapsed time instead of user CPU time.
> Doing 2048 bits private rsa's for 10s: 205425 2048 bits private RSA's in 
> 10.00s
> Doing 2048 bits public rsa's for 10s: 2150270 2048 bits public RSA's in 10.00s
> OpenSSL 1.1.1a FIPS  20 Nov 2018
> built on: Tue Jan 22 20:43:41 2019 UTC
> options:bn(64,64) md2(char) rc4(16x,int) des(int) aes(partial) idea(int) 
> blowfish(ptr)
> compiler: gcc -fPIC -pthread -m64 -Wa,--noexecstack -Wall -O3 -O2 -g -pipe 
> -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong 
> --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic 
> -Wa,--noexecstack -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC 
> -DOPENSSL_CPUID_OBJ -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT 
> -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM 
> -DSHA512_ASM -DKECCAK1600_ASM -DRC4_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM 
> -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DX25519_ASM -DPADLOCK_ASM 
> -DPOLY1305_ASM -DZLIB -DNDEBUG -DPURIFY -DDEVRANDOM="\"/dev/urandom\"" 
> -DSYSTEM_CIPHERS_FILE="/opt/openssl/etc/crypto-policies/back-ends/openssl.config"
>   sign    verify    sign/s verify/s
> rsa 2048 bits 0.49s 0.05s  20542.5 215027.0
> 
> So far so good. Unfortunately HAProxy 1.8 iwth QAT engine enabled 
> periodically fail with SSL checks of backend servers. The simplest 
> configuration I could get to reproduce it:
> 
> * /etc/haproxy/haproxy.cfg
> 
> global
>     user lbengine
>     group lbengine
>     daemon
>     ssl-mode-async
>     ssl-engine qat
>     ssl-server-verify none
>     stats   socket /run/lb_engine/process-1.sock user lbengine group 
> lbengine mode 660 level admin expose-fd listeners process 1
> 
> defaults
>     mode http
>     timeout check 5s
>     timeout connect 4s
> 
> backend pool_all
>     default-server inter 5s
> 
>     server server1 ip1:443 check ssl
>     server server2 ip2:443 check ssl
>     ...
>     server serverN ipN:443 check ssl
> 
> Without QAT enabled everything works just fine - healthchecks do not flap. 
> With QAT engine enabled random server healtchecks flap: they fail and then 
> shortly after they recover eg.
> 
> 2019-03-06T15:06:22+01:00 localhost hapee-lb[1832]: Server pool_all/server1 
> is DOWN, reason: Layer6 timeout, check duration: 4000ms. 110 active and 0 
> backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
> 2019-03-06T15:06:32+01:00 localhost hapee-lb[1832]: Server pool_all/server1 
> is UP, reason: Layer6 check passed, check duration: 13ms. 117 active and 0 
> backup servers 

Re: HAProxy with OpenSSL 1.1.1 breaks when TLS 1.3 KeyUpdate is used.

2019-01-22 Thread Emeric Brun
Hi Willy,

On 1/21/19 6:38 PM, Dirkjan Bussink wrote:
> Hi Emeric,
> 
>> On 21 Jan 2019, at 08:06, Emeric Brun  wrote:
>>
>> Interesting, it would be good to skip the check using the same method.
>>
>> We must stay careful to not put the OP_NO_RENEG flag on the client part 
>> (when haproxy connects to server), because reneg from server is authorized
>> but i think infocbk is called only on frontend/accept side.
>>
>> so a patch which do:
>>
>> #ifdef  SSL_OP_NO_RENEGOTIATION
>> SSL_set_options(ctx, SSL_OP_NO_RENEGOTIATION);
>> #endif
>>
>> without condition during init
>>
>> and adding #ifndef SSL_OP_NO_RENEGOTIATION arround the CVE check, should fix 
>> the issue mentionned about keyupdate and will fix the CVE using the clean 
>> way if the version
>> of openssl support.
> 
> I have implemented this and attached the patch for it. What do you think of 
> this approach? 
> 
> Cheers,
> 
> Dirkjan Bussink
> 
I think you can merge this.

Thx Dirkjan.

R,
Emeric



Re: HAProxy with OpenSSL 1.1.1 breaks when TLS 1.3 KeyUpdate is used.

2019-01-22 Thread Emeric Brun
Hi Willy,

On 1/21/19 6:38 PM, Dirkjan Bussink wrote:
> Hi Emeric,
> 
>> On 21 Jan 2019, at 08:06, Emeric Brun  wrote:
>>
>> Interesting, it would be good to skip the check using the same method.
>>
>> We must stay careful to not put the OP_NO_RENEG flag on the client part 
>> (when haproxy connects to server), because reneg from server is authorized
>> but i think infocbk is called only on frontend/accept side.
>>
>> so a patch which do:
>>
>> #ifdef  SSL_OP_NO_RENEGOTIATION
>> SSL_set_options(ctx, SSL_OP_NO_RENEGOTIATION);
>> #endif
>>
>> without condition during init
>>
>> and adding #ifndef SSL_OP_NO_RENEGOTIATION arround the CVE check, should fix 
>> the issue mentionned about keyupdate and will fix the CVE using the clean 
>> way if the version
>> of openssl support.
> 
> I have implemented this and attached the patch for it. What do you think of 
> this approach? 
> 
> Cheers,
> 
> Dirkjan Bussink
> 

I think you can merge this

R,
Emeric



Re: HAProxy with OpenSSL 1.1.1 breaks when TLS 1.3 KeyUpdate is used.

2019-01-21 Thread Emeric Brun
On 1/21/19 3:37 PM, Dirkjan Bussink wrote:
> Hi all,
> 
>> On 21 Jan 2019, at 02:01, Emeric Brun  wrote:
>>
>> Is there a way to check this is a keyupdate message which trigger the 
>> callback (and not an other)?
> 
> Sadly there is not. I had taken a look at the OpenSSL code and it triggers 
> the callback without any additional information available at 
> https://github.com/openssl/openssl/blob/OpenSSL_1_1_1a/ssl/statem/statem_srvr.c#L4059.
>  
> 
>> And what happen for natural i/o operation, are they return something 
>> receiving the message or is this completely
>> hide by openssl (i.e. what returns a SSL_read/write for instance when a 
>> keyupdate has just been received)
> 
> This is completely hidden by OpenSSL and as a library user you don’t see this 
> message happened as it’s transparently updated. 
> 
> I think the other option to consider is that since this logic was added, 
> OpenSSL has gotten support for flags to control renegotiation and prevent 
> insecure renegotiation so the question is if we need this check still at all. 
> I think it can be removed (which is also what NGinx did in the commit that 
> Janusz mentioned in another email in this thread. From the commit message at 
> https://trac.nginx.org/nginx/changeset/e3ba4026c02d2c1810fd6f2cecf499fc39dde5ee/nginx/src/event/ngx_event_openssl.c:
> 
> --- 
> 
> Following 7319:dcab86115261, as long as SSL_OP_NO_RENEGOTIATION is
> defined, it is OpenSSL library responsibility to prevent renegotiation,
> so the checks are meaningless.
> 
> Additionally, with TLSv1.3 OpenSSL tends to report SSL_CB_HANDSHAKE_START
> at various unexpected moments - notably, on KeyUpdate? messages and
> when sending tickets. This change prevents unexpected connection
> close on KeyUpdate? messages and when finishing handshake with upcoming
> early data changes.
> 
> --- 

Interesting, it would be good to skip the check using the same method.

We must stay careful to not put the OP_NO_RENEG flag on the client part (when 
haproxy connects to server), because reneg from server is authorized
but i think infocbk is called only on frontend/accept side.

so a patch which do:

#ifdef  SSL_OP_NO_RENEGOTIATION
SSL_set_options(ctx, SSL_OP_NO_RENEGOTIATION);
#endif

without condition during init

and adding #ifndef SSL_OP_NO_RENEGOTIATION arround the CVE check, should fix 
the issue mentionned about keyupdate and will fix the CVE using the clean way 
if the version
of openssl support.

R,
Emeric



Re: HAProxy with OpenSSL 1.1.1 breaks when TLS 1.3 KeyUpdate is used.

2019-01-21 Thread Emeric Brun
Hi Adam,
On 1/20/19 10:12 PM, Adam Langley wrote:
> KeyUpdate messages are a feature of TLS 1.3 that allows the symmetric
> keys of a connection to be periodically rotated. It's
> mandatory-to-implement in TLS 1.3, but not mandatory to use. Google
> Chrome tried enabling KeyUpdate and promptly broke several sites, at
> least some of which are using HAProxy.
> 
> The cause is that HAProxy's code to disable TLS renegotiation[1] is
> triggering for TLS 1.3 post-handshake messages. But renegotiation has
> been removed in TLS 1.3 and post-handshake messages are no longer
> abnormal. Thus I'm attaching a patch to only enforce that check when
> the version of a TLS connection is <= 1.2.
> 
> Since sites that are using HAProxy with OpenSSL 1.1.1 will break when
> Chrome reenables KeyUpdate without this change, I'd like to suggest it
> as a candidate for backporting to stable branches.
> 
> [1] https://github.com/haproxy/haproxy/blob/master/src/ssl_sock.c#L1472
> 
> 
> Thank you
> 
> AGL

Is there a way to check this is a keyupdate message which trigger the callback 
(and not an other)?

And what happen for natural i/o operation, are they return something receiving 
the message or is this completely
hide by openssl (i.e. what returns a SSL_read/write for instance when a 
keyupdate has just been received)

R,
Emeric



Re: [PATCH] ssl certificates load speedup and dedup (pem/ctx)

2019-01-07 Thread Emeric Brun
Hi Manu,

On 1/7/19 5:59 PM, Emmanuel Hocdet wrote:
> It's better with patches…
> 
>> Le 7 janv. 2019 à 17:57, Emmanuel Hocdet > > a écrit :
>>
>> Hi,
>>
>> Following the first patch series (included).
>> The goal is to deduplicate common certificates in memory and in shared pem 
>> files.
>>
>> PATCH 7/8 is only for boringssl (directive to dedup certificate in memory 
>> for ctx)
>> Last patch should be the more interesting:
>> [PATCH 8/8] MINOR: ssl: add "issuer-path" directive.
>>
>> Certificates loaded with "crt" and "crt-list" commonly share the same
>> intermediate certificate in PEM file. "issuer-path" is a global
>> directive to share intermediate certificate in a directory. If
>> certificate chain is not included in certificate PEM file, haproxy
>> will complete chain if issuer match the first certificate of the chain
>> stored via "issuer-path" directive. Such chains will be shared in ssl
>> shared memory.
>> . "issuer-path" directive can be set several times.
>> . only sha1 key identifier is supported (rfc5280 4.2.1.2. (1))
>>
>> If you want to test it, the patch series can be apply to haproxy-dev or 
>> haproxy-1.9.
>>
>> Feedbacks are welcome :)
>>
>> ++
>> Manu
>>
> 
> 

We have to double check this patches proposal because we have a pending feature 
in roadmap which could heavily collide: to load only one time a certificate per 
fs entry.

For us it is a mandatory feature to allow a clean "hot" update of certificates. 
(the key to identify a certificate to update will be the path on the fs, or at 
least, the base path)

Emeric



Re: sample/fetch support for TLS extensions

2018-10-19 Thread Emeric Brun
Hello Alexey,

On 10/18/18 11:17 PM, Lukas Tribus wrote:
> Hello Alexey,
> 
> 
> On Tue, 16 Oct 2018 at 14:18, Alexey Elymanov  wrote:
>>
>> I would like to propose a little patch, based on current ssl_capture 
>> (ssl_sock.c) scheme.
>> Purpose is to be able to sample/fetch TLS extensions, it could be useful for 
>> debugging or fingerprinting purposes (for example, cURL and Firefox provide 
>> different sets of extensions in ClientHello message).
>>
>> it provides two hooks, which should be enough for further Lua 
>> processing/request forwarding/analysis: smp_fetch_ssl_fc_exts_bin, 
>> smp_fetch_ssl_fc_exts_hex
> 
> Looping in Thierry (LUA), Emeric (SSL).
> 
> 
> lukas
> 

I disagree with the idee of having two fetches. All fetches available are using 
only on type as output and we are using converters to change these.

So smp_fetch_ssl_fc_ext should output bin and "smp_fetch_ssl_fc_exts hex" or 
"%[smp_fetch_ssl_fc_exts,hex]" output hexa.


R,
Emeric



Re: TLS 1.3 options available with OpenSSL 1.1.1

2018-09-24 Thread Emeric Brun
Hi Dirkjan,

On 09/24/2018 11:55 AM, Dirkjan Bussink wrote:
> Hi all,
> 
> Given all the critical security issue and that you all were busy with that, I 
> suspect this didn’t get much additional eyes. Now that that fix is out the 
> door, I’m wondering if there’s any feedback or further input for the OpenSSL 
> 1.1.1 patches I wrote? 
> 
> Cheers,
> 
> Dirkjan
> 
>> On 14 Sep 2018, at 14:28, Dirkjan Bussink  wrote:
>>
>> Hi all,
>>
>>> On 14 Sep 2018, at 14:15, Emmanuel Hocdet  wrote:
>>>
>>> It’s not necessary, BoringSSL and LibreSSL have, at best,  
>>> OPENSSL_VERSION_NUMBER  set to 1.1.0 for API compatibilité.
>>
>> Looking at LibreSSL, it’s defining this (in their latest Git code):
>>
>> src/lib/libcrypto/opensslv.h:#define OPENSSL_VERSION_NUMBER  0x2000L
>>
>> I also see this conditional used in other places to explicitly exclude 
>> BoringSSL and LibreSSL, so that’s why I thought it would be needed here as 
>> well. 
>>
>> -- 
>> Cheers,
>>
>> Dirkjan
> 

Seems good for me except for documentation:

Could you precise in the old "ciphers" description that this applies only for 
TLSv <= 1.2. (and add a ref to the new keyword for TLSv1.3)

R,
Emeric



Re: TLS 1.3 options available with OpenSSL 1.1.1

2018-09-14 Thread Emeric Brun
Hi Lukas, Dirkjan,

On 09/13/2018 10:17 PM, Lukas Tribus wrote:
> Hello Dirkjan,
> 
> 
> On Thu, 13 Sep 2018 at 16:44, Dirkjan Bussink  wrote:
>> So with a new API call, does that mean adding for example a `ciphersuites`
>> option that works similar to `ciphers` today that it accepts a string and 
>> then
>> calls `SSL_CTX_set_ciphersuites`?
> 
> Yes, that's what I'd have in mind. Sufficiently #ifdef that everything
> still builds the same and documented of course so that people
> understand what is what. Let's wait for Manu and Emeric's feedback
> though, before investing your time.

I think if TLSv <= 1.2 and TLSv1.3 ciphers are handled separately, this is good 
reason
to add a new keyword to manage both at a same line on an haproxy configuration 
file's line .

I've just realized that it may be the openssl's response to an issue we faced 
on earlier version of 
openssl1.1.1 dev branch where forcing cipher suite on a SSL_CTX broke TLSv1.2 
handshakes if
no TLSv1.3 ciphers were specified in this list.

Doing this, managing differently TLS <= v1.2 and 1.3 ciphers permits the user 
to not face regression issues
upgrading  to v1.1.1 when suites where forced in configuration because 
openssl-1.1.1 kept default
TVSv1.3 ciphers.

So i'm convinced we need to handle this new TLSv1.3 cipher suite with a new 
config keyword, but I
don't know how we should name it. I think it will be a mistake to make appear 
1.3 in the new name because
there is no warranty that next TLS versions will specify specific cipher lists. 
Openssl's
API make the choice of "ciphersuites" ... perhaps a the right choice.

Did any of you check how this is endled on "openssl s_client" command line?

> 
> 
>> The curve functions are synonyms for the equivalently named group functions
>> and are identical in every respect.
> 
> So there is no technical reason to implement the new API call, it's
> just to go with the new naming convention. No need for any action
> then, we don't have to blow up our code with additional #ifdef mess
> for the sake of using better looking names in API calls.

Perfectly agreed.

> 
> Thanks for looking into this!
> 
> cheers,
> lukas
> 
R,
Emeric



Re: BUG/MEDIUM ssl: Fix loading of dhaparams in multicert setups.

2018-09-10 Thread Emeric Brun
Hi Francisco,

On 09/10/2018 02:43 PM, klondike wrote:
> El 10/09/18 a las 11:25, Emeric Brun escribió:
>> Hi Fransisco,
> Hi Emeric!
> 
> First of all thanks for taking the time to review my patch. It is my
> first time contributing (and it was also my first time using) HAProxy so
> It took me quite a few hours to pinpoint and "fix" the issue I was
> experiencing.
>> On 09/09/2018 06:10 PM, Francisco Blas Izquierdo Riera (klondike) wrote:
>>> When using multicertificate bundles (i.e. .rsa, .ecdsa and .dsa files) 
>>> HAProxy
>>> fails to load certificates at random.
>>>
>>> This is caused by an attempt to load the DH parameters from the NULL pointer
>>> instead of the corresponding bundle which leaves an error in the queue.
>>>
>>> This patch makes ssl_sock_load_mutli_cert use instead the correct bundle
>>> identifier which in turn prevents the error (after the BIO tries to
>>> open NULL in read only mode).
>>>
>>> For any legal matters, please consider this contribution on the public 
>>> domain.
>>>
>>> Please backport to 1.8 and 1.7 it will apply correctly at least on 1.8.
>>>
>>> --- src/ssl_sock.c
>>> +++ src/ssl_sock.c
>>> @@ -3131,11 +3131,11 @@ static int ssl_sock_load_multi_cert(const char 
>>> *path, struct bind_conf *bind_con
>>> if (ssl_dh_ptr_index >= 0)
>>> SSL_CTX_set_ex_data(cur_ctx, ssl_dh_ptr_index, 
>>> NULL);
>>>  
>>> -   rv = ssl_sock_load_dh_params(cur_ctx, NULL);
>>> +   rv = ssl_sock_load_dh_params(cur_ctx, cur_file);
>>> if (rv < 0) {
>>> if (err)
>>> memprintf(err, "%sunable to load DH 
>>> parameters from file '%s'.\n",
>>> -   *err ? *err : "", path);
>>> +   *err ? *err : "", 
>>> cur_file);
>>> rv = 1;
>>> goto end;
>>> }
>>>
>> I agree there is a bug but I think we should qualify the wanted behaviour 
>> and confirm doing some check.
>>
>>
>> I think DH param is global per SSL_CTX and not per key type (please confirm, 
>> may I wrong?)
> Yes it is.
>> So what happens in this case:
>>
>> cert.pem.rsa firstly loaded and contain a DH param => 
>> ssl_sock_load_dh_params will load the param in to SSL_CTX.
>> in a second time cert.pem.ecdsa is loaded and does not contain any DH param. 
>> => What happens ? previous param is kept or ssl_sock_load_dh_params resets 
>> the DH param to default one?
> As it is now, SSL_CTX_set_tmp_dh is called only once because the call is
> done outside the loop and the value of cur_file points to the .rsa file.
> More concerning is the case of what happens if there is no .rsa file as
> it will provoke an error and we go back to the problematic behaviour.
> 
> As to what happens when we call SSL_CTX_set_tmp_dh you can see openssl's
> code itself:
> https://github.com/openssl/openssl/blob/HEAD/ssl/s3_lib.c#L3411 it will
> always take the last passed item.
> 
> Going over this code I have also noticed that we may end up with a
> memory leak as we will never free the DH parameters once we are done
> with them.
> 
>> I don't know what would be the wanted behavior but the last described (which 
>> I suppose is performed) doesn't seem to be the right choice.
> I can propose the following optionss:
> 1. Always load the global parameters (this is the previous behaviour). I
> just need to avoid opening the file when the filename is NULL to fix the
> bug.
> 2. Stop at the last hit (i.e. test dsa, ecdsa and rsa in that order, if
> they are supported by the specific context, and choose the last dh
> params struct) if none is found, fall-back to global.
> 3. Use a .dhparams file instead fallback to global if not found.
> 
> Personally I think number 1 is the easiest one to implement followed by
> number 2 which provides the most flexibility but is the hardest one to
> understand.
>> So in my opinion,  the current patch just fix a piece of the problem.
> Indeed, the patch only prevents the spurious errors and not even always.
> 
> PS: I noticed your answer was not sent to the list so I'm uncertain if
> that was intentional or not.

It was a mistake, we may need people opinion.
 

R,
Emeric




Re: BUG: ssl: regression with openssl 1.1.1 when using <= TLSv1.2

2018-09-03 Thread Emeric Brun
Hi Lukas,

On 09/02/2018 03:31 PM, Lukas Tribus wrote:
> Hello,
> 
> 
> On Sat, 1 Sep 2018 at 20:49, Lukas Tribus  wrote:
>>> I've confirmed the change in behavior only happens with an ECC
>>> certificate, an RSA certificate is not affected.
>>
>> Just to confirm that this is still an actual problem with current
>> haproxy and openssl 1.1.1pre9.
>>
>> You just have to use a ECC certificate instead of a RSA certificate,
>> and it will fail with TLSv1.1 when strict-sni is enabled.
> 
> Actually the problem is worse: SNI doesn't work *at all* with ECC
> certificates in TLSv1.1 and TLSv1.0. It simply falls back to a
> matching RSA certificate or the default-certificate. Of course, if
> only the ECC certificate is there, and strict-sni is set, the
> handshake is rejected.

Just to be sure, do you want to mean?:


> only the ECC certificate is there, *OR* strict-sni is set, the
> handshake is rejected.


> Same exact behavior happens with boringssl as well (not only openssl 1.1.1).
> 
> 
> Any help with this would be much appreciated.
> 

We must go deeper, now the bug is more qualified.

R,
Emeric



Re: [Patch] multiple key type bundles are not loaded correctly in certain cases

2018-08-16 Thread Emeric Brun
Hi Willy, Michael,

On 08/02/2018 06:03 PM, Willy Tarreau wrote:
> Hi Michael,
> 
> On Thu, Aug 02, 2018 at 03:48:13PM +0200, Michael Wimmesberger wrote:
>> Hi,
>>
>> while preparing to use multi-keytype bundles for my company's
>> domains, I found the following two issues:
> (...)
> 
> Thanks for reporting these issues. I'm CCing Emeric who's currently
> in vacation and will be back soon.  I prefer that he double-checks
> the implications of these modifications and/or proposes some extra
> solutions. While I'd suspect your first proposed change is right, he
> might have a use case in mind that we don't want to break (and this
> code is quite tricky).
> 
> Thanks!
> Willy
> 

Here two patches which should fix the issues.


Thanks you for the debug scripts Michael and your informations. It was very 
useful.


R,
Emeric
>From b7698752256a405ee32f0ac412eec7a25163c459 Mon Sep 17 00:00:00 2001
From: Emeric Brun 
Date: Thu, 16 Aug 2018 15:14:12 +0200
Subject: [PATCH 2/2] BUG/MEDIUM: ssl: loading dh param from certifile causes
 unpredictable error.

If the dh parameter is not found, the openssl's error global
stack was not correctly cleared causing unpredictable error
during the following parsing (chain cert parsing for instance).

This patch should be backported in 1.8 (and perhaps 1.7)
---
 src/ssl_sock.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/ssl_sock.c b/src/ssl_sock.c
index b9c1285..8d0b674 100644
--- a/src/ssl_sock.c
+++ b/src/ssl_sock.c
@@ -2607,6 +2607,8 @@ end:
 if (in)
 BIO_free(in);
 
+	ERR_clear_error();
+
 	return dh;
 }
 
-- 
2.7.4

>From 246bad0bee1e02088538d492f801c38e5f9ad535 Mon Sep 17 00:00:00 2001
From: Emeric Brun 
Date: Thu, 16 Aug 2018 15:11:12 +0200
Subject: [PATCH 1/2] BUG/MEDIUM: ssl: fix missing error loading a keytype cert
 from a bundle.

If there was an issue loading a keytype's part of a bundle, the bundle
was implicitly ignored without errors.

This patch should be backported in 1.8 (and perhaps 1.7)
---
 src/ssl_sock.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/ssl_sock.c b/src/ssl_sock.c
index 7e8739a..b9c1285 100644
--- a/src/ssl_sock.c
+++ b/src/ssl_sock.c
@@ -3491,7 +3491,7 @@ int ssl_sock_load_cert(char *path, struct bind_conf *bind_conf, char **err)
 		}
 
 		snprintf(fp, sizeof(fp), "%s/%s", path, dp);
-		ssl_sock_load_multi_cert(fp, bind_conf, NULL, NULL, 0, err);
+		cfgerr += ssl_sock_load_multi_cert(fp, bind_conf, NULL, NULL, 0, err);
 
 		/* Successfully processed the bundle */
 		goto ignore_entry;
-- 
2.7.4



Re: Bug when passing variable to mapping function

2018-07-17 Thread Emeric Brun
Hi Jarno, and thanks Lukas

On 07/16/2018 07:27 AM, Lukas Tribus wrote:
> Hello,
> 
> 
> 
> On Fri, 29 Jun 2018 at 07:15, Jarno Huuskonen  wrote:
>>
>> Hi,
>>
>> On Thu, Jun 28, Jarno Huuskonen wrote:
>>> I think this is the commit that breaks map_regm in this case:
>>> b5997f740b21ebb197e10a0f2fe9dc13163e1772 (MAJOR: threads/map: Make
>>> acls/maps thread safe).
>>>
>>> If I revert this commit from pattern.c:pattern_exec_match
>>> then the map_regm \1 backref seems to work.
>>
>> I think I found what's replacing the \000 as first char:
>> in (map.c) sample_conv_map:
>> /* In the regm case, merge the sample with the input. */
>> if ((long)private == PAT_MATCH_REGM) {
>> str = get_trash_chunk();
>> str->len = exp_replace(str->str, str->size, 
>> smp->data.u.str.str,
>>pat->data->u.str.str,
>>(regmatch_t *)smp->ctx.a[0]);
>>
>> Before call to get_trash_chunk() smp->data.u.str.str is for example
>> 'distri.com' and after get_trash_chunk() smp->data.u.str.str
>> is '\000istri.com'.
>>
>> At the moment I don't have time to dig deeper, but hopefully this
>> helps a little bit.
> 
> Thanks for the detailed analysis, relaying to Emeric ...
> 
> 
> 
> Lukas
> 

Could you try the patch in attachment? i hope it will fix the issue

R,
Emeric
>From b6d8df3387a7ff9fe781d0315b0ee1de627679cf Mon Sep 17 00:00:00 2001
From: Emeric Brun 
Date: Tue, 17 Jul 2018 09:47:07 -0400
Subject: [PATCH] BUG/MINOR: map: fix map_regm with backref

Due to a cascade of get_trash_chunk calls the sample is
corrupted when we want to read it.

The fix consist to use a temporary chunk to copy the sample
value and use it.
---
 src/map.c | 19 ++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/src/map.c b/src/map.c
index 0f1b754..3d8ec20 100644
--- a/src/map.c
+++ b/src/map.c
@@ -184,10 +184,27 @@ static int sample_conv_map(const struct arg *arg_p, struct sample *smp, void *pr
 		if (pat->data) {
 			/* In the regm case, merge the sample with the input. */
 			if ((long)private == PAT_MATCH_REGM) {
+struct chunk *tmptrash;
+
+/* Copy the content of the sample because it could
+   be scratched by incoming get_trash_chunk */
+tmptrash = alloc_trash_chunk();
+if (!tmptrash)
+	return 0;
+
+tmptrash->len = smp->data.u.str.len;
+if (tmptrash->len > (tmptrash->size-1))
+	tmptrash->len = tmptrash->size-1;
+
+memcpy(tmptrash->str, smp->data.u.str.str, tmptrash->len);
+tmptrash->str[tmptrash->len] = 0;
+
 str = get_trash_chunk();
-str->len = exp_replace(str->str, str->size, smp->data.u.str.str,
+str->len = exp_replace(str->str, str->size, tmptrash->str,
    pat->data->u.str.str,
    (regmatch_t *)smp->ctx.a[0]);
+
+free_trash_chunk(tmptrash);
 if (str->len == -1)
 	return 0;
 smp->data.u.str = *str;
-- 
2.7.4



Re: Haproxy 1.8 with OpenSSL 1.1.1-pre4 stops working after 1 hour

2018-06-29 Thread Emeric Brun
Hi Lukas,

On 06/27/2018 04:48 AM, Willy Tarreau wrote:
> On Wed, Jun 27, 2018 at 01:44:08AM +0200, Lukas Tribus wrote:
>> Hey guys,
>>
>>
>> FYI after lots of discussions with openssl folks:
>>
>> https://github.com/openssl/openssl/issues/5330
>> https://github.com/openssl/openssl/pull/6388
>> https://github.com/openssl/openssl/pull/6432
>>
>>
>> OpenSSL 1.1.1 will now keep the FD open by default:
>>
>> https://github.com/openssl/openssl/commit/c7504aeb640a88949dfe3146f7e0f275f517464c
> 
> Wow good job Lukas! At least now we know that openssl 1.1.1 is not broken
> anymore! Thanks for taking care of explaining all these valid use cases
> there!
> 
> Willy
> 

I've noticed that. Thank you for your support reporting this issue to openssl's 
team

R,
Emeric



Re: haproxy and solarflare onload

2018-06-12 Thread Emeric Brun
Hi Elias,

On 05/28/2018 04:08 PM, Elias Abacioglu wrote:
> Hi Willy and HAproxy folks!
> 
> Sorry for bumping this old thread. But Solarflare recently released a new 
> Onload version.
> http://www.openonload.org/download/openonload-201805-ReleaseNotes.txt 
> 
> 
> Here is a small excerpt from the Release Notes:
> "
> 
>  A major overhaul to clustering and scalable filters enables new data
>  center use cases.
> 
>  Scalable filters direct all traffic for a given VI to Onload so that
>  the filter table size is not a constraint on the number of concurrent
>  connections supported efficiently. This release allows scalable filters
>  to be combined for both passive and active open connections and with RSS,
>  enabling very high transaction rates for use cases such as a reverse web
>  proxy.
> 
> "
> 
> So this NIC with Onload features requires tuning and perhaps a better 
> understanding of the Linux network stack than what I got to get working in a 
> high volume/frequency setup.
> I am willing to gift one SFN8522(2x10Gbit/s SFP+) to the HAproxy team(within 
> EU there should be no toll) if you want to test the card and it's 
> capabilities.
> I don't have any hard requirements for gifting this card, just that you got 
> the will to give it a fair shot. The only thing I want in return is that you 
> share your insights, good or bad. Perhaps we can get a working Onload profile 
> for HAproxy. They ship a example profile for nginx in onload-201805. I am 
> still very curious if Onload actually can offload the CPU more than regular a 
> NIC.
> 
> I don't have an EnterpriseOnload license. But this card should get a free 
> ScaleoutOnload license(basically it's included in the card, but Dell forgot, 
> so I had to reach out to Solarflare support to get a free License). And 
> ScaleoutOnload is their OpenOnload.
> I could help out with that if needed.
> 
> So HAproxy team, you want this card to play with?
> 
> /Elias

Yes, we are always interested testing hardware with our soft to advise the end 
users if they could have some benefits.
But currently we are very busy and t thing we can't test your NIC before the 
last quarter of year.

R,
Emeric



Re: Haproxy 1.8 with OpenSSL 1.1.1-pre4 stops working after 1 hour

2018-05-24 Thread Emeric Brun
Hi Lukas,

On 05/24/2018 11:27 AM, Lukas Tribus wrote:
> Hi Emeric,
> 
> 
> On 24 May 2018 at 11:19, Emeric Brun <eb...@haproxy.com> wrote:
>> in pre6 there is a news wrapping function on getrandom which have different 
>> fallback way to use the syscall.
>>
>> Perhaps the openssl -r output depends of that (if getrandom was found from 
>> glibc or if a syscall loaded from a different way and considered 
>> os-specific).
> 
> No, openssl version -r output is a verbatim copy of what was passed to
> --with-rand-seed at configure time:
> https://github.com/openssl/openssl/pull/5910#issuecomment-391514494
> 
> 
>> @Lukas Which build-workarround did you use?
> 
> No workaround at all, getrandom works for me out of the box in -pre6
> (with libc2.23) and the output of "openssl version -r" is
> "os-specific", which also is expected behavior as per the github
> discussion above. The raw syscall as implemented in-pre6 works for me.

Ok, i've initialy patched the pre3, and port my patch on pre6 in a different 
way but i didn't check if it work of the box in pre6 :)

In my case i'm using cross-compilation and there is not access to kernel 
includes, only those of the builded libc of the sysroot.

This way i don't now how openssl building will be able to retrieve the 
SYS_getrandom not the syscall def because the are not present in my sysroot.

Anyway, it seems that there was two issues in that thread, do still have one 
Lukas?


R,
Emeric



Re: Haproxy 1.8 with OpenSSL 1.1.1-pre4 stops working after 1 hour

2018-05-24 Thread Emeric Brun
Hi Lukas,

On 05/23/2018 09:48 PM, Lukas Tribus wrote:
> Hello,
> 
> 
> On 23 May 2018 at 18:29, Emeric Brun <eb...@haproxy.com> wrote:
>> This issue was due to openssl-1.1.1 which re-seed after an elapsed time or 
>> number of request.
>>
>> If /dev/urandom is used as seeding source when haproxy is chrooted it fails 
>> to re-open /dev/urandom 
>>
>> By defaut the openssl-1.1.1 configure script uses the syscall getrandom as 
>> seeding source and fallback on /dev/urandom if not available.
>>
>> So you don't face the issue if your openssl-1.1.1 is compiled to use 
>> getrandom
>>
>> But getrandom syscall is available only since kernel > 3.17 and the main 
>> point: for glibc > 2.25.
>>
>> With openssl-1.1.1 you can check this this way:
>> # ./openssl-1.1.1/openssl version -r
>> Seeding source: getrandom-syscall
> 
> I have glibc 2.23 (Ubuntu 16.04) and openssl shows "os-specific", even
> if kernel headers are installed while compiling, yet -pre6 does not
> hang for me in chroot (-pre4 did):
> 
> lukas@dev:~/libsslbuild/bin$ uname -r
> 4.4.0-109-generic
> lukas@dev:~/libsslbuild/bin$ ./openssl version
> OpenSSL 1.1.1-pre6 (beta) 1 May 2018
> lukas@dev:~/libsslbuild/bin$ ./openssl version -r
> Seeding source: os-specific
> lukas@dev:~/libsslbuild/bin$
> 
> 
> But, stracing haproxy shows that the library IS ACTUALLY using
> getrandom(). So the "Seeding source" output of the executable is
> wrong. Gonna dig into this as well, but seeing how my haproxy
> executable uses getrandom() calls, this perfectly explains why I did
> not see this in -pre6 (which has the build-workaround for < libc 2.25,
> while pre4 did not, so it did not use the getrandom() call).

in pre6 there is a news wrapping function on getrandom which have different 
fallback way to use the syscall.

Perhaps the openssl -r output depends of that (if getrandom was found from 
glibc or if a syscall loaded from a different way and considered os-specific).

@Lukas Which build-workarround did you use?

In my case i've patched openssl to define SYS_getrandom (depending of the arch) 
and i set --with-rand-seed=getrandom

You may have a more elegant way to do that without a patch.

/*
 * syscall_random(): Try to get random data using a system call
 * returns the number of bytes returned in buf, or <= 0 on error.
 */
int syscall_random(void *buf, size_t buflen)
{
#  if defined(OPENSSL_HAVE_GETRANDOM)
return (int)getrandom(buf, buflen, 0);
#  endif

#  if defined(__linux) && defined(SYS_getrandom)
return (int)syscall(SYS_getrandom, buf, buflen, 0);
#  endif

#  if defined(__FreeBSD__) && defined(KERN_ARND)
return (int)sysctl_random(buf, buflen);
#  endif

   /* Supported since OpenBSD 5.6 */
#  if defined(__OpenBSD__) && OpenBSD >= 201411
return getentropy(buf, buflen);
#  endif

return -1;
}

My patch:
--- openssl-1.1.1-pre6/crypto/rand/rand_unix.c.ori  2018-05-22 
14:06:03.490771549 +0200
+++ openssl-1.1.1-pre6/crypto/rand/rand_unix.c  2018-05-22 14:14:33.133237079 
+0200
@@ -173,6 +173,26 @@
 #   define OPENSSL_HAVE_GETRANDOM
 #  endif

+# if defined(__linux)
+#   include 
+#if !defined(SYS_getrandom)
+#if !defined(__NR_getrandom)
+#if defined(__powerpc__) || defined(__powerpc64__)
+#define __NR_getrandom 236
+#elif defined(__sparc__) || defined(__sparc64__)
+#define __NR_getrandom 347 
+#elif defined(__x86_64__)
+#define __NR_getrandom 318
+#elif defined (__i386__)
+#define __NR_getrandom 355
+#elif defined (__s390__) || defined(__s390x__)
+#define __NR_getrandom 249
+#endif /* $arch */
+#endif /* __NR_getrandom */
+#   define SYS_getrandom __NR_getrandom
+#endif
+#endif
+
 #  if defined(OPENSSL_HAVE_GETRANDOM)
 #   include 
 #  endif
~


> 
> 
> @Sander it looks like openssl folks won't change their mind about
> this. You have to either upgrade to a kernel more recent than 3.17 so
> that getrandom() can be used, or make /dev/xrandom available within
> your chroot.
> 
> 
> 
> Lukas
> 

Emeric



Re: Haproxy 1.8 with OpenSSL 1.1.1-pre4 stops working after 1 hour

2018-05-23 Thread Emeric Brun
Hi Sander, Lukas,

On 05/23/2018 02:32 PM, Lukas Tribus wrote:
> Hello,
> 
> On 23 May 2018 at 13:10, Sander Hoentjen  wrote:
>> I can confirm the issue is gone when I don't use chroot. I will try to
>> see if I can get more info like a strace soon. I won't be able to today
>> though. Thanks Lucas and Emeric!
> 
> 1.8.9 with 1.1.1-pre6 chrooted is now running for me for more than 22
> hours. I don't see the same issue.
> 
> So are probably we are facing a different issue.
> 
> 
> 
> Lukas
> 

This issue was due to openssl-1.1.1 which re-seed after an elapsed time or 
number of request.

If /dev/urandom is used as seeding source when haproxy is chrooted it fails to 
re-open /dev/urandom 

By defaut the openssl-1.1.1 configure script uses the syscall getrandom as 
seeding source and fallback on /dev/urandom if not available.

So you don't face the issue if your openssl-1.1.1 is compiled to use getrandom

But getrandom syscall is available only since kernel > 3.17 and the main point: 
for glibc > 2.25.

With openssl-1.1.1 you can check this this way:
# ./openssl-1.1.1/openssl version -r
Seeding source: getrandom-syscall


R,
Emeric



Re: Dynamically adding/deleting SSL certificates

2018-05-22 Thread Emeric Brun
Hi Auréline

On 05/18/2018 11:07 AM, Aurélien Nephtali wrote:
> Hello,
> 
> On Wed, Apr 18, 2018 at 9:34 PM, Aurélien Nephtali
>  wrote:
>> Hello,
>>
>> I have some patches to support dynamically loading and unloading PEM
>> certificates through the CLI. It is mainly a big refactoring of some
>> part of the SSL code (thanks Thierry for your patches, we came to the
>> same conclusion :) !).
>>
> 
> Here is an updated version of this feature. The changes are:
> - Use a payload in the CLI to pass the certificate
> - Change the way to specify on which listener the certificate is
> to be added/removed: using the "bind name".
>   If the listeners are not named, the only way to update their
> certificates is to do a global operation (using just the frontend name
> in the command).
> 
> One thing that should be discussed is what will be the command syntax
> when it will support more certificate options (OCSP, SCTL) ?
> I thought about sending something like an .ini file:
> 
> [certificate]
> a===
> 
> [ocsp]
> b===
> 
> etc...
> 
> but one needs to prepare these files: it may not be very handy for a
> one shot operation ?
> Plus, without streaming we're quickly limited by the payload size with
> the default value.
> 

I see that you're using the domain to known the certificate to delete.

If you take a look to crt-list, you will see that the identifier of the 
certificate
is customizable and is not necessarily the domain.

I think to perform the adds/delete operation we should use the same identifiers 
than the crt-list option

R,
Emeric



Re: [RFC PATCH] MINOR: ssl: set SSL_OP_PRIORITIZE_CHACHA

2018-05-22 Thread Emeric Brun
Hi Lukas, Willy,

On 05/18/2018 05:55 PM, Lukas Tribus wrote:
> Sets OpenSSL 1.1.1's SSL_OP_PRIORITIZE_CHACHA unconditionally, as per [1]:
> 
> When SSL_OP_CIPHER_SERVER_PREFERENCE is set, temporarily reprioritize
> ChaCha20-Poly1305 ciphers to the top of the server cipher list if a
> ChaCha20-Poly1305 cipher is at the top of the client cipher list. This
> helps those clients (e.g. mobile) use ChaCha20-Poly1305 if that cipher
> is anywhere in the server cipher list; but still allows other clients to
> use AES and other ciphers. Requires SSL_OP_CIPHER_SERVER_PREFERENCE.
> 
> [1] https://www.openssl.org/docs/man1.1.1/man3/SSL_CTX_clear_options.html
> ---
> 
> RFC because we need to confirm we want to set this option unconditionally.
> 
> We could also add another configuration option, however I don't see the
> real benefit honestly:
> 
> Symmetric crypto performance of ChaCha20-Poly1305 should not be a real
> world concern at all as it performs very good. If it is a concern, it
> should not be in the allowed cipher list in the first place.
> 
> I'm not aware of any clients not supporting a recent AES-GCM (AEAD)
> cipher while they do support ChaCha20-Poly1305, it certainly doesn't
> make much sense.
> 
> That is why I'm proposing to set this option unconditionally.
> 
> ---
>  doc/configuration.txt | 3 +++
>  src/ssl_sock.c| 4 
>  2 files changed, 7 insertions(+)
> 
> diff --git a/doc/configuration.txt b/doc/configuration.txt
> index cbea330..d59a446 100644
> --- a/doc/configuration.txt
> +++ b/doc/configuration.txt
> @@ -10961,6 +10961,9 @@ prefer-client-ciphers
>Use the client's preference when selecting the cipher suite, by default
>the server's preference is enforced. This option is also available on
>global statement "ssl-default-bind-options".
> +  Note that with OpenSSL >= 1.1.1 ChaCha20-Poly1305 is reprioritized anyway
> +  (without setting this option), if a ChaCha20-Poly1305 cipher is at the top 
> of
> +  the client cipher list.
>  
>  process [/]
>This restricts the list of processes and/or threads on which this listener 
> is
> diff --git a/src/ssl_sock.c b/src/ssl_sock.c
> index 7a602ad..5a003dc 100644
> --- a/src/ssl_sock.c
> +++ b/src/ssl_sock.c
> @@ -1957,6 +1957,9 @@ ssl_sock_generate_certificate_from_conn(struct 
> bind_conf *bind_conf, SSL *ssl)
>  #ifndef SSL_MODE_SMALL_BUFFERS  /* needs 
> small_records.patch */
>  #define SSL_MODE_SMALL_BUFFERS 0
>  #endif
> +#ifndef SSL_OP_PRIORITIZE_CHACHA/* needs OpenSSL >= 
> 1.1.1 */
> +#define SSL_OP_PRIORITIZE_CHACHA 0
> +#endif
>  
>  #if (OPENSSL_VERSION_NUMBER < 0x101fL)
>  typedef enum { SET_CLIENT, SET_SERVER } set_context_func;
> @@ -3711,6 +3714,7 @@ ssl_sock_initial_ctx(struct bind_conf *bind_conf)
>   SSL_OP_SINGLE_DH_USE |
>   SSL_OP_SINGLE_ECDH_USE |
>   SSL_OP_NO_SESSION_RESUMPTION_ON_RENEGOTIATION |
> + SSL_OP_PRIORITIZE_CHACHA |
>   SSL_OP_CIPHER_SERVER_PREFERENCE;
>   long mode =
>   SSL_MODE_ENABLE_PARTIAL_WRITE |
> 

I agree, we could merge it as it is.

R,
Emeric



Re: Haproxy 1.8 with OpenSSL 1.1.1-pre4 stops working after 1 hour

2018-05-22 Thread Emeric Brun
Hi Sander,

On 05/22/2018 02:04 PM, Sander Hoentjen wrote:
> On 05/22/2018 12:04 PM, Lukas Tribus wrote:
>> Hello,
>>
>> On 22 May 2018 at 11:48, Sander Hoentjen  wrote:
>>> I did, but I still experience the same issues. What is your exact
>>> haproxy version you tested with? Mine is 1.8.8
>>> Built with OpenSSL version : OpenSSL 1.1.1-pre6 (beta) 1 May 2018
>>> Running on OpenSSL version : OpenSSL 1.1.1-pre6 (beta) 1 May 2018
>> I'm using the development tree. So if it doesn't depend on openssl, it
>> may be bug that has been fixed in haproxy itself.
>>
>> Can you try 1.8.9?
> I just tried HA-Proxy version 1.8.9-83616ec 2018/05/18, it has the same
> issue unfortunately.
> 

Do you use chroot ?

I remember a behavior change between 1.1.0 and 1.1.1. After an elapsed time (or 
a number of new sess) openssl
try to re-open some /dev/Xrandom sources. And it could fail in a chroot.

https://github.com/openssl/openssl/issues/5330

 
Emeric



Re: BUG: ssl: regression with openssl 1.1.1 when using <= TLSv1.2

2018-05-22 Thread Emeric Brun
Hi Lukas,

I've just made some tests using openssl-1.1.1-pre6 and can't reproduce the 
issue.

here my simple configuration:
frontend my
mode http
bind :443 ssl crt default strict-sni
redirect location /

(default certificate CN is aloha)

I've tested with openssl-1.1.1-pre6  as client without issue (same thing with 
1.0.2g)

openssl s_client -connect 10.0.3.168:443 -tls1_1 -servername aloha
HS => OK
openssl s_client -connect 10.0.3.168:443 -tls1 -servername aloha
HS => OK
openssl s_client -connect 10.0.3.168:443 -tls1 -servername foobar
HS => KO

My haproxy sources are latest 1.8 + some backports from dev branch

Do you have any specific parameter related to ssl in your global section?

R,
Emeric




Re: [PATCH 2/2] MINOR: ssl: add fetch 'ssl_fc_session_key' and 'ssl_bc_session_key'

2018-04-30 Thread Emeric Brun
Hi Patrick,

On 04/29/2018 01:15 AM, Patrick Hemmer wrote:
> 
> These fetches return the SSL master key of the front/back connection.
> This is useful to decrypt traffic encrypted with ephemeral ciphers.
> ---
>  doc/configuration.txt | 13 +
>  src/ssl_sock.c| 35 +++
>  2 files changed, 48 insertions(+)
> 
> 

No comment, except that i think the flag SMP_F_CONST is not necessary in this 
case


R,
Emeric



Re: [PATCH 1/2] MINOR: ssl: disable SSL sample fetches when unsupported

2018-04-30 Thread Emeric Brun
Hi Patrick,

On 04/29/2018 01:15 AM, Patrick Hemmer wrote:
> 
> Previously these fetches would return empty results when HAProxy was
> compiled
> without the requisite SSL support. This results in confusion and problem
> reports from people who unexpectedly encounter the behavior.
> ---
>  src/ssl_sock.c | 27 +++
>  1 file changed, 15 insertions(+), 12 deletions(-)
> 
> 
Are you sure this path will not cause regression using boringssl or libressl 
because i see some ifdef just based on openssl version.


R,
Emeric



Re: Fix building haproxy 1.8.5 with LibreSSL 2.6.4

2018-04-18 Thread Emeric Brun
On 04/16/2018 02:30 PM, Dmitry Sivachenko wrote:
> 
>> On 07 Apr 2018, at 17:38, Emmanuel Hocdet  wrote:
>>
>>
>> I Andy
>>
>>> Le 31 mars 2018 à 16:43, Andy Postnikov  a écrit :
>>>
>>> I used to rework previous patch from Alpinelinux to build with latest 
>>> stable libressl
>>> But found no way to run tests with openssl which is primary library as I see
>>> Is it possible to accept the patch upstream or get review on it? 
>>>
>>> 
>>
>>
>> @@ -2208,7 +2223,7 @@
>> #else
>>  cipher = SSL_CIPHER_find(ssl, cipher_suites);
>> #endif
>> -if (cipher && SSL_CIPHER_get_auth_nid(cipher) == 
>> NID_auth_ecdsa) {
>> +if (cipher && SSL_CIPHER_is_ECDSA(cipher)) {
>>  has_ecdsa = 1;
>>  break;
>>  }
>>
>> No, it’s a regression in lib compatibility.
>>
> 
> 
> Hello,
> 
> it would be nice if you come to an acceptable solution and finally merge 
> LibreSSL support.
> There were several attempts to propose LibreSSL support in the past and every 
> time discussion dies with no result.
> 
> Thanks :)
> 
> 
> 

What do you think Manu?

R,
Emeric



Re: Warning: upgrading to openssl master+ enable_tls1_3 (coming v1.1.1) could break handshakes for all protocol versions .

2018-03-28 Thread Emeric Brun
Hi Lukas,

> 
> FYI OpenSSL did a 180 on this, they are implemented a new API call to
> set TLSv1.3 ciphers and enable them by default:
> 
> https://github.com/mattcaswell/openssl/commit/d93e832a82087a5f9bcf7d93ed7ae21bc6c1fed0
> 
> https://www.openssl.org/docs/manmaster/man3/SSL_CTX_set_ciphersuites.html
> 

Seems a good news. Thank you Lukas!
 
> 
> cheers,
> lukas
> 

Emeric



Warning: upgrading to openssl master+ enable_tls1_3 (coming v1.1.1) could break handshakes for all protocol versions .

2018-01-12 Thread Emeric Brun
Hi All,

FYI: upgrading to next openssl-1.1.1 could break your prod if you're using a 
forced cipher list because
handshake will fail regardless the tls protocol version if you don't specify a 
cipher valid for TLSv1.3
in your cipher list.

https://github.com/openssl/openssl/issues/5057

https://github.com/openssl/openssl/issues/5065

Openssl's team doesn't seem to consider this as an issue and I'm just bored to 
discuss with them.

R,
Emeric



Re: haproxy+QAT memory usage very high under busy traffic

2018-01-09 Thread Emeric Brun
Hi Julian,

On 01/09/2018 03:28 PM, Willy Tarreau wrote:
> Hi Julian,
> 
> On Tue, Jan 09, 2018 at 08:50:48AM +, Julian Zhu wrote:
>> We are testing haproxy+QAT card(Intel QuickAssist-Technology) and find that 
>> the memory usage of haproxy+QAT is much higher than that of haproxy alone.
>> Under the same traffic (about 120K connections), haproxy alone only takes 6G 
>> memory while haproxy+QAT takes 36G.
>> The only difference in config is as below for haproxy+QAT:
>> ..
>> ssl-engine  qat
>> ssl-mode-async
>> ..
>> and the memory is not released even if we terminate all haproxy processes 
>> until we reboot the server.
> 
> Then this indicates that the leak (if any but it sounds like this) is in
> the engine. You'd need to unload and reload the modules to see if it's a
> real leak or just some extreme memory usage.
> 
>> OS:
>> Centos-7
>>
>> SW version:
>> haproxy-1.8.3
>> openssl-1.1.0e
>> QAT_Engine: 0.5.20
>> QAT1.7.Upstream.L.1.0.3_42
>>
>> Does anyone encounter the same issue? Is there any way to debug or fix it?
> 
> Unfortunately at this point it will be entirely dependant on QAT, and I
> don't really know what can be done there. Maybe there's a debug mode. You
> should possibly check with other versions of the engine (higher or lower).
> If you're still experiencing the leak with an up to date version, you
> should contact your intel support so that they can take a look at this.
> 
> It's possible that haproxy triggers the leak (ie does something not cool
> triggering an unusual code path leading to a leak in the engine), so they
> may not be aware of it yet.
> 
> Hoping this helps,
> Willy
> 

There is no specific memory allocation in the code of haproxy to handle the 
async engines. 

In addition, we are also using an other dfferent engine than QAT which is using 
a the "async engine mode" and we didn't notice any memory consumption.

So the issue seems related to the openssl ".so" used for the engine. As Willy 
told you, you should try a different QAT version.

R,
Emeric



Re: Traffic delivered to disabled server when cookie persistence is enabled after upgrading to 1.8.1

2017-12-21 Thread Emeric Brun
Hi All,

This bug should be fixed using this patch (patch on dev, abd should be 
backported in 1.8).

R,
Emeric

On 12/21/2017 10:42 AM, Greg Nolle wrote:
> Thanks guys! I should be able to test the new version this weekend if you are 
> able to issue it before then.
> 
> Best regards,
> Greg
> 
> On Thu, Dec 21, 2017 at 12:15 AM, Willy Tarreau <w...@1wt.eu 
> <mailto:w...@1wt.eu>> wrote:
> 
> On Thu, Dec 21, 2017 at 12:04:11AM +0100, Cyril Bonté wrote:
> > Hi Greg,
> >
> > Le 20/12/2017 à 22:42, Greg Nolle a écrit :
> > > Hi Andrew,
> > >
> > > Thanks for the info but I'm afraid I'm not seeing anything here that
> > > would affect the issue I'm seeing, and by the way the docs don't
> > > indicate that the cookie names have to match the server names.
> >
> > First, don't worry about the configuration, there is nothing wrong in 
> it ;-)
> >
> > > That being said, I tried using your settings and am still seeing the
> > > issue (see below for new full config). And like I say, this is only an
> > > issue with v1.8.1, it works as expected in v1.7.9.
> >
> > I won't be able to look further tonight, but at least I could identify 
> when
> > the regression occured : it's caused by the work done to prepare
> > multi-threading, more specifically by this commit :
> > http://git.haproxy.org/?p=haproxy.git;a=commitdiff;h=64cc49cf7 
> <http://git.haproxy.org/?p=haproxy.git;a=commitdiff;h=64cc49cf7>
> >
> > I add Emeric to the thread, maybe he'll be able to provide a fix faster 
> than
> > me (I'll won't be very available for the next days).
> 
> Thus I'll ping Emeric tomorrow as well so that we can issue 1.8.2 soon in
> case someone wants to play with it on friday afternoon jus before xmas :-)
> 
> Willy
> 
> 

>From db483435c294541cbab27babacb9daefc043fd32 Mon Sep 17 00:00:00 2001
From: Emeric Brun <eb...@haproxy.com>
Date: Thu, 21 Dec 2017 14:42:26 +0100
Subject: [PATCH] BUG/MEDIUM: checks: a server passed in maint state was not
 forced down.

Setting a server in maint mode, the required next_state was not set
before calling the 'lb_down' function and so the system state was never
commited.

This patch should be backported in 1.8
---
 src/server.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/server.c b/src/server.c
index 23e4cc9..a37e919 100644
--- a/src/server.c
+++ b/src/server.c
@@ -4630,10 +4630,11 @@ void srv_update_status(struct server *s)
 		else {	/* server was still running */
 			check->health = 0; /* failure */
 			s->last_change = now.tv_sec;
+
+			s->next_state = SRV_ST_STOPPED;
 			if (s->proxy->lbprm.set_server_status_down)
 s->proxy->lbprm.set_server_status_down(s);
 
-			s->next_state = SRV_ST_STOPPED;
 			if (s->onmarkeddown & HANA_ONMARKEDDOWN_SHUTDOWNSESSIONS)
 srv_shutdown_streams(s, SF_ERR_DOWN);
 
-- 
2.7.4



Re: 回复:Haproxy SSl Termination performance issue

2017-12-21 Thread Emeric Brun
Hi Mike,

Thierry is right, 4096 rsa key computation have clearly an heavy CPU cost.
In our internal benchmark we notice:
Using one process of haproxy on one core of i7-4790K CPU @ 4.00GHz we reach 170 
con/s (comparatively 1350 con/s using 2048 rsa key).

Usually this CPU usage isn't so high because software clients use "session ID" 
or "tls tickets" witch avoid to recompute an rsa key for each new connection 
from the same client.

I don't know if what you observe is due to a production traffic or a benchmark 
tool but in any case your software clients
don't seem to reuse sessions.

Anyway, Thierry has well described how to scale your arch using the max of 
available cpu.

R,
Emeric 
 


On 12/19/2017 04:14 PM, hongw...@163.com wrote:
> Hi, Thierry.
> 
> Thanks again.
> 
> One more question about you talking about, can i just think like this way: 
> assume we got a 8core cpu, we use 7 of them for ssl termination and one is 
> for http forward? If it is, is there any document for this soulution?
> 
> Thanks a lot
> 
> Mike
> 
> 
> 
>  原始邮件 
> 主题:Re: Haproxy SSl Termination performance issue
> 发件人:Thierry Fournier
> 收件人:Mike G
> 抄送:Haproxy
> 
> 
> Ok, you’re using HAProxy as SSL offloading. HAProxy is one of the
> right solutions for doing this. You’re performance problem is not
> due to HAProxy, each component using OpenSSL will reach the same
> limits.
> 
> Classic setup is to configure many process for the SSL offloading
> (proxy in TCP mode), and only one for the HTTP. This setup works
> fine because it allow many CPU for the SSL which require computing,
> and the HTTP processing is done by only one process which can
> perform accounting and apply limits (rate limit, connexion limit,
> ...).
> 
> Thierry
> 
> 
> > On 19 Dec 2017, at 12:44, Mike G wrote:
> >
> >
> >
> > Hi, Thierry.
> >
> > our case is like this: we put a haproxy as ssl termination. and haproxy 
> got the https requirement. and then go throught SSL ternimation. and then 
> forward
> > the request to web (by HTTP), also, get the Http request and encrypt 
> it, and return HTTPS to client.
> >
> >
> > thanks
> >
> > Mike
> >
> >
> >
> >
> >
> > At 2017-12-19 19:25:09, "Thierry Fournier" wrote:
> > >Hi,
> > >
> > >What kind of job ?
> > >
> > >Thierry
> > >
> > >> On 19 Dec 2017, at 12:17, hongw...@163.com wrote:
> > >>
> > >> Hi,Thierry
> > >>
> > >> got it. Thanks!
> > >>
> > >> By the way, may I ask the ssl termination is best solution for this 
> kind of job?
> > >>
> > >>
> > >> Many thanks
> > >>
> > >> Mike
> > >>
> > >>
> > >>
> > >>  原始邮件 
> > >> 主题:Re: Haproxy SSl Termination performance issue
> > >> 发件人:Thierry Fournier
> > >> 收件人:Mike G
> > >> 抄送:Haproxy
> > >>
> > >>
> > >> Hi,
> > >>
> > >> I gues that 130 is 130 SSL requests per seconds ?
> > >>
> > >> SSL is a very heavy processing. The 4096 bits certificates consume 
> more
> > >> CPU that 2048 (thanks captain obvious). Your capacity processing is
> > >> capped by your CPU. You must check the CPU of your server during your
> > >> test. If the CPU consummation is 100%, you reach the limit of your 
> server.
> > >>
> > >> If you reach the limit of one CPU (nbproc), you can use more CPU 
> and/or more
> > >> servers.
> > >>
> > >> Thierry
> > >>
> > >>
> > >> > On 19 Dec 2017, at 08:36, Mike G wrote:
> > >> >
> > >> > Hi, everyone.
> > >> >
> > >> > I just got a problem about the haproxy ssl termination performance 
> issues.
> > >> > we have a case which want to use SSL Termination. so, we did some 
> testing before online, I know the virtual machine will not good choice, but 
> it make feel so supriose the cur link can be more than 130 when I use 4096 
> key.
> > >> > here's my configuration about the haproxy:
> > >> >
> > >> > haproxy as SSL termination layer before web server.
> > >> > the haproxy version is 1.8.1
> > >> > I compile it by myself:
> > >> > use the parameter:
> > >> > make TARGET=linux2628 USE_PCRE=1 USE_OPENSSL=1 USE_ZLIB=1
> > >> >
> > >> > also, I use download openssl 1.0.2n from openssl.org, and compile 
> by those parameters:
> > >> > ./config -d zlib
> > >> >
> > >> > after install openssl and haproxy.
> > >> > here's my configuration about the haproxy:
> > >> > global
> > >> > log 127.0.0.1 local0
> > >> >
> > >> > chroot /var/lib/haproxy
> > >> > pidfile /var/run/haproxy.pid
> > >> > maxconn 65535
> > >> > group haproxy
> > >> > user haproxy
> > >> > daemon
> > >> > nbproc 1
> > >> >
> > >> > stats socket /var/lib/haproxy/stats
> > >> > tune.ssl.default-dh-param 2048
> > >> >
> > >> > defaults
>

Re: [PATCH] BUG/MEDIUM: email-alert: don't set server check status from a email-alert task

2017-12-07 Thread Emeric Brun
Hi Pieter,

I'm CCing Christopher, he did some test on your patch.

R,
Emeric

On 12/06/2017 07:06 AM, Willy Tarreau wrote:
> Hi Pieter,
> 
> CCing Emeric since these parts have changed a bit for threads and
> there may be some subtle things we oversee.
> 
> thanks for this!
> Willy
> 
> On Wed, Dec 06, 2017 at 02:11:53AM +0100, PiBa-NL wrote:
>> Hi List, Simon and Baptiste,
>>
>> Sending to both of you guys as its both tcp-check and email related and you
>> are the maintainers of those parts.
>> Patch subject+content basically says it all (i hope.).
>>
>> It is intended to fixes yesterdays report:
>> https://www.mail-archive.com/haproxy@formilux.org/msg28158.html
>>
>> Please let me know if it is OK, or should be done differently.
>>
>> Thanks in advance,
>> PiBa-NL / Pieter
> 
>> From bf80b0398c08f94bebec30feaaddda422cb87ba1 Mon Sep 17 00:00:00 2001
>> From: PiBa-NL 
>> Date: Wed, 6 Dec 2017 01:35:43 +0100
>> Subject: [PATCH] BUG/MEDIUM: email-alert: don't set server check status from 
>> a
>>  email-alert task
>>
>> This avoids possible 100% cpu usage deadlock on a EMAIL_ALERTS_LOCK and 
>> avoids sending lots of emails when 'option log-health-checks' is used.
>> It is avoided to change the server state and possibly queue a new email 
>> while processing the email alert by checking if the check task is being 
>> processed for the process_email_alert struct.
>>
>> This needs to be backported to 1.8.
>> ---
>>  src/checks.c | 4 
>>  1 file changed, 4 insertions(+)
>>
>> diff --git a/src/checks.c b/src/checks.c
>> index eaf84a2..55bfde2 100644
>> --- a/src/checks.c
>> +++ b/src/checks.c
>> @@ -72,6 +72,7 @@ static int tcpcheck_main(struct check *);
>>  
>>  static struct pool_head *pool_head_email_alert   = NULL;
>>  static struct pool_head *pool_head_tcpcheck_rule = NULL;
>> +static struct task *process_email_alert(struct task *t);
>>  
>>  
>>  static const struct check_status check_statuses[HCHK_STATUS_SIZE] = {
>> @@ -198,6 +199,9 @@ const char *get_analyze_status(short analyze_status) {
>>   */
>>  static void set_server_check_status(struct check *check, short status, 
>> const char *desc)
>>  {
>> +if (check->task->process == process_email_alert)
>> +return; // email alerts should not change the status of the 
>> server
>> +
>>  struct server *s = check->server;
>>  short prev_status = check->status;
>>  int report = 0;
>> -- 
>> 2.10.1.windows.1
>>
> 




Re: server DOWN and can not auto restore to MAINT state when use DNS SRV resolvers

2017-12-06 Thread Emeric Brun
Hi All,


On 12/06/2017 04:00 AM, slene wrote:
>>1. running two container and publish port 80 but not really listening
>>to 80.
> I simulate a service fault. (eg: port missed config).
> And in step 2 the health check has layer4 failed: Connection refused.
> 
>> As you writen in 3 there is "DNS NX" (=> no dns entry) for all 3 servers, 
>> what do you expect when the resolver can't find a ip or srv entry?
> 
> See the image in this comment:
> https://gist.github.com/slene/0feaa1e68adfaae1b7868871dd3c0196#gistcomment-2277122
> 
> Now the server enter DOWN and can not return back to MAINT. Although it said 
> enters maintenance but not. I think http1/2 should be enable state like http3 
> wait for SRV records available.
> 
> 
> 
> 
> 
> Aleksandar Lazic  >于2017年12月6日周三 上午7:44写道:
> 
> Hi.
> 
> -- Originalnachricht --
> Von: "slene" >
> An: haproxy@formilux.org 
> Gesendet: 05.12.2017 11:36:27
> Betreff: server DOWN and can not auto restore to MAINT state when use
> DNS SRV resolvers
> 
> >I'm a new fisher of haproxy. I have some question when I use consul and
> >registrator for service discovery. And use haproxy as balancer.
> >
> >My steps:
> >
> >1. running two container and publish port 80 but not really listening
> >to 80.
> What does this mean?
> 
> 
> >2. haproxy found two server failed status: 0/9 DOWN. (not tried times,
> >only once)
> >3. remove two container and SRV records already empty.
> >4. haproxy found two server failed and enters maintenance.
> >5. now two server are full DOWN and can not enter MAINT state. must be
> >restore manual by myself (set state to MAINT -> health force up -> set
> >state to READY)
> >
> >I think the servers should change state to MAINT and ready for DNS SRV
> >records available instead of just DOWN and need restore manual. Am I
> >right?
> >
> >Is it a bug? Or I miss something. Thanks.
> >
> >config and logs:
> >https://gist.github.com/slene/0feaa1e68adfaae1b7868871dd3c0196
> 
> Cool you use the new server-template feature ;-)
> 
> As you writen in 3 there is "DNS NX" (=> no dns entry) for all 3
> servers, what do you expect when the resolver can't find a ip or srv
> entry?
> 
> Regards
> aleks
> 

Latest commit "BUG/MEDIUM: checks: a down server going to maint remains 
definitely stucked on down.."
should fix this issue.

R,
Emeric




Re: Fix building haproxy with recent LibreSSL

2017-10-25 Thread Emeric Brun
On 10/24/2017 05:57 PM, Emmanuel Hocdet wrote:
> 
>> Le 3 août 2017 à 10:07, Willy Tarreau > a 
>> écrit :
>>
>> Hi Bernard,
>>
>> I'm CCing Emeric since this affects SSL. I have some comments below.
>>
>> On Tue, Jul 25, 2017 at 05:03:10PM +0200, Bernard Spil wrote:
>>
>>> --- src/ssl_sock.c.orig2017-06-02 13:59:51 UTC
>>> +++ src/ssl_sock.c
>>> @@ -56,7 +56,7 @@
>>> #include 
>>> #endif
>>>
>>> -#if OPENSSL_VERSION_NUMBER >= 0x101fL
>>> +#if (OPENSSL_VERSION_NUMBER >= 0x101fL) && 
>>> !defined(LIBRESSL_VERSION_NUMBER)
>>> #include 
>>> #endif
>> (...)
>>
>> OK so all the ones related to LibreSSL not supporting async should go
>> into one patch.
> 
> 
> "[PATCH] MINOR: ssl: build with recent BoringSSL library » used  
> OPENSSL_NO_ASYNC
> 
> With this patch, a generic way could be to have (if LibreSSL have no  
> OPENSSL_NO_ASYNC defined):
> 
> #ifdef LIBRESSL_VERSION_NUMBER
> 
> #ifndef OPENSSL_NO_ASYNC
> #define OPENSSL_NO_ASYNC
> #endif
> 
> #endif
> 
> 
> 
I think you wanna mean:

#if (OPENSSL_VERSION_NUMBER < 0x101fL || defined LIBRESSL_VERSION_NUMBER)

#ifndef OPENSSL_NO_ASYNC
#define OPENSSL_NO_ASYNC
#endif

#endif

and review all subtests to check OPENSSL_NO_ASYNC.

I think it is a good idea.

R,
Emeric



Re: Possible bug in task_wakeup() impacts Lua tasks

2017-10-17 Thread Emeric Brun
Hi Adis,

On 10/17/2017 05:48 PM, Emeric Brun wrote:
> Hi Adis,
> 
> On 10/17/2017 05:41 PM, Adis Nezirovic wrote:
>> Hello guys,
>>
>> After this commit:
>>
>>   commit 0194897e540cec67d7d1e9281648b70efe403f08
>>   Author: Emeric Brun <eb...@haproxy.com>
>>   Date:   Thu Mar 30 15:37:25 2017 +0200
>>
>>   MAJOR: task: task scheduler rework.
>>
>> basic Lua tasks don't work anymore.
>> e.g. this only gets called once:
>>
>>   function cron()
>>   while true do
>>   core.Debug("Hello from Cron")
>>   core.sleep(1)
>>   end
>>   end
>>   core.register_task(cron)
>>
>> 
>>
>> The current code in task_wakeup() checks for TASK_RUNNING and decides
>> that it won't call __task_wakeup(), but when Lua task wakes up, it has
>> both, TASK_WOKEN_TIMER and TASK_RUNNING set.
>>
>> My quick fix/workaround was to add an additional check:
>>
>>   if (unlikely(!(t->state & TASK_WOKEN_TIMER) &&
>>(t->state & TASK_RUNNING)))
>>
>> But I might be missing something more fundamental (i.e. this is really
>> necessary for multithreaded stuff), maybe we need additional flags when
>> running task_wakeup from task handlers or threads.
>>
>>
>> Best regards,
>> Adis
>>
> 
> I'm adding the haproxy's LUA engine maintainer in CC, Thierry.
> 
> He should be helpful.
> 
> R,
> Emeric
> 

This patch should fix the issue more consistently.

Could you confirm?

R,
Emeric
>From 03e97e72ce2736db7ec04ee76583e480fc85be75 Mon Sep 17 00:00:00 2001
From: Emeric Brun <eb...@haproxy.com>
Date: Tue, 17 Oct 2017 18:58:40 +0200
Subject: [PATCH] BUG/MAJOR: lua: scheduled task is freezing.

Since commit 'MAJOR: task: task scheduler rework'
0194897e540cec67d7d1e9281648b70efe403f08. LUA's
scheduling tasks are freezing.

A running task should not handle the scheduling itself
but let the task scheduler to handle it based on the
'expire' field.
---
 src/hlua.c | 12 
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/src/hlua.c b/src/hlua.c
index c68495b..7eddda8 100644
--- a/src/hlua.c
+++ b/src/hlua.c
@@ -5360,12 +5360,6 @@ static struct task *hlua_process_task(struct task *task)
 	struct hlua *hlua = task->context;
 	enum hlua_exec status;
 
-	/* We need to remove the task from the wait queue before executing
-	 * the Lua code because we don't know if it needs to wait for
-	 * another timer or not in the case of E_AGAIN.
-	 */
-	task_delete(task);
-
 	/* If it is the first call to the task, we must initialize the
 	 * execution timeouts.
 	 */
@@ -5385,7 +5379,7 @@ static struct task *hlua_process_task(struct task *task)
 
 	case HLUA_E_AGAIN: /* co process or timeout wake me later. */
 		if (hlua->wake_time != TICK_ETERNITY)
-			task_schedule(task, hlua->wake_time);
+			task->expire = hlua->wake_time;
 		break;
 
 	/* finished with error. */
@@ -5394,6 +5388,7 @@ static struct task *hlua_process_task(struct task *task)
 		hlua_ctx_destroy(hlua);
 		task_delete(task);
 		task_free(task);
+		task = NULL;
 		break;
 
 	case HLUA_E_ERR:
@@ -5402,9 +5397,10 @@ static struct task *hlua_process_task(struct task *task)
 		hlua_ctx_destroy(hlua);
 		task_delete(task);
 		task_free(task);
+		task = NULL;
 		break;
 	}
-	return NULL;
+	return task;
 }
 
 /* This function is an LUA binding that register LUA function to be
-- 
2.7.4



Re: Possible bug in task_wakeup() impacts Lua tasks

2017-10-17 Thread Emeric Brun
Hi Adis,

On 10/17/2017 05:41 PM, Adis Nezirovic wrote:
> Hello guys,
> 
> After this commit:
> 
>   commit 0194897e540cec67d7d1e9281648b70efe403f08
>   Author: Emeric Brun <eb...@haproxy.com>
>   Date:   Thu Mar 30 15:37:25 2017 +0200
> 
>   MAJOR: task: task scheduler rework.
> 
> basic Lua tasks don't work anymore.
> e.g. this only gets called once:
> 
>   function cron()
>   while true do
>   core.Debug("Hello from Cron")
>   core.sleep(1)
>   end
>   end
>   core.register_task(cron)
> 
> 
> 
> The current code in task_wakeup() checks for TASK_RUNNING and decides
> that it won't call __task_wakeup(), but when Lua task wakes up, it has
> both, TASK_WOKEN_TIMER and TASK_RUNNING set.
> 
> My quick fix/workaround was to add an additional check:
> 
>   if (unlikely(!(t->state & TASK_WOKEN_TIMER) &&
>(t->state & TASK_RUNNING)))
> 
> But I might be missing something more fundamental (i.e. this is really
> necessary for multithreaded stuff), maybe we need additional flags when
> running task_wakeup from task handlers or threads.
> 
> 
> Best regards,
> Adis
> 

I'm adding the haproxy's LUA engine maintainer in CC, Thierry.

He should be helpful.

R,
Emeric



Re: [PATCH] MINOR: ssl: rework smp_fetch_ssl_fc_cl_str without internal ssl use

2017-09-04 Thread Emeric Brun
Hi Thierry,

On 09/01/2017 06:07 PM, Emmanuel Hocdet wrote:
> Hi Thierry,
> 
> This patch is related to « Capturing browser TLS cipher suites » thread.
> I think it will be match the initial need but without internal ssl structure 
> usage and.
> work with openssl 1.0.2 to 1.1.1 and boringssl.
> 
> ++
> Manu
> 
> 
> 

Could you confirm?

R,
Emeric



Re: [PATCH] MINOR: ssl: remove duplicate ssl_methods in struct bind_conf

2017-09-04 Thread Emeric Brun
On 08/09/2017 07:07 PM, Emmanuel Hocdet wrote:
> Hi Willy,
> 
> Patch is not related to openssl version x. It’s a internal structure cleanup.
> I don’t label it as CLEANUP because it remove a potential source of errors 
> (this is debatable).
> If you can consider it.
> 
> Thanks.
> Manu
> 
> 
> 

It shoud be ok.

Willy, you've my go!

R,
Emeric



Re: [PATCH] MINOR: ssl: remove duplicate ssl_methods in struct bind_conf

2017-09-01 Thread Emeric Brun
Hi Manu,

On 09/01/2017 05:56 PM, Emmanuel Hocdet wrote:
> 
> Hi Willy, Emeric
> 
> Can you consider it?
> 
> ++
> Manu
> 
>> Le 9 août 2017 à 19:07, Emmanuel Hocdet  a écrit :
>>
>> Hi Willy,
>>
>> Patch is not related to openssl version x. It’s a internal structure cleanup.
>> I don’t label it as CLEANUP because it remove a potential source of errors 
>> (this is debatable).
>> If you can consider it.
>>
>> Thanks.
>> Manu
>>
>>
>> <0001-MINOR-ssl-remove-duplicate-ssl_methods-in-struct-bin.patch>
>>
> 

I will take a look on monday


R,
Emeric



Re: Feature request: disable CA/distinguished names.

2017-07-27 Thread Emeric Brun
Hi Manu,


Could you add a block '{ }' or move the comment on the comment on following 
lines:

+   if (!((ssl_conf && ssl_conf->no_ca_names) || 
bind_conf->ssl_conf.no_ca_names))
+   /* set CA names fo client cert request, 
function returns void */
+   SSL_CTX_set_client_CA_list(ctx, 
SSL_load_client_CA_file(ca_file));

Is it quite confusing, and we want to avoid further mistakes.


A second point, i don't know which is the current policy about the keyword 
prefix "no-" in configuration statements, but
we usually take care using this word.

Willy, would you clarify that point?

R,
Emeric

On 07/10/2017 05:45 PM, Emmanuel Hocdet wrote:
> 
> Hi Bas,
> 
>> Le 10 juil. 2017 à 17:05, Wolvers, Bas  a écrit :
>>
>> Hi Emmanuel,
>>
>> I finally found time to test your patch.
>>
>> It works, but you can't seem to turn it off.
>> no-ca-names seems to be active regardless of the option in the config file.
>>
> 
> oops i fail the double negation.
> fix patch include.
> 
>> I think I'll find time tomorrow to find out if it’s the global option or 
>> not, but my time is a bit limited unfortunately.
>>
>> Best regards,
>>
>> Bas
> 
> Thanks for testing!
> 
> Manu
> 
> 
> 
> 
> 
> 




Re: [PATCH] BUG/MINOR: ssl: Be sure that SSLv3 connection methods exist for openssl < 1.1.0

2017-07-12 Thread Emeric Brun
Hi Manu,

On 07/12/2017 03:23 PM, Willy Tarreau wrote:
> Hi Manu!
> 
> Please don't forget to CC Emeric and keep in mind that I still don't
> understand anything about openssl, so for me it's always a huge pain
> each time to try to have an opinion on openssl related changes.
> 
> On Wed, Jul 12, 2017 at 02:54:16PM +0200, Emmanuel Hocdet wrote:
>>
>> Hi Willy,
>>
>> I would like you consider this patches because Christopher's patch is false 
>> and
>> doesn't support other ssl libs and openssl >= 1.1.0.
> 
> OK so I guess we need to take it. Are you confident that it doesn't break
> older versions ? I'm asking because since we started to add support for
> openssl derivatives, we've probably had as many patches to fix build with
> them as patches needed to fix the build with openssl due to these patches,
> to the point that sometimes I'm wondering why we still make so many efforts
> supporting these libs given the amount of incompatibilities they cause :-(
> 
>> I sent my original patch with more comments and another with a little 
>> cleanup:

Same worries, the openssl 0.9.8 is still maintained in redhat 5 so we should be 
able to compile with this version.


> This one will definitely break :
> 
> Subject: [PATCH 2/2] MINOR: ssl: remove an unecessary SSL_OP_NO_* dependancy
> 
> Use methodVersions table to display "OpenSSL library supports".
> (...)
> - memprintf(, "%s\nOpenSSL library supports : "
> -#if SSL_OP_NO_SSLv3
> -   "SSLv3 "
> -#endif
> -#if SSL_OP_NO_TLSv1
> -   "TLSv1.0 "
> -#endif
> -#if SSL_OP_NO_TLSv1_1
> -   "TLSv1.1 "
> -#endif
> -#if SSL_OP_NO_TLSv1_2
> -   "TLSv1.2 "
> -#endif
> -#if SSL_OP_NO_TLSv1_3
> -   "TLSv1.3"
> -#endif
> -"", ptr);
> + memprintf(, "%s\nOpenSSL library supports :", ptr);
> + for (i = CONF_TLSV_MIN; i <= CONF_TLSV_MAX; i++)
> + if (methodVersions[i].option)
> + memprintf(, "%s %s", ptr, methodVersions[i].name);
> 
> $ grep -rF methodVersions openssl-1.0.2k/
> $ echo $?
> 1
> 
> Thanks,
> Willy
> 

R,
Emeric



Re: ssl: crashing since 8d85aa (BUG/MAJOR: map: fix segfault ...)

2017-07-05 Thread Emeric Brun
On 07/05/2017 12:25 AM, Lukas Tribus wrote:
> 
> Am 04.07.2017 um 23:18 schrieb Willy Tarreau:
>> On Tue, Jul 04, 2017 at 10:57:08PM +0200, Lukas Tribus wrote:
>>> The call trace doesn't really look different when I used -dM or 
>>> -DDEBUG_MEMORY.
>>>
>>> I was able to get a different trace by actually connecting to a backend 
>>> however,
>>> (instead of showing an haproxy internal 403 error):
>> (...)
>>
>> Thank you Lukas, let's hope this will help.
>>
>>
> 
> Another bisect (this time with -dM or -DDEBUG_MEMORY), another commit...
> Now it points to 23e9e931 (MINOR: log: Add logurilen tunable).
> 
> 

Hi Lukas,

Indeed this commit introduced a regression.

The commit in attachment should fix the issue.

R,
Emeric
>From 595396561c380aa100e2c1f80299e5fadd18e663 Mon Sep 17 00:00:00 2001
From: Emeric Brun <eb...@haproxy.com>
Date: Wed, 5 Jul 2017 13:33:16 +0200
Subject: [PATCH] BUG/MAJOR: http: fix buffer overflow on loguri buffer.

The pool used to log the uri was created with a size of 0 because the
configuration and 'tune.http.logurilen' were parsed too earlier.

The fix consist to postpone the pool_create as it is done for
cookie captures.

Regression introduced with 'MINOR: log: Add logurilen tunable'
---
 src/cfgparse.c   | 2 ++
 src/proto_http.c | 1 -
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/cfgparse.c b/src/cfgparse.c
index 3706bca..600f273 100644
--- a/src/cfgparse.c
+++ b/src/cfgparse.c
@@ -7404,6 +7404,8 @@ int check_config_validity()
 	if (!global.tune.requri_len)
 		global.tune.requri_len = REQURI_LEN;
 
+	pool2_requri = create_pool("requri", global.tune.requri_len , MEM_F_SHARED);
+
 	pool2_capture = create_pool("capture", global.tune.cookie_len, MEM_F_SHARED);
 
 	/* allocate pool of resolution per resolvers */
diff --git a/src/proto_http.c b/src/proto_http.c
index 46cb6ff..7141833 100644
--- a/src/proto_http.c
+++ b/src/proto_http.c
@@ -459,7 +459,6 @@ void init_proto_http()
 
 	/* memory allocations */
 	pool2_http_txn = create_pool("http_txn", sizeof(struct http_txn), MEM_F_SHARED);
-	pool2_requri = create_pool("requri", global.tune.requri_len , MEM_F_SHARED);
 	pool2_uniqueid = create_pool("uniqueid", UNIQUEID_LEN, MEM_F_SHARED);
 }
 
-- 
2.7.4



Re: OpenSSL engine and async support

2017-06-06 Thread Emeric Brun
Hi Grant, Willy,
On 05/27/2017 09:03 PM, Grant Zhang wrote:
> 
>> On May 26, 2017, at 22:21, Willy Tarreau  wrote:
>>
>> Hi Emeric, Grant,
>>
>> patch set now merged! Thank you both for this great work!
>>
>> Willy
> 
> Bravo! Really appreciate your and Emeric's help in this effort.
> 
> Grant
> 

Here 3 new fixes:

I noticed a segfault sometime at connection closure (first patch)

I noticed buffer overflows using the cipher offloading in async:

The moving or reuse of buffer addresses passed to SSL_read/write in haproxy
is not compliant with the ASYNC API. I had a discussion about that on the
openssl-dev mailing list with Matt Caswell.

So the second patch disables the async mode for the symmetric stuff.

The last one to not call directly the conn_fd_handler from the async_fd_handler.

R,
Emeric


>From 535ef040f7c6ee31f8c8943d1db5236f66cb6e43 Mon Sep 17 00:00:00 2001
From: root 
Date: Fri, 2 Jun 2017 15:54:06 +
Subject: [PATCH 3/3] BUG/MINOR: ssl: do not call directly the conn_fd_handler
 from async_fd_handler

This patch modifies the way to re-enable the connection from the async fd
handler calling conn_update_sock_polling instead of the conn_fd_handler.

It also ensures that the polling is really stopped on the async fd.
---
 src/ssl_sock.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/src/ssl_sock.c b/src/ssl_sock.c
index a009803..04c4cbb 100644
--- a/src/ssl_sock.c
+++ b/src/ssl_sock.c
@@ -363,17 +363,19 @@ fail_get:
 static void ssl_async_fd_handler(int fd)
 {
 	struct connection *conn = fdtab[fd].owner;
-	int conn_fd = conn->t.sock.fd;
 
 	/* fd is an async enfine fd, we must stop
 	 * to poll this fd until it is requested
 	 */
+fd_stop_recv(fd);
 fd_cant_recv(fd);
 
 	/* crypto engine is available, let's notify the associated
 	 * connection that it can pursue its processing.
 	 */
-	conn_fd_handler(conn_fd);
+	__conn_sock_want_recv(conn);
+	__conn_sock_want_send(conn);
+	conn_update_sock_polling(conn);
 }
 
 /*
-- 
2.1.4

>From 79b70ede0baed17e52c17ae2f6c93437fa68b824 Mon Sep 17 00:00:00 2001
From: root 
Date: Tue, 6 Jun 2017 12:35:14 +
Subject: [PATCH 2/3] BUG/MAJOR: ssl: buffer overflow using offloaded ciphering
 on async engine

The Openssl's ASYNC API does'nt support moving buffers on SSL_read/write
This patch disables the ASYNC mode dynamically when the handshake
is left and re-enables it on reneg.
---
 doc/configuration.txt |  6 +-
 src/ssl_sock.c| 39 ---
 2 files changed, 33 insertions(+), 12 deletions(-)

diff --git a/doc/configuration.txt b/doc/configuration.txt
index 75f9961..836c8b9 100644
--- a/doc/configuration.txt
+++ b/doc/configuration.txt
@@ -1285,7 +1285,11 @@ ssl-engine  [algo ]
 ssl-mode-async
   Adds SSL_MODE_ASYNC mode to the SSL context. This enables asynchronous TLS
   I/O operations if asynchronous capable SSL engines are used. The current
-  implementation supports a maximum of 32 engines.
+  implementation supports a maximum of 32 engines. The Openssl ASYNC API
+  doesn't support moving read/write buffers and is not compliant with
+  haproxy's buffer management. So the asynchronous mode is disabled on
+  read/write  operations (it is only enabled during initial and reneg
+  handshakes).
 
 tune.buffers.limit 
   Sets a hard limit on the number of buffers which may be allocated per process.
diff --git a/src/ssl_sock.c b/src/ssl_sock.c
index a04deb6..a009803 100644
--- a/src/ssl_sock.c
+++ b/src/ssl_sock.c
@@ -4632,6 +4632,15 @@ int ssl_sock_handshake(struct connection *conn, unsigned int flag)
 	}
 
 reneg_ok:
+
+#if OPENSSL_VERSION_NUMBER >= 0x101fL
+	/* ASYNC engine API doesn't support moving read/write
+	 * buffers. So we disable ASYNC mode right after
+	 * the handshake to avoid buffer oveflows.
+	 */
+	if (global_ssl.async)
+		SSL_clear_mode(conn->xprt_ctx, SSL_MODE_ASYNC);
+#endif
 	/* Handshake succeeded */
 	if (!SSL_session_reused(conn->xprt_ctx)) {
 		if (objt_server(conn->target)) {
@@ -4750,6 +4759,11 @@ static int ssl_sock_to_buf(struct connection *conn, struct buffer *buf, int coun
 /* handshake is running, and it needs to enable write */
 conn->flags |= CO_FL_SSL_WAIT_HS;
 __conn_sock_want_send(conn);
+#if OPENSSL_VERSION_NUMBER >= 0x101fL
+/* Async mode can be re-enabled, because we're leaving data state.*/
+if (global_ssl.async)
+	SSL_set_mode(conn->xprt_ctx, SSL_MODE_ASYNC);
+#endif
 break;
 			}
 			else if (ret == SSL_ERROR_WANT_READ) {
@@ -4757,18 +4771,17 @@ static int ssl_sock_to_buf(struct connection *conn, struct buffer *buf, int coun
 	/* handshake is running, and it may need to re-enable read */
 	conn->flags |= CO_FL_SSL_WAIT_HS;
 	__conn_sock_want_recv(conn);
+#if OPENSSL_VERSION_NUMBER >= 0x101fL
+	/* Async mode can be re-enabled, because we're leaving data state.*/
+	if (global_ssl.async)
+		SSL_set_mode(conn->xprt_ctx, SSL_MODE_ASYNC);

Re: OpenSSL engine and async support

2017-05-22 Thread Emeric Brun
Hi Willy,

On 05/17/2017 10:10 PM, Willy Tarreau wrote:
> Hi Emeric,
> 
> On Wed, May 17, 2017 at 09:49:32PM +0200, Emeric Brun wrote:
>> More fixes, it appears stable now, even if session are closed during 
>> handshake.
>>
>> I also added the support of multiple async engines (latest patch: it is 
>> limited to 32 engines)
>>
>> I did some tests using my home-made engine (algo EC,DH) + QuickAssist (algo
>> RSA) and I checked it handles 2 async fds on the same session.
>>
>> Thanks for your job!
>>
>> Willy: could you merge them?
> 
> Good job! I'll integrate this ASAP (once I lift my nose out of the H2 spec).
> Just one thing, I'm seeing that your bug fixes are against Grant's initial
> patch so I'll merge them together trying to keep most of your comment, in
> order to avoid purposely integrating known bugs ; this would needlessly
> confuse any future backport and bisects.
> 
> Thanks!
> Willy
> 

Sure!

R,
Emeric



Re: [PATCH] MINOR: boringssl: basic support for OCSP Stapling

2017-05-22 Thread Emeric Brun
Hi Manu,

On 03/29/2017 04:46 PM, Emmanuel Hocdet wrote:
> 
> Use boringssl SSL_CTX_set_ocsp_response to set OCSP response from file with
> '.ocsp' extension. CLI update is not supported.
> 

Could you add this detail in the doc?

R,
Emeric




Re: OpenSSL engine and async support

2017-05-17 Thread Emeric Brun

Hi Grant,

On 05/16/2017 12:05 PM, Emeric Brun wrote:
> Hi Grant,
> 
> On 05/15/2017 08:11 PM, Grant Zhang wrote:
>>
>>> On May 15, 2017, at 03:14, Emeric Brun <eb...@haproxy.com> wrote:
>>>
>>> What does it look like?
>> New patches attached.
>>
>>>
>>> The issue is very similar: 
>>> https://mta.openssl.org/pipermail/openssl-dev/2016-March/005909.html
>> Interesting. yeah, it looks similar.
>>
>> Regards,
>>
>> Grant
>>
> 
> I've re-based your patches on latest master and also adds two fixes (in 
> attachment).
> 
> All my tests are good for now, so i think Willy could merge them.
> 
> R,
> Emeric
> 
> 

More fixes, it appears stable now, even if session are closed during handshake.

I also added the support of multiple async engines (latest patch: it is limited 
to 32 engines)

I did some tests using my home-made engine (algo EC,DH) + QuickAssist (algo 
RSA) and I checked it handles 2 async fds on the same session.

Thanks for your job!

Willy: could you merge them?

R,
Emeric
>From 3d5a8bbc96e1e6d6f275c29f416cac9e356389fa Mon Sep 17 00:00:00 2001
From: Grant Zhang <gzh...@fastly.com>
Date: Sat, 21 Jan 2017 01:10:18 +
Subject: [PATCH 1/7] MEDIUM: ssl: add basic support for OpenSSL crypto engine

This patch adds the global 'ssl-engine' keyword. First arg is an engine
identifier followed by a list of default_algorithms the engine will
operate.

If the openssl version is too old, an error is reported when the option
is used.
---
 doc/configuration.txt|  16 ++
 include/proto/ssl_sock.h |   2 +
 src/haproxy.c|   4 ++
 src/ssl_sock.c   | 147 ---
 4 files changed, 148 insertions(+), 21 deletions(-)

diff --git a/doc/configuration.txt b/doc/configuration.txt
index edd9e79..ffe9f76 100644
--- a/doc/configuration.txt
+++ b/doc/configuration.txt
@@ -591,6 +591,7 @@ The following keywords are supported in the "global" section :
- spread-checks
- server-state-base
- server-state-file
+   - ssl-engine
- tune.buffers.limit
- tune.buffers.reserve
- tune.bufsize
@@ -1265,6 +1266,21 @@ spread-checks <0..50, in percent>
   and +/- 50%. A value between 2 and 5 seems to show good results. The
   default value remains at 0.
 
+ssl-engine  [algo ]
+  Sets the OpenSSL engine to . List of valid values for  may be
+  obtained using the command "openssl engine".  This statement may be used
+  multiple times, it will simply enable multiple crypto engines. Referencing an
+  unsupported engine will prevent haproxy from starting. Note that many engines
+  will lead to lower HTTPS performance than pure software with recent
+  processors. The optional command "algo" sets the default algorithms an ENGINE
+  will supply using the OPENSSL function ENGINE_set_default_string(). A value
+  of "ALL" uses the engine for all cryptographic operations.  If no list of
+  algo is specified then the value of "ALL" is used.  A comma-seperated list
+  of different algorithms may be specified, including: RSA, DSA, DH, EC, RAND,
+  CIPHERS, DIGESTS, PKEY, PKEY_CRYPTO, PKEY_ASN1. This is the same format that
+  openssl configuration file uses:
+  https://www.openssl.org/docs/man1.0.2/apps/config.html
+
 tune.buffers.limit 
   Sets a hard limit on the number of buffers which may be allocated per process.
   The default value is zero which means unlimited. The minimum non-zero value
diff --git a/include/proto/ssl_sock.h b/include/proto/ssl_sock.h
index 6f779fa..8779770 100644
--- a/include/proto/ssl_sock.h
+++ b/include/proto/ssl_sock.h
@@ -68,7 +68,9 @@ struct tls_keys_ref *tlskeys_ref_lookupid(int unique_id);
 #endif
 #ifndef OPENSSL_NO_DH
 int ssl_sock_load_global_dh_param_from_file(const char *filename);
+void ssl_free_dh(void);
 #endif
+void ssl_free_engines(void);
 
 SSL_CTX *ssl_sock_create_cert(struct connection *conn, const char *servername, unsigned int key);
 SSL_CTX *ssl_sock_get_generated_cert(unsigned int key, struct bind_conf *bind_conf);
diff --git a/src/haproxy.c b/src/haproxy.c
index 261b213..be1e11d 100644
--- a/src/haproxy.c
+++ b/src/haproxy.c
@@ -109,6 +109,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /* list of config files */
 static struct list cfg_cfgfiles = LIST_HEAD_INIT(cfg_cfgfiles);
@@ -2162,6 +2163,9 @@ int main(int argc, char **argv)
 for (proc = 0; proc < global.nbproc; proc++)
 	while (waitpid(-1, NULL, 0) == -1 && errno == EINTR);
 			}
+#ifndef OPENSSL_NO_DH
+			ssl_free_dh();
+#endif
 			exit(0); /* parent must leave */
 		}
 
diff --git a/src/ssl_sock.c b/src/ssl_sock.c
index 21d50c7..bad9b99 100644
--- a/src/ssl_sock.c
+++ b/src/ssl_sock.c
@@ -52,6 +52,7 @@
 #ifndef OPENSSL_NO_DH
 #include 
 #endif
+#include 
 
 #include 
 #include 
@@ -207,6 +208,13 @@ static in

Re: OpenSSL engine and async support

2017-05-16 Thread Emeric Brun
Hi Grant,

On 05/15/2017 08:11 PM, Grant Zhang wrote:
> 
>> On May 15, 2017, at 03:14, Emeric Brun <eb...@haproxy.com> wrote:
>>
>> What does it look like?
> New patches attached.
> 
>>
>> The issue is very similar: 
>> https://mta.openssl.org/pipermail/openssl-dev/2016-March/005909.html
> Interesting. yeah, it looks similar.
> 
> Regards,
> 
> Grant
> 

I've re-based your patches on latest master and also adds two fixes (in 
attachment).

All my tests are good for now, so i think Willy could merge them.

R,
Emeric


>From 3d5a8bbc96e1e6d6f275c29f416cac9e356389fa Mon Sep 17 00:00:00 2001
From: Grant Zhang <gzh...@fastly.com>
Date: Sat, 21 Jan 2017 01:10:18 +
Subject: [PATCH 1/4] MEDIUM: ssl: add basic support for OpenSSL crypto engine

This patch adds the global 'ssl-engine' keyword. First arg is an engine
identifier followed by a list of default_algorithms the engine will
operate.

If the openssl version is too old, an error is reported when the option
is used.
---
 doc/configuration.txt|  16 ++
 include/proto/ssl_sock.h |   2 +
 src/haproxy.c|   4 ++
 src/ssl_sock.c   | 147 ---
 4 files changed, 148 insertions(+), 21 deletions(-)

diff --git a/doc/configuration.txt b/doc/configuration.txt
index edd9e79..ffe9f76 100644
--- a/doc/configuration.txt
+++ b/doc/configuration.txt
@@ -591,6 +591,7 @@ The following keywords are supported in the "global" section :
- spread-checks
- server-state-base
- server-state-file
+   - ssl-engine
- tune.buffers.limit
- tune.buffers.reserve
- tune.bufsize
@@ -1265,6 +1266,21 @@ spread-checks <0..50, in percent>
   and +/- 50%. A value between 2 and 5 seems to show good results. The
   default value remains at 0.
 
+ssl-engine  [algo ]
+  Sets the OpenSSL engine to . List of valid values for  may be
+  obtained using the command "openssl engine".  This statement may be used
+  multiple times, it will simply enable multiple crypto engines. Referencing an
+  unsupported engine will prevent haproxy from starting. Note that many engines
+  will lead to lower HTTPS performance than pure software with recent
+  processors. The optional command "algo" sets the default algorithms an ENGINE
+  will supply using the OPENSSL function ENGINE_set_default_string(). A value
+  of "ALL" uses the engine for all cryptographic operations.  If no list of
+  algo is specified then the value of "ALL" is used.  A comma-seperated list
+  of different algorithms may be specified, including: RSA, DSA, DH, EC, RAND,
+  CIPHERS, DIGESTS, PKEY, PKEY_CRYPTO, PKEY_ASN1. This is the same format that
+  openssl configuration file uses:
+  https://www.openssl.org/docs/man1.0.2/apps/config.html
+
 tune.buffers.limit 
   Sets a hard limit on the number of buffers which may be allocated per process.
   The default value is zero which means unlimited. The minimum non-zero value
diff --git a/include/proto/ssl_sock.h b/include/proto/ssl_sock.h
index 6f779fa..8779770 100644
--- a/include/proto/ssl_sock.h
+++ b/include/proto/ssl_sock.h
@@ -68,7 +68,9 @@ struct tls_keys_ref *tlskeys_ref_lookupid(int unique_id);
 #endif
 #ifndef OPENSSL_NO_DH
 int ssl_sock_load_global_dh_param_from_file(const char *filename);
+void ssl_free_dh(void);
 #endif
+void ssl_free_engines(void);
 
 SSL_CTX *ssl_sock_create_cert(struct connection *conn, const char *servername, unsigned int key);
 SSL_CTX *ssl_sock_get_generated_cert(unsigned int key, struct bind_conf *bind_conf);
diff --git a/src/haproxy.c b/src/haproxy.c
index 261b213..be1e11d 100644
--- a/src/haproxy.c
+++ b/src/haproxy.c
@@ -109,6 +109,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /* list of config files */
 static struct list cfg_cfgfiles = LIST_HEAD_INIT(cfg_cfgfiles);
@@ -2162,6 +2163,9 @@ int main(int argc, char **argv)
 for (proc = 0; proc < global.nbproc; proc++)
 	while (waitpid(-1, NULL, 0) == -1 && errno == EINTR);
 			}
+#ifndef OPENSSL_NO_DH
+			ssl_free_dh();
+#endif
 			exit(0); /* parent must leave */
 		}
 
diff --git a/src/ssl_sock.c b/src/ssl_sock.c
index 21d50c7..bad9b99 100644
--- a/src/ssl_sock.c
+++ b/src/ssl_sock.c
@@ -52,6 +52,7 @@
 #ifndef OPENSSL_NO_DH
 #include 
 #endif
+#include 
 
 #include 
 #include 
@@ -207,6 +208,13 @@ static int ssl_capture_ptr_index = -1;
 struct list tlskeys_reference = LIST_HEAD_INIT(tlskeys_reference);
 #endif
 
+static unsigned int openssl_engines_initialized;
+struct list openssl_engines = LIST_HEAD_INIT(openssl_engines);
+struct ssl_engine_list {
+	struct list list;
+	ENGINE *e;
+};
+
 #ifndef OPENSSL_NO_DH
 static int ssl_dh_ptr_index = -1;
 static DH *global_dh = NULL;
@@ -302,6 +310,47 @@ struct ocsp_cbk_arg {
 	};
 };
 
+static int ssl_init_single_engine(const char *engine_id, const char *def_algorithms)
+{
+	int err_code = ERR_ABORT;
+	ENGINE *engine;
+	struct

Re: OpenSSL engine and async support

2017-05-15 Thread Emeric Brun
Hi Grant,

On 05/15/2017 12:14 PM, Emeric Brun wrote:
> On 05/13/2017 01:14 AM, Grant Zhang wrote:
>>
>>> On May 10, 2017, at 04:51, Emeric Brun <eb...@haproxy.com> wrote:
>>>
>>>> It looks like the main process stalls at DH_free(local_dh_1024) (part of 
>>>> __ssl_sock_deinit). Not sure why but I will debug and report back.
>>>>
>>>> Thanks,
>>>
>>> I experienced the same issue (stalled on a futex) if i run haproxy in 
>>> foreground and trying to kill it with kill -USR1.
>>>
>>> With this conf (dh param and ssl-async are disabled)
>>> global
>>> #   tune.ssl.default-dh-param 2048
>>>ssl-engine qat
>>> #   ssl-async
>>>nbproc 1
>>
>> It looks like that the stall on futex issue is related to DH_free() calling 
>> ENGINE_finish in openssl 1.1:
>> https://github.com/openssl/openssl/blob/master/crypto/dh/dh_lib.c#L109
>> (gdb) bt
>> #0  __lll_lock_wait () at 
>> ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:132
>> #1  0x7fa1582c5571 in pthread_rwlock_wrlock ()
>> at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_rwlock_wrlock.S:118
>> #2  0x7fa158a58559 in CRYPTO_THREAD_write_lock () from 
>> /tmp/openssl_1.1.0_install/lib/libcrypto.so.1.1
>> #3  0x7fa1589d8800 in ENGINE_finish () from 
>> /tmp/openssl_1.1.0_install/lib/libcrypto.so.1.1
>> #4  0x7fa158975e76 in DH_free () from 
>> /tmp/openssl_1.1.0_install/lib/libcrypto.so.1.1
>> #5  0x00417c78 in free_dh () at src/ssl_sock.c:7905
>> #6  0x7fa1591d58ce in _dl_fini () at dl-fini.c:254
>> #7  0x7fa158512511 in __run_exit_handlers (status=0, 
>> listp=0x7fa15888e688, run_list_atexit=true)
>> at exit.c:78
>> #8  0x7fa158512595 in __GI_exit (status=) at exit.c:100
>> #9  0x00408814 in main (argc=4, argv=0x7ffe72188548) at 
>> src/haproxy.c:2235
>>
>> Openssl 1.1 has changed the way ENGINE_cleanup works:
>> https://www.openssl.org/docs/man1.1.0/crypto/ENGINE_cleanup.html
>> "From OpenSSL 1.1.0 it is no longer necessary to explicitly call 
>> ENGINE_cleanup and this function is deprecated. Cleanup automatically takes 
>> place at program exit."
>>
>> I suppose by the time the destructor __ssl_sock_deinit is called, 
>> engine-related cleanup are already done by openssl and ENGINE_finish (from 
>> DH_free) stalls on a non-existing write lock.
>>
>> I have a workaround which moves the DH_free logic out of the destructor 
>> __ssl_sock_deinit, and right before process exit.  With the workaround I no 
>> longer see the stall issue. I am not sure whether it is optimal solution 
>> though. Let me know.
> 
> What does it look like?
> 
> The issue is very similar: 
> https://mta.openssl.org/pipermail/openssl-dev/2016-March/005909.html
> 
> 
>>
>> Thanks,
>>
>> Grant
> R,
> Emeric
> 
> 


Reading the changelog openssl-1.1.0e changelog:

  *) Make various cleanup routines no-ops and mark them as deprecated. Most
 global cleanup functions are no longer required because they are handled
 via auto-deinit (see OPENSSL_init_crypto and OPENSSL_init_ssl man pages).
 Explicitly de-initing can cause problems (e.g. where a library that uses
 OpenSSL de-inits, but an application is still using it). The affected
 functions are CONF_modules_free(), ENGINE_cleanup(), OBJ_cleanup(),
 EVP_cleanup(), BIO_sock_cleanup(), CRYPTO_cleanup_all_ex_data(),
 RAND_cleanup(), SSL_COMP_free_compression_methods(), ERR_free_strings() and
 COMP_zlib_cleanup().
 [Matt Caswell

and reading the OPENSSL_init_ssl/OPENSSL_init_crypto man pages.


We may remove all openssl uninit calls from the ssl_sock.c destructor for 
openssl v >= 1.1.

R,
Emeric
  



Re: OpenSSL engine and async support

2017-05-15 Thread Emeric Brun
On 05/13/2017 01:14 AM, Grant Zhang wrote:
> 
>> On May 10, 2017, at 04:51, Emeric Brun <eb...@haproxy.com> wrote:
>>
>>> It looks like the main process stalls at DH_free(local_dh_1024) (part of 
>>> __ssl_sock_deinit). Not sure why but I will debug and report back.
>>>
>>> Thanks,
>>
>> I experienced the same issue (stalled on a futex) if i run haproxy in 
>> foreground and trying to kill it with kill -USR1.
>>
>> With this conf (dh param and ssl-async are disabled)
>> global
>> #   tune.ssl.default-dh-param 2048
>>ssl-engine qat
>> #   ssl-async
>>nbproc 1
> 
> It looks like that the stall on futex issue is related to DH_free() calling 
> ENGINE_finish in openssl 1.1:
> https://github.com/openssl/openssl/blob/master/crypto/dh/dh_lib.c#L109
> (gdb) bt
> #0  __lll_lock_wait () at 
> ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:132
> #1  0x7fa1582c5571 in pthread_rwlock_wrlock ()
> at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_rwlock_wrlock.S:118
> #2  0x7fa158a58559 in CRYPTO_THREAD_write_lock () from 
> /tmp/openssl_1.1.0_install/lib/libcrypto.so.1.1
> #3  0x7fa1589d8800 in ENGINE_finish () from 
> /tmp/openssl_1.1.0_install/lib/libcrypto.so.1.1
> #4  0x7fa158975e76 in DH_free () from 
> /tmp/openssl_1.1.0_install/lib/libcrypto.so.1.1
> #5  0x00417c78 in free_dh () at src/ssl_sock.c:7905
> #6  0x7fa1591d58ce in _dl_fini () at dl-fini.c:254
> #7  0x7fa158512511 in __run_exit_handlers (status=0, 
> listp=0x7fa15888e688, run_list_atexit=true)
> at exit.c:78
> #8  0x7fa158512595 in __GI_exit (status=) at exit.c:100
> #9  0x00408814 in main (argc=4, argv=0x7ffe72188548) at 
> src/haproxy.c:2235
> 
> Openssl 1.1 has changed the way ENGINE_cleanup works:
> https://www.openssl.org/docs/man1.1.0/crypto/ENGINE_cleanup.html
> "From OpenSSL 1.1.0 it is no longer necessary to explicitly call 
> ENGINE_cleanup and this function is deprecated. Cleanup automatically takes 
> place at program exit."
> 
> I suppose by the time the destructor __ssl_sock_deinit is called, 
> engine-related cleanup are already done by openssl and ENGINE_finish (from 
> DH_free) stalls on a non-existing write lock.
> 
> I have a workaround which moves the DH_free logic out of the destructor 
> __ssl_sock_deinit, and right before process exit.  With the workaround I no 
> longer see the stall issue. I am not sure whether it is optimal solution 
> though. Let me know.

What does it look like?

The issue is very similar: 
https://mta.openssl.org/pipermail/openssl-dev/2016-March/005909.html


> 
> Thanks,
> 
> Grant
R,
Emeric




Re: OpenSSL engine and async support

2017-05-10 Thread Emeric Brun
Hi Grant,

On 05/09/2017 10:38 PM, Grant Zhang wrote:
> 
>> On May 9, 2017, at 02:38, Emeric Brun <eb...@haproxy.com> wrote:
>>
>> Hi Grant,
>>
>> On 05/06/2017 12:41 AM, Grant Zhang wrote:
>>> Hi Emeric,
>>>
>>> Thanks for your review! Please see the updated patches and let me know if 
>>> your comments have been properly addressed.
>>>
>>> Thanks,
>>>
>>> Grant
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>> On May 2, 2017, at 04:49, Emeric Brun <eb...@haproxy.com> wrote:
>>>>
>>>> Hi Grant,
>>>>
>>>>
>>>> An other issue:
>>>>
>>>> static void ssl_sock_close(struct connection *conn) {
>>>>
>>>>   if (conn->xprt_ctx) {
>>>>   if (global_ssl.async) {
>>>>   /* the async fd is created and owned by the SSL 
>>>> engine, which is
>>>>* responsible for fd closure. Here we are done with 
>>>> the async fd
>>>>* thus disable the polling on it, as well as clean 
>>>> up fdtab entry.
>>>>*/
>>>>   fd_stop_both(conn->async_fd);
>>>>   fdtab[conn->async_fd].async = 0;
>>>>   fdtab[conn->async_fd].state = 0;
>>>>   }
>>>>
>>>>
>>>> If yout configure ssl-async without an engine or filtering on a unused 
>>>> algo. This code is
>>>> called with an uninitialized conn->async_fd, resulting some of the time 
>>>> with a segfault.
>>>>
>>>> R,
>>>> Emeric
>>>
>>
>> The issue is still there:
>> +#if OPENSSL_VERSION_NUMBER >= 0x101fL
>> +if (openssl_engines_initialized && global_ssl.async) {
>> +/* the async fd is created and owned by the SSL engine, 
>> which is
>> + * responsible for fd closure. Here we are done with 
>> the async fd
>> + * thus disable the polling on it, as well as clean up 
>> fdtab entry.
>> + */
>> +fd_stop_both(conn->async_fd);
>> +fdtab[conn->async_fd].async = 0;
>> +fdtab[conn->async_fd].state = 0;
>> +}
>> +#endif
>>
>> If no WANT_ASYNC is returned, the 'conn->async_fd' could remain 
>> uninitialized (for instance a conn witch doesn't use algo provided by the 
>> engine). A fix would be to initialize global_ssl.async to '-1' and to test 
>> it.
> Now I see the issue, thanks. Wrt the fix, I might be missing something--I 
> don't know how initialize global_ssl.async to -1 would fix the issue. Or do 
> you mean initialize conn->async_fd to '-1'? 
I mean  conn->async_fd to '-1'

>> I still have the daemonize issue: the main process stalled on a futex if dh 
>> param 2048 is set:
>>
>> global
>>tune.ssl.default-dh-param 2048
>>
> Hmm, I probably mixed "-d" with "-D" and thus couldn't repro the issue 
> before. It looks like the main process stalls at DH_free(local_dh_1024) (part 
> of __ssl_sock_deinit). Not sure why but I will debug and report back.
> 
> Thanks,

I experienced the same issue (stalled on a futex) if i run haproxy in 
foreground and trying to kill it with kill -USR1.

With this conf (dh param and ssl-async are disabled)
global
#   tune.ssl.default-dh-param 2048
ssl-engine qat
#   ssl-async
nbproc 1


> 
> Grant
> 

R,
Emeric



Re: OpenSSL engine and async support

2017-05-09 Thread Emeric Brun
Hi Grant,

On 05/06/2017 12:41 AM, Grant Zhang wrote:
> Hi Emeric,
> 
> Thanks for your review! Please see the updated patches and let me know if 
> your comments have been properly addressed.
> 
> Thanks,
> 
> Grant
> 
> 
> 
> 
> 
> 
> 
>> On May 2, 2017, at 04:49, Emeric Brun <eb...@haproxy.com> wrote:
>>
>> Hi Grant,
>>
>>
>> An other issue:
>>
>> static void ssl_sock_close(struct connection *conn) {
>>
>>if (conn->xprt_ctx) {
>>if (global_ssl.async) {
>>/* the async fd is created and owned by the SSL 
>> engine, which is
>> * responsible for fd closure. Here we are done with 
>> the async fd
>> * thus disable the polling on it, as well as clean 
>> up fdtab entry.
>> */
>>fd_stop_both(conn->async_fd);
>>fdtab[conn->async_fd].async = 0;
>>fdtab[conn->async_fd].state = 0;
>>}
>>
>>
>> If yout configure ssl-async without an engine or filtering on a unused algo. 
>> This code is
>> called with an uninitialized conn->async_fd, resulting some of the time with 
>> a segfault.
>>
>> R,
>> Emeric
> 

The issue is still there:
+#if OPENSSL_VERSION_NUMBER >= 0x101fL
+   if (openssl_engines_initialized && global_ssl.async) {
+   /* the async fd is created and owned by the SSL engine, 
which is
+* responsible for fd closure. Here we are done with 
the async fd
+* thus disable the polling on it, as well as clean up 
fdtab entry.
+*/
+   fd_stop_both(conn->async_fd);
+   fdtab[conn->async_fd].async = 0;
+   fdtab[conn->async_fd].state = 0;
+   }
+#endif

If no WANT_ASYNC is returned, the 'conn->async_fd' could remain uninitialized 
(for instance a conn witch doesn't use algo provided by the engine). A fix 
would be to initialize global_ssl.async to '-1' and to test it.


I still have the daemonize issue: the main process stalled on a futex if dh 
param 2048 is set:

global
tune.ssl.default-dh-param 2048


R,
Emeric






Re: [Patches] TLS methods configuration reworked

2017-05-09 Thread Emeric Brun
Hi,

On 05/05/2017 06:12 PM, Emmanuel Hocdet wrote:
> 
>> Le 5 mai 2017 à 17:21, Emmanuel Hocdet  a écrit :
>>
>> Hi Emeric,
>>
>>> Le 28 avr. 2017 à 17:57, Emmanuel Hocdet  a écrit :
>>>
>>> Hi Emeric, Willy
>>>
>>> Up the thread with a compatible configuration view.
>>>
>>> 1) force-xx force-tlsv12 no-tlsv12
>>> old: do a force-tlsv12  (no-xx ignored without warning)
>>> new:  warning "all SSL/TLS versions are disabled »
>>>
>>> It’s not a good configuration, but… It can be changed with:
>>> . no-xx  ignored when force-xx, min-ssl-ver or max-ssl-ver is used  (impact 
>>> 4 and 5)
>> for compat and to simplify configuration no-xx : ignored with warning
>>
>>> . generate an error
>>> . keep warning, but it can depend on 2)
>>>
>>> 2) force-tlsv12   with openssl without v1.2 
>>> old:   error "option not implemented »
>>> new:  warning "all SSL/TLS versions are disabled »
>>> => generate an error?
>> generate an error 
>>
>>>
>>> 3)  no-tlsv10
>>> old: hole without warning
>>> new: warning ‘hole'
>>> => i prefer keep warning and not generate error, openssl will deal with that
>>>
>> no change
>>
>>> 4) min-ssl-ver TLSv1.0 no-tlsv11
>>> new:  warning ‘hole'
>>> . no hole if no-tlsxx  ignored
>>>
>> Ignored with warning.
>>
>>> 5) max-ssl-ver TLSv1.2  no-sslv3
>>>  ok but sslv3 will be activate if no-xx are ignored (1) (need at least 
>>> warning)
>>>
>>
>> Ignored with warning.
>> (I will suggest to disable sslv3 per default for bind. Can be ‘force’ with 
>> ssl-min-ver SSLv3.)
>>
>>
>> I add a patch (7) for that. All patch rebase from current master in the mail.
>>
> 
> fix a bug in patch 7, resend all:
> 
> 
> 
> 

It seems to do what we want, so we can merge it.

R,
Emeric




Re: OpenSSL engine and async support

2017-05-02 Thread Emeric Brun
Hi Grant,


An other issue:

static void ssl_sock_close(struct connection *conn) {

if (conn->xprt_ctx) {
if (global_ssl.async) {
/* the async fd is created and owned by the SSL engine, 
which is
 * responsible for fd closure. Here we are done with 
the async fd
 * thus disable the polling on it, as well as clean up 
fdtab entry.
 */
fd_stop_both(conn->async_fd);
fdtab[conn->async_fd].async = 0;
fdtab[conn->async_fd].state = 0;
}


If yout configure ssl-async without an engine or filtering on a unused algo. 
This code is
called with an uninitialized conn->async_fd, resulting some of the time with a 
segfault.

R,
Emeric



Re: [Patches] TLS methods configuration reworked

2017-04-28 Thread Emeric Brun
Hi Manu,

>>
> 
> yes, i delayed this change (lack of time).
> last patch with  'ssl-min-ver' and 'ssl-max-ver' with argument SSLv3, 
> TLSv1.0, TLSv1.1, TLSv1.2 or TLSv1.3
> 
> Manu
> 
> 

Could you please rebase your patch set on the master and split them by 
features. The latest feature seems
to match what we expect but you're sending incremental patches depending of the 
talks and this is now completely
confusing.

 
To Willy: in attachement a small patch just to avoid warnings with openssl >= 
1.1, the current behavior is kept regardless the talk with Manu.

R,
Emeric



>From 83b1ff6ef56a0c2fb502552bb1525de7b843d0d6 Mon Sep 17 00:00:00 2001
From: Emeric Brun <eb...@haproxy.com>
Date: Fri, 28 Apr 2017 16:19:51 +0200
Subject: [PATCH] BUG/MINOR: ssl: fix warnings about methods for opensslv1.1.

This patch replaces the calls to TLSvX_X_client/server/_method
by the new TLS_client/server_method and it uses the new functions
SSL_set_min_proto_version and SSL_set_max_proto_version, setting them
at the wanted protocol version using 'force-' statements.
---
 src/ssl_sock.c | 46 ++
 1 file changed, 46 insertions(+)

diff --git a/src/ssl_sock.c b/src/ssl_sock.c
index 4c1be5a..48ad1b2 100644
--- a/src/ssl_sock.c
+++ b/src/ssl_sock.c
@@ -3188,6 +3188,28 @@ ssl_sock_initial_ctx(struct bind_conf *bind_conf)
 		SSL_MODE_SMALL_BUFFERS;
 	int conf_ssl_options = bind_conf->ssl_options;
 
+#if (OPENSSL_VERSION_NUMBER >= 0x101fL || defined OPENSSL_IS_BORINGSSL)
+	if (!ctx && conf_ssl_options & BC_SSL_O_USE_TLSV12) {
+		ctx = SSL_CTX_new(TLS_server_method());
+		SSL_CTX_set_min_proto_version(ctx, TLS1_2_VERSION);
+		SSL_CTX_set_max_proto_version(ctx, TLS1_2_VERSION);
+	}
+	if (!ctx && conf_ssl_options & BC_SSL_O_USE_TLSV11) {
+		ctx = SSL_CTX_new(TLS_server_method());
+		SSL_CTX_set_min_proto_version(ctx, TLS1_1_VERSION);
+		SSL_CTX_set_max_proto_version(ctx, TLS1_1_VERSION);
+	}
+	if (!ctx && conf_ssl_options & BC_SSL_O_USE_TLSV10) {
+		ctx = SSL_CTX_new(TLS_server_method());
+		SSL_CTX_set_min_proto_version(ctx, TLS1_VERSION);
+		SSL_CTX_set_max_proto_version(ctx, TLS1_VERSION);
+	}
+	if (!ctx && conf_ssl_options & BC_SSL_O_USE_SSLV3) {
+		ctx = SSL_CTX_new(TLS_server_method());
+		SSL_CTX_set_min_proto_version(ctx, SSL3_VERSION);
+		SSL_CTX_set_max_proto_version(ctx, SSL3_VERSION);
+	}
+#else
 #if SSL_OP_NO_TLSv1_2
 	if (!ctx && conf_ssl_options & BC_SSL_O_USE_TLSV12)
 		ctx = SSL_CTX_new(TLSv1_2_server_method());
@@ -3202,6 +3224,7 @@ ssl_sock_initial_ctx(struct bind_conf *bind_conf)
 	if (!ctx && conf_ssl_options & BC_SSL_O_USE_SSLV3)
 		ctx = SSL_CTX_new(SSLv3_server_method());
 #endif
+#endif
 	if (!ctx) {
 		ctx = SSL_CTX_new(SSLv23_server_method());
 		if (conf_ssl_options & BC_SSL_O_NO_SSLV3)
@@ -3588,6 +3611,28 @@ int ssl_sock_prepare_srv_ctx(struct server *srv)
 	if (srv->check.use_ssl)
 		srv->check.xprt = _sock;
 
+#if (OPENSSL_VERSION_NUMBER >= 0x101fL || defined OPENSSL_IS_BORINGSSL)
+	if (!ctx && srv->ssl_ctx.options & SRV_SSL_O_USE_TLSV12) {
+		ctx = SSL_CTX_new(TLS_client_method());
+		SSL_CTX_set_min_proto_version(ctx, TLS1_2_VERSION);
+		SSL_CTX_set_max_proto_version(ctx, TLS1_2_VERSION);
+	}
+	if (!ctx && srv->ssl_ctx.options & SRV_SSL_O_USE_TLSV11) {
+		ctx = SSL_CTX_new(TLS_client_method());
+		SSL_CTX_set_min_proto_version(ctx, TLS1_1_VERSION);
+		SSL_CTX_set_max_proto_version(ctx, TLS1_1_VERSION);
+	}
+	if (!ctx && srv->ssl_ctx.options & SRV_SSL_O_USE_TLSV10) {
+		ctx = SSL_CTX_new(TLS_client_method());
+		SSL_CTX_set_min_proto_version(ctx, TLS1_VERSION);
+		SSL_CTX_set_max_proto_version(ctx, TLS1_VERSION);
+	}
+	if (!ctx && srv->ssl_ctx.options & SRV_SSL_O_USE_SSLV3) {
+		ctx = SSL_CTX_new(TLS_client_method());
+		SSL_CTX_set_min_proto_version(ctx, SSL3_VERSION);
+		SSL_CTX_set_max_proto_version(ctx, SSL3_VERSION);
+	}
+#else
 #if SSL_OP_NO_TLSv1_2
 	if (!ctx && srv->ssl_ctx.options & SRV_SSL_O_USE_TLSV12)
 		ctx = SSL_CTX_new(TLSv1_2_client_method());
@@ -3602,6 +3647,7 @@ int ssl_sock_prepare_srv_ctx(struct server *srv)
 	if (!ctx && srv->ssl_ctx.options & SRV_SSL_O_USE_SSLV3)
 		ctx = SSL_CTX_new(SSLv3_client_method());
 #endif
+#endif
 	if (!ctx) {
 		ctx = SSL_CTX_new(SSLv23_client_method());
 		if (srv->ssl_ctx.options & SRV_SSL_O_NO_SSLV3)
-- 
1.8.3.1



Re: OpenSSL engine and async support

2017-04-28 Thread Emeric Brun
Hi Grant,

>>>
>>
>> I've made a POC of a soft async engine. Based on dasync engine it launchs a 
>> thread on priv_rsa_enc to spread the load on multiple cores.
>>
>> Regarding openssl s_server it is efficient and scale very well depending the 
>> number of core (1700 rsa2048/s on one core, 7400 on 4 cores)
>>
>> But using haproxy i'm still facing the same issue. There is a poor scale as 
>> for qat. If i check a top, i see that haproxy uses 100% of one core
>> and some of the time 80% of an other but for a very short period.
> Does this occur with single haproxy process(nproc=1) and you see haproxy 
> process bounce between cores? Or you see haproxy occupying one core and 
> occasionally use 80% of another core?
> 
>>
>> In my opinion, there is very few parallelized jobs. I hope a clean rebase 
>> will fix the issue.
> Sorry for the delay. Attached is the rebased patches, on top of the latest 
> git head:
> http://git.haproxy.org/?p=haproxy.git;a=commit;h=013a84fe939cf393fbcf8deb9b4504941d382777
> 
> Wrt scaling, my use case is mainly for offloading the cpu load during 
> handshake, and as such spreading over multiple cores is not desirable. I've 
> tried increasing nproc for scale out purpose. nproc(6 to 8) gives me the max 
> connection rate until it hits the hw limit of the qat card. The connection 
> rate increase (close to) linearly with the number of processes.
> 
>>
>> On haproxy conn/s side:
>> native ssl: 1200 conn/s
> Which cipher do you use for testing? For me I use ECDHE-RSA-AES128-GCM-SHA256 
> and I see about 500 conn/s. For some other cipher I see a bit higher 
> number(e.g AES128-SHA256 550 conn/s), but haven't seen 1200conn/s. Not sure 
> whether cpu/hw makes such a big difference.
> 
>> qat+async: 1700 conn/s
> I see about 2000 conn/s per core, so not much different than your number.
> 
> Thanks,
> 
> Grant
> 
> 
> 

I'm using a Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz.

I've do some tests on the rebased version.

Finally I reach interesting results disabling the perfect forward secrecy and 
i'm now close to the announced performance for the hardware (4100 conn/s for 
announced 5.5Krsa/s on  8920)

I've also good performances using my patched software engines and it scales 
well.

I did'nt notice any unexplained CPU usage or crashes on this version. So we are 
close to merge your patches.

I've few comments about your patches:

- For the first patch (engine).

I think it would be better to load engines directly during the parsing, it 
would detect engine misconfiguration on the right line.

In other word, a call to 'ssl_init_single_engine' directly from 
'ssl_parse_global_ssl_engine'. This way you wouldn't have to handle 
a list of engines to free. In addition, postponing the init in 'crt/crt-list', 
you missed to init engines in the case of ssl is used only on the server side.

- For the async patch ()

Haproxy does not compile on version < 1.1 due to these lines:

OSSL_ASYNC_FD async_fd; in types/connection.h (and all lines referencing 
async_fd in ssl_sock.c if you fix only the header.

and

#include  in src/ssl_sock.c


It would be more interesting also to move your test following the 
SSL_read/write/handshaje this way:

Instead of:

#if OPENSSL_VERSION_NUMBER >= 0x101fL
if (ret == SSL_ERROR_WANT_ASYNC) {
...
}
#endif

if (ret == SSL_ERROR_WANT_WRITE) {
...
}
else if (ret == SSL_ERROR_WANT_READ) {
...
}
else if (ret == SSL_ERROR_SYSCALL) {
...
}

like this:


if (ret == SSL_ERROR_WANT_WRITE) {
...
}
else if (ret == SSL_ERROR_WANT_READ) {
...
}
#if OPENSSL_VERSION_NUMBER >= 0x101fL
else if (ret == SSL_ERROR_WANT_ASYNC) {
...
}
#endif
else if (ret == SSL_ERROR_SYSCALL) {
...
}

About 'ssl_async_process_fds' and 'SSL_get_changed_async_fds':

The doc says you have to call SSL_get_changed_async_fds with NULL to ensure you 
have enough place in your fd's buffer to retrieve the fds.

So here you have a potential buffer overflow.

The doc also says also 'However if multiple asynchronous capable engines are in 
use then more than one is possible.'

I think you should authorize ssl-async only if there is only one engine 
configured because:

/* we don't support more than 1 async fds */
if (num_add_fds > 1 || num_del_fds > 1)
return;

i will results to unpredictable behavior.

Finally, i think it would preferable to rename 'ssl-async' to 
'ssl-engine-async' or 'ssl-mode-async'

Thanks a lot for your job Grant!

R,
Emeric








Re: OpenSSL engine and async support

2017-04-25 Thread Emeric Brun
Hi Grant,

On 04/10/2017 05:16 PM, Grant Zhang wrote:
> 
>> On Apr 10, 2017, at 07:42, Emeric Brun <eb...@haproxy.com> wrote:
>>
> 
>>> * openssl version (1.1.0b-e?)
>> compiled 1.1.0e
>>>
>>>
>> Could you provide patches rebased on current dev master branch?
> I am kinda busy with other project but will try to provide rebased patches 
> this week.
> 
> Thanks, 
> 
> Grant
> 

I've made a POC of a soft async engine. Based on dasync engine it launchs a 
thread on priv_rsa_enc to spread the load on multiple cores.

Regarding openssl s_server it is efficient and scale very well depending the 
number of core (1700 rsa2048/s on one core, 7400 on 4 cores)

But using haproxy i'm still facing the same issue. There is a poor scale as for 
qat. If i check a top, i see that haproxy uses 100% of one core
and some of the time 80% of an other but for a very short period.

In my opinion, there is very few parallelized jobs. I hope a clean rebase will 
fix the issue.

On haproxy conn/s side:
native ssl: 1200 conn/s
qat+async: 1700 conn/s
dasync modified+async: 1800 conn/s

R,
Emeric




  1   2   >