Re: V1.9 SSL engine and ssl-mode-async is unstable

2019-01-25 Thread Aleksandar Lazic

Hi.

Am 25-01-2019 08:55, schrieb Kevin Zhu:


HI HAProxy Team,:
I am trying to use Intel qat work with HAProxy-1.9.0, but it work very 
unstable. and i had other try HAProxy-1.8.16 and it work will, How can 
i find what is wrong?
1.8.16 and 1.9.0 use same hardwave and system to running and compile, 
and use the same config file, the attach file is config file


Please can you explain "very unstable" a little bit more.

Can you try 1.9.2/3 ?

Do you have any errors or warnings in the logs?
Maybe you can use loglevel debug?


Thanks of any help.
Best regards


Regards
Aleks

haproxy.conf
Description: Binary data


Re: h1-client to h2-server host header / authority conversion failure.?

2019-01-25 Thread Aleksandar Lazic

Hi List.

Am 25-01-2019 01:01, schrieb PiBa-NL:

Hi List,

Attached a regtest which i 'think' should pass.

**   s1    0.0 === expect tbl.dec[1].key == ":authority"
 s1    0.0 EXPECT tbl.dec[1].key (host) == ":authority" failed

It seems to me the Host <> Authority conversion isn't happening
properly.? But maybe i'm just making a mistake in the test case...

I was using HA-Proxy version 2.0-dev0-f7a259d 2019/01/24 with this 
test.


The test was inspired by the attempt to connect to mail.google.com ,
as discussed in the "haproxy 1.9.2 with boringssl" mail thread.. Not
sure if this is the main problem, but it seems suspicious to me..


That's one of the reason why I love this community ;-)

As I'm just one of this Community, I want to say, Thanks all on the list 
to be part of HAProxy ;-).



Regards,

PiBa-NL (Pieter)


Regards
Aleks



1.9.3 delayed

2019-01-25 Thread Willy Tarreau
Hi guys,

I know I said I'd issue 1.9.3 this week, but after having worked
on addressing ugly H2 issues, I've now spent the whole day on a
severe crash bug affecting server-side idle connections and at the
end of the day with numerous traces and code changes I'm still back
to the same point making no progress and starting to think I'm the
one stupid not capable of reading the code.

Given this bug manifests itself as memory corruption then crashes,
I would really like to at least understand it before issuing 1.9.3,
because at this point I don't trust the code I was about to release
nor any bug report that could come from it.

I've checked the queue and there isn't an emergency to emit a version
now so better get the dirty things fixed and have a cleaner release
than force everyone to upgrade every week.

Thanks,
Willy



Re: h1-client to h2-server host header / authority conversion failure.?

2019-01-25 Thread Willy Tarreau
Hi Pieter,

On Fri, Jan 25, 2019 at 01:01:19AM +0100, PiBa-NL wrote:
> Hi List,
> 
> Attached a regtest which i 'think' should pass.
> 
> **   s1    0.0 === expect tbl.dec[1].key == ":authority"
>  s1    0.0 EXPECT tbl.dec[1].key (host) == ":authority" failed
> 
> It seems to me the Host <> Authority conversion isn't happening properly.?
> But maybe i'm just making a mistake in the test case...
> 
> I was using HA-Proxy version 2.0-dev0-f7a259d 2019/01/24 with this test.
> 
> The test was inspired by the attempt to connect to mail.google.com , as
> discussed in the "haproxy 1.9.2 with boringssl" mail thread.. Not sure if
> this is the main problem, but it seems suspicious to me..

It's not as simple, :authority is only required for CONNECT and is optional
for other methods with Host as a fallback. Clients are encouraged to use it
instead of the Host header field, according to paragraph 8.1.2.3, but there
is nothing indicating that a gateway may nor should build one from scratch
when translating HTTP/1.1 to HTTP/2. In fact the authority part is
generally not present in the URIs we receive as a gateway, so what we'd put
there would be completely reconstructed from the host header field. I don't
even know if all servers are fine with authority only instead of Host.

Please note, I'm not against changing this, I just want to be sure we
actually fix something and that we don't break anything. Thus if you have
any info indicating there is an issue with this one missing, it could
definitely help.

Thanks!
Willy



Re: [PATCH] runtime do-resolve http action

2019-01-25 Thread Willy Tarreau
On Fri, Jan 25, 2019 at 03:09:52PM +0100, Baptiste wrote:
> Hi Willy,
> 
> Thanks for the review!!!
> I fixed most of the problems, but I have a 3 points I'd like to discuss:
> 
> > +  If an IP address can be found, it is stored into . If any kind of
> > > +  error occurs, then  is not set.
> >
> > Just to be sure, it is not set or not modified ? I guess the latter, which
> > is fine.
> >
> 
> Yes, not set. So '-m found' can be used.

So you actually *remove* the variable if you don't get a response,
that's it ? I would have possibly found it more convenient to just
stay on the not modified approach so that you could possibly chain
multiple do-resolve actions and hope that at least one of them could
pick the response. Think about environments where you have multiple
sets of resolvers (internal, admin, internet) and for unkonwn names
you don't know which onee to ask so you ask all of them with 3
different rules.

> > > + struct sample *smp;
> > > +
> > > + conn_get_from_addr(cli_conn);
> > > +
> > > + smp = sample_fetch_as_type(px, sess, s,
> > SMP_OPT_DIR_REQ|SMP_OPT_FINAL, rule->arg.dns.expr, SMP_T_STR);
> > > + if (smp) {
> > > + char *fqdn;
> > > +
> > > + fqdn = smp->data.u.str.area;
> > > + if (action_prepare_for_resolution(s, fqdn) == -1) {
> > > + ha_alert("Can't create DNS resolution for
> > server 'http request action'\n");
> >
> > Please don't send runtime alerts. We've tried hard to clean them up so
> > that they remain only during startup.
> >
> 
> In this function, I have a proxy structure. Should I use send_log() on it
> to report the error?

You could but then it'd be better to perform some form of rate-limiting.
It is possible that the same reason causes the function to fail in loops
for all requests and it's not very cool to spam logs with info that are
already present in the request's failure anyway. In general an alert log
is made so that someone can do something about it. What could be done
however is to emit this error once if it's a matter of config, and to
increment a counter reported in "show info". We already do this at some
places, I just don't remember which ones :-)

> > > + case ACT_HTTP_DO_RESOLVE:
> > >   case ACT_CUSTOM:
> > >   if ((s->req.flags & CF_READ_ERROR) ||
> > >   ((s->req.flags & (CF_SHUTR|CF_READ_NULL)) &&
> >
> > Suddenly that makes me wonder : why is it needed to have a dedicated
> > action since it uses the generic infrastructure with ACT_CUSTOM ?
> >
> 
> I think this must have been one of the first thing I did during my
> development phase so I would be able to "isolate" my code when needed.
> Now you said it, and I step back a bit, I also consider there is no value
> in this action, appart being clear on the action name and gives us the
> ability to be very cautious if we update the behavior of ACT_CUSTOM in the
> future.
> I can remove ACT_HTTP_DO_RESOLVE and add a comment in ACT_CUSTOM saying
> that the do-resolve action relies on this code, just in case.

Normally the vast majority of actions are already in ACT_CUSTOM nowadays.
The other ones are just historical exceptions. Please have a look at
http_req_actions to see how to declare yours. In short you'll have to
add something like this to dns.c (please excuse the copy-paste which
will not work, but you'll get the idea) :

   static struct action_kw_list http_req_dns_actions = {
  .kw = {
  { "do-resolve",  parse_http_req_do_resolve },
  { NULL, NULL }
  }
   };

   INITCALL1(STG_REGISTER, http_req_keywords_register, _req_dns_actions);

And you're done, more or less a few includes of course :-)

Cheers,
Willy



Re: [PATCH] runtime do-resolve http action

2019-01-25 Thread Baptiste
Hi Willy,

Thanks for the review!!!
I fixed most of the problems, but I have a 3 points I'd like to discuss:

> +  If an IP address can be found, it is stored into . If any kind of
> > +  error occurs, then  is not set.
>
> Just to be sure, it is not set or not modified ? I guess the latter, which
> is fine.
>

Yes, not set. So '-m found' can be used.


> > + struct sample *smp;
> > +
> > + conn_get_from_addr(cli_conn);
> > +
> > + smp = sample_fetch_as_type(px, sess, s,
> SMP_OPT_DIR_REQ|SMP_OPT_FINAL, rule->arg.dns.expr, SMP_T_STR);
> > + if (smp) {
> > + char *fqdn;
> > +
> > + fqdn = smp->data.u.str.area;
> > + if (action_prepare_for_resolution(s, fqdn) == -1) {
> > + ha_alert("Can't create DNS resolution for
> server 'http request action'\n");
>
> Please don't send runtime alerts. We've tried hard to clean them up so
> that they remain only during startup.
>

In this function, I have a proxy structure. Should I use send_log() on it
to report the error?


> > + case ACT_HTTP_DO_RESOLVE:
> >   case ACT_CUSTOM:
> >   if ((s->req.flags & CF_READ_ERROR) ||
> >   ((s->req.flags & (CF_SHUTR|CF_READ_NULL)) &&
>
> Suddenly that makes me wonder : why is it needed to have a dedicated
> action since it uses the generic infrastructure with ACT_CUSTOM ?
>

I think this must have been one of the first thing I did during my
development phase so I would be able to "isolate" my code when needed.
Now you said it, and I step back a bit, I also consider there is no value
in this action, appart being clear on the action name and gives us the
ability to be very cautious if we update the behavior of ACT_CUSTOM in the
future.
I can remove ACT_HTTP_DO_RESOLVE and add a comment in ACT_CUSTOM saying
that the do-resolve action relies on this code, just in case.

Baptiste


Re: H2 Server Connection Resets (1.9.2)

2019-01-25 Thread Willy Tarreau
Hi Luke,

On Fri, Jan 25, 2019 at 08:08:22AM +, Luke Seelenbinder wrote:
> Hi Willy,
> 
> > OK so instead of sending you a boring series, I can propose you to run
> > a test on 2.0-dev, which contains all the fixes I had to go through
> > because of tiny issues everywhere related to this. If you're using git,
> > just clone the master and checkout commit f7a259d46f8.
> > you can simply wait for the next nightly snapshot.
> 
> Sounds good. My compilation playbook uses tarballs, so I'll just use the last
> nightly. I assume I should wait for these fixes to be backported (1.9.3?)
> before trying anything in production?

As you like. My first rule is never to make people take risks they're not
willing to take. It's perfectly OK to me if you don't feel confident with
2.0-dev in prod. I'm going to perform the 1.9 backports. If you're
interested in testing them from the branch before I release it today, just
let me know.

> > But now you have a new server parameter called
> > "max-reuse". This allows to limit the number of times a server connection
> > is reused. For example you can set it to 990 when you know that the
> > server limits to 1000.
> 
> That's great! I didn't expect to get a new configuration option. I'll
> definitely make sure these are in sync across our infrastructure.

Even without the option it will work better than before, but the option
is there to completely void any risk of hitting the limit too late.

> > Regarding the fact that in your case the client's close seems to cause
> > the server-side issue, I couldn't yet reproduce it though I have a few
> > theories about it. One of them would be an unexpected response from
> > the server causing the connection to turn to an error state. The other
> > one would be that we'd incorrectly abort our stream and/or session and
> > bring the connection down with us. I'll submit these theories to Olivier
> > once he's back so that he can tell me I'm saying crap regarding some of
> > them and we can focus on what remains :-)
> 
> Sounds good. I'll report back my results from the latest snapshot and we can
> go from there. Perhaps the client causing the issues was a red herring for
> the server-side bugs.

I hadn't thought about it but it could also be, indeed.

> Thanks again for deep-diving and resolving this! I won't ask how many hours
> it took to find all these small edge cases. . .

Usually you start from a bug report, you find a hook in the code which
starts to explain it, and you walk along the thread discovering that a
lot of places are wrong together and once perfectly aligned cause crazy
things to happen. Of course there's the solution of putting some brown
paper bag on top of the most visible one, but in this project we prefer
to address the causes than the consequences ;-)  So yes it sometimes
takes time and caffeine, and often delays releases because it's always
hard to accept to release something with known unfixed issues in it.

Cheers,
Willy



Re: H2 Server Connection Resets (1.9.2)

2019-01-25 Thread Luke Seelenbinder
Hi Willy,

> OK so instead of sending you a boring series, I can propose you to run
> a test on 2.0-dev, which contains all the fixes I had to go through
> because of tiny issues everywhere related to this. If you're using git,
> just clone the master and checkout commit f7a259d46f8.
> you can simply wait for the next nightly snapshot.

Sounds good. My compilation playbook uses tarballs, so I'll just use the last 
nightly. I assume I should wait for these fixes to be backported (1.9.3?) 
before trying anything in production?

> But now you have a new server parameter called
> "max-reuse". This allows to limit the number of times a server connection
> is reused. For example you can set it to 990 when you know that the
> server limits to 1000.

That's great! I didn't expect to get a new configuration option. I'll 
definitely make sure these are in sync across our infrastructure.

> Regarding the fact that in your case the client's close seems to cause
> the server-side issue, I couldn't yet reproduce it though I have a few
> theories about it. One of them would be an unexpected response from
> the server causing the connection to turn to an error state. The other
> one would be that we'd incorrectly abort our stream and/or session and
> bring the connection down with us. I'll submit these theories to Olivier
> once he's back so that he can tell me I'm saying crap regarding some of
> them and we can focus on what remains :-)

Sounds good. I'll report back my results from the latest snapshot and we can go 
from there. Perhaps the client causing the issues was a red herring for the 
server-side bugs.

Thanks again for deep-diving and resolving this! I won't ask how many hours it 
took to find all these small edge cases. . .

Best,
Luke

—
Luke Seelenbinder
Stadia Maps | Founder
stadiamaps.com

‐‐‐ Original Message ‐‐‐
On Thursday, January 24, 2019 7:55 PM, Willy Tarreau  wrote:

> Hi Luke,
> 

> On Wed, Jan 23, 2019 at 05:16:04PM +, Luke Seelenbinder wrote:
> 

> > Hi Willy,
> > This is all very good to hear. I'm glad you were able to get to the bottom 
> > of
> > it all!
> > Feel free to send along patches if you want me to test before the 1.9.3
> > release. I'm more than happy to do so.
> 

> OK so instead of sending you a boring series, I can propose you to run
> a test on 2.0-dev, which contains all the fixes I had to go through
> because of tiny issues everywhere related to this. If you're using git,
> just clone the master and checkout commit f7a259d46f8.
> you can simply wait for the next nightly snapshot.
> 

> Just let me know if that's OK for you.
> 

> I found a number of issues that were causing server aborts, mainly
> due to the late GOAWAY frame. Once we hit this one, the connection
> is quickly closed by the server, causing our output packets to be
> rejected and the connection to be in error. I have not yet investigated
> in details to see if the close happens after we got the last data or in
> the middle though. But now you have a new server parameter called
> "max-reuse". This allows to limit the number of times a server connection
> is reused. For example you can set it to 990 when you know that the
> server limits to 1000.
> 

> On the tests I've run here, I managed to address all the problems
> related to excessive use of idle connections resulting in too many
> streams being sent. In addition most of the rare cases that still
> happen when you don't have max-reuse are properly handled as a retry.
> 

> Regarding the fact that in your case the client's close seems to cause
> the server-side issue, I couldn't yet reproduce it though I have a few
> theories about it. One of them would be an unexpected response from
> the server causing the connection to turn to an error state. The other
> one would be that we'd incorrectly abort our stream and/or session and
> bring the connection down with us. I'll submit these theories to Olivier
> once he's back so that he can tell me I'm saying crap regarding some of
> them and we can focus on what remains :-)
> 

> Regards,
> Willy



publickey - luke.seelenbinder@stadiamaps.com - 0xB23C1E8A.asc
Description: application/pgp-keys


signature.asc
Description: OpenPGP digital signature