Re: [PATCH v4 2/2] MEDIUM: cli/ssl: configure ssl on server at runtime
On Thu, Oct 29, 2020 at 01:17:56PM +0100, William Dauchy wrote: > in the context of a progressive backend migration, we want to be able to > activate SSL on outgoing connections to the server at runtime without > reloading. > This patch adds a `set server ssl` command; in order to allow that: > > - add `srv_use_ssl` to `show servers state` command for compatibility, > also update associated parsing > - when using default-server ssl setting, and `no-ssl` on server line, > init SSL ctx without activating it > - when triggering ssl API, de/activate SSL connections as requested > - clean ongoing connections as it is done for addr/port changes, without > checking prior server state > > example config: > > backend be_foo > default-server ssl > server srv0 127.0.0.1:6011 weight 1 no-ssl > > show servers state: > > 5 be_foo 1 srv0 127.0.0.1 2 0 1 1 15 1 0 4 0 0 0 0 - 6011 - -1 > > where srv0 can switch to ssl later during the runtime: > > set server be_foo/srv0 ssl on > > 5 be_foo 1 srv0 127.0.0.1 2 0 1 1 15 1 0 4 0 0 0 0 - 6011 - 1 > > Signed-off-by: William Dauchy Looks good. I think a VTC file which tests this feature could also be a good idea, so we don't break this accidentaly. Thanks! -- William Lallemand
Re: [PATCH v4 1/2] MINOR: ssl: create common ssl_ctx init
On Thu, Oct 29, 2020 at 01:17:55PM +0100, William Dauchy wrote: > so we can reuse it later > > Signed-off-by: William Dauchy Could you add a little more explanations in the commit message for this one, and separate clearly the subject from the commit message? Thanks! -- William Lallemand
Re: [ANNOUNCE] haproxy-2.3.0
Hi Cyril! On Wed, Nov 11, 2020 at 10:47:21PM +0100, Cyril Bonté wrote: > I'm sad to not find enough time to contribute to haproxy. But well, at least > I try to read mail subjects :-/ > With some delays, I've now pushed the documentation for 2.3 and 2.4-dev ;-) Thank you! Willy
Re: [ANNOUNCE] haproxy-2.3.0
Cyril, Am 11.11.20 um 22:47 schrieb Cyril Bonté: >> HAProxy 2.3.0 was released on 2020/11/05. It added 33 new commits after >> version 2.3-dev9. I was right to wait a few more days before releasing, >> we could spot two late regressions and fix them in time! >> [...] >> Cyril's HTML doc : http://cbonte.github.io/haproxy-dconv/ > > I'm sad to not find enough time to contribute to haproxy. But well, at > least I try to read mail subjects :-/ > With some delays, I've now pushed the documentation for 2.3 and 2.4-dev ;-) > Would it be helpful giving the repository a new home in the 'haproxy' organization [1] by moving the repository there? It would not be reliant on you personally then and updating it could be included as part of the regular release workflow (or even automated [1]). I'm not sure how moving a repository affects GitHub pages, though. Best regards Tim Düsterhus [1] https://github.com/haproxy/ [2] https://github.com/stefanzweifel/git-auto-commit-action
Re: [ANNOUNCE] haproxy-2.3.0
Hi all ! Le 05/11/2020 à 19:20, Willy Tarreau a écrit : Hi, HAProxy 2.3.0 was released on 2020/11/05. It added 33 new commits after version 2.3-dev9. I was right to wait a few more days before releasing, we could spot two late regressions and fix them in time! [...] Cyril's HTML doc : http://cbonte.github.io/haproxy-dconv/ I'm sad to not find enough time to contribute to haproxy. But well, at least I try to read mail subjects :-/ With some delays, I've now pushed the documentation for 2.3 and 2.4-dev ;-) -- Cyril Bonté
[PATCH v2] CI: Stop hijacking the hosts file
Willy, fixed a typo in the commit message. not -> now. Sorry for the noise with the separate mails! Best regards Tim Düsterhus Apply with `git am --scissors` to automatically cut the commit message. -- >8 -- vtest/VTest#24 is merged now. This step is no longer required. --- .github/workflows/vtest.yml | 6 -- 1 file changed, 6 deletions(-) diff --git a/.github/workflows/vtest.yml b/.github/workflows/vtest.yml index c3571b876..28e814153 100644 --- a/.github/workflows/vtest.yml +++ b/.github/workflows/vtest.yml @@ -101,12 +101,6 @@ jobs: echo "::endgroup::" haproxy -vv echo "::set-output name=version::$(haproxy -v |awk 'NR==1{print $3}')" -- name: Adjust hosts file - # This step can be removed if https://github.com/vtest/VTest/pull/24 is - # fixed. - run: | -cat /etc/hosts -sudo sed -i.bak '/::1/s/^/#/' /etc/hosts - name: Install problem matcher for VTest # This allows one to more easily see which tests fail. run: echo "::add-matcher::.github/vtest.json" -- 2.29.0
[PATCH] CI: Stop hijacking the hosts file
vtest/VTest#24 is merged not. This step is no longer required. --- .github/workflows/vtest.yml | 6 -- 1 file changed, 6 deletions(-) diff --git a/.github/workflows/vtest.yml b/.github/workflows/vtest.yml index c3571b876..28e814153 100644 --- a/.github/workflows/vtest.yml +++ b/.github/workflows/vtest.yml @@ -101,12 +101,6 @@ jobs: echo "::endgroup::" haproxy -vv echo "::set-output name=version::$(haproxy -v |awk 'NR==1{print $3}')" -- name: Adjust hosts file - # This step can be removed if https://github.com/vtest/VTest/pull/24 is - # fixed. - run: | -cat /etc/hosts -sudo sed -i.bak '/::1/s/^/#/' /etc/hosts - name: Install problem matcher for VTest # This allows one to more easily see which tests fail. run: echo "::add-matcher::.github/vtest.json" -- 2.29.0
Re: Updated CI using GitHub actions
Ilya, Am 11.11.20 um 20:52 schrieb Илья Шипицин: > as for running some jobs on schedule, does current "generated" config > support schedule in some way ? > or do you mean classic yml definition > Yes, it does. GitHub exposes the workflow trigger as 'github.event_name'. We can pass this as a parameter to the Python script and then use a simple if condition. That's the magic of a programming language. I've made an example: https://github.com/TimWolla/haproxy/commit/770ffd31f3c99b13bc9e36b82dc9723c34afdb4b The workflow that ran on push: https://github.com/TimWolla/haproxy/runs/1387351894?check_suite_focus=true#step:3:4 I would have expected the scheduled job to run on 21:25 UTC, but apparently it did not start. Maybe they are delayed a bit depending on load. Maybe it pops up here in a minutes. It should include the dedicated compression jobs then: https://github.com/TimWolla/haproxy/actions?query=workflow%3AVTest Best regards Tim Düsterhus
Re: Updated CI using GitHub actions
Ilya, Am 11.11.20 um 20:48 schrieb Илья Шипицин: >>> (few things like 51 degree, prometheus, PCRE2 to be discussed later) >>> >> >> - Put 51d, Prometheus into the "all features" tests, that's what they >> are for. >> > > 51d has 2 different implementations: "pattern" and "trie". > I guess for 51d it is sufficient to test one of both. I expect any necessary changes to affect both of them. We can't really test 51d anyway, because it's just a compile test. Using one of them is "good enough". >> - PCRE 2 should get a dedicated pair of tests testing only regex similar >> to slz + zlib that only test compression. >> > > that's questionable. > I think we might encounter more issues if we run slz or pcre2 together with > other features. I disagree here. The differences between slz / gzip and pcre / pcre2 are pretty localized. They are not magically going to break SSL. Thus I prefer to optimize the builds for easier understanding. If you see that slz fails and gzip does not then it's immediately obvious. Any more features enabled make it less obvious. > but I do not think it is good to waste someone electricity > That's why I suggested to run the less important builds on a schedule. Even then the GitHub action VMs are much quicker. The Ubuntu, clang, gz=slz=1 build finishes in 1:58 minutes total with 42 seconds spent installing build dependencies. With the amount of pushes HAProxy does daily a $5/month VPS would probably be sufficient to run all the builds we do and with GitHub Actions we can use our own VPS if necessary. Best regards Tim Düsterhus
Re: Updated CI using GitHub actions
as for running some jobs on schedule, does current "generated" config support schedule in some way ? or do you mean classic yml definition ср, 11 нояб. 2020 г. в 23:59, Tim Düsterhus : > Ilya, > > Am 11.11.20 um 19:38 schrieb Илья Шипицин: > > let us discuss next steps :) > > > > Ubuntu, gcc, all features > > Ubuntu, gcc, ssl=stock > > both of jobs use stock ssl. do we really need second one (ssl enabled, no > > other features enabled) ? I would second one (same for clang) > > Yes, we need both. I'd like to have both to be able to directly compare > any differences between the various SSL libraries. Compare the following > example: > > - Ubuntu, gcc, all features: Fails. > - Ubuntu, gcc, ssl=stock: Works. > - Ubuntu, gcc, ssl=libressl=3.0.2: Fails. > > -> There are probably two separate issues. One with LibreSSL and one > that is not related to SSL. This is more useful than: > > - Ubuntu, gcc, all features: Fails. > - Ubuntu, gcc, ssl=libressl=3.0.2: Fails. > > -> This might be an issue with SSL in general or two separate issues. We > don't know because we have nothing to compare. > > - Ubuntu, gcc, all features: Works. > - Ubuntu, gcc, ssl=stock: Fails. > > -> There is probably some issue with SSL combined with something else. > > - Ubuntu, gcc, all features: Fails. > - Ubuntu, gcc, ssl=stock: Works. > > -> There is probably some issue that is not related to SSL. > > - Ubuntu, gcc, all features: Fails. > - Ubuntu, gcc, ssl=stock: Works. > - Ubuntu, gcc, ssl=libressl=3.0.2: Fails. > > -> There is probably some issue in LibreSSL only. > > > Ubuntu, gcc, gz=slz=1 > > this one is "slz only, no other features". should we run slz + all > features > > enabled instead ? (same for clang) > > No. As with SSL this is intentional, testing only the specific > difference of using slz instead of zlib in a "controlled" environment. > There is no need to run, e.g. all the SSL tests with slz, because they > don't enable compression anyway. > > > > > Ubuntu, gcc, gz=zlib=1 > > should we drop this one in favour of "gcc all features enabled" ? > > No. Same reason. Specific controlled environment as a direct comparison > to easily detect differences that are caused by switching out the > compression library. If zlib fails and slz does not we can't easily see > if it is caused by a bug in zlib integration or by some issue in the > combination of the various features. > > --- > > I would be fine with running e.g. the SLZ test / LibreSSL tests only in > cron to keep resource usage low. But we must keep them separate to keep > things simple and easy to understand instead of implicitly testing stuff > by including them somewhere else. That was one of my biggest concerns > with the current Travis situation. > > > > > Ubuntu, gcc, ssl=libressl=3.0.2 > > I think we can drop this one as well > > Okay. > > > > > (few things like 51 degree, prometheus, PCRE2 to be discussed later) > > > > - Put 51d, Prometheus into the "all features" tests, that's what they > are for. > - PCRE 2 should get a dedicated pair of tests testing only regex similar > to slz + zlib that only test compression. > > Best regards > Tim Düsterhus >
Re: Updated CI using GitHub actions
ср, 11 нояб. 2020 г. в 23:59, Tim Düsterhus : > Ilya, > > Am 11.11.20 um 19:38 schrieb Илья Шипицин: > > let us discuss next steps :) > > > > Ubuntu, gcc, all features > > Ubuntu, gcc, ssl=stock > > both of jobs use stock ssl. do we really need second one (ssl enabled, no > > other features enabled) ? I would second one (same for clang) > > Yes, we need both. I'd like to have both to be able to directly compare > any differences between the various SSL libraries. Compare the following > example: > > - Ubuntu, gcc, all features: Fails. > - Ubuntu, gcc, ssl=stock: Works. > - Ubuntu, gcc, ssl=libressl=3.0.2: Fails. > > -> There are probably two separate issues. One with LibreSSL and one > that is not related to SSL. This is more useful than: > > - Ubuntu, gcc, all features: Fails. > - Ubuntu, gcc, ssl=libressl=3.0.2: Fails. > > -> This might be an issue with SSL in general or two separate issues. We > don't know because we have nothing to compare. > > - Ubuntu, gcc, all features: Works. > - Ubuntu, gcc, ssl=stock: Fails. > > -> There is probably some issue with SSL combined with something else. > > - Ubuntu, gcc, all features: Fails. > - Ubuntu, gcc, ssl=stock: Works. > > -> There is probably some issue that is not related to SSL. > > - Ubuntu, gcc, all features: Fails. > - Ubuntu, gcc, ssl=stock: Works. > - Ubuntu, gcc, ssl=libressl=3.0.2: Fails. > > -> There is probably some issue in LibreSSL only. > > > Ubuntu, gcc, gz=slz=1 > > this one is "slz only, no other features". should we run slz + all > features > > enabled instead ? (same for clang) > > No. As with SSL this is intentional, testing only the specific > difference of using slz instead of zlib in a "controlled" environment. > There is no need to run, e.g. all the SSL tests with slz, because they > don't enable compression anyway. > > > > > Ubuntu, gcc, gz=zlib=1 > > should we drop this one in favour of "gcc all features enabled" ? > > No. Same reason. Specific controlled environment as a direct comparison > to easily detect differences that are caused by switching out the > compression library. If zlib fails and slz does not we can't easily see > if it is caused by a bug in zlib integration or by some issue in the > combination of the various features. > > --- > > I would be fine with running e.g. the SLZ test / LibreSSL tests only in > cron to keep resource usage low. But we must keep them separate to keep > things simple and easy to understand instead of implicitly testing stuff > by including them somewhere else. That was one of my biggest concerns > with the current Travis situation. > > > > > Ubuntu, gcc, ssl=libressl=3.0.2 > > I think we can drop this one as well > > Okay. > > > > (few things like 51 degree, prometheus, PCRE2 to be discussed later) > > > > - Put 51d, Prometheus into the "all features" tests, that's what they > are for. > 51d has 2 different implementations: "pattern" and "trie". > - PCRE 2 should get a dedicated pair of tests testing only regex similar > to slz + zlib that only test compression. > that's questionable. I think we might encounter more issues if we run slz or pcre2 together with other features. but I do not think it is good to waste someone electricity > > Best regards > Tim Düsterhus >
Re: Updated CI using GitHub actions
Ilya, Am 11.11.20 um 19:38 schrieb Илья Шипицин: > let us discuss next steps :) > > Ubuntu, gcc, all features > Ubuntu, gcc, ssl=stock > both of jobs use stock ssl. do we really need second one (ssl enabled, no > other features enabled) ? I would second one (same for clang) Yes, we need both. I'd like to have both to be able to directly compare any differences between the various SSL libraries. Compare the following example: - Ubuntu, gcc, all features: Fails. - Ubuntu, gcc, ssl=stock: Works. - Ubuntu, gcc, ssl=libressl=3.0.2: Fails. -> There are probably two separate issues. One with LibreSSL and one that is not related to SSL. This is more useful than: - Ubuntu, gcc, all features: Fails. - Ubuntu, gcc, ssl=libressl=3.0.2: Fails. -> This might be an issue with SSL in general or two separate issues. We don't know because we have nothing to compare. - Ubuntu, gcc, all features: Works. - Ubuntu, gcc, ssl=stock: Fails. -> There is probably some issue with SSL combined with something else. - Ubuntu, gcc, all features: Fails. - Ubuntu, gcc, ssl=stock: Works. -> There is probably some issue that is not related to SSL. - Ubuntu, gcc, all features: Fails. - Ubuntu, gcc, ssl=stock: Works. - Ubuntu, gcc, ssl=libressl=3.0.2: Fails. -> There is probably some issue in LibreSSL only. > Ubuntu, gcc, gz=slz=1 > this one is "slz only, no other features". should we run slz + all features > enabled instead ? (same for clang) No. As with SSL this is intentional, testing only the specific difference of using slz instead of zlib in a "controlled" environment. There is no need to run, e.g. all the SSL tests with slz, because they don't enable compression anyway. > > Ubuntu, gcc, gz=zlib=1 > should we drop this one in favour of "gcc all features enabled" ? No. Same reason. Specific controlled environment as a direct comparison to easily detect differences that are caused by switching out the compression library. If zlib fails and slz does not we can't easily see if it is caused by a bug in zlib integration or by some issue in the combination of the various features. --- I would be fine with running e.g. the SLZ test / LibreSSL tests only in cron to keep resource usage low. But we must keep them separate to keep things simple and easy to understand instead of implicitly testing stuff by including them somewhere else. That was one of my biggest concerns with the current Travis situation. > > Ubuntu, gcc, ssl=libressl=3.0.2 > I think we can drop this one as well Okay. > > (few things like 51 degree, prometheus, PCRE2 to be discussed later) > - Put 51d, Prometheus into the "all features" tests, that's what they are for. - PCRE 2 should get a dedicated pair of tests testing only regex similar to slz + zlib that only test compression. Best regards Tim Düsterhus
Re: Updated CI using GitHub actions
let us discuss next steps :) Ubuntu, gcc, all features Ubuntu, gcc, ssl=stock both of jobs use stock ssl. do we really need second one (ssl enabled, no other features enabled) ? I would second one (same for clang) Ubuntu, gcc, gz=slz=1 this one is "slz only, no other features". should we run slz + all features enabled instead ? (same for clang) Ubuntu, gcc, gz=zlib=1 should we drop this one in favour of "gcc all features enabled" ? Ubuntu, gcc, ssl=libressl=3.0.2 I think we can drop this one as well (few things like 51 degree, prometheus, PCRE2 to be discussed later) ср, 11 нояб. 2020 г. в 09:34, Willy Tarreau : > On Tue, Nov 10, 2020 at 10:30:52PM +0100, Tim Düsterhus wrote: > (...) > > Let me (or Ilya) know if you have any questions or if you notice any > > issues with it. Personally I'm super happy with how it turned out :-) > > Many thanks to you and Ilya for handling this. I know for having followed > your exchanges that it was not trivial, and I really appreciate that you > entirely offloaded me from this painful task. We'll see how it goes over > the long term, especially if many projects move from Travis to Github > Actions and start to slow things down as we've observed on Travis over > time (let's hope not!). > > I guess we'll soon remove the redundant builds from Travis and only keep > the unusual ones (ppc64, s390x, arm64). > > Cheers, > Willy >
[PATCH] remote couple of travis jobs (migrated to github actions), unmark arm64 as allowed to fail
Hi, some travis-ci cleanup. Ilya From 80b4201f0f590f72fa87e4887089b5317f4da80e Mon Sep 17 00:00:00 2001 From: Ilya Shipitsin Date: Wed, 11 Nov 2020 23:16:22 +0500 Subject: [PATCH 2/2] CI: travis-ci: arm64 are not allowed to fail anymore --- .travis.yml | 5 - 1 file changed, 5 deletions(-) diff --git a/.travis.yml b/.travis.yml index e7ae0bbb8..1057ccc37 100644 --- a/.travis.yml +++ b/.travis.yml @@ -98,11 +98,6 @@ matrix: - export SLZ_INC=${HOME}/opt/include SLZ_LIB=${HOME}/opt/lib - export ADDLIB="-Wl,-rpath,$SLZ_LIB" name: openssl-1.1.1 | slz | pcre2 - allow_failures: - - os: linux -arch: arm64 -if: type == push -compiler: clang install: - git clone https://github.com/VTest/VTest.git ../vtest -- 2.28.0 From de1e7d2b04cbc2b7785a0588fc98b57d4990ba43 Mon Sep 17 00:00:00 2001 From: Ilya Shipitsin Date: Wed, 11 Nov 2020 23:15:20 +0500 Subject: [PATCH 1/2] CI: travis-ci: remove amd64, osx builds, they are migrated to Github Actions --- .travis.yml | 15 --- 1 file changed, 15 deletions(-) diff --git a/.travis.yml b/.travis.yml index b32de97a1..e7ae0bbb8 100644 --- a/.travis.yml +++ b/.travis.yml @@ -34,12 +34,6 @@ matrix: compiler: gcc env: TARGET=linux-glibc OPENSSL_VERSION=1.0.2u name: openssl-1.0.2 - - os: linux -arch: amd64 -if: type == push -compiler: clang -env: TARGET=linux-glibc CC=clang-9 -name: openssl-1.1.1 - os: linux arch: arm64 if: type == push @@ -94,15 +88,6 @@ matrix: compiler: clang env: TARGET=linux-glibc FLAGS= CC=clang-9 name: FLAGS= - - os: osx -osx_image: xcode12 -if: type == push -compiler: clang -before_script: - - echo 'brew "socat"' > brew.bundle - - brew bundle --file=brew.bundle -env: TARGET=osx FLAGS="USE_OPENSSL=1" OPENSSL_VERSION=1.1.1f -name: openssl-1.1.1 - os: linux if: type == cron compiler: clang -- 2.28.0
[PATCH] BUG/MINOR: http-htx: make sure to print warn on c-l error
commit "BUG/MINOR: http-htx: Just warn if payload of an errorfile doesn't match the C-L" (which is only present in 2.2, 2.1 and 2.0 trees, i.e see commit 7bf3d81d3cf4b9f4587 in 2.2 tree), is changing the behavior of `http_str_to_htx` function as the error path is filling up errmsg without returning an error. - when called in `http_htx_init`, it is ok as we test `errmsg` and print a warning. - but it is not the case in `http_load_errorfile` and `http_load_errormsg`. End result is no warning is printed when there is a content-length mismatch. This patch tries to address that. This patch is probably only valid for 2.2, 2.1 and 2.0 trees as it fixes an issue present in those trees. However it might be worth considering for the dev tree to avoid future regressions and be more consistent. Signed-off-by: William Dauchy --- src/http_htx.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/src/http_htx.c b/src/http_htx.c index 51dbf44a6..27c409e4a 100644 --- a/src/http_htx.c +++ b/src/http_htx.c @@ -1197,6 +1197,9 @@ struct buffer *http_load_errorfile(const char *file, char **errmsg) free(http_errmsg); goto out; } + else if (*errmsg) { + ha_warning("invalid custom message %s: %s\n", file, *errmsg); + } /* Insert the node in the tree and return the HTX message */ http_errmsg->msg = chk; @@ -1248,6 +1251,9 @@ struct buffer *http_load_errormsg(const char *key, const struct ist msg, char ** free(http_errmsg); goto out; } + else if (*errmsg) { + ha_warning("invalid custom message: %s\n", *errmsg); + } /* Insert the node in the tree and return the HTX message */ http_errmsg->msg = chk; -- 2.28.0
Re: Accelerate Web Opportunities: haproxy.com
Hi Team, You must be curious to know, in spite of having popular keywords your website is not visible on the first page of major search engines. There are various reasons and you should be well aware of that which plays an important role in web visibility. Competitor analysis has always been an important criterion for any strategic layout where you can compare your strategies and accordingly prepare a road map to stay ahead of competition. We can guide you by providing the details in a form of thorough *Analysis*. *Please find below some of the major areas we need to focus on: * - Key phrases selected for your site are not competitive enough to present you on first page. - On page and website issues are the major setback leading to non-performance on search engines. - Search based landing page relevance has been broadly ignored, leading to higher bounce rate. - Aggressive Social media promotion will help you to achieve brand visibility. What I can see a set of strategy and great opportunities here that can be achieved by joining hands. We will take all the responsibilities in making the website performance grow faster and you can dedicate your time & energy on new businesses acquisition. This e-mail just provides you with a glimpse of information. If you have any queries about our services then kindly contact us back for a *free website audit report*. Should you be interested, get in touch with us so that we can discuss further about it. Peterson Moore *Digital Marketing Analyst* …… [image: beacon]
Re: [2.0.17] crash with coredump
śr., 11 lis 2020 o 12:53 Willy Tarreau napisał(a): > Two months of chasing a non reproducible > memory corruption with zero initial info is quite an achievement, many > thanks for doing that! > Initially it crashed (once every few hours) only on our most critical HAProxy servers and with SPOA from external vendor, then on less critical but still production servers. It took a lot of time to analyze our config and not break anything. I was almost sure it's something specific to our configuration. Especially nobody else reported similar problems with SPOE. Your patch with additional checks was a game changer, it was easier to trigger the crash thus easier to replicate! :)
Re: [2.0.17] crash with coredump
On Wed, Nov 11, 2020 at 12:43:50PM +0100, Maciej Zdeb wrote: > Wow! Yes, I can confirm that a crash does not occur now. :) I checked 2.0 > and 2.2 branches. I'll keep testing it for a couple days just to be sure. > > So that stacktrace I shared before (on spoe_release_appctx function) was > very lucky... Do you think that it'd be possible to find the bug without > the replication procedure? Very hardly. I actually continued on your indications and noticed that each time I had a crash, a pointer that was supposedly aligned had regressed by one. This reminded me of the NULL pointer that became -1. I thought it was related to the pools since it often crashed there, and in parallel Christopher looked for decrements in the SPOE part and found that some nulls were missing there on aborts. > Christopher & Willy many thanks for your hard work! Let me return you the compliment! Two months of chasing a non reproducible memory corruption with zero initial info is quite an achievement, many thanks for doing that! > I'm always impressed > how fast you're able to narrow the bug when you finally get proper input > from a reporter. :) It's very simple, the code is huge and any piece could be responsible for any problem. Sometimes you have a good nose and manage to narrow down the issue in an area. Sometimes you just read a piece of code and figure it can do something nasty. Sometimes other reports come in and help rule out other hypothesis. But when there's nothing logical, most often it's a memory corruption and then there's no other solution than being able to observe it live and heavily instrument the code to go back in time from the crash to the cause. In your case we were lucky, threads were not involved, otherwise this adds another dimension, and very often the instrumentation code changes the timings and makes the issue disappear :-) Cheers, Willy
Re: [2.0.17] crash with coredump
Wow! Yes, I can confirm that a crash does not occur now. :) I checked 2.0 and 2.2 branches. I'll keep testing it for a couple days just to be sure. So that stacktrace I shared before (on spoe_release_appctx function) was very lucky... Do you think that it'd be possible to find the bug without the replication procedure? Christopher & Willy many thanks for your hard work! I'm always impressed how fast you're able to narrow the bug when you finally get proper input from a reporter. :) wt., 10 lis 2020 o 22:30 Willy Tarreau napisał(a): > Hi Christopher, > > On Tue, Nov 10, 2020 at 09:17:15PM +0100, Christopher Faulet wrote: > > Le 10/11/2020 à 18:12, Maciej Zdeb a écrit : > > > Hi, > > > > > > I'm so happy you're able to replicate it! :) > > > > > > With that patch that disabled pool_flush I still can reproduce on my > r > > > server and on production, just different places of crash: > > > > > > > Hi Maciej, > > > > Could you test the following patch please ? For now I don't know if it > fully > > fixes the bug. But it is step forward. I must do a deeper review to be > sure > > it covers all cases. > > Looks like you got it right this time, not only it doesn't crash anymore > in my tests, the suspiciously wrong cur_fap that were going negative very > quickly do not happen anymore either! This is a very good news! Looking > forward to read about Maciej's tests. > > Cheers, > Willy >