Re: Miscellaneous Crashes on 2.7.1

2023-01-09 Thread Luke Seelenbinder

Hi Amaury, Willy,

Thank you! That sounds good. I'll give 2.8-dev1 a try when I have the 
chance (probably later this week or next).


Best,
Luke

Luke Seelenbinder
Founder, Stadia Maps
https://stadiamaps.com

On 1/9/23 09:41, Amaury Denoyelle wrote:

On Sat, Jan 07, 2023 at 02:22:01PM +0100, Willy Tarreau wrote:

Hi Luke,
On Sat, Jan 07, 2023 at 01:44:30PM +0100, Luke Seelenbinder wrote:

Hi list,

We've been running 2.7.1 on a subset of our edge servers with QUIC + HTTP/3

enabled, and we're seeing routine, but infrequent (~daily), crashes (mix of
SIGABRT / SIGSEGV). I have coredumps and there doesn't seem to be any common
thread across crashes / machines, but it's possible I'm missing something.
Two of the coredumps show the following backtrace:

Program terminated with signal SIGSEGV, Segmentation fault.

#0  0x55b0fe319ce7 in qc_release_frm (qc=0x55b101236570,
frm=0x7fd8201fbbf0 ) at src/quic_conn.c:1569
1569                pn = f->pkt->pn_node.key;

Program terminated with signal SIGSEGV, Segmentation fault.

#0  qc_release_frm (qc=0x5652aa588fc0, frm=0x5652aa2537d0) at
src/quic_conn.c:1564
1564        list_for_each_entry_safe(f, tmp, >reflist, ref) {

which seem similar enough to possibly share a common cause. The other

crashes occur in quictls (sigabrt), htx.h (sigsegv), and ebtree.h (sigsegv).

Are there known fixes from 2.8-dev or internal trackers that could be
related? I can dig deeper, but for now I'll probably disable quic since that
seems to be the most likely culprit.

I'm seeing the following patch for QUIC which was fixed right after
2.7.1 was emitted and which suggest potential crashes:
   15337fd80 ("BUG/MEDIUM: mux-quic: fix double delete from qcc.opening_list")
So you might possibly be hitting that bug, indeed. If you're interested
in giving 2.8-dev1 a try, it would confirm whether you're facing this
exact issue. But at the moment we're not aware of any remaining crash-
inducing bugs in 2.8-dev, so if it would still fail for you it would
indicate a new unknown bug.

Luke, the crashes you reported are quite identical to the ones I had
before I introduced the fix. Indeed, you should try 2.8-dev1 if you can
and report us if this has solved the issue.

Thanks for your help,






Re: Miscellaneous Crashes on 2.7.1

2023-01-09 Thread Amaury Denoyelle
On Sat, Jan 07, 2023 at 02:22:01PM +0100, Willy Tarreau wrote:
> Hi Luke,
> On Sat, Jan 07, 2023 at 01:44:30PM +0100, Luke Seelenbinder wrote:
> > Hi list,
> > > We've been running 2.7.1 on a subset of our edge servers with QUIC + 
> > > HTTP/3
> > enabled, and we're seeing routine, but infrequent (~daily), crashes (mix of
> > SIGABRT / SIGSEGV). I have coredumps and there doesn't seem to be any common
> > thread across crashes / machines, but it's possible I'm missing something.
> > Two of the coredumps show the following backtrace:
> > > Program terminated with signal SIGSEGV, Segmentation fault.
> > #0  0x55b0fe319ce7 in qc_release_frm (qc=0x55b101236570,
> > frm=0x7fd8201fbbf0 ) at src/quic_conn.c:1569
> > 1569                pn = f->pkt->pn_node.key;
> > > Program terminated with signal SIGSEGV, Segmentation fault.
> > #0  qc_release_frm (qc=0x5652aa588fc0, frm=0x5652aa2537d0) at
> > src/quic_conn.c:1564
> > 1564        list_for_each_entry_safe(f, tmp, >reflist, ref) {
> > > which seem similar enough to possibly share a common cause. The other
> > crashes occur in quictls (sigabrt), htx.h (sigsegv), and ebtree.h (sigsegv).
> >
> > Are there known fixes from 2.8-dev or internal trackers that could be
> > related? I can dig deeper, but for now I'll probably disable quic since that
> > seems to be the most likely culprit.
> I'm seeing the following patch for QUIC which was fixed right after
> 2.7.1 was emitted and which suggest potential crashes:
>   15337fd80 ("BUG/MEDIUM: mux-quic: fix double delete from qcc.opening_list")
> So you might possibly be hitting that bug, indeed. If you're interested
> in giving 2.8-dev1 a try, it would confirm whether you're facing this
> exact issue. But at the moment we're not aware of any remaining crash-
> inducing bugs in 2.8-dev, so if it would still fail for you it would
> indicate a new unknown bug.

Luke, the crashes you reported are quite identical to the ones I had
before I introduced the fix. Indeed, you should try 2.8-dev1 if you can
and report us if this has solved the issue.

Thanks for your help,

-- 
Amaury Denoyelle



Re: Miscellaneous Crashes on 2.7.1

2023-01-07 Thread Willy Tarreau
Hi Luke,

On Sat, Jan 07, 2023 at 01:44:30PM +0100, Luke Seelenbinder wrote:
> Hi list,
> 
> We've been running 2.7.1 on a subset of our edge servers with QUIC + HTTP/3
> enabled, and we're seeing routine, but infrequent (~daily), crashes (mix of
> SIGABRT / SIGSEGV). I have coredumps and there doesn't seem to be any common
> thread across crashes / machines, but it's possible I'm missing something.
> Two of the coredumps show the following backtrace:
> 
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0  0x55b0fe319ce7 in qc_release_frm (qc=0x55b101236570,
> frm=0x7fd8201fbbf0 ) at src/quic_conn.c:1569
> 1569                pn = f->pkt->pn_node.key;
> 
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0  qc_release_frm (qc=0x5652aa588fc0, frm=0x5652aa2537d0) at
> src/quic_conn.c:1564
> 1564        list_for_each_entry_safe(f, tmp, >reflist, ref) {
> 
> which seem similar enough to possibly share a common cause. The other
> crashes occur in quictls (sigabrt), htx.h (sigsegv), and ebtree.h (sigsegv).
>
> Are there known fixes from 2.8-dev or internal trackers that could be
> related? I can dig deeper, but for now I'll probably disable quic since that
> seems to be the most likely culprit.

I'm seeing the following patch for QUIC which was fixed right after
2.7.1 was emitted and which suggest potential crashes:

  15337fd80 ("BUG/MEDIUM: mux-quic: fix double delete from qcc.opening_list")

So you might possibly be hitting that bug, indeed. If you're interested
in giving 2.8-dev1 a try, it would confirm whether you're facing this
exact issue. But at the moment we're not aware of any remaining crash-
inducing bugs in 2.8-dev, so if it would still fail for you it would
indicate a new unknown bug.

Thanks!
Willy