BUG: segfault with lua sample converters & wrong arg types

2018-06-14 Thread Patrick Hemmer
Haproxy segfaults if you pass the wrong argument type to a converter.
Example:

haproxy.cfg:
global
lua-load /tmp/haproxy.lua
   
frontend f1
mode http
bind :8000
default_backend b1
   
http-request lua.foo
   
backend b1
mode http
server s1 127.0.0.1:8080

haproxy.lua:
core.register_action("foo", { "http-req" }, function(txn)
txn.sc:ipmask(txn.f:src(), 24, 112)
end)

Result:
* thread #1, queue = 'com.apple.main-thread', stop reason =
EXC_BAD_ACCESS (code=1, address=0x18)
frame #0: 0x7fffc9fcbf56
libsystem_platform.dylib`_platform_memmove$VARIANT$Haswell + 182
libsystem_platform.dylib`_platform_memmove$VARIANT$Haswell:
->  0x7fffc9fcbf56 <+182>: movb   (%rsi,%r8), %cl
0x7fffc9fcbf5a <+186>: movb   %cl, (%rdi,%r8)
0x7fffc9fcbf5e <+190>: subq   $0x1, %rdx
0x7fffc9fcbf62 <+194>: je 0x7fffc9fcbf78; <+216>
Target 0: (haproxy) stopped.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason =
EXC_BAD_ACCESS (code=1, address=0x18)
  * frame #0: 0x7fffc9fcbf56
libsystem_platform.dylib`_platform_memmove$VARIANT$Haswell + 182
frame #1: 0x7fffc9e7442e libsystem_c.dylib`__memcpy_chk + 22
frame #2: 0x00010002ec46
haproxy`hlua_lua2arg_check(L=0x00010120d298, first=3,
argp=0x7fff5fbfe690, mask=196, p=0x000101817000) at hlua.c:749
frame #3: 0x00010001fa00
haproxy`hlua_run_sample_conv(L=0x00010120d298) at hlua.c:3393
frame #4: 0x00010032400b haproxy`luaD_precall + 747
frame #5: 0x0001003343c6 haproxy`luaV_execute + 3158
frame #6: 0x000100323429 haproxy`luaD_rawrunprotected + 89
frame #7: 0x000100324516 haproxy`lua_resume + 278
frame #8: 0x00010001b199
haproxy`hlua_ctx_resume(lua=0x000101205080, yield_allowed=1) at
hlua.c:1080
frame #9: 0x000100027de8
haproxy`hlua_action(rule=0x00010101b180, px=0x000101817000,
sess=0x00010120cb70, s=0x00010120cc00, flags=2) at hlua.c:6198
frame #10: 0x000100044bcd
haproxy`http_req_get_intercept_rule(px=0x000101817000,
rules=0x000101817048, s=0x00010120cc00,
deny_status=0x7fff5fbfee78) at proto_http.c:2760
frame #11: 0x000100046182
haproxy`http_process_req_common(s=0x00010120cc00,
req=0x00010120cc10, an_bit=16, px=0x000101817000) at
proto_http.c:3461
frame #12: 0x000100094c50
haproxy`process_stream(t=0x00010120cf40, context=0x00010120cc00,
state=9) at stream.c:1905
frame #13: 0x00010016179f haproxy`process_runnable_tasks at
task.c:362
frame #14: 0x0001000ea0eb haproxy`run_poll_loop at
haproxy.c:2403
frame #15: 0x0001000e7c74
haproxy`run_thread_poll_loop(data=0x7fff5fbff3a4) at haproxy.c:2464
frame #16: 0x0001000e4a49 haproxy`main(argc=3,
argv=0x7fff5fbff590) at haproxy.c:3082
frame #17: 0x7fffc9db9235 libdyld.dylib`start + 1

Issue goes away if you change the lua txn.sc:ipmask() line to:
txn.sc:ipmask(txn.f:src(), '24', '112')

Reproduced with current master (9db0fed) and lua version 5.3.4.

-Patrick


Re: Connections stuck in CLOSE_WAIT state with h2

2018-06-14 Thread Willy Tarreau
On Thu, Jun 14, 2018 at 07:22:34PM +0200, Janusz Dziemidowicz wrote:
> 2018-06-14 18:56 GMT+02:00 Willy Tarreau :
> 
> > If you'd like to run a test, I'm attaching the patch.
> 
> Sure, but you forgot to attach it :)

Ah, that's because I'm stupid :-)

Here it comes this time.

Willy
diff --git a/src/mux_h2.c b/src/mux_h2.c
index 5f1da0d..4c3e2dd 100644
--- a/src/mux_h2.c
+++ b/src/mux_h2.c
@@ -3106,6 +3109,7 @@ static int h2s_frt_make_resp_headers(struct h2s *h2s, 
struct buffer *buf)
 */
if (es_now) {
// trim any possibly pending data (eg: inconsistent 
content-length)
+   ret += buf->o;
bo_del(buf, buf->o);
 
h1m->state = HTTP_MSG_DONE;
@@ -3359,6 +3363,7 @@ static int h2s_frt_make_resp_data(struct h2s *h2s, struct 
buffer *buf)
 
if (!(h1m->flags & H1_MF_CHNK)) {
// trim any possibly pending data (eg: inconsistent 
content-length)
+   total += buf->o;
bo_del(buf, buf->o);
 
h1m->state = HTTP_MSG_DONE;
@@ -3413,6 +3418,7 @@ static int h2_snd_buf(struct conn_stream *cs, struct 
buffer *buf, int flags)
bo_del(buf, count);
 
// trim any possibly pending data (eg: extra CR-LF, ...)
+   total += buf->o;
bo_del(buf, buf->o);
 
h2s->res.state = HTTP_MSG_DONE;


Re: Connections stuck in CLOSE_WAIT state with h2

2018-06-14 Thread Janusz Dziemidowicz
2018-06-14 18:56 GMT+02:00 Willy Tarreau :

> If you'd like to run a test, I'm attaching the patch.

Sure, but you forgot to attach it :)

-- 
Janusz Dziemidowicz



Re: [PATCHES] Fix a few shortcomings in the tasklet code

2018-06-14 Thread Willy Tarreau
On Thu, Jun 14, 2018 at 04:27:04PM +0200, Olivier Houchard wrote:
> Attached are 2 patches that fix a few bugs in the tasklet code.
> It should have little incidence right now because tasklets are unused, but
> will be useful for later work.

Applied, thank you!

Willy



Re: Connections stuck in CLOSE_WAIT state with h2

2018-06-14 Thread Willy Tarreau
On Thu, Jun 14, 2018 at 01:51:20PM +0200, Janusz Dziemidowicz wrote:
> 2018-06-14 11:46 GMT+02:00 Willy Tarreau :
> >> Will try.
> 
> I've tried the seconds path, together with the first one, no change at all.
> 
> However, I was able to catch it on my laptop finally. I still can't
> easily reproduce this, but at least that's something. Little
> background, my company makes online games, the one I am testing with
> is a web browser flash game. As it starts, it makes various API calls
> and loads game resources, graphics/music, etc. So I've disabled
> browser cache and tried closing browser tab with the game as it was
> loading. After a couple of tries I've achieved following state:
> tcp61190  0 SERVER_IP:443 MY_IP:54514 ESTABLISHED 538049/haproxy
> 
> This is with browser tab already closed. Browser (latest Chrome)
> probably keeps the connection alive, but haproxy should close it after
> a while. Well, that didn't happen, after good 30 minutes the
> connection is still ESTABLISHED. My timeouts are at the beginning of
> this thread, my understanding is that this connection should be killed
> after "timeout client" which is 60s.
> After that I've closed the browser completely. Connection moved to the
> CLOSE_WAIT state in question:
> tcp61191  0 SERVER_IP:443 MY_IP:54514 CLOSE_WAIT  538049/haproxy
> 
> haproxy logs (I have dontlognormal enabled): https://pastebin.com/sUsa6jNQ

Thank you! I've just found a bug which I suspect could be related. By trying
to exploit it I managed to reproduce the problem once, and after the fix I
couldn't anymore. That's not enough to draw a conclusion but I suspect I'm
on a track.

I found that the case where some extra data are pending after a
chunked-encoded response is not properly responded to by the H2 encoder,
the data are properly deleted but are not reported as being part of what
was sent. This can cause the upper layer to believe that nothing was sent
and to continue to wait. When trying to send responses containing garbage
after the final chunk, I ended in the same situation you saw, with an H2
connection still present in "show fd" and the timeout not getting rid of
it, apparently because there's permanently a stream attached to it. I also
remember we faced a similar situation in the early 1.8 with extra data
after content-length not being properly trimmed. It could very well be
similar here. It's unclear to me why the stream timeout doesn't trigger
(probably that the stream is considered completed, which would be the root
cause of the problem), but these data definitely need to be reported as
deleted.

If you'd like to run a test, I'm attaching the patch.

Cheers,
Willy



[PATCHES] Fix a few shortcomings in the tasklet code

2018-06-14 Thread Olivier Houchard
Hi,

Attached are 2 patches that fix a few bugs in the tasklet code.
It should have little incidence right now because tasklets are unused, but
will be useful for later work.

Regards,

Olivier
>From fd2838a8b4eae2d9801592889285ae221fc3a7cb Mon Sep 17 00:00:00 2001
From: Olivier Houchard 
Date: Fri, 8 Jun 2018 17:08:19 +0200
Subject: [PATCH 1/2] MINOR: tasks: Make sure we correctly init and deinit a
 tasklet.

Up until now, a tasklet couldn't be free'd while it was in the list, it is
no longer the case, so make sure we remove it from the list before freeing it.
To do so, we have to make sure we correctly initialize it, so use LIST_INIT,
instead of setting the pointers to NULL.
---
 include/proto/task.h | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/include/proto/task.h b/include/proto/task.h
index 760b368b..47f74d4e 100644
--- a/include/proto/task.h
+++ b/include/proto/task.h
@@ -229,6 +229,7 @@ static inline void task_insert_into_tasklet_list(struct 
task *t)
 static inline void task_remove_from_task_list(struct task *t)
 {
LIST_DEL(&((struct tasklet *)t)->list);
+   LIST_INIT(&((struct tasklet *)t)->list);
task_list_size[tid]--;
HA_ATOMIC_SUB(_run_queue, 1);
if (!TASK_IS_TASKLET(t)) {
@@ -270,7 +271,7 @@ static inline void tasklet_init(struct tasklet *t)
t->nice = -32768;
t->calls = 0;
t->state = 0;
-   t->list.p = t->list.n = NULL;
+   LIST_INIT(>list);
 }
 
 static inline struct tasklet *tasklet_new(void)
@@ -321,9 +322,10 @@ static inline void task_free(struct task *t)
t->process = NULL;
 }
 
-
 static inline void tasklet_free(struct tasklet *tl)
 {
+   LIST_DEL(>list);
+
pool_free(pool_head_tasklet, tl);
if (unlikely(stopping))
pool_flush(pool_head_tasklet);
-- 
2.14.3

>From e09a118d1a7b120e4529f56c2ba4458f5059f5ec Mon Sep 17 00:00:00 2001
From: Olivier Houchard 
Date: Thu, 14 Jun 2018 15:40:47 +0200
Subject: [PATCH 2/2] BUG/MINOR: tasklets: Just make sure we don't pass a
 tasklet to the handler.

We can't just set t to NULL if it's a tasklet, or we'd have a hard time
accessing to t->process, so just make sure we pass NULL as the first parameter
of t->process if it's a tasklet.
This should be a non-issue at this point, as tasklets aren't used yet.
---
 src/task.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/src/task.c b/src/task.c
index fb484073..815df24c 100644
--- a/src/task.c
+++ b/src/task.c
@@ -342,13 +342,11 @@ void process_runnable_tasks()
rqueue_size[tid]--;
t->calls++;
curr_task = (struct task *)t;
-   if (TASK_IS_TASKLET(t))
-   t = NULL;
if (likely(process == process_stream))
t = process_stream(t, ctx, state);
else {
if (t->process != NULL)
-   t = process(t, ctx, state);
+   t = process(TASK_IS_TASKLET(t) ? NULL : t, ctx, 
state);
else {
__task_free(t);
t = NULL;
-- 
2.14.3



Re: Connections stuck in CLOSE_WAIT state with h2

2018-06-14 Thread Janusz Dziemidowicz
2018-06-14 11:46 GMT+02:00 Willy Tarreau :
>> Will try.

I've tried the seconds path, together with the first one, no change at all.

However, I was able to catch it on my laptop finally. I still can't
easily reproduce this, but at least that's something. Little
background, my company makes online games, the one I am testing with
is a web browser flash game. As it starts, it makes various API calls
and loads game resources, graphics/music, etc. So I've disabled
browser cache and tried closing browser tab with the game as it was
loading. After a couple of tries I've achieved following state:
tcp61190  0 SERVER_IP:443 MY_IP:54514 ESTABLISHED 538049/haproxy

This is with browser tab already closed. Browser (latest Chrome)
probably keeps the connection alive, but haproxy should close it after
a while. Well, that didn't happen, after good 30 minutes the
connection is still ESTABLISHED. My timeouts are at the beginning of
this thread, my understanding is that this connection should be killed
after "timeout client" which is 60s.
After that I've closed the browser completely. Connection moved to the
CLOSE_WAIT state in question:
tcp61191  0 SERVER_IP:443 MY_IP:54514 CLOSE_WAIT  538049/haproxy

haproxy logs (I have dontlognormal enabled): https://pastebin.com/sUsa6jNQ

-- 
Janusz Dziemidowicz



Extra load with unix-socket backend

2018-06-14 Thread accpad1
Hello,

I have a heavy loaded local HTTP-cache servers behind Haproxy

I've made it use Unix-socket instead of TCP and on the test-server Apache 
Benchmark show a
few percent benefit (tested in 1-2 threads, with small and big files).

But when I deployed new settings on few production servers, I saw that 
haproxy's CPU usage increased up to 5-10 times (with ~300mbps and some 
lua-processing
there is 5-30% CPU-core normally, when unix-socket in use - it goes over 100% 
very often).

Still can't reproduce the situation with benchmarks.

Haproxy 1.8.9-1e3c84 (threaded and not threaded) + kernel 4.15.17

--
Wert Revon




Re: Connections stuck in CLOSE_WAIT state with h2

2018-06-14 Thread Willy Tarreau
On Thu, Jun 14, 2018 at 11:29:39AM +0200, Janusz Dziemidowicz wrote:
> 2018-06-14 11:14 GMT+02:00 Willy Tarreau :
> > Yep it's really not easy and probably once we find it I'll be ashamed
> > saying "I thought this code was not merged"... By the way yesterday I
> > found another suspect but I'm undecided on it ; the current architecture
> > of the H2 mux complicates the code analysis. If you want to give it a
> > try on top of previous one, I'd appreciate it, even if it doesn't change
> > anything. Please find it attached.
> 
> Will try.
> 
> I've found one more clue. I've added various graphs to my monitoring.
> Also I've been segregating various traffic kinds into different
> haproxy backends. Yesterday test shows this:
> https://pasteboard.co/HpPK2Ml6.png
> 
> This backend (sns) is used exclusively for static files that are
> "large" (from 10KB up to over a megabyte) compared to my usual traffic
> (various API calls mostly). Those 5xx errors are not from the backend
> servers, "show stat":
> sns,kr-8,0,0,5,108,,186655,191829829,19744924356,,0,,0,0,0,0,UP,100,1,0,0,0,15377,0,,1,4,1,,186655,,2,7,,147,L7OK,200,1,0,184295,550,0,0,0,8895,0,0,OK,,0,12,4,1474826Layer7
> check passed,,2,5,610.7.1.8:81,,http
> sns,kr-10,0,0,2,105,,186654,191649821,19977086644,,0,,0,0,0,0,UP,100,1,0,0,0,15377,0,,1,4,2,,186654,,2,7,,148,L7OK,200,0,0,184275,551,0,0,0,8823,0,0,OK,,0,21,4,1473385Layer7
> check passed,,2,5,610.7.1.10:81,,http
> sns,BACKEND,0,0,8,213,6554,383553,391967657,39722011000,0,0,,0,0,0,0,UP,200,2,0,,0,15377,0,,1,4,0,,373309,,1,14,,3320,368563,1101,0,1873,12008383545,27962,0,0,0,0,0,0,,,0,18,5,1763433,,http,roundrobin,,,

Oh this is very interesting indeed! So haproxy detected 6500 5xx errors
on this backend that were not attributed to any of these servers. I'm
really not seeing many situations where this can happen, I'll have a
look at this in the code. A common case could be 503s emitted when the
requests die in the queue but you have no maxconn thus no queue. Maybe
we return some 500 from time to time, though I'll have to figure out
why!

Thank you!
Willy



Re: Connections stuck in CLOSE_WAIT state with h2

2018-06-14 Thread Janusz Dziemidowicz
2018-06-14 11:14 GMT+02:00 Willy Tarreau :
> Yep it's really not easy and probably once we find it I'll be ashamed
> saying "I thought this code was not merged"... By the way yesterday I
> found another suspect but I'm undecided on it ; the current architecture
> of the H2 mux complicates the code analysis. If you want to give it a
> try on top of previous one, I'd appreciate it, even if it doesn't change
> anything. Please find it attached.

Will try.

I've found one more clue. I've added various graphs to my monitoring.
Also I've been segregating various traffic kinds into different
haproxy backends. Yesterday test shows this:
https://pasteboard.co/HpPK2Ml6.png

This backend (sns) is used exclusively for static files that are
"large" (from 10KB up to over a megabyte) compared to my usual traffic
(various API calls mostly). Those 5xx errors are not from the backend
servers, "show stat":
sns,kr-8,0,0,5,108,,186655,191829829,19744924356,,0,,0,0,0,0,UP,100,1,0,0,0,15377,0,,1,4,1,,186655,,2,7,,147,L7OK,200,1,0,184295,550,0,0,0,8895,0,0,OK,,0,12,4,1474826Layer7
check passed,,2,5,610.7.1.8:81,,http
sns,kr-10,0,0,2,105,,186654,191649821,19977086644,,0,,0,0,0,0,UP,100,1,0,0,0,15377,0,,1,4,2,,186654,,2,7,,148,L7OK,200,0,0,184275,551,0,0,0,8823,0,0,OK,,0,21,4,1473385Layer7
check passed,,2,5,610.7.1.10:81,,http
sns,BACKEND,0,0,8,213,6554,383553,391967657,39722011000,0,0,,0,0,0,0,UP,200,2,0,,0,15377,0,,1,4,0,,373309,,1,14,,3320,368563,1101,0,1873,12008383545,27962,0,0,0,0,0,0,,,0,18,5,1763433,,http,roundrobin,,,

-- 
Janusz Dziemidowicz



Re: Connections stuck in CLOSE_WAIT state with h2

2018-06-14 Thread Willy Tarreau
On Wed, Jun 13, 2018 at 08:01:30PM +0200, Janusz Dziemidowicz wrote:
> 2018-06-13 19:14 GMT+02:00 Willy Tarreau :
> > On Wed, Jun 13, 2018 at 07:06:58PM +0200, Janusz Dziemidowicz wrote:
> >> 2018-06-13 14:42 GMT+02:00 Willy Tarreau :
> >> > Hi Milan, hi Janusz,
> >> >
> >> > thanks to your respective traces, I may have come up with a possible
> >> > scenario explaining the CLOSE_WAIT you're facing. Could you please
> >> > try the attached patch ?
> >>
> >> Unfortunately there is no change for me. CLOSE_WAIT sockets still
> >> accumulate if I switch native h2 on. Milan should probably double
> >> check this though.
> >> https://pasteboard.co/HpJj72H.png
> >
> > :-(
> >
> > With still the same perfectly straight line really making me think of either
> > a periodic activity which I'm unable to guess nor model, or something 
> > related
> > to our timeouts.
> 
> It is not exactly straight. While it looks like this for short test,
> when I did this earlier, for much longer period of time, it was
> slowing down during night, when I have less traffic.

OK but there's definitely something very stable in this.

> >> I'll try move some low traffic site to a separate instance tomorrow,
> >> maybe I'll be able to capture some traffic too.
> >
> > Unfortunately with H2 that will not help much, there's the TLS layer
> > under it that makes it a real pain. TLS is designed to avoid observability
> > and it does it well :-/
> >
> > I've suspected a received shutdown at the TLS layer, which I was not
> > able to model at all. Tools are missing at this point. I even tried
> > to pass the traffic through haproxy in TCP mode to help but I couldn't
> > reproduce the problem.
> 
> When I disable native h2 in haproxy I switch back to tcp mode going
> though nghttpx. The traffic is obviously the same, yet there is no
> problem.

I'm not surprized since it doens't use our H2 engine anymore :-) In my
case the purpose was to try to abuse haproxy's TCP proxy to modify the
behaviour befor reaching its H2 engine.

> > It could possibly help if you can look for the affected client's IP:port
> > in your logs to see if they are perfectly normal or if you notice they
> > have something in common (eg: always the exact same requests, or they
> > never made a request from the affected connections, etc).
> 
> I'm aware of the problems :) However, if I can get some traffic dumps,
> knowing my application I might be able to reproduce this, which would
> be a huge win. I've already tried some experiments with various tools
> with no luck unfortunately.

Yep it's really not easy and probably once we find it I'll be ashamed
saying "I thought this code was not merged"... By the way yesterday I
found another suspect but I'm undecided on it ; the current architecture
of the H2 mux complicates the code analysis. If you want to give it a
try on top of previous one, I'd appreciate it, even if it doesn't change
anything. Please find it attached.

Thanks,
willy
>From f055296f7598ba84b08e78f8309ffd7fa0c9522b Mon Sep 17 00:00:00 2001
From: Willy Tarreau 
Date: Wed, 13 Jun 2018 09:19:29 +0200
Subject: WIP: h2: try to address possible causes for the close_wait issues

It is uncertain whether certain errors could prevent pending outgoing
data from being emitted, and from releasing attached streams. Indeed,
for h2_release() to be called, the mux buf must be empty or an error
must have been met. A clean shutdown will not constitute an error and
it's likely that refraining from sending may prevent the buffer from
flushing. Thus maybe we can end up with data forever in the buffer.

The timeout task should normally take care of this though. It's worth
noting that if there's no more stream and the output buffer is empty
on wake(), the task's timeout is eternity.

This fix should be backported to 1.8.
---
 src/mux_h2.c | 9 +
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/src/mux_h2.c b/src/mux_h2.c
index bc33ce2..2f2384d 100644
--- a/src/mux_h2.c
+++ b/src/mux_h2.c
@@ -2271,14 +2271,6 @@ static int h2_wake(struct connection *conn)
h2_release(conn);
return -1;
}
-   else {
-   /* some streams still there, we need to signal them all 
and
-* wait for their departure.
-*/
-   __conn_xprt_stop_recv(conn);
-   __conn_xprt_stop_send(conn);
-   return 0;
-   }
}
 
if (!h2c->dbuf->i)
@@ -2294,6 +2286,7 @@ static int h2_wake(struct connection *conn)
 
/* adjust output polling */
if (!(conn->flags & CO_FL_SOCK_WR_SH) &&
+   h2c->st0 != H2_CS_ERROR2 && !(h2c->flags & H2_CF_GOAWAY_FAILED) &&
(h2c->st0 == H2_CS_ERROR ||
 h2c->mbuf->o ||
 (h2c->mws > 0 && !LIST_ISEMPTY(>fctl_list)) ||
-- 
1.7.12.1



Re: dev1.9 2018/06/05 threads cpu 100% spin_lock v.s. thread_sync_barrier

2018-06-14 Thread Willy Tarreau
Hi Pieter,

On Tue, Jun 12, 2018 at 08:09:08PM +0200, PiBa-NL wrote:
> Is there something i can do to find out more info if it happens again? Or
> maybe before that build with more specific debug info so if it happens again
> more info would be retrievable.?.
> (My usual way of 'debugging' with the relatively easy to reproduce issues is
> just cramming a lot of printf statements into the code until something makes
> sense., but with a issue that only happens once in a blue moon (sofar), that
> doesn't work very well...)

Building with -DDEBUG_THREAD and -DDEBUG_MEMORY can help to get a more
exploitable core file which will reveal where a given lock was already
taken. Be careful that it will slightly increase the CPU usage though.

> For the moment it hasn't happened again. i suspend/resume the vm it happened
> on almost daily (that's running on my workstation witch is shutdown
> overnight) the vm also has haproxy running on it and some other stuff..
> 
> If it does happen again, would there be any other gdb information helpful?
> Inspecting specific variables or creating a memory dump or something.? 
> (please give a hint about the comment/option to call if applicable/possible,
> i'm still a rookie with gdb..) p.s. i'm on FreeBSD not sure if that matters
> for some of gdb's available options or something..

Creating a dump is often quite useful. You can do it with the command
"generate-core-file". A "bt full" and "info thread" will also be needed
as you did last time. I cannot guide you through the commands used to
debug the threads however because I don't know them, I think I remember
you need to print the contents of the lock itself (eg: "p foo->lock").

> Would the complete configuration be helpfull? There is a lot of useless
> stuff in there. because its my test/development test vm, and because of lack
> of it happening again there is no good opportunity to shrink it down to
> specific options/parts..

Well probably not, though if at least to keep it unmodified with the
unstripped executable and the upcoming core, that may be useful afterwards.

Thanks!
Willy