Re: [Xen-devel] [PATCH] xenstore: increase default thread stack size to 32k

2018-02-22 Thread Jim Fehlig

On 02/21/2018 10:18 PM, Juergen Gross wrote:

On 21/02/18 23:13, Jim Fehlig wrote:

On several Skylake machines I've observed xl segfaults when running
create or destroy subcommands. Other subcommands may segfault too,
but I've only looked at create and destroy which share a similar
backtrace

Thread 2 (Thread 0x77ff3700 (LWP 2941)):
 at /usr/include/bits/unistd.h:44
 at xs.c:398
 fd=) at xs.c:1231

Thread 1 has canceled Thread 2 and is waiting for it in pthread_join().

The backtrace smelled of memory/stack overflow, which was verified by
increasing DEFAULT_THREAD_STACKSIZE to 32kb. Presumably the stack
overflow is observed on Skylake due to a broader CPU feature set which
must be saved within _dl_runtime_resolve and friends.

While PTHREAD_STACK_MIN should advertise a suitable stack size based on
the underlying system, increasing the default size makes xenstore a bit
more robust on systems with insufficient/broken minimums.


We hit something like this before:

https://lists.xen.org/archives/html/xen-devel/2016-07/msg01727.html

The main problem is that any thread local storage is taken from the
stack without any interface being available for adjusting the _real_
stack size instead of the meory for thread local storage + stack.

So we can increase the stack size of the xenstore thread and wait for
the next breakage, or we have to think about a proper solution.

Right now I have no sensible idea how to address the problem, as the
old thread suggests the underlying glibc problem isn't fixed yet (wow:
the problem is known for more than 7 years now):

https://sourceware.org/bugzilla/show_bug.cgi?id=11787


It looks like the bug I'm hitting is described in

https://sourceware.org/bugzilla/show_bug.cgi?id=22636

And unlike the other bug, it has been fixed.

Regards,
Jim

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] xenstore: increase default thread stack size to 32k

2018-02-22 Thread Juergen Gross
On 22/02/18 12:33, Ian Jackson wrote:
> Juergen Gross writes ("Re: [Xen-devel] [PATCH] xenstore: increase default 
> thread stack size to 32k"):
>> The main problem is that any thread local storage is taken from the
>> stack without any interface being available for adjusting the _real_
>> stack size instead of the meory for thread local storage + stack.
>>
>> So we can increase the stack size of the xenstore thread and wait for
>> the next breakage, or we have to think about a proper solution.
>>
>> Right now I have no sensible idea how to address the problem, as the
>> old thread suggests the underlying glibc problem isn't fixed yet (wow:
>> the problem is known for more than 7 years now):
>>
>> https://sourceware.org/bugzilla/show_bug.cgi?id=11787
> 
> Oh!
> 
> Thank you, Juergen, for providing facts!
> 
> I withdraw my ack on this patch.
> 
> I agree with all the complaints in the libc bugzilla and disegree with
> glibc upstream.  I think this should be fixed in the libc.  Sadly the
> libc bugzilla doesn't have a libc patch.
> 
> I would accept a workaround in libxenstore that does something similar
> to what they do in Rust...

Yeah, the Rust solution seems to be sensible.

The source patch is easy. How to add the linker flag ("-ldl")? Just do
it in the Makefile of xenstore, or via configure?


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] xenstore: increase default thread stack size to 32k

2018-02-22 Thread Juergen Gross
On 22/02/18 12:35, Roger Pau Monné wrote:
> On Thu, Feb 22, 2018 at 11:16:53AM +, Ian Jackson wrote:
>> Jim Fehlig writes ("[PATCH] xenstore: increase default thread stack size to 
>> 32k"):
>>> On several Skylake machines I've observed xl segfaults when running
>>> create or destroy subcommands. Other subcommands may segfault too,
>>> but I've only looked at create and destroy which share a similar
>>> backtrace
>>
>> Acked-by: Ian Jackson 
>>
>> However, I am concerned that this isn't a very systematic way of
>> addressing this problem.  How do we know that 32K is enough ?
>> We have already increased this number once.
>>
>> Alternatively, maybe this ia a bug in the platform libc ?
> 
> Is it really that bad to use the default thread stack size? That's
> going to be much bigger, I know, but I assume physical memory for the
> stack will be mapped on demand, and thus the physical memory usage is
> not going to be that different.
> 
> I've been always told to simply not play with the stack size.

Virtual memory isn't unlimited, especially in 32 bit programs.

It is possible to do a hack for obtaining the _free_ stack space in a
thread, even if it isn't really beautiful and relying on a non-standard
GNU extension (pthread_getattr_np()). It would require creating a test
thread looking how much space is left on the stack and size the watch's
thread stack accordingly.


Juergen


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] xenstore: increase default thread stack size to 32k

2018-02-22 Thread Roger Pau Monné
On Thu, Feb 22, 2018 at 11:16:53AM +, Ian Jackson wrote:
> Jim Fehlig writes ("[PATCH] xenstore: increase default thread stack size to 
> 32k"):
> > On several Skylake machines I've observed xl segfaults when running
> > create or destroy subcommands. Other subcommands may segfault too,
> > but I've only looked at create and destroy which share a similar
> > backtrace
> 
> Acked-by: Ian Jackson 
> 
> However, I am concerned that this isn't a very systematic way of
> addressing this problem.  How do we know that 32K is enough ?
> We have already increased this number once.
> 
> Alternatively, maybe this ia a bug in the platform libc ?

Is it really that bad to use the default thread stack size? That's
going to be much bigger, I know, but I assume physical memory for the
stack will be mapped on demand, and thus the physical memory usage is
not going to be that different.

I've been always told to simply not play with the stack size.

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] xenstore: increase default thread stack size to 32k

2018-02-22 Thread Ian Jackson
Juergen Gross writes ("Re: [Xen-devel] [PATCH] xenstore: increase default 
thread stack size to 32k"):
> The main problem is that any thread local storage is taken from the
> stack without any interface being available for adjusting the _real_
> stack size instead of the meory for thread local storage + stack.
> 
> So we can increase the stack size of the xenstore thread and wait for
> the next breakage, or we have to think about a proper solution.
> 
> Right now I have no sensible idea how to address the problem, as the
> old thread suggests the underlying glibc problem isn't fixed yet (wow:
> the problem is known for more than 7 years now):
> 
> https://sourceware.org/bugzilla/show_bug.cgi?id=11787

Oh!

Thank you, Juergen, for providing facts!

I withdraw my ack on this patch.

I agree with all the complaints in the libc bugzilla and disegree with
glibc upstream.  I think this should be fixed in the libc.  Sadly the
libc bugzilla doesn't have a libc patch.

I would accept a workaround in libxenstore that does something similar
to what they do in Rust...

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH] xenstore: increase default thread stack size to 32k

2018-02-22 Thread Ian Jackson
Jim Fehlig writes ("[PATCH] xenstore: increase default thread stack size to 
32k"):
> On several Skylake machines I've observed xl segfaults when running
> create or destroy subcommands. Other subcommands may segfault too,
> but I've only looked at create and destroy which share a similar
> backtrace

Acked-by: Ian Jackson 

However, I am concerned that this isn't a very systematic way of
addressing this problem.  How do we know that 32K is enough ?
We have already increased this number once.

Alternatively, maybe this ia a bug in the platform libc ?

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] xenstore: increase default thread stack size to 32k

2018-02-21 Thread Juergen Gross
On 21/02/18 23:13, Jim Fehlig wrote:
> On several Skylake machines I've observed xl segfaults when running
> create or destroy subcommands. Other subcommands may segfault too,
> but I've only looked at create and destroy which share a similar
> backtrace
> 
> Thread 2 (Thread 0x77ff3700 (LWP 2941)):
> at /usr/include/bits/unistd.h:44
> at xs.c:398
> fd=) at xs.c:1231
> 
> Thread 1 has canceled Thread 2 and is waiting for it in pthread_join().
> 
> The backtrace smelled of memory/stack overflow, which was verified by
> increasing DEFAULT_THREAD_STACKSIZE to 32kb. Presumably the stack
> overflow is observed on Skylake due to a broader CPU feature set which
> must be saved within _dl_runtime_resolve and friends.
> 
> While PTHREAD_STACK_MIN should advertise a suitable stack size based on
> the underlying system, increasing the default size makes xenstore a bit
> more robust on systems with insufficient/broken minimums.

We hit something like this before:

https://lists.xen.org/archives/html/xen-devel/2016-07/msg01727.html

The main problem is that any thread local storage is taken from the
stack without any interface being available for adjusting the _real_
stack size instead of the meory for thread local storage + stack.

So we can increase the stack size of the xenstore thread and wait for
the next breakage, or we have to think about a proper solution.

Right now I have no sensible idea how to address the problem, as the
old thread suggests the underlying glibc problem isn't fixed yet (wow:
the problem is known for more than 7 years now):

https://sourceware.org/bugzilla/show_bug.cgi?id=11787


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH] xenstore: increase default thread stack size to 32k

2018-02-21 Thread Jim Fehlig
On several Skylake machines I've observed xl segfaults when running
create or destroy subcommands. Other subcommands may segfault too,
but I've only looked at create and destroy which share a similar
backtrace

Thread 2 (Thread 0x77ff3700 (LWP 2941)):
at /usr/include/bits/unistd.h:44
at xs.c:398
fd=) at xs.c:1231

Thread 1 has canceled Thread 2 and is waiting for it in pthread_join().

The backtrace smelled of memory/stack overflow, which was verified by
increasing DEFAULT_THREAD_STACKSIZE to 32kb. Presumably the stack
overflow is observed on Skylake due to a broader CPU feature set which
must be saved within _dl_runtime_resolve and friends.

While PTHREAD_STACK_MIN should advertise a suitable stack size based on
the underlying system, increasing the default size makes xenstore a bit
more robust on systems with insufficient/broken minimums.

Signed-off-by: Jim Fehlig 
---
 tools/xenstore/xs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/xenstore/xs.c b/tools/xenstore/xs.c
index abffd9cd80..3891e4907c 100644
--- a/tools/xenstore/xs.c
+++ b/tools/xenstore/xs.c
@@ -800,7 +800,7 @@ bool xs_watch(struct xs_handle *h, const char *path, const 
char *token)
struct iovec iov[2];
 
 #ifdef USE_PTHREAD
-#define DEFAULT_THREAD_STACKSIZE (16 * 1024)
+#define DEFAULT_THREAD_STACKSIZE (32 * 1024)
 #define READ_THREAD_STACKSIZE  \
((DEFAULT_THREAD_STACKSIZE < PTHREAD_STACK_MIN) ?   \
PTHREAD_STACK_MIN : DEFAULT_THREAD_STACKSIZE)
-- 
2.16.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel