On 02/21/2018 10:18 PM, Juergen Gross wrote:
On 21/02/18 23:13, Jim Fehlig wrote:
On several Skylake machines I've observed xl segfaults when running
create or destroy subcommands. Other subcommands may segfault too,
but I've only looked at create and destroy which share a similar
Thread 2 (Thread 0x7ffff7ff3700 (LWP 2941)):
fd=<optimized out>) at xs.c:1231
Thread 1 has canceled Thread 2 and is waiting for it in pthread_join().
The backtrace smelled of memory/stack overflow, which was verified by
increasing DEFAULT_THREAD_STACKSIZE to 32kb. Presumably the stack
overflow is observed on Skylake due to a broader CPU feature set which
must be saved within _dl_runtime_resolve and friends.
While PTHREAD_STACK_MIN should advertise a suitable stack size based on
the underlying system, increasing the default size makes xenstore a bit
more robust on systems with insufficient/broken minimums.
We hit something like this before:
The main problem is that any thread local storage is taken from the
stack without any interface being available for adjusting the _real_
stack size instead of the meory for thread local storage + stack.
So we can increase the stack size of the xenstore thread and wait for
the next breakage, or we have to think about a proper solution.
Right now I have no sensible idea how to address the problem, as the
old thread suggests the underlying glibc problem isn't fixed yet (wow:
the problem is known for more than 7 years now):
It looks like the bug I'm hitting is described in
And unlike the other bug, it has been fixed.
Xen-devel mailing list