Re: [Xen-devel] 4.9.52: INFO: task X blocked for more than 300 seconds.

2017-10-19 Thread Philipp Hahn
Hello Ankur,

Am 05.10.2017 um 19:59 schrieb Ankur Arora:
> On 2017-10-05 06:20 AM, Konrad Rzeszutek Wilk wrote:
>> On Wed, Oct 04, 2017 at 08:26:27PM +0200, Philipp Hahn wrote:
...
>> Adding Ankur to this as I think he saw something similar.
>>
>> But in the meantime - do you see this with the latest version of Linux?
>>> with linux-4.9.52 running on Debian-Wheezy with Xen-4.1 I observed
>>> several stuck processes: Here is one (truncated) dump of the Linux
>>> kernel messages:
>>>
>>>>   [] ? __schedule+0x23d/0x6d0
>>>>   [] ? bit_wait_timeout+0x90/0x90
>>>>   [] ? schedule+0x32/0x80
>>>>   [] ? schedule_timeout+0x1ec/0x360
>>>>   [] ? __blk_mq_run_hw_queue+0x327/0x3e0* see below
>>>>   [] ? xen_clocksource_get_cycles+0x11/0x20
>>>>   [] ? bit_wait_timeout+0x90/0x90
>>>>   [] ? io_schedule_timeout+0xb4/0x130
>>>>   [] ? prepare_to_wait+0x57/0x80
>>>>   [] ? bit_wait_io+0x17/0x60
>>>>   [] ? __wait_on_bit+0x5c/0x90
>>>>   [] ? bit_wait_timeout+0x90/0x90
>>>>   [] ? out_of_line_wait_on_bit+0x7e/0xa0
>>>>   [] ? autoremove_wake_function+0x40/0x40
>>>>   [] ?
>>>> jbd2_journal_commit_transaction+0xd48/0x17e0 [jbd2]
>>>>   [] ? __switch_to+0x2c9/0x720
>>>>   [] ? try_to_del_timer_sync+0x4d/0x80
>>>>   [] ? kjournald2+0xdd/0x280 [jbd2]
>>>>   [] ? wake_up_atomic_t+0x30/0x30
>>>>   [] ? commit_timeout+0x10/0x10 [jbd2]
>>>>   [] ? kthread+0xf0/0x110
>>>>   [] ? __switch_to+0x2c9/0x720
>>>>   [] ? kthread_park+0x60/0x60
>>>>   [] ? ret_from_fork+0x25/0x30
> This looks like this race: https://patchwork.kernel.org/patch/9853443/

I built a new kernel, for which I picked that patch on top of 4.9.56. We
are currently testing that, but it crashed again yesterday evening. Here
is the dmesg output:

> INFO: task systemd:1 blocked for more than 120 seconds.
>   Not tainted 4.9.0-ucs105-amd64 #1 Debian 4.9.30-2A~4.2.0.201710161640
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> systemd D0 1  0 0x
>  8801f1797c00  81c0e540 8801f4956040
>  8801f5a187c0 c90040c4f880 8160ebbd 81186295
>  81350b39 02cebf80 811863c8 8801f4956040
> Call Trace:
>  [] ? __schedule+0x23d/0x6d0
>  [] ? move_freepages+0x95/0xd0
>  [] ? list_del+0x9/0x20
>  [] ? __rmqueue+0x88/0x3e0
>  [] ? schedule+0x32/0x80
>  [] ? schedule_timeout+0x1ec/0x360
>  [] ? get_page_from_freelist+0x350/0xad0
>  [] ? io_schedule_timeout+0xb4/0x130
>  [] ? __sbitmap_queue_get+0x24/0x90
>  [] ? bt_get.isra.6+0x129/0x1c0
>  [] ? list_del+0x9/0x20
>  [] ? wake_up_atomic_t+0x30/0x30
>  [] ? blk_mq_get_tag+0x23/0x90
>  [] ? __blk_mq_alloc_request+0x1a/0x220
>  [] ? blk_mq_map_request+0xcd/0x170
>  [] ? blk_sq_make_request+0xca/0x4c0
>  [] ? generic_make_request_checks+0x22a/0x4f0
>  [] ? generic_make_request+0x121/0x2c0
>  [] ? __add_to_page_cache_locked+0x183/0x230
>  [] ? submit_bio+0x76/0x150
>  [] ? add_to_page_cache_lru+0x84/0xe0
>  [] ? ext4_mpage_readpages+0x2b9/0x8b0 [ext4]
>  [] ? alloc_pages_current+0x8a/0x110
>  [] ? __do_page_cache_readahead+0x195/0x240
>  [] ? pagecache_get_page+0x27/0x2b0
>  [] ? filemap_fault+0x276/0x590
>  [] ? ext4_filemap_fault+0x31/0x50 [ext4]
>  [] ? __do_fault+0x84/0x190
>  [] ? handle_mm_fault+0xede/0x1680
>  [] ? ep_poll+0x13e/0x360
>  [] ? __do_page_fault+0x26a/0x500
>  [] ? SyS_read+0x52/0xc0
>  [] ? page_fault+0x28/0x30

I haven't been able to get the address of the queue object yet to get
its state. (timeout for today)

> Can you dump the output of: cat /sys/block/$xen-frontend-device/mq/*/tags
> 
> If you've hit this bug, one or more of the MQs would be wedged and
> the nr_free in one or more of the queues would be 0 and will not
> change.

As soon as the bug occurs, we can no longer access the VM via ssh or the
Xen (serial) console: the connection stalls after entering 2-3 characters.

I have a Xen crash-dump file of one such crash, but following
/sys/block/xvd?/mq/*/tags manually to get the kobject address using
"crash" (gdb) is very time consuming. So far I only did it once for
xvda, but I have to to that for the other 15 block devices as well to
find one culprit.

Philipp Hahn
-- 
Philipp Hahn
Open Source Software Engineer

Univention GmbH
be open.
Mary-Somerville-Str. 1
D-28359 Bremen
Tel.: +49 421 22232-0
Fax : +49 421 22232-99
h...@univention.de

http://www.univention.de/
Geschäftsführer: Peter H. Ganten
HRB 20755 Amtsgericht Bremen
Steuer-Nr.: 71-597-0287
PID: 1  

Re: [Xen-devel] 4.9.52: INFO: task XXX blocked for more than 300 seconds.

2017-10-05 Thread Philipp Hahn
Hello Jan,

thank you for you answer.

Am 05.10.2017 um 12:12 schrieb Jan Beulich:
>>>> On 04.10.17 at 20:26, <h...@univention.de> wrote:
>> with linux-4.9.52 running on Debian-Wheezy with Xen-4.1 I observed
>> several stuck processes: Here is one (truncated) dump of the Linux
>> kernel messages:
>>
>>>  [] ? __schedule+0x23d/0x6d0
>>>  [] ? bit_wait_timeout+0x90/0x90
>>>  [] ? schedule+0x32/0x80
>>>  [] ? schedule_timeout+0x1ec/0x360
>>>  [] ? __blk_mq_run_hw_queue+0x327/0x3e0* see below
>>>  [] ? xen_clocksource_get_cycles+0x11/0x20
>>>  [] ? bit_wait_timeout+0x90/0x90
>>>  [] ? io_schedule_timeout+0xb4/0x130
>>>  [] ? prepare_to_wait+0x57/0x80
>>>  [] ? bit_wait_io+0x17/0x60
>>>  [] ? __wait_on_bit+0x5c/0x90
>>>  [] ? bit_wait_timeout+0x90/0x90
>>>  [] ? out_of_line_wait_on_bit+0x7e/0xa0
>>>  [] ? autoremove_wake_function+0x40/0x40
>>>  [] ? jbd2_journal_commit_transaction+0xd48/0x17e0 [jbd2]
>>>  [] ? __switch_to+0x2c9/0x720
>>>  [] ? try_to_del_timer_sync+0x4d/0x80
>>>  [] ? kjournald2+0xdd/0x280 [jbd2]
>>>  [] ? wake_up_atomic_t+0x30/0x30
>>>  [] ? commit_timeout+0x10/0x10 [jbd2]
>>>  [] ? kthread+0xf0/0x110
>>>  [] ? __switch_to+0x2c9/0x720
>>>  [] ? kthread_park+0x60/0x60
>>>  [] ? ret_from_fork+0x25/0x30
>>> NMI backtrace for cpu 2
>>> CPU: 2 PID: 35 Comm: khungtaskd Not tainted 4.9.0-ucs105-amd64 #1 Debian 
>>> 4.9.30-2A~4.2.0.201709271649
>>>   81331935  0002
>>>  81335e60 0002 8104cb70 8801f0c90e80
>>>  81335f6a 8801f0c90e80 003fffbc 81128048
>>> Call Trace:
>>>  [] ? dump_stack+0x5c/0x77
>>>  [] ? nmi_cpu_backtrace+0x90/0xa0
>>>  [] ? irq_force_complete_move+0x140/0x140
>>>  [] ? nmi_trigger_cpumask_backtrace+0xfa/0x130
>>>  [] ? watchdog+0x2b8/0x330
>>>  [] ? reset_hung_task_detector+0x10/0x10
>>>  [] ? kthread+0xf0/0x110
>>>  [] ? __switch_to+0x2c9/0x720
>>>  [] ? kthread_park+0x60/0x60
>>>  [] ? ret_from_fork+0x25/0x30
...
>> Looking at the dis-assembly of xen_clocksource_get_cycles() in
>> arch/x86/xen/time.c I see no path how that should call
>> __blk_mq_run_hw_queue():
> 
> Hence the question marks ahead of the stack entries: What you see
> there are likely leftovers from prior call trees. It just so happens
> that the old return address slots haven't got overwritten yet. You
> need to first sanitize the stack trace e.g. by having the kernel
> dump more of the stack in raw hex form, and then looking at the
> disassembly to figure out how large each stack frame is, starting
> at the top-most address (i.e. the one in RIP).

That explains the strange call trace for me, thank you for the
enlightenment.

> On Wed, Oct 04, 2017 at 06:26:27PM +, Philipp Hahn wrote:
>>> INFO: task btrfs-transacti:522 blocked for more than 300 seconds.
> [...] 
>> And another one:
>>> INFO: task smbd:20101 blocked for more than 300 seconds.
> [...] 
>> This does not look normal to me or did I miss something?
> 
> So I see that both of the stuck processes listed above (smbd and
> btrfs-*) are disk related processes. Might I ask how many disk/nics
> (PV) do you have attached to this DomU, and how many queues does each
> have?

Nothing special configured, how would I best fetch that info?


Which leads me back to my original problem: How can I diagnose *why* the
task is blocked for that time? From my understanding this can happen if
IO is too slow and task just have to wait for too long. ¹
Even if IO is slow the system should stabilize itself when no new IO is
generated and the old one has been processed, right? So looking at
`vmstat` or `blktrace` should tell me, that Xen/Linux/whatever is busy
with IO and it is simply not fast enough to keep up with the load.

Thanks again, but any hint how to diagnose this does help.

Philipp

¹
<https://www.blackmoreops.com/2014/09/22/linux-kernel-panic-issue-fix-hung_task_timeout_secs-blocked-120-seconds-problem/>

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] 4.9.52: INFO: task XXX blocked for more than 300 seconds.

2017-10-05 Thread Philipp Hahn
x5c/0x90
>  [] ? __jbd2_journal_file_buffer+0xcb/0x180 [jbd2]
>  [] ? bit_wait_timeout+0x90/0x90
>  [] ? out_of_line_wait_on_bit+0x7e/0xa0
>  [] ? autoremove_wake_function+0x40/0x40
>  [] ? do_get_write_access+0x208/0x420 [jbd2]
>  [] ? jbd2_journal_get_write_access+0x2e/0x60 [jbd2]
>  [] ? __ext4_journal_get_write_access+0x36/0x70 [ext4]
>  [] ? ext4_orphan_add+0xd3/0x230 [ext4]
>  [] ? ext4_mark_inode_dirty+0x6a/0x200 [ext4]
>  [] ? ext4_unlink+0x36a/0x380 [ext4]
>  [] ? vfs_unlink+0xe7/0x180
>  [] ? do_unlinkat+0x289/0x300
>  [] ? system_call_fast_compare_end+0xc/0x9b

This does not look normal to me or did I miss something?

Where can I get more information on why there is no progress for 300s,
what should I do to debug which task is waiting for what?

The traces of the of other CPUs look normal to me: the one posted first
above is the shortest, in all other cases they were sooner or later
waiting for IO (my interpretation, but I can post them if necessary.)

This problem occurs since the upgrade of the Linux kernel inside the VM
from 4.1.x to 4.9.32 and now 4.9.52.

Any help is appreciated.
Philipp Hahn
-- 
Philipp Hahn
Open Source Software Engineer

Univention GmbH
be open.
Mary-Somerville-Str. 1
D-28359 Bremen
Tel.: +49 421 22232-0
Fax : +49 421 22232-99
h...@univention.de

http://www.univention.de/
Geschäftsführer: Peter H. Ganten
HRB 20755 Amtsgericht Bremen
Steuer-Nr.: 71-597-02876

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [RFH] 4.9.52: task blocked for more than 300 seconds - xen_clocksource_get_cycles?

2017-10-05 Thread Philipp Hahn
it+0x5c/0x90
>  [] ? __jbd2_journal_file_buffer+0xcb/0x180 [jbd2]
>  [] ? bit_wait_timeout+0x90/0x90
>  [] ? out_of_line_wait_on_bit+0x7e/0xa0
>  [] ? autoremove_wake_function+0x40/0x40
>  [] ? do_get_write_access+0x208/0x420 [jbd2]
>  [] ? jbd2_journal_get_write_access+0x2e/0x60 [jbd2]
>  [] ? __ext4_journal_get_write_access+0x36/0x70 [ext4]
>  [] ? ext4_orphan_add+0xd3/0x230 [ext4]
>  [] ? ext4_mark_inode_dirty+0x6a/0x200 [ext4]
>  [] ? ext4_unlink+0x36a/0x380 [ext4]
>  [] ? vfs_unlink+0xe7/0x180
>  [] ? do_unlinkat+0x289/0x300
>  [] ? system_call_fast_compare_end+0xc/0x9b

This does not look normal to me or did I miss something?

Where can I get more information on why there is no progress for 300s,
what should I do to debug which task is waiting for what?

The traces of the of other CPUs look normal to me: the one posted first
above is the shortest, in all other cases they were sooner or later
waiting for IO (my interpretation, but I can post them if necessary.)

This problem occurs since the upgrade of the Linux kernel inside the VM
from 4.1.x to 4.9.32 and now 4.9.52.

Any help is appreciated.
Philipp Hahn
-- 
Philipp Hahn
Open Source Software Engineer

Univention GmbH
be open.
Mary-Somerville-Str. 1
D-28359 Bremen
Tel.: +49 421 22232-0
Fax : +49 421 22232-99
h...@univention.de

http://www.univention.de/
Geschäftsführer: Peter H. Ganten
HRB 20755 Amtsgericht Bremen
Steuer-Nr.: 71-597-02876

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [xen-4.1.6.1] SIGSEGV libxc/xc_save_domain.c: p2m_size >> configured_ram_size

2016-06-13 Thread Philipp Hahn
Hello Georg,

first of all thank you for answering.

Am 13.06.2016 um 12:15 schrieb George Dunlap:
> On Fri, Jun 10, 2016 at 4:22 PM, Philipp Hahn <h...@univention.de> wrote:
>> while trying to live migrate some VMs from an xen-4.1.6.1 host "xc_save"
>> crashes with a segmentation fault in tools/libxc/xc_domain_save.c:1141
>>> /*
>>>  * Quick belt and braces sanity check.
>>>  */
>>> for ( i = 0; i < dinfo->p2m_size; i++ )
>>> {
>>> mfn = pfn_to_mfn(i);
>>> if( (mfn != INVALID_P2M_ENTRY) && (mfn_to_pfn(mfn) != i) )
>>  ^^^
>> due to a de-reference through
>>> #define pfn_to_mfn(_pfn)\
>>>   ((xen_pfn_t) ((dinfo->guest_width==8)   \
>>> ? (((uint64_t *)ctx->live_p2m)[(_pfn)])  \
>>> : uint32_t *)ctx->live_p2m)[(_pfn)]) == 0xU  \
>>>? (-1UL) : (((uint32_t *)ctx->live_p2m)[(_pfn)]
...
> Given that 4.1 is long out of support, we won't be making a proper fix
> in-tree (since it will never be released).

I know that 4.1 is EOL.
I'm aware of Ubuntu still having xen-4.1 in one of their LTS versions
(Precise) and its also in Debian-oldstable, which a lot people (us
included) still use. I would prefer to update, but I can for reasons
outside my direct control.

I'm already working with Stefan Bader from Canonical to backport most of
the XSAs to 4.1, so there already exists a "better" version outside of
the official Xen repositories.

> So what kind of resolution
> would be the most help to you?  A patch you can apply locally to allow
> the save/restore to work?

A patch is okay. I've already fixed a lot other bugs in xen-4.1 by
patching the last release, so compiling my own version is no problem for me.

Philipp

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [xen-4.1.6.1] SIGSEGV libxc/xc_save_domain.c: p2m_size >> configured_ram_size

2016-06-10 Thread Philipp Hahn
Hi,

while trying to live migrate some VMs from an xen-4.1.6.1 host "xc_save"
crashes with a segmentation fault in tools/libxc/xc_domain_save.c:1141
> /*
>  * Quick belt and braces sanity check.
>  */
> for ( i = 0; i < dinfo->p2m_size; i++ )
> {
> mfn = pfn_to_mfn(i);
> if( (mfn != INVALID_P2M_ENTRY) && (mfn_to_pfn(mfn) != i) )
 ^^^
due to a de-reference through
> #define pfn_to_mfn(_pfn)\
>   ((xen_pfn_t) ((dinfo->guest_width==8)   \
> ? (((uint64_t *)ctx->live_p2m)[(_pfn)])  \
> : uint32_t *)ctx->live_p2m)[(_pfn)]) == 0xU  \
>? (-1UL) : (((uint32_t *)ctx->live_p2m)[(_pfn)]

The VM is a 32bit Linux-PV-domain having maxmem=1997[MB]
> (gdb) print _ctx
> $1 = {hvirt_start = 4118806528, pt_levels = 3, max_mfn = 25690112, live_p2m = 
> 0x7f421cc2e000, live_m2p = 0x7f421d02e000, m2p_mfn0 = 8649728, dinfo = 
> {guest_width = 4, p2m_size = 1048576}}

Note that p2m_size = 0x10 = 4GiB_RAM/4KiB_page_size »
maxmem_of_domU, so so loop doesn't end around the allocated 2GB, but
tries to go up to the 32-maximum of 4GB and fails.

I've added more debugging to verify that:
> xc: detail: p2m_size=0x10
...
> xc: detail: i=0x7cfff mfn=183ecf2 live_m2p=7cfff
> segfault

I can reproduce that easily doing
 /usr/lib/xen/bin/xc_save 28 $domid 0 0 1 28>/dev/null

Doing a non-live-migration (28 $domid 0 0 0) also fails, so it doesn't
depend on being live.

Increasing the configured memory size of the domU to 2001 doesn't change
the problem:
> xc: detail: p2m_size=0x10
> xc: detail: i=0x7cfff mfn=183ecf2 live_m2p=7cfff
> xc: detail: i=0x7d000 mfn=183ecf1 live_m2p=7d000
...
> xc: detail: i=0x7d0fd mfn=10aebce live_m2p=7d0fd
> xc: detail: i=0x7d0fe mfn=10aebcd live_m2p=7d0fe
> xc: detail: i=0x7d0ff mfn=10aebcc live_m2p=7d0ff
> Speicherzugriffsfehler (Speicherabzug geschrieben)

Rebooting the domU also doesn't fix the problem.

There has been an 8 year old report about live migration mailing:
<http://lists.xenproject.org/archives/html/xen-devel/2008-10/msg00107.html>

The host is Linux-3.10.71 amd64, while the guest is Linux-4.1.16 i386
with PAE. The guest kernel reports
> [0.00] e820: last_pfn = 0x7d900 max_arch_pfn = 0x100

> # cat iomem 
> -0fff : reserved
> 1000-0009 : System RAM
> 000a-000f : reserved
>   000f-000f : System ROM
> 0010-7d8f : System RAM
>   0100-014e51d9 : Kernel code
>   014e51da-016efd7f : Kernel data
>   017a9000-0181cfff : Kernel bss
> fee0-fee00fff : Local APIC


To me that looks like a bad mixture of
1. xc_domain_maximum_gpfn() returning not the right maximum,
2. xc_map_foreign_pages() mapping only the really allocated pages,

Any idea?
If you need more info, I can provide that on request.

I know that the old migration implementation has been removed with
xen-4.6, but I still would like to know how to fix that since we will
not have 4.6 in the next few month and I need working live migration.

Thank you in advance and have a nice weekend.

Philipp
-- 
Philipp Hahn
Open Source Software Engineer

Univention GmbH
be open.
Mary-Somerville-Str. 1
D-28359 Bremen
Tel.: +49 421 22232-0
Fax : +49 421 22232-99
h...@univention.de

http://www.univention.de/
Geschäftsführer: Peter H. Ganten
HRB 20755 Amtsgericht Bremen
Steuer-Nr.: 71-597-02876

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] xen/serial: Fix incorrect length of strncmp for dtuart

2016-06-08 Thread Philipp Hahn
Hello,

Am 06.06.2016 um 23:29 schrieb Jiandi An:
> In serial_parse_handler(), length of strncmp for dtuart should have been
> 6, not 5.
> 
> Signed-off-by: Jiandi An 
> ---
>  xen/drivers/char/serial.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/xen/drivers/char/serial.c b/xen/drivers/char/serial.c
> index c583a48..0fc5ced 100644
> --- a/xen/drivers/char/serial.c
> +++ b/xen/drivers/char/serial.c
> @@ -310,7 +310,7 @@ int __init serial_parse_handle(char *conf)
>  goto common;
>  }
>  
> -if ( !strncmp(conf, "dtuart", 5) )
> +if ( !strncmp(conf, "dtuart", 6) )

Do you really want to check for a prefix, that it that conf starts with
"dtuart"?

If you want to check for an exact string match, you need to include the
trailing \0!
In that case just use "strcmp()" as there is (AFAIK) not reason to use
the n-variant as one of your string is a constant already and thus the
comparison will terminate when the \0 of that const-string is reached.

Philipp

#include 
#include 
#include 

static const char *confs[] = {
	"dtuar",
	"dtuart",
	"dtuartx",
	NULL
};

int main(void) {
	int i;
	for (i = 0; confs[i]; i++) {
		const char *conf = confs[i];
		printf("%s\t%d\t%d\n", conf, strncmp(conf, "dtuart", 5),  strncmp(conf, "dtuart", 6));
	}
	return 0;
}
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Xen Security Advisory 173 (CVE-2016-3960) - x86 shadow pagetables: address width overflow

2016-05-13 Thread Philipp Hahn
Hi,


Am 18.04.2016 um 15:31 schrieb Xen.org security team:
> Xen Security Advisory CVE-2016-3960 / XSA-173
>   version 3
> 
>  x86 shadow pagetables: address width overflow
...
> ISSUE DESCRIPTION
> =
> In the x86 shadow pagetable code, the guest frame number of a
> superpage mapping is stored in a 32-bit field.  If a shadowed guest
> can cause a superpage mapping of a guest-physical address at or above
> 2^44 to be shadowed, the top bits of the address will be lost, causing
> an assertion failure or NULL dereference later on, in code that
> removes the shadow.
...
> VULNERABLE SYSTEMS
> ==
> Xen versions from 3.4 onwards are affected.
> 
> Only x86 variants of Xen are susceptible.  ARM variants are not
> affected.
...
> RESOLUTION
> ==
> Applying the appropriate attached patch resolves this issue.
...
> xsa173-4.3.patch   Xen 4.3.x

As Xen-4.2 and xen-4.1 are also vulnerable, I'm trying to backport this.
The 4.3 patch applies mostly, but compilation fails as x86-32-bit
support was dropped with Xen-4.3 and  _PAGE_INVALID_BIT remains
undefined for x86-32:
> guest_walk.c: In function 'mandatory_flags':
> guest_walk.c:66:40: error: '_PAGE_INVALID_BIT' undeclared (first use in this 
> function)
> guest_walk.c:66:40: note: each undeclared identifier is reported only once 
> for each function it appears in
> guest_walk.c: In function 'guest_walk_tables_2_levels':
> guest_walk.c:146:30: error: '_PAGE_INVALID_BIT' undeclared (first use in this 
> function)
> guest_walk.c: In function 'mandatory_flags':
> guest_walk.c:67:1: error: control reaches end of non-void function 
> [-Werror=return-type]

It's only defined for x86-64:
> --- a/xen/include/asm-x86/x86_64/page.h
> +++ b/xen/include/asm-x86/x86_64/page.h
...
> +/*
> + * Bit 24 of a 24-bit flag mask!  This is not any bit of a real pte,
> + * and is only used for signalling in variables that contain flags.
> + */
> +#define _PAGE_INVALID_BIT (1U<<24)
> +
>  #endif /* __X86_64_PAGE_H__ */

I guess using bit 24 is okay for 32 bit, too.

Can someone confirm that please?

Philipp

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Wall-Clock-Time-Jump after migration [xen-4.1] [SOLVED]

2016-02-17 Thread Philipp Hahn
Hello,

to answer my own questions:

Am 16.02.2016 um 16:38 schrieb Philipp Hahn:
> Summary: When a Linux-PV-domU is migrated between two hosts, the
> "ntpdate" time jumps.
...
> 1. If I start a new domU (just kernel and InitRamFS with busybox as to
> minimize the processes involved), the wall-clock-time if off by ~283
> (=4m43s) seconds:
> 
> dom0: Tue Deb 16 14:48:12 UTC 2016 (before starting domU)
> domU: Tue Feb 16 14:43:32 UTC 2016 (after boot, which takes ~3s)
> 
> If I then run "ntpdate" that domU, the offset is corrected:
>> 16 Feb 14:51:59 ntpdate[150]: step time server X.X.73.241 offset 283.697217 
>> sec
> 
> Q: Where does the initial time for the domU come from?

Wall-clock (WC) from Xen Hypervisor.

> Q: Where does that offset of ~283s come from?

The Hypervisor reads the HW-RTC once on boot to initialize its WC.
When the dom0 runs ntpdate, only "time_offset_seconds" get updated, so
dom0-WC is correct, but any new domU starts with "time_offset_seconds=0"
again and is off by that initial difference again.
You could reboot the Hypervisor, so it could read the corrected RTC
after calling 'hwclock --systohc'.

> Q: Is the shard_info.wc_sec supposed to be updated?

No: "wc_set" is more-or-less constant.
The current time is calculated by adding "wc_set + tsc_to_sec()",  so
"wc_set" is the offset calculated from "wc_time@boot" - tsc@boot()".

'wc_sec' can be updated through "XENPF_settime".

> 3. If I migrate the domain from the first host to the second while
> running "ntpdate" in a loop, I see the clock jumping ~257s, which
> matches the difference between the time_offset_seconds between the hosts
> (283 - 22):
...
> To me that looks like "time_offset_seconds" is migrated, but as "wc_sec"
> between the two hosts is not synchronized, the time jumps.
> 
> Q: Is that a known problem and has it been fixed in newer versions of Xen?

The Linux kernel has a hook to call XENPF_settime from its
"update-RTC-every-11-minutes" (sync_cmos_work) work queue, but the code
path is disabled when the time is *not* NTP-synchronized:

> kernel/time/ntp.c:454
> »···static void sync_cmos_clock(struct work_struct *work)
> »···»···if (!ntp_synced()) {

> Q: Is there some recommended procedure to synchronize the time of
> multiple hypervisors, like perhaps:

Run "ntpd" in dom0 instead of "ntpdate":
Then ntp_synced() returns True, x86_platform.set_wallclock() gets called
to update the Hypervisor wall-time clock using XENPF_settime{32,64},
then every new domU will get a reasonably correct wall-time-clock on
domain creation, ...

You can check the status of "time_status" calling "ntptime":
> # ntptime 
...
> ntp_adjtime() returns code 5 (ERROR)
...
>   status 0x2041 (PLL,UNSYNC,NANO),

It's a bit-filed; if UNSYNC=0x40 is set, Xens WC will not get updated.


@David: Thank you for looking into that and sending the list of commits.
It helped me to get a better understanding of the interaction between
the Linux dom0 and Xen Hypervisor.


Thank you for the adventure; I hope that info helps others to get it
right from the beginning.


Philipp Hahn

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] Wall-Clock-Time-Jump after migration [xen-4.1]

2016-02-16 Thread Philipp Hahn
mUs. The jump of
several seconds is unacceptable and NTPd would take to long to slew that
clock back to reality. That's why we're using ntpdate, as we want to go
back to real-time as fast as possible.


Thanks in advance.

Philipp Hahn
-- 
Philipp Hahn
Open Source Software Engineer

Univention GmbH
be open.
Mary-Somerville-Str. 1
D-28359 Bremen
Tel.: +49 421 22232-0
Fax : +49 421 22232-99
h...@univention.de

http://www.univention.de/
Geschäftsführer: Peter H. Ganten
HRB 20755 Amtsgericht Bremen
Steuer-Nr.: 71-597-02876

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] VHD: Fix locale aware character encoding handling

2015-03-12 Thread Philipp Hahn
Hello Ian,

On 11.03.2015 13:30, Ian Campbell wrote:
 On Sun, 2015-03-08 at 11:54 +0100, Philipp Hahn wrote:
 ASCII is 7 bit only, which does not work in UTF-8 environments:
 failed to read parent name
...
 Don't check outbytesleft==0 as one UTF-8 characters get encoded into
 1..8 bytes, so it's perfectly fine (and expected) for the output to have
 remaining bytes left.

...
 I'm a bit perplexed over why libvhd is even trying to interpret these
 bytes, I probably don't want to know...

If with bytes you mean the encoding used for the file-name: When
creating a snapshot the names are stored UTF-16 encoded for Windows  and
in UTF-8 for MacOS-X compatibility. Therefore the utility needs to know
from which encoding to start.

If with bytes you mean the (input|output)-bytes left: yeah, gory UTF-8
details.

 Anyway: acked + applied, thanks.

Thanks. I hope it builds on BSD or wherever vhd-utils are also used.

Philipp

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] xenstored crashes with SIGSEGV

2015-03-12 Thread Philipp Hahn
Hello,
On 12.03.2015 19:17, Oleg Nesterov wrote:
 On 03/12, Philipp Hahn wrote:

 Have you seen any other corruption
 
 No,
 
 or is one of your patches likely to
 fix something like the issue mentioned above:
 
 I am not sure I even understand the problem above ;) I mean, after the quick
 look I do not see how this connects to FPU. $rdi == 2 looks obviously wrong.

In December we found some strange crashes of a Xen daemon, but other
processes crashed as well. One strange pattern Ian found was some
0x..00.ff pattern, which seems to have come from some SSE register
corruption.
That is why we upgrades to 3.10.62, which contains some fixes for saving
the FPU state. If my memory is correct the FPU registers share the space
with the MMU/SSE registers, so that seemed a good candidate.

You might want to take a look at
http://lists.xenproject.org/archives/html/xen-devel/2014-12/msg01583.html,
where you find the mail thread from December.

 $ git l1 --grep fpu v3.10.. -- arch/x86
 c7b228a Merge branch 'x86-fpu-for-linus' of 
 git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
 dc56c0f x86, fpu: Shift fpu_counter = 0 from copy_thread() to 
 arch_dup_task_struct()
 5e23fee x86, fpu: copy_process: Sanitize fpu-last_cpu initialization
 f185350 x86, fpu: copy_process: Avoid fpu_alloc/copy if !used_math()
 31d9633 x86, fpu: Change __thread_fpu_begin() to use use_eager_fpu()
 
 This is only cleanups... I do not think this series can fix something.

My guess from reading your description, but still tanks for your help.

Philipp

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH] VHD: Fix locale aware character encoding handling

2015-03-08 Thread Philipp Hahn
ASCII is 7 bit only, which does not work in UTF-8 environments:
 failed to read parent name

Setup locale in vhd-util to parse LC_CTYPE and use the right codeset
when doing file name encoding and decoding.

Increase allocation for UTF-8 buffer as one UTF-16 character might use
twice as much space in UTF-8 (or more).

Don't check outbytesleft==0 as one UTF-8 characters get encoded into
1..8 bytes, so it's perfectly fine (and expected) for the output to have
remaining bytes left.

Test-case:
$ ./vhd-util create -n ä.vhd -s 1
$ ./vhd-util snapshot -n snap.vhd -p ä.vhd ; echo $?

See
http://unix.stackexchange.com/questions/48689/effect-of-lang-on-terminal
for more information about the details of handling the encoding right.

Signed-off-by: Philipp Hahn h...@univention.de
---
 tools/blktap2/vhd/lib/libvhd.c | 27 +++
 tools/blktap2/vhd/vhd-util.c   |  3 +++
 2 files changed, 22 insertions(+), 8 deletions(-)

diff --git a/tools/blktap2/vhd/lib/libvhd.c b/tools/blktap2/vhd/lib/libvhd.c
index 95eb5d6..1fd5b4e 100644
--- a/tools/blktap2/vhd/lib/libvhd.c
+++ b/tools/blktap2/vhd/lib/libvhd.c
@@ -37,6 +37,7 @@
 #include iconv.h
 #include sys/mman.h
 #include sys/stat.h
+#include langinfo.h
 
 #include libvhd.h
 #include relative-path.h
@@ -1296,6 +1297,7 @@ vhd_macx_encode_location(char *name, char **out, int 
*outlen)
size_t ibl, obl;
char *uri, *uri_utf8, *uri_utf8p, *ret;
const char *urip;
+   char *codeset;
 
err = 0;
ret = NULL;
@@ -1304,7 +1306,7 @@ vhd_macx_encode_location(char *name, char **out, int 
*outlen)
len = strlen(name) + strlen(file://);
 
ibl = len;
-   obl = len;
+   obl = len * 2;
 
urip = uri = malloc(ibl + 1);
uri_utf8 = uri_utf8p = malloc(obl);
@@ -1312,7 +1314,8 @@ vhd_macx_encode_location(char *name, char **out, int 
*outlen)
if (!uri || !uri_utf8)
return -ENOMEM;
 
-   cd = iconv_open(UTF-8, ASCII);
+   codeset = nl_langinfo(CODESET);
+   cd = iconv_open(UTF-8, codeset);
if (cd == (iconv_t)-1) {
err = -errno;
goto out;
@@ -1325,7 +1328,7 @@ vhd_macx_encode_location(char *name, char **out, int 
*outlen)
(char **)
 #endif
urip, ibl, uri_utf8p, obl) == (size_t)-1 ||
-   ibl || obl) {
+   ibl) {
err = (errno ? -errno : -EIO);
goto out;
}
@@ -1357,6 +1360,7 @@ vhd_w2u_encode_location(char *name, char **out, int 
*outlen)
size_t ibl, obl;
char *uri, *uri_utf16, *uri_utf16p, *tmp, *ret;
const char *urip;
+   char *codeset;
 
err = 0;
ret = NULL;
@@ -1404,7 +1408,8 @@ vhd_w2u_encode_location(char *name, char **out, int 
*outlen)
 * MICROSOFT_COMPAT
 * little endian unicode here 
 */
-   cd = iconv_open(UTF-16LE, ASCII);
+   codeset = nl_langinfo(CODESET);
+   cd = iconv_open(UTF-16LE, codeset);
if (cd == (iconv_t)-1) {
err = -errno;
goto out;
@@ -1415,7 +1420,7 @@ vhd_w2u_encode_location(char *name, char **out, int 
*outlen)
(char **)
 #endif
urip, ibl, uri_utf16p, obl) == (size_t)-1 ||
-   ibl || obl) {
+   ibl) {
err = (errno ? -errno : -EIO);
goto out;
}
@@ -1447,11 +1452,13 @@ vhd_macx_decode_location(const char *in, char *out, int 
len)
iconv_t cd;
char *name;
size_t ibl, obl;
+   char *codeset;
 
name = out;
ibl  = obl = len;
 
-   cd = iconv_open(ASCII, UTF-8);
+   codeset = nl_langinfo(CODESET);
+   cd = iconv_open(codeset, UTF-8);
if (cd == (iconv_t)-1) 
return NULL;
 
@@ -1479,11 +1486,13 @@ vhd_w2u_decode_location(const char *in, char *out, int 
len, char *utf_type)
iconv_t cd;
char *name, *tmp;
size_t ibl, obl;
+   char *codeset;
 
tmp = name = out;
ibl = obl  = len;
 
-   cd = iconv_open(ASCII, utf_type);
+   codeset = nl_langinfo(CODESET);
+   cd = iconv_open(codeset, utf_type);
if (cd == (iconv_t)-1) 
return NULL;
 
@@ -2450,6 +2459,7 @@ vhd_initialize_header_parent_name(vhd_context_t *ctx, 
const char *parent_path)
size_t ibl, obl;
char *ppath, *dst;
const char *pname;
+   char *codeset;
 
err   = 0;
pname = NULL;
@@ -2459,7 +2469,8 @@ vhd_initialize_header_parent_name(vhd_context_t *ctx, 
const char *parent_path)
 * MICROSOFT_COMPAT
 * big endian unicode here 
 */
-   cd = iconv_open(UTF_16BE, ASCII);
+   codeset = nl_langinfo(CODESET);
+   cd = iconv_open(UTF_16BE, codeset);
if (cd == (iconv_t)-1) {
err = -errno;
goto out;
diff --git a/tools/blktap2/vhd/vhd-util.c b/tools/blktap2/vhd/vhd-util.c
index 944a59e..13f1835

Re: [Xen-devel] Backport request for tools/hotplug: set mtu from bridge for tap interface - for Xen 4.4, 4.5, and unstable

2015-01-12 Thread Philipp Hahn
Hello,

On 12.01.2015 18:03, Ian Jackson wrote:
 Charles Arnold writes (Re: Backport request for tools/hotplug: set mtu from 
 bridge for tap interface - for Xen 4.4, 4.5, and unstable):
 Add quotes around $bridge and $dev to handle spaces in names.
 This should go into 4.4, 4.5 and unstable.
 
 Is this really necessary for backporting ?
 
 Frankly I think if you put spaces in your network device names an
 awful lot of things are going to break.

Luckily for you at least Linux does not allow space characters in
interface names: net/core/dev.c:936 dev_valid_name

On the other hand I find not caring about quoting somehow dangerous, as
I already expeienced one desaster caused by missing quotes. Just my 2¢.

Philipp

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] xenstored crashes with SIGSEGV

2015-01-05 Thread Philipp Hahn
Hello,

happy new year to everyone.

On 19.12.2014 13:36, Philipp Hahn wrote:
 On 18.12.2014 11:17, Ian Campbell wrote:
 On Tue, 2014-12-16 at 16:13 +, Frediano Ziglio wrote:
 Do we have a bug in Xen that affect SSE instructions (possibly already
 fixed after Philipp version) ?

 I've had a niggling feeling of Deja Vu over this which I'd been putting
 down to an old Xen on ARM bug in the area of FPU register switching.

 But it seems at some point (possibly even still) there was a similar
 issue with pvops kernels on x86, see:
 http://bugs.xenproject.org/xen/bug/40
 
 That definitely looks interesting.
 
 Philipp, what kernel are you guys using?
 
 The crash 2014-12-06 01:26:21 xenstored[4337] happened on linux-3.10.46.

I looked through the changes of v3.10.46..v3.10.63 and found the
following patches:
| fb5b6e7 x86, fpu: shift drop_init_fpu() from save_xstate_sig() to
handle_signal()
| b888e3d x86, fpu: __restore_xstate_sig()-math_state_restore() needs
preempt_disable()

They look interesting enough to may have fixed the bug, which could
explain the strange bit pattern caused by not restoring the FPU state
correctly. Because of that and because of the missing

 commit d1cc001905146d58c17ac8452eb96f226767819d
 Author: Silesh C V svella...@mvista.com
 Date:   Wed Jul 23 13:59:59 2014 -0700

 coredump: fix the setting of PF_DUMPCORE
 commit aed8adb7688d5744cb484226820163af31d2499a upstream.

we're now working on upgrading the dom0 kernel which should give use
usable core dumps again and may also fix the underlying problem. It that
bug ever happens again I'll keep you informed.

Thanks so far to everybody for the excellent support.

Sincerely
Philipp Hahn

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] xenstored crashes with SIGSEGV

2014-12-19 Thread Philipp Hahn
Hello Ian,

On 18.12.2014 11:17, Ian Campbell wrote:
 On Tue, 2014-12-16 at 16:13 +, Frediano Ziglio wrote:
 Do we have a bug in Xen that affect SSE instructions (possibly already
 fixed after Philipp version) ?
 
 I've had a niggling feeling of Deja Vu over this which I'd been putting
 down to an old Xen on ARM bug in the area of FPU register switching.
 
 But it seems at some point (possibly even still) there was a similar
 issue with pvops kernels on x86, see:
 http://bugs.xenproject.org/xen/bug/40

That definitely looks interesting.

 Philipp, what kernel are you guys using?

The crash 2014-12-06 01:26:21 xenstored[4337] happened on linux-3.10.46.

That kernel is missing v3.10.50-13-gd1cc001:
 commit d1cc001905146d58c17ac8452eb96f226767819d
 Author: Silesh C V svella...@mvista.com
 Date:   Wed Jul 23 13:59:59 2014 -0700

 coredump: fix the setting of PF_DUMPCORE
 commit aed8adb7688d5744cb484226820163af31d2499a upstream.
which explains why the xmm* registers are not included in the core file.

 I also can't quite shake the feeling that there was another much older
 issue relating to FPU context switch on x86, but I think that was truly
 ancient history (2.6.18 era stuff)

Some of those host might still use 3.2, most use 3.10.x, but definitely
no 2.6 kernels.

Xen-Hypervisor is 4.1.3

If you need anything more, just ask. It might take me some time to
answer as I'm on vacation for the next 2 weeks.

Thanks again for your help.
Philipp

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] xenstored crashes with SIGSEGV

2014-12-18 Thread Philipp Hahn
Hello,

On 17.12.2014 10:14, Frediano Ziglio wrote:
 2014-12-16 16:44 GMT+00:00 Frediano Ziglio fredd...@gmail.com:
 2014-12-16 16:23 GMT+00:00 Ian Campbell ian.campb...@citrix.com:
...
 First we (I'll try when I reach home) can check if memset in glibc (or
 the version called from talloc_zero) can use SSE. A possible dmesg
 output and /proc/cpuinfo content could help too but I think SSE are
 now quite common.
 
 I have access to some core dumps. glibc memset is using SSE,
 specifically xmm0 register.
 
 Unfortunately is seems that core dumps contains only standard
 registers, so all register appears zeroed. If you try with a newer gdb
 version is shows that registers are not available.

I had another look myself and I'm confused now:

Using info float or info vector with gdb-7.0.1 shows the FP and MMX
registers to be all zero.
A newer gdb-7.2 shows the registers as unavailable.

eu-readelf --notes core doesn't show a NT_FPREGSET note, so to me it
looks like at least the FP-registers were not dumped.
But is that also used for the MMX registers? If my memory is right, the
FP and MMX registers are shared in the CPU, but that might be old
knowledge.

I wrote a small SSE using program, which dumps core. If I run that
locally and do a readelf --notes core, I get:
  CORE  0x0200  NT_FPREGSET (floating point registers)

If I do the same in dom0, I don't get that note and gdb doesn't show the
register content.
SSE seems to be available in the dom0, as the program would crash with
SIGILL otherwise:
# grep ^flags /proc/cpuinfo
flags   : fpu de tsc msr pae mce cx8 apic sep mca cmov pat
clflush acpi mmx fxsr sse sse2 ss ht syscall nx lm constant_tsc rep_good
nopl nonstop_tsc pni est ssse3 cx16 sse4_1 sse4_2 popcnt hypervisor
lahf_lm ida dtherm

Look like that got fixed with a newer 3.10.61 kernel, so I'll urge our
admins to update to a later kernel (again), so we'll get more useful
core dumps for future crashes.

I'm still investigating the core files of the other programs, but it
takes some time. I don't know if I will be able to finish that in time,
as the Christmas holiday season starts tomorrow and I will be
unavailable for nearly two weeks,

So happy Christmas to everybody and thanks again for your help.

Philipp

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] xenstored crashes with SIGSEGV

2014-12-15 Thread Philipp Hahn
Hello Ian,

On 15.12.2014 14:17, Ian Campbell wrote:
 On Fri, 2014-12-12 at 17:58 +, Ian Campbell wrote:
  On Fri, 2014-12-12 at 18:20 +0100, Philipp Hahn wrote:
 On 12.12.2014 17:56, Ian Campbell wrote:
 On Fri, 2014-12-12 at 17:45 +0100, Philipp Hahn wrote:
 On 12.12.2014 17:32, Ian Campbell wrote:
 On Fri, 2014-12-12 at 17:14 +0100, Philipp Hahn wrote:
...
 The 1st and 2nd trace look like this: ptr in frame #2 looks very bogus.

 (gdb) bt full
 #0  talloc_chunk_from_ptr (ptr=0xff) at talloc.c:116
 tc = value optimized out
 #1  0x00407edf in talloc_free (ptr=0xff) at talloc.c:551
 tc = value optimized out
 #2  0x0040a348 in tdb_open_ex (name=0x1941fb0
 /var/lib/xenstored/tdb.0x1935bb0,

I just noticed something strange:

 #3  0x0040a684 in tdb_open (name=0xff Address
 0xff out of bounds, hash_size=0,
 tdb_flags=4254928, open_flags=-1, mode=3119127560) at tdb.c:1773
 #4  0x0040a70b in tdb_copy (tdb=0x192e540, outfile=0x1941fb0
 /var/lib/xenstored/tdb.0x1935bb0)

Why does gdb-7.0.1 print name=0xff00 here for frame 3, but for
frame 2 and 4 the pointers are correct again?
Verifying the values with an explicit print shows them as correct.

 I've timed out for tonight will try and have another look next week.
 
 I've had another dig, and have instrumented all of the error paths from
 this function and I can't see any way for an invalid pointer to be
 produced, let alone freed. I've been running under valgrind which should
 have caught any uninitialised memory type errors.

Thank you for testing that.

 hash_size=value optimized out, tdb_flags=0, open_flags=value
 optimized out, mode=value optimized out,
 log_fn=0x4093b0 null_log_fn, hash_fn=value optimized out) at
 tdb.c:1958
 
 Please can you confirm what is at line 1958 of your copy of tdb.c. I
 think it will be tdb-locked, but I'd like to be sure.

Yes, that's the line:
# sed -ne 1958p tdb.c
SAFE_FREE(tdb-locked);

 You are running a 64-bit dom0, correct?

yes: x86_64

 I've only just noticed that
 0xff is 32bits. My testing so far was 32-bit, I don't think it
 should matter wrt use of uninitialised data etc.
 
 I can't help feeling that 0xff must be some sort of magic
 sentinel value to someone. I can't figure out what though.

0xff is too much for bit flip errors. and also two crashes on different
machines in the same location very much rules out any HW error for me.

My 2nd idea was that someone decremented 0 one too many, but then that
would have to be an 8 bit value - reading the code I didn't see anything
like that.

 Have you observed the xenstored processes growing especially large
 before this happens? I'm wondering if there might be a leak somewhere
 which after a time is resulting a 

I have no monitoring of the memory usage for the crashed systems, but
the core files look reasonable sane.
Looking at the test-system running
/usr/share/pyshared/xen/xend/xenstore/tests/stress_xs.py the memory
usage stays constant since last Friday.

 I'm about to send out a patch which plumbs tdb's logging into
 xenstored's logging, in the hopes that next time you see this it might
 say something as it dies.

Thank you for the patch: I'll try to incorporate it and will continue
trying to reproduce the crash.


One more thing we noticed: /var/lib/xenstored/ contained the tdb file
and to bit-identical copies after the crash, so I would read that as two
transactions being in progress at the time of the crash. Might be that
this is important.
But /usr/share/pyshared/xen/xend/xenstore/tests/stress_xs.py seems to
create more transaction in parallel and my test system so far has
survived this since Friday.

Sincerely
Philipp Hahn

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] xenstored crashes with SIGSEGV

2014-12-15 Thread Philipp Hahn
Hello Ian,

On 15.12.2014 18:45, Ian Campbell wrote:
 On Mon, 2014-12-15 at 14:50 +, Ian Campbell wrote:
 On Mon, 2014-12-15 at 15:19 +0100, Philipp Hahn wrote:
 I just noticed something strange:

 #3  0x0040a684 in tdb_open (name=0xff Address
 0xff out of bounds, hash_size=0,
 tdb_flags=4254928, open_flags=-1, mode=3119127560) at tdb.c:1773
...
 I'm reasonably convinced now that this is just a weird artefact of
 running gdb on an optimised binary, probably a shortcoming in the debug
 info leading to gdb getting confused.
 
 Unfortunately this also calls into doubt the parameter to talloc_free,
 perhaps in that context 0xff000 is a similar artefact.
 
 Please can you print the entire contents of tdb in the second frame
 (print *tdb ought to do it). I'm curious whether it is all sane or
 not.

(gdb) print *tdb
$1 = {name = 0x0, map_ptr = 0x0, fd = 47, map_size = 65280, read_only =
16711680,
  locked = 0xff00, ecode = 16711680, header = {
magic_food =
\000\000\000\000\000\000\000\000\000\377\000\000\000\000\377\000\000\000\000\000\000\000\000\000\000\377\000\000\000\000\377,
version = 0, hash_size = 0,
rwlocks = 65280, reserved = {16711680, 0, 0, 65280, 16711680, 0, 0,
65280,
  16711680, 0, 0, 65280, 16711680, 0, 0, 65280, 16711680, 0, 0,
65280, 16711680,
  0, 0, 65280, 16711680, 0, 0, 65280, 16711680, 0, 0}}, flags = 0,
travlocks = {
next = 0xff, off = 0, hash = 65280}, next = 0xff,
  device = 280375465082880, inode = 16711680, log_fn = 0x4093b0
null_log_fn,
  hash_fn = 0x4092f0 default_tdb_hash, open_flags = 2}

 Please can you also print info regs at the point of the segv (in frame
 0) as well as disas at that point.

(gdb) info registers
rax0x0  0
rbx0x16bff7023854960
rcx0x   -1
rdx0x40ecd0 4254928
rsi0x0  0
rdi0xff00   280375465082880
rbp0x7fcaed6c96a8   0x7fcaed6c96a8
rsp0x7fff9dc86330   0x7fff9dc86330
r8 0x7fcaece54c08   140509534571528
r9 0xff00   -72057594037927936
r100x7fcaed08c14c   140509536895308
r110x246582
r120xd  13
r130xff00   280375465082880
r140x4093b0 4232112
r150x167d62023582240
rip0x4075c4 0x4075c4 talloc_chunk_from_ptr+4
eflags 0x10206  [ PF IF RF ]
cs 0x33 51
ss 0x2b 43
ds 0x0  0
es 0x0  0
fs 0x0  0
gs 0x0  0
fctrl  0x0  0
fstat  0x0  0
ftag   0x0  0
fiseg  0x0  0
fioff  0x0  0
foseg  0x0  0
fooff  0x0  0
fop0x0  0
mxcsr  0x0  [ ]

(gdb) disassemble
Dump of assembler code for function talloc_chunk_from_ptr:
0x004075c0 talloc_chunk_from_ptr+0:   sub$0x8,%rsp
0x004075c4 talloc_chunk_from_ptr+4:   mov-0x8(%rdi),%edx
0x004075c7 talloc_chunk_from_ptr+7:   lea-0x50(%rdi),%rax
0x004075cb talloc_chunk_from_ptr+11:  mov%edx,%ecx
0x004075cd talloc_chunk_from_ptr+13:  and
$0xfff0,%ecx
0x004075d0 talloc_chunk_from_ptr+16:  cmp$0xe814ec70,%ecx
0x004075d6 talloc_chunk_from_ptr+22:  jne0x4075e2
talloc_chunk_from_ptr+34
0x004075d8 talloc_chunk_from_ptr+24:  and$0x1,%edx
0x004075db talloc_chunk_from_ptr+27:  jne0x4075e2
talloc_chunk_from_ptr+34
0x004075dd talloc_chunk_from_ptr+29:  add$0x8,%rsp
0x004075e1 talloc_chunk_from_ptr+33:  retq
0x004075e2 talloc_chunk_from_ptr+34:  nopw   0x0(%rax,%rax,1)
0x004075e8 talloc_chunk_from_ptr+40:  callq  0x401b98 abort@plt

 Can you also p $_siginfo._sifields._sigfault.si_addr (in frame 0).
 This ought to be the actual faulting address, which ought to give a hint
 on how much we can trust the parameters in the stack trace.

Hmm, my gdb refused to access $_siginfo:
(gdb) show convenience
$_siginfo = Unable to read siginfo

 Since I'm asking for the world I may as well ask you to dump the raw
 stack too x/64x $sp ought to be a good starting point.

(gdb) x/64x $sp
0x7fff9dc86330: 0xed6c96a8  0x7fca  0x00407edf  0x
0x7fff9dc86340: 0x  0x  0x016bff70  0x
0x7fff9dc86350: 0xed6c96a8  0x7fca  0x000d  0x
0x7fff9dc86360: 0x  0x  0x004093b0  0x
0x7fff9dc86370: 0x0167d620  0x  0x0040a348  0x
0x7fff9dc86380: 0x  0x  0x  0x
0x7fff9dc86390: 0x  0x  0x  0x
0x7fff9dc863a0: 0x0011  0x  0x411d4816  0x
0x7fff9dc863b0: 0x0001  0x  0x81a0  0x
0x7fff9dc863c0

Re: [Xen-devel] xenstored crashes with SIGSEGV

2014-12-12 Thread Philipp Hahn
Hello,

On 13.11.2014 10:12, Ian Campbell wrote:
 On Thu, 2014-11-13 at 08:45 +0100, Philipp Hahn wrote:
 To me this looks like some memory corruption by some unknown code
 writing into some random memory space, which happens to be the tdb here.
 
 I wonder if running xenstored under valgrind would be useful. I think
 you'd want to stop xenstored from starting during normal boot and then
 launch it with:
 valgrind /usr/local/sbin/xenstored -N
 -N is to stay in the foreground, you might want to do this in a screen
 session or something, alternatively you could investigate the --log-*
 options in the valgrind manpage, together with the various
 --trace-children* in order to follow the processes over its
 daemonization.

We did enable tracing and now have the xenstored-trace.log of one crash:
It contains 1.6 billion lines and is 83 GiB.
It just shows xenstored to crash on TRANSACTION_START.

Is there some tool to feed that trace back into a newly launched xenstored?

My hope would be that xenstored crashes again, because then we could use
all those other tools like valgrind more easily.

 3. the crash happens rarely and the host run fine most of the time. The
 crash mostly happens around midnight and seem to be guest-triggered, as
 the logs on the host don't show any activity like starting new or
 destroying running VMs. So far the problem only showed on host running
 Linux VMs. Other host running Windows VMs so far never showed that crash.

Now we also observed a crash on a host running Windows VMs.

 If it is really mostly happening around midnight then it might be worth
 digging into the host and guest configs for cronjobs and the like, e.g.
 log rotation stuff like that which might be tweaking things somehow.
 
 Does this happen on multiple hosts, or just the one?

Multiple host in two different data centers.

 Do you rm the xenstore db on boot? It might have a persistent
 corruption, aiui most folks using C xenstored are doing so or even
 placing it on a tmpfs for performance reasons.

We're using a tmpfs for /var/lib/xenstored/, as we had some sever
performance problem with something updating
/local/domain/0/backend/console/*/0/uuid too often, which put xenstored
in permanent D state.

 If you are running 4.1.x then I think oxenstored isn't an option, but it
 might be something to consider when you upgrade.

Thank you for the hint, I'll have another look at the Ocaml version.

Thank you again.
Philipp Hahn

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel