Re: [Xen-devel] [BUG] Error applying XSA240 update 5 on 4.8 and 4.9 (patch 3 references CONFIG_PV_LINEAR_PT, 3285e75dea89, x86/mm: Make PV linear pagetables optional)
On Friday, 17 November 2017 2:09:09 AM AEDT Ian Jackson wrote: > George Dunlap writes ("Re: [BUG] Error applying XSA240 update 5 on 4.8 and 4.9 (patch 3 references CONFIG_PV_LINEAR_PT, 3285e75dea89, x86/mm: Make PV linear pagetables optional)"): > > These are two different things. Steve's reluctance to backport a > > potentially arbitrary number of non-security-related patches is > > completely reasonable. > > I think the right thing to do is this: > > If the patche(s) in an XSA require commits from staging-N which are > not contained in previous XSAs, the prerequisite commits should be > listed in the advisory. > > That way someone who is following the XSAs (and by implication does > not want to take the other stuff from staging-N/stable-N or even our > point releases) will be able to take the minimum set necessary. Hi Ian, I think that would be a great idea. That way, if a non-xsa and non-release commit is required, at least it is documented as such - therefore correctable. On a theoretical side though, what would be the chances of opening up other vulnerabilities like this? I would think somewhat minimal, but worthy of thought - even in passing... -- Steven Haigh net...@crc.id.au http://www.crc.id.au +61 (3) 9001 6090 0412 935 897 signature.asc Description: This is a digitally signed message part. ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [BUG] Error applying XSA240 update 5 on 4.8 and 4.9 (patch 3 references CONFIG_PV_LINEAR_PT, 3285e75dea89, x86/mm: Make PV linear pagetables optional)
On Thursday, 16 November 2017 8:30:39 PM AEDT Jan Beulich wrote: > >>> On 15.11.17 at 23:48, <li...@johnthomson.fastmail.com.au> wrote: > > Hi, > > > > I am having trouble applying the patch 3 from XSA240 update 5 for xen > > stable 4.8 and 4.9 > > xsa240 0003 contains: > > > > CONFIG_PV_LINEAR_PT > > > > from: > > > > x86/mm: Make PV linear pagetables optional > > https://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=3285e75dea89afb0e > > f5 b3ee39bd15194bd7cc110 > > > > I cannot find this string in an XSA, nor is an XSA referenced in the > > commit. > > Am I missing a patch, or doing something wrong? > > Well, you're expected to apply all patched which haven't been > applied so far. In particular, in the stable version trees, the 2nd > patch hasn't gone in yet (I'm intending to do this later today), > largely because it (a) wasn't ready at the time the first patch > went in and (b) it is more a courtesy patch than an actual part of > the security fix. I'm not quite sure this is a great idea... They should work on the released versions - hence xsa240 patchset should apply to the base tarball + current XSA patches. If there is something in the git that *isn't* in the latest release, it should be included in the XSA patchset - otherwise the set is incomplete. I don't see mention of anywhere in the written XSA that mentions a separate patch is required outside of the patches included with the XSA. Could I suggest that we re-do v6 of these patches with the complete required set? These should be included in 4.9.1 - which makes most things irrelevant - but I'm not aware of what the release window is for 4.9.1. -- Steven Haigh net...@crc.id.au http://www.crc.id.au +61 (3) 9001 6090 0412 935 897 signature.asc Description: This is a digitally signed message part. ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3 0/2] XSA-226
On Tuesday, 15 August 2017 11:43:59 PM AEST Jan Beulich wrote: > XSA-226 went out with just a workaround patch. The pair of patches > here became ready too late to be reasonably included in the XSA. > Nevertheless they aim at fixing the underlying issues, ideally making > the workaround unnecessary. > > 1: gnttab: don't use possibly unbounded tail calls > 2: gnttab: fix transitive grant handling > > Signed-off-by: Jan Beulich <jbeul...@suse.com> If this turns out to be all good and accepted, is it possible to reissue xsa226 with the proper fixes? -- Steven Haigh net...@crc.id.au http://www.crc.id.au +61 (3) 9001 6090 0412 935 897 signature.asc Description: This is a digitally signed message part. ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 0/3] xen/blkback: several fixes of resource management
On Wednesday, 7 June 2017 11:52:34 PM AEST Konrad Rzeszutek Wilk wrote: > On Wed, Jun 07, 2017 at 10:36:58PM +1000, Steven Haigh wrote: > > On Friday, 19 May 2017 1:28:46 AM AEST Juergen Gross wrote: > > > Destroying a Xen guest domain while it was doing I/Os via xen-blkback > > > leaked several resources, including references of the guest's memory > > > pages. > > > > > > This patch series addresses those leaks by correcting usage of > > > reference counts and the sequence when to free which resource. > > > > > > The series applies on top of commit 2d4456c73a487abe ("block: > > > xen-blkback: add null check to avoid null pointer dereference") in > > > Jens Axboe's tree kernel/git/axboe/linux-block.git > > > > > > V2: changed flag to type bool in patch 1 (Dietmar Hahn) > > > > > > Juergen Gross (3): > > > xen/blkback: fix disconnect while I/Os in flight > > > xen/blkback: don't free be structure too early > > > xen/blkback: don't use xen_blkif_get() in xen-blkback kthread > > > > > > drivers/block/xen-blkback/blkback.c | 3 --- > > > drivers/block/xen-blkback/common.h | 1 + > > > drivers/block/xen-blkback/xenbus.c | 15 --- > > > 3 files changed, 9 insertions(+), 10 deletions(-) > > > > Just wanted to give this a bit of a prod. > > Ouch! > > > Is there any plans in having this hit the kernel.org kernels? > > Yes. > > > My testing was purely on kernel 4.9 branch - but it doesn't look like this > > has shown up there yet? > > Correct. I am thinking to send these to Jens around June 20th or so. Ok, all understood. Thanks for the clarifications. At the moment, I'm just including them in my kernel builds - then expecting them to fail at one point in the future when the patch fails due to upstreaming. I'll just keep doing this. -- Steven Haigh net...@crc.id.au http://www.crc.id.au +61 (3) 9001 6090 0412 935 897 signature.asc Description: This is a digitally signed message part. ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 0/3] xen/blkback: several fixes of resource management
On Friday, 19 May 2017 1:28:46 AM AEST Juergen Gross wrote: > Destroying a Xen guest domain while it was doing I/Os via xen-blkback > leaked several resources, including references of the guest's memory > pages. > > This patch series addresses those leaks by correcting usage of > reference counts and the sequence when to free which resource. > > The series applies on top of commit 2d4456c73a487abe ("block: > xen-blkback: add null check to avoid null pointer dereference") in > Jens Axboe's tree kernel/git/axboe/linux-block.git > > V2: changed flag to type bool in patch 1 (Dietmar Hahn) > > Juergen Gross (3): > xen/blkback: fix disconnect while I/Os in flight > xen/blkback: don't free be structure too early > xen/blkback: don't use xen_blkif_get() in xen-blkback kthread > > drivers/block/xen-blkback/blkback.c | 3 --- > drivers/block/xen-blkback/common.h | 1 + > drivers/block/xen-blkback/xenbus.c | 15 --- > 3 files changed, 9 insertions(+), 10 deletions(-) Just wanted to give this a bit of a prod. Is there any plans in having this hit the kernel.org kernels? My testing was purely on kernel 4.9 branch - but it doesn't look like this has shown up there yet? -- Steven Haigh net...@crc.id.au http://www.crc.id.au +61 (3) 9001 6090 0412 935 897 signature.asc Description: This is a digitally signed message part. ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:316 dev_watchdog+0x217/0x220
On Thursday, 1 June 2017 11:56:28 PM AEST Boris Ostrovsky wrote: > On 05/31/2017 10:25 PM, Steven Haigh wrote: > > On 2017-05-31 00:37, Steven Haigh wrote: > >> On 31/05/17 00:18, Boris Ostrovsky wrote: > >>> On 05/30/2017 06:27 AM, Steven Haigh wrote: > >>>> Just wanted to give this a nudge to try and get some suggestions on > >>>> where to go / what to do about this. > >>>> > >>>> On 28/05/17 09:44, Steven Haigh wrote: > >>>>> The last couple of days running on kernel 4.9.29 and 4.9.30 with Xen > >>>>> 4.9.0-rc6 I've had a number of ethernet lock ups that have taken my > >>>>> system off the network. > >>>>> > >>>>> This is a new development - but I'm not sure if its kernel or xen > >>>>> related. > >>> > >>> Since noone seems to have seen this it would be useful to narrow it > >>> down > >>> a bit. > >>> > >>> Do you observe this on rc5? Or with 4.9.28 kernel? Any particular load > >>> that you are using? Do you see this on a specific NIC? > >> > >> This install is currently using xen 4.9-rc7 and kernel 4.9.30. I would > >> say that there may be a connection between occurrences between disk > >> activity and the ethernet adapter locking up - but I haven't been able > >> to prove this in any valid way yet. > >> > >> I am currently running this script on the server in question to try and > >> get a log of how often the adapter locks up. I only added the logger > >> line tonight - so I don't have a great deal of historical data to add as > >> yet. > >> > >> #!/bin/bash > >> while true; do > >> > >> ping -c1 10.1.1.2 >& /dev/null > >> if [ $? != 0 ]; then > >> > >> logger 'No response. Resetting enp5s0' > >> mii-tool -R enp5s0 > >> > >> fi > >> sleep 5 > >> > >> done > > > > Just to keep kicking this along a little bit, my logs so far have shown: > > messages:May 31 00:20:10 No response. Resetting enp5s0 > > messages:May 31 04:20:08 No response. Resetting enp5s0 > > messages:May 31 12:21:37 No response. Resetting enp5s0 > > > > Its almost spooky that its nearly 20 minutes past the hour on each reset. > > > > I've checked against the cron logs, but I can't find anything that > > would be scheduled on the Dom0 at that time. > > > > The logs also show that after running mii-tool to reset the ethernet > > adapter, connectivity has returned straight away. > > > > The network adapter uses the r8169 kernel module, and shows as: > > 05:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. > > RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 06) > > > > I have a DomU backup script that runs *in* a DomU at 01:00 each night > > - that causes a lot of disk activity - but alas, that time hasn't > > lined up with anything as yet... > > > > Still seem to be fidgeting in the dark :( > > Since you've already observed this problem with rc6 and 4.9.29, wouldn't > it be more useful to go backwards to narrow down where the problem first > occurred? I am not sure how moving to rc7 and 4.9.30 is going to help > unless you think this is a temporary regression. I'm not 100% sure of the cause at the moment. I moved to kernel 4.9 from 4.4 a few weeks before I started to test Xen 4.9. My only thoughts were that bringing up to the latest version would at least test against other fixes that are known going into the Xen 4.9rc releases. I have also been updating to the latest 4.9 kernel in case I come across a fix - or at least a version of kernel where this no longer occurs. At this stage, I don't have any information to give any major hint on if this is Xen or kernel related other than I had never observed this using: * Xen 4.7 + kernel 4.4 * Xen 4.7 + kernel 4.9 I am making the assumption however that because when the network dies in this manor, it is dead until manual intervention, that I would notice this in a different combination of Xen / kernel. One observation I have made since putting in the extra logging via the ethernet reset script posted earlier - the WARNING is not printed for every ethernet controller hang. As such, this may actually be a side-effect of having the controller stay dead - rather than a cause. A second observation is that I don't seem to see as many hangs of the ethern
Re: [Xen-devel] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:316 dev_watchdog+0x217/0x220
On 2017-05-31 00:37, Steven Haigh wrote: On 31/05/17 00:18, Boris Ostrovsky wrote: On 05/30/2017 06:27 AM, Steven Haigh wrote: Just wanted to give this a nudge to try and get some suggestions on where to go / what to do about this. On 28/05/17 09:44, Steven Haigh wrote: The last couple of days running on kernel 4.9.29 and 4.9.30 with Xen 4.9.0-rc6 I've had a number of ethernet lock ups that have taken my system off the network. This is a new development - but I'm not sure if its kernel or xen related. Since noone seems to have seen this it would be useful to narrow it down a bit. Do you observe this on rc5? Or with 4.9.28 kernel? Any particular load that you are using? Do you see this on a specific NIC? This install is currently using xen 4.9-rc7 and kernel 4.9.30. I would say that there may be a connection between occurrences between disk activity and the ethernet adapter locking up - but I haven't been able to prove this in any valid way yet. I am currently running this script on the server in question to try and get a log of how often the adapter locks up. I only added the logger line tonight - so I don't have a great deal of historical data to add as yet. #!/bin/bash while true; do ping -c1 10.1.1.2 >& /dev/null if [ $? != 0 ]; then logger 'No response. Resetting enp5s0' mii-tool -R enp5s0 fi sleep 5 done Just to keep kicking this along a little bit, my logs so far have shown: messages:May 31 00:20:10 No response. Resetting enp5s0 messages:May 31 04:20:08 No response. Resetting enp5s0 messages:May 31 12:21:37 No response. Resetting enp5s0 Its almost spooky that its nearly 20 minutes past the hour on each reset. I've checked against the cron logs, but I can't find anything that would be scheduled on the Dom0 at that time. The logs also show that after running mii-tool to reset the ethernet adapter, connectivity has returned straight away. The network adapter uses the r8169 kernel module, and shows as: 05:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 06) I have a DomU backup script that runs *in* a DomU at 01:00 each night - that causes a lot of disk activity - but alas, that time hasn't lined up with anything as yet... Still seem to be fidgeting in the dark :( -- Steven Haigh ? net...@crc.id.au ? http://www.crc.id.au ? +61 (3) 9001 6090? 0412 935 897 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:316 dev_watchdog+0x217/0x220
On 31/05/17 00:18, Boris Ostrovsky wrote: > On 05/30/2017 06:27 AM, Steven Haigh wrote: >> Just wanted to give this a nudge to try and get some suggestions on >> where to go / what to do about this. >> >> On 28/05/17 09:44, Steven Haigh wrote: >>> The last couple of days running on kernel 4.9.29 and 4.9.30 with Xen >>> 4.9.0-rc6 I've had a number of ethernet lock ups that have taken my >>> system off the network. >>> >>> This is a new development - but I'm not sure if its kernel or xen related. > > Since noone seems to have seen this it would be useful to narrow it down > a bit. > > Do you observe this on rc5? Or with 4.9.28 kernel? Any particular load > that you are using? Do you see this on a specific NIC? This install is currently using xen 4.9-rc7 and kernel 4.9.30. I would say that there may be a connection between occurrences between disk activity and the ethernet adapter locking up - but I haven't been able to prove this in any valid way yet. I am currently running this script on the server in question to try and get a log of how often the adapter locks up. I only added the logger line tonight - so I don't have a great deal of historical data to add as yet. #!/bin/bash while true; do ping -c1 10.1.1.2 >& /dev/null if [ $? != 0 ]; then logger 'No response. Resetting enp5s0' mii-tool -R enp5s0 fi sleep 5 done What I have right now in dmesg + journalctl is: # dmesg [221834.898685] r8169 :05:00.0 enp5s0: link down [221834.898768] br10: port 1(vlan10) entered disabled state [221834.898827] br203: port 1(vlan203) entered disabled state [221834.905380] r8169 :05:00.0 enp5s0: link up [221834.905748] br10: port 1(vlan10) entered blocking state [221834.905749] br10: port 1(vlan10) entered forwarding state [221834.906162] br203: port 1(vlan203) entered blocking state [221834.906162] br203: port 1(vlan203) entered forwarding state [221834.906176] r8169 :05:00.0 enp5s0: link down [221835.949483] br10: port 1(vlan10) entered disabled state [221835.949515] br203: port 1(vlan203) entered disabled state [221838.069998] r8169 :05:00.0 enp5s0: link up [221838.070538] br10: port 1(vlan10) entered blocking state [221838.070540] br10: port 1(vlan10) entered forwarding state [221838.071055] br203: port 1(vlan203) entered blocking state [221838.071057] br203: port 1(vlan203) entered forwarding state # journalctl | grep Resetting May 31 00:20:10 xenhost: No response. Resetting enp5s0 > Have you checked hypervisor log (xl dmesg)? The last lines I see in 'xl dmesg' are: (XEN) Scrubbing Free RAM on 1 nodes using 4 CPUs (XEN) .done. (XEN) Initial low memory virq threshold set at 0x4000 pages. (XEN) Std. Loglevel: Errors and warnings (XEN) Guest Loglevel: Nothing (Rate-limited: Errors and warnings) (XEN) *** Serial input -> DOM0 (type 'CTRL-a' three times to switch input to Xen) (XEN) Freed 456kB init memory This would indicate that nothing additional is being logged here. If it matters, the xl info follows: # xl info host : xenhost release: 4.9.30-1.el7xen.x86_64 version: #1 SMP Fri May 26 06:16:37 AEST 2017 machine: x86_64 nr_cpus: 4 max_cpu_id : 3 nr_nodes : 1 cores_per_socket : 4 threads_per_core : 1 cpu_mhz: 3303 hw_caps: bfebfbff:179ae3bf:28100800:0001:0001:::0100 virt_caps : hvm total_memory : 16308 free_memory: 1785 sharing_freed_memory : 0 sharing_used_memory: 0 outstanding_claims : 0 free_cpus : 0 xen_major : 4 xen_minor : 9 xen_extra : -rc xen_version: 4.9-rc xen_caps : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 hvm-3.0-x86_32p hvm-3.0-x86_64 xen_scheduler : credit2 xen_pagesize : 4096 platform_params: virt_start=0x8000 xen_changeset : xen_commandline: placeholder dom0_mem=2048M cpufreq=xen dom0_max_vcpus=1 dom0_vcpus_pin sched=credit2 console=tty0 console=com1 com1=115200,8n1 cc_compiler: gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-11) cc_compile_by : mockbuild cc_compile_domain : crc.id.au cc_compile_date: Sun May 28 10:08:40 AEST 2017 build_id : 0848a8631a9064b3de53cdfe71c996e929ce2539 xend_config_format : 4 -- Steven Haigh Email: net...@crc.id.au Web: https://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897 signature.asc Description: OpenPGP digital signature ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:316 dev_watchdog+0x217/0x220
Just wanted to give this a nudge to try and get some suggestions on where to go / what to do about this. On 28/05/17 09:44, Steven Haigh wrote: > The last couple of days running on kernel 4.9.29 and 4.9.30 with Xen > 4.9.0-rc6 I've had a number of ethernet lock ups that have taken my > system off the network. > > This is a new development - but I'm not sure if its kernel or xen related. > > in dmesg, I see the following: > WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:316 > dev_watchdog+0x217/0x220 > NETDEV WATCHDOG: enp5s0 (r8169): transmit queue 0 timed out > Modules linked in: bridge 8021q garp stp llc btrfs dm_mod > crct10dif_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul > glue_helper ablk_helper cryptd raid456 async_raid6_recov async_memcpy > async_pq ppdev iTCO_wdt async_xor xor iTCO_vendor_support async_tx > raid6_pq pcspkr i2c_i801 i2c_smbus pl2303 usbserial sg lpc_ich mfd_core > tpm_infineon parport_pc parport shpchp mei_me mei xenfs xen_privcmd > ip_tables xfs libcrc32c raid1 sd_mod i915 i2c_algo_bit drm_kms_helper > drm crc32c_intel serio_raw ahci libahci i2c_core r8169 mii sata_mv video > xen_acpi_processor xen_pciback xen_netback xen_gntalloc xen_gntdev > xen_evtchn ipv6 crc_ccitt autofs4 > CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.9.30-1.el7xen.x86_64 #1 > Hardware name: Gigabyte Technology Co., Ltd. To be filled by > O.E.M./Z68M-D2H, BIOS U1f 06/13/2012 > 880080203dd8 81348dc5 880080203e28 > 880080203e18 81081711 013c80203e10 > 88000526a000 0001 88000526a000 > Call Trace: > > [] dump_stack+0x63/0x8e > [] __warn+0xd1/0xf0 > [] warn_slowpath_fmt+0x4f/0x60 > [] dev_watchdog+0x217/0x220 > [] ? dev_deactivate_queue.constprop.27+0x60/0x60 > [] call_timer_fn+0x3a/0x130 > [] run_timer_softirq+0x191/0x420 > [] ? handle_percpu_irq+0x3a/0x50 > [] ? generic_handle_irq+0x22/0x30 > [] __do_softirq+0xd6/0x287 > [] irq_exit+0xa5/0xb0 > [] xen_evtchn_do_upcall+0x35/0x50 > [] xen_do_hypervisor_callback+0x1e/0x40 > > [] ? xen_hypercall_sched_op+0xa/0x20 > [] ? xen_hypercall_sched_op+0xa/0x20 > [] ? __tick_nohz_idle_enter+0x2c9/0x3c0 > [] ? xen_safe_halt+0x10/0x20 > [] ? default_idle+0x23/0xd0 > [] ? arch_cpu_idle+0xf/0x20 > [] ? default_idle_call+0x2c/0x40 > [] ? cpu_startup_entry+0x17a/0x210 > [] ? rest_init+0x77/0x80 > [] ? start_kernel+0x435/0x442 > [] ? set_init_arg+0x55/0x55 > [] ? x86_64_start_reservations+0x2a/0x2c > [] ? xen_start_kernel+0x547/0x553 > ---[ end trace 2f33c440640c78e5 ]--- > > All network activity out that ethernet port dies until either: > 1) The ethernet cable is unplugged & replugged, or > 2) I run: mii-tool -R enp5s0 > > Either causes the ethernet adapter to be reset. > > Any suggestions if this is Xen or kernel related? > > > > ___ > Xen-devel mailing list > Xen-devel@lists.xen.org > https://lists.xen.org/xen-devel > -- Steven Haigh Email: net...@crc.id.au Web: https://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897 signature.asc Description: OpenPGP digital signature ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:316 dev_watchdog+0x217/0x220
The last couple of days running on kernel 4.9.29 and 4.9.30 with Xen 4.9.0-rc6 I've had a number of ethernet lock ups that have taken my system off the network. This is a new development - but I'm not sure if its kernel or xen related. in dmesg, I see the following: WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:316 dev_watchdog+0x217/0x220 NETDEV WATCHDOG: enp5s0 (r8169): transmit queue 0 timed out Modules linked in: bridge 8021q garp stp llc btrfs dm_mod crct10dif_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd raid456 async_raid6_recov async_memcpy async_pq ppdev iTCO_wdt async_xor xor iTCO_vendor_support async_tx raid6_pq pcspkr i2c_i801 i2c_smbus pl2303 usbserial sg lpc_ich mfd_core tpm_infineon parport_pc parport shpchp mei_me mei xenfs xen_privcmd ip_tables xfs libcrc32c raid1 sd_mod i915 i2c_algo_bit drm_kms_helper drm crc32c_intel serio_raw ahci libahci i2c_core r8169 mii sata_mv video xen_acpi_processor xen_pciback xen_netback xen_gntalloc xen_gntdev xen_evtchn ipv6 crc_ccitt autofs4 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.9.30-1.el7xen.x86_64 #1 Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./Z68M-D2H, BIOS U1f 06/13/2012 880080203dd8 81348dc5 880080203e28 880080203e18 81081711 013c80203e10 88000526a000 0001 88000526a000 Call Trace: [] dump_stack+0x63/0x8e [] __warn+0xd1/0xf0 [] warn_slowpath_fmt+0x4f/0x60 [] dev_watchdog+0x217/0x220 [] ? dev_deactivate_queue.constprop.27+0x60/0x60 [] call_timer_fn+0x3a/0x130 [] run_timer_softirq+0x191/0x420 [] ? handle_percpu_irq+0x3a/0x50 [] ? generic_handle_irq+0x22/0x30 [] __do_softirq+0xd6/0x287 [] irq_exit+0xa5/0xb0 [] xen_evtchn_do_upcall+0x35/0x50 [] xen_do_hypervisor_callback+0x1e/0x40 [] ? xen_hypercall_sched_op+0xa/0x20 [] ? xen_hypercall_sched_op+0xa/0x20 [] ? __tick_nohz_idle_enter+0x2c9/0x3c0 [] ? xen_safe_halt+0x10/0x20 [] ? default_idle+0x23/0xd0 [] ? arch_cpu_idle+0xf/0x20 [] ? default_idle_call+0x2c/0x40 [] ? cpu_startup_entry+0x17a/0x210 [] ? rest_init+0x77/0x80 [] ? start_kernel+0x435/0x442 [] ? set_init_arg+0x55/0x55 [] ? x86_64_start_reservations+0x2a/0x2c [] ? xen_start_kernel+0x547/0x553 ---[ end trace 2f33c440640c78e5 ]--- All network activity out that ethernet port dies until either: 1) The ethernet cable is unplugged & replugged, or 2) I run: mii-tool -R enp5s0 Either causes the ethernet adapter to be reset. Any suggestions if this is Xen or kernel related? -- Steven Haigh Email: net...@crc.id.au Web: https://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897 signature.asc Description: OpenPGP digital signature ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 0/3] xen/blkback: several fixes of resource management
On 2017-05-16 16:23, Juergen Gross wrote: Destroying a Xen guest domain while it was doing I/Os via xen-blkback leaked several resources, including references of the guest's memory pages. This patch series addresses those leaks by correcting usage of reference counts and the sequence when to free which resource. The series applies on top of commit 2d4456c73a487abe ("block: xen-blkback: add null check to avoid null pointer dereference") in Jens Axboe's tree kernel/git/axboe/linux-block.git Juergen Gross (3): xen/blkback: fix disconnect while I/Os in flight xen/blkback: don't free be structure too early xen/blkback: don't use xen_blkif_get() in xen-blkback kthread drivers/block/xen-blkback/blkback.c | 3 --- drivers/block/xen-blkback/common.h | 1 + drivers/block/xen-blkback/xenbus.c | 15 --- 3 files changed, 9 insertions(+), 10 deletions(-) Tested-by: Steven Haigh <net...@crc.id.au> I've had a report that a new message is logged on destroy sometimes: vif vif-1-0 vif1.0: Guest Rx stalled This may be a different issue - however the main fix of this patch set is fully functional. -- Steven Haigh Email: net...@crc.id.au Web: https://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH for-4.9 v2] build: stubdom and tools should depend on public header target
Can confirm this fixes the problems I was seeing. Tested with multiple builds on both RHEL6 and RHEL7. No further issues found. Tested-by: Steven Haigh <net...@crc.id.au> On 18/05/17 00:26, Wei Liu wrote: > Build can fail if stubdom build is run before tools build because: > > 1. tools/include build uses relative path and depends on XEN_OS > 2. stubdom needs tools/include to be built, at which time XEN_OS is >mini-os and corresponding symlinks are created > 3. libraries inside tools needs tools/include to be built, at which >time XEN_OS is the host os name, but symlinks won't be created >because they are already there > 4. libraries get the wrong headers and fail to build > > Since both tools and stubdom build need the public headers, we build > tools/include before stubdom and tools. Remove runes in stubdom and > tools to avoid building tools/include more than once. > > Provide a new dist target for tools/include. Hook up the install, > clean, dist and distclean targets for tools/include. > > The new arrangement ensures tools build gets the correct headers > because XEN_OS is set to host os when building tools/include. As for > stubdom, it explicitly links to the mini-os directory without relying > on XEN_OS so it should fine. > > Reported-by: Steven Haigh <net...@crc.id.au> > Signed-off-by: Wei Liu <wei.l...@citrix.com> > --- > Cc: Steven Haigh <net...@crc.id.au> > Cc: Ian Jackson <ian.jack...@eu.citrix.com> > Cc: Samuel Thibault <samuel.thiba...@ens-lyon.org> > Cc: Julien Grall <julien.gr...@arm.com> > --- > Makefile | 14 +++--- > stubdom/Makefile | 1 - > tools/Makefile | 3 +-- > tools/include/Makefile | 2 ++ > 4 files changed, 14 insertions(+), 6 deletions(-) > > diff --git a/Makefile b/Makefile > index 084588e11e..3e1e065537 100644 > --- a/Makefile > +++ b/Makefile > @@ -38,9 +38,14 @@ mini-os-dir-force-update: mini-os-dir > export XEN_TARGET_ARCH > export DESTDIR > > +.PHONY: build-tools-public-headers > +build-tools-public-headers: > + $(MAKE) -C tools/include > + > # build and install everything into the standard system directories > .PHONY: install > install: $(TARGS_INSTALL) > + $(MAKE) -C tools/include install > > .PHONY: build > build: $(TARGS_BUILD) > @@ -50,11 +55,11 @@ build-xen: > $(MAKE) -C xen build > > .PHONY: build-tools > -build-tools: > +build-tools: build-tools-public-headers > $(MAKE) -C tools build > > .PHONY: build-stubdom > -build-stubdom: mini-os-dir > +build-stubdom: mini-os-dir build-tools-public-headers > $(MAKE) -C stubdom build > ifeq (x86_64,$(XEN_TARGET_ARCH)) > XEN_TARGET_ARCH=x86_32 $(MAKE) -C stubdom pv-grub > @@ -75,6 +80,7 @@ test: > .PHONY: dist > dist: DESTDIR=$(DISTDIR)/install > dist: $(TARGS_DIST) dist-misc > + make -C tools/include dist > > dist-misc: > $(INSTALL_DIR) $(DISTDIR)/ > @@ -101,7 +107,7 @@ install-tools: > $(MAKE) -C tools install > > .PHONY: install-stubdom > -install-stubdom: mini-os-dir > +install-stubdom: mini-os-dir build-tools-public-headers > $(MAKE) -C stubdom install > ifeq (x86_64,$(XEN_TARGET_ARCH)) > XEN_TARGET_ARCH=x86_32 $(MAKE) -C stubdom install-grub > @@ -168,6 +174,7 @@ src-tarball: subtree-force-update-all > > .PHONY: clean > clean: $(TARGS_CLEAN) > + $(MAKE) -C tools/include clean > > .PHONY: clean-xen > clean-xen: > @@ -191,6 +198,7 @@ clean-docs: > # clean, but blow away tarballs > .PHONY: distclean > distclean: $(TARGS_DISTCLEAN) > + $(MAKE) -C tools/include distclean > rm -f config/Toplevel.mk > rm -rf dist > rm -rf config.log config.status config.cache autom4te.cache > diff --git a/stubdom/Makefile b/stubdom/Makefile > index aef705dd1e..db01827070 100644 > --- a/stubdom/Makefile > +++ b/stubdom/Makefile > @@ -355,7 +355,6 @@ LINK_DIRS := libxc-$(XEN_TARGET_ARCH) xenstore $(foreach > dir,$(LINK_LIBS_DIRS),l > LINK_STAMPS := $(foreach dir,$(LINK_DIRS),$(dir)/stamp) > > mk-headers-$(XEN_TARGET_ARCH): $(IOEMU_LINKFARM_TARGET) $(LINK_STAMPS) > - $(MAKE) -C $(XEN_ROOT)/tools/include > mkdir -p include/xen && \ >ln -sf $(wildcard $(XEN_ROOT)/xen/include/public/*.h) include/xen > && \ >ln -sf $(addprefix $(XEN_ROOT)/xen/include/public/,arch-x86 hvm io > xsm) include/xen && \ > diff --git a/tools/Makefile b/tools/Makefile > index 1396d95b50..496428e3a9 100644 > --- a/tools/Makefile > +++ b/tools/Makefile > @@ -5,7 +5,6 @@ export PKG_CONFIG_DIR = $(CURDIR)/pkg-config > i
Re: [Xen-devel] [PATCH for-4.9 0/2] build: fix tools and stubdom build
On 16/05/17 20:47, Wei Liu wrote: > Wei Liu (2): > tools/Rules.mk: honour CPPFLAGS in header check > build: fix tools/include and stubdom build > > stubdom/Makefile | 13 +++-- > tools/Rules.mk | 2 +- > tools/include/Makefile | 34 ++ > 3 files changed, 22 insertions(+), 27 deletions(-) I have been seeing mixed results with these patches. I can confirm that they seem to fix the problem with building on RHEL7 - however on RHEL6, the packages still fail to build. I have copied the build log to: https://cloud.crc.id.au/index.php/s/iTWJE3A1TQBhgDq So far: EL7 - Successful builds: 4/4 EL6 - Successful builds: 0/4 -- Steven Haigh Email: net...@crc.id.au Web: https://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897 signature.asc Description: OpenPGP digital signature ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] null domains after xl destroy
On 2017-05-16 10:49, Glenn Enright wrote: On 15/05/17 21:57, Juergen Gross wrote: On 13/05/17 06:02, Glenn Enright wrote: On 09/05/17 21:24, Roger Pau Monné wrote: On Mon, May 08, 2017 at 11:10:24AM +0200, Juergen Gross wrote: On 04/05/17 00:17, Glenn Enright wrote: On 04/05/17 04:58, Steven Haigh wrote: On 04/05/17 01:53, Juergen Gross wrote: On 03/05/17 12:45, Steven Haigh wrote: Just wanted to give this a little nudge now people seem to be back on deck... Glenn, could you please give the attached patch a try? It should be applied on top of the other correction, the old debug patch should not be applied. I have added some debug output to make sure we see what is happening. This patch is included in kernel-xen-4.9.26-1 It should be in the repos now. Still seeing the same issue. Without the extra debug patch all I see in the logs after destroy is this... xen-blkback: xen_blkif_disconnect: busy xen-blkback: xen_blkif_free: delayed = 0 Hmm, to me it seems as if some grant isn't being unmapped. Looking at gnttab_unmap_refs_async() I wonder how this is supposed to work: I don't see how a grant would ever be unmapped in case of page_count(item->pages[pc]) > 1 in __gnttab_unmap_refs_async(). All it does is deferring the call to the unmap operation again and again. Or am I missing something here? No, I don't think you are missing anything, but I cannot see how this can be solved in a better way, unmapping a page that's still referenced is certainly not the best option, or else we risk triggering a page-fault elsewhere. IMHO, gnttab_unmap_refs_async should have a timeout, and return an error at some point. Also, I'm wondering whether there's a way to keep track of who has references on a specific page, but so far I haven't been able to figure out how to get this information from Linux. Also, I've noticed that __gnttab_unmap_refs_async uses page_count, shouldn't it use page_ref_count instead? Roger. In case it helps, I have continued to work on this. I notices processed left behind (under 4.9.27). The same issue is ongoing. # ps auxf | grep [x]vda root 2983 0.0 0.0 0 0 ?S01:44 0:00 \_ [1.xvda1-1] root 5457 0.0 0.0 0 0 ?S02:06 0:00 \_ [3.xvda1-1] root 7382 0.0 0.0 0 0 ?S02:36 0:00 \_ [4.xvda1-1] root 9668 0.0 0.0 0 0 ?S02:51 0:00 \_ [6.xvda1-1] root 11080 0.0 0.0 0 0 ?S02:57 0:00 \_ [7.xvda1-1] # xl list Name ID Mem VCPUs State Time(s) Domain-0 0 1512 2 r- 118.5 (null)1 8 4 --p--d 43.8 (null)3 8 4 --p--d 6.3 (null)4 8 4 --p--d 73.4 (null)6 8 4 --p--d 14.7 (null)7 8 4 --p--d 30 Those all have... [root 11080]# cat wchan xen_blkif_schedule [root 11080]# cat stack [] xen_blkif_schedule+0x418/0xb40 [] kthread+0xe5/0x100 [] ret_from_fork+0x25/0x30 [] 0x And found another reference count bug. Would you like to give the attached patch (to be applied additionally to the previous ones) a try? Juergen This seems to have solved the issue in 4.9.28, with all three patches applied. Awesome! On my main test machine I can no longer replicate what I was originally seeing, and in dmesg I now see this flow... xen-blkback: xen_blkif_disconnect: busy xen-blkback: xen_blkif_free: delayed = 1 xen-blkback: xen_blkif_free: delayed = 0 xl list is clean, xenstore looks right. No extraneous processes left over. Thankyou Juergen, so much. Really appreciate your persistence with this. Anything I can do to help push this upstream please let me know. Feel free to add a reported-by line with my name if you think it appropriate. This is good news. Juergen, Can I request a full patch set posted to the list (plz CC me) - and I'll ensure we can build the kernel with all 3 (?) patches applied and test properly. I'll build up a complete kernel with those patches and give a tested-by if all goes well. -- Steven Haigh Email: net...@crc.id.au Web: https://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] 4.9rc4: Cannot build with higher than -j4 - was: linux.c:27:28: fatal error: xen/sys/evtchn.h: No such file or directory
On 10/05/17 23:02, Steven Haigh wrote: > On 10/05/17 01:20, M A Young wrote: >> On Tue, 9 May 2017, Steven Haigh wrote: >> >>> I'm trying to use the same build procedure I had for working correctly >>> for Xen 4.7 & 4.8.1 - but am coming across this error: >>> >>> gcc -DPIC -m64 -DBUILD_ID -fno-strict-aliasing -std=gnu99 -Wall >>> -Wstrict-prototypes -Wdeclaration-after-statement >>> -Wno-unused-but-set-variable -Wno-unused-local-typedefs -g3 -O0 >>> -fno-omit-frame-pointer -D__XEN_INTERFACE_VERSION__=__XE >>> N_LATEST_INTERFACE_VERSION__ -MMD -MF .linux.opic.d -D_LARGEFILE_SOURCE >>> -D_LARGEFILE64_SOURCE -Werror -Wmissing-prototypes -I./include >>> -I/builddir/build/BUILD/xen-4.9.0-rc4/tools/libs/evtchn/../../../tools/include >>> -I/builddir/build/BUI >>> LD/xen-4.9.0-rc4/tools/libs/evtchn/../../../tools/libs/toollog/include >>> -I/builddir/build/BUILD/xen-4.9.0-rc4/tools/libs/evtchn/../../../tools/include >>> -fPIC -c -o linux.opic linux.c >>> mv headers.chk.new headers.chk >>> linux.c:27:28: fatal error: xen/sys/evtchn.h: No such file or directory >>> #include >>> ^ >>> compilation terminated. >>> linux.c:27:28: fatal error: xen/sys/evtchn.h: No such file or directory >>> #include >>> ^ >>> compilation terminated. >>> >>> Any clues as to what to start pulling apart that changed between 4.8.1 >>> and 4.9.0-rc4 that could cause this? >> >> It worked for me in a test build, eg. see one of the builds at >> https://copr.fedorainfracloud.org/coprs/myoung/xentest/build/549124/ > > Ok, after lots of debugging, when I run 'make dist', I usually use the > macro for smp building, so I end up with: > make %{?_smp_mflags} dist > > It seems this is hit and miss as to it actually working. > > I have had a 100% success rate (but slow builds) with: > make dist > > Trying with 'make -j4 dist' seems to work the couple of times I've tried it. > > This seems to be a new problem that I haven't come across before in 4.4, > 4.5, 4.6, 4.7 or my initial 4.8.1 builds - so its new to 4.9.0 rc's. > > The consensus on #xen seems to be that there is a race between libs & > include - and that these are supposed to be built in sequence and not > parallel. > > I'm a little over my depth now - as I assume this heads into Makefile land. > > If it helps, there is a full build log available at: > https://cloud.crc.id.au/index.php/s/iTWJE3A1TQBhgDq > > I've committed my current progress in my git tree: > https://xen.crc.id.au/git/?p=xen49;a=tree > > Right now, we're looking at lines 304 / 305 of SPECS/xen49.spec Just wanted to give this a nudge. It seems if you build with above -j4 (on a machine with suitable number of cores), the build will fail. This is a degradation from any version previous to 4.9. -- Steven Haigh Email: net...@crc.id.au Web: https://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897 signature.asc Description: OpenPGP digital signature ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] 4.9rc4: linux.c:27:28: fatal error: xen/sys/evtchn.h: No such file or directory
On 10/05/17 01:20, M A Young wrote: > On Tue, 9 May 2017, Steven Haigh wrote: > >> I'm trying to use the same build procedure I had for working correctly >> for Xen 4.7 & 4.8.1 - but am coming across this error: >> >> gcc -DPIC -m64 -DBUILD_ID -fno-strict-aliasing -std=gnu99 -Wall >> -Wstrict-prototypes -Wdeclaration-after-statement >> -Wno-unused-but-set-variable -Wno-unused-local-typedefs -g3 -O0 >> -fno-omit-frame-pointer -D__XEN_INTERFACE_VERSION__=__XE >> N_LATEST_INTERFACE_VERSION__ -MMD -MF .linux.opic.d -D_LARGEFILE_SOURCE >> -D_LARGEFILE64_SOURCE -Werror -Wmissing-prototypes -I./include >> -I/builddir/build/BUILD/xen-4.9.0-rc4/tools/libs/evtchn/../../../tools/include >> -I/builddir/build/BUI >> LD/xen-4.9.0-rc4/tools/libs/evtchn/../../../tools/libs/toollog/include >> -I/builddir/build/BUILD/xen-4.9.0-rc4/tools/libs/evtchn/../../../tools/include >> -fPIC -c -o linux.opic linux.c >> mv headers.chk.new headers.chk >> linux.c:27:28: fatal error: xen/sys/evtchn.h: No such file or directory >> #include >> ^ >> compilation terminated. >> linux.c:27:28: fatal error: xen/sys/evtchn.h: No such file or directory >> #include >> ^ >> compilation terminated. >> >> Any clues as to what to start pulling apart that changed between 4.8.1 >> and 4.9.0-rc4 that could cause this? > > It worked for me in a test build, eg. see one of the builds at > https://copr.fedorainfracloud.org/coprs/myoung/xentest/build/549124/ Ok, after lots of debugging, when I run 'make dist', I usually use the macro for smp building, so I end up with: make %{?_smp_mflags} dist It seems this is hit and miss as to it actually working. I have had a 100% success rate (but slow builds) with: make dist Trying with 'make -j4 dist' seems to work the couple of times I've tried it. This seems to be a new problem that I haven't come across before in 4.4, 4.5, 4.6, 4.7 or my initial 4.8.1 builds - so its new to 4.9.0 rc's. The consensus on #xen seems to be that there is a race between libs & include - and that these are supposed to be built in sequence and not parallel. I'm a little over my depth now - as I assume this heads into Makefile land. If it helps, there is a full build log available at: https://cloud.crc.id.au/index.php/s/iTWJE3A1TQBhgDq I've committed my current progress in my git tree: https://xen.crc.id.au/git/?p=xen49;a=tree Right now, we're looking at lines 304 / 305 of SPECS/xen49.spec -- Steven Haigh Email: net...@crc.id.au Web: https://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897 signature.asc Description: OpenPGP digital signature ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] 4.9rc4: linux.c:27:28: fatal error: xen/sys/evtchn.h: No such file or directory
I'm trying to use the same build procedure I had for working correctly for Xen 4.7 & 4.8.1 - but am coming across this error: gcc -DPIC -m64 -DBUILD_ID -fno-strict-aliasing -std=gnu99 -Wall -Wstrict-prototypes -Wdeclaration-after-statement -Wno-unused-but-set-variable -Wno-unused-local-typedefs -g3 -O0 -fno-omit-frame-pointer -D__XEN_INTERFACE_VERSION__=__XE N_LATEST_INTERFACE_VERSION__ -MMD -MF .linux.opic.d -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -Werror -Wmissing-prototypes -I./include -I/builddir/build/BUILD/xen-4.9.0-rc4/tools/libs/evtchn/../../../tools/include -I/builddir/build/BUI LD/xen-4.9.0-rc4/tools/libs/evtchn/../../../tools/libs/toollog/include -I/builddir/build/BUILD/xen-4.9.0-rc4/tools/libs/evtchn/../../../tools/include -fPIC -c -o linux.opic linux.c mv headers.chk.new headers.chk linux.c:27:28: fatal error: xen/sys/evtchn.h: No such file or directory #include ^ compilation terminated. linux.c:27:28: fatal error: xen/sys/evtchn.h: No such file or directory #include ^ compilation terminated. Any clues as to what to start pulling apart that changed between 4.8.1 and 4.9.0-rc4 that could cause this? -- Steven Haigh Email: net...@crc.id.au Web: https://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897 signature.asc Description: OpenPGP digital signature ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] null domains after xl destroy
On 04/05/17 01:53, Juergen Gross wrote: > On 03/05/17 12:45, Steven Haigh wrote: >> Just wanted to give this a little nudge now people seem to be back on >> deck... > > Glenn, could you please give the attached patch a try? > > It should be applied on top of the other correction, the old debug > patch should not be applied. > > I have added some debug output to make sure we see what is happening. This patch is included in kernel-xen-4.9.26-1 It should be in the repos now. -- Steven Haigh Email: net...@crc.id.au Web: https://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897 signature.asc Description: OpenPGP digital signature ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] null domains after xl destroy
n_blkif_disconnect with a call to >>> xen_blk_drain_io instead, and making xen_blkif_disconnect return void >>> (to >>> prevent further issues like this one). >> >> Glenn, >> >> can you please try the attached patch (in dom0)? >> >> >> Juergen >> > > (resending with full CC list) > > I'm back. After testing unfortunately I'm still seeing the leak. The > below trace is with the debug patch applied as well under 4.9.25. It > looks very similar to me. I am still able to replicate this reliably. > > Regards, Glenn > http://rimuhosting.com > > [ cut here ] > WARNING: CPU: 0 PID: 19 at drivers/block/xen-blkback/xenbus.c:511 > xen_blkbk_remove+0x138/0x140 > Modules linked in: ebt_ip xen_pciback xen_netback xen_gntalloc > xen_gntdev xen_evtchn xenfs xen_privcmd xt_CT ipt_REJECT nf_reject_ipv4 > ebtable_filter ebtables xt_hashlimit xt_recent xt_state iptable_security > iptable_raw iptable_mangle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 > nf_nat_ipv4 nf_nat nf_conntrack iptable_filter ip_tables bridge stp llc > ipv6 crc_ccitt ppdev parport_pc parport serio_raw i2c_i801 i2c_smbus > i2c_core sg e1000e ptp pps_core i3000_edac edac_core raid1 sd_mod ahci > libahci floppy dm_mirror dm_region_hash dm_log dm_mod > CPU: 0 PID: 19 Comm: xenwatch Not tainted 4.9.25-1.el6xen.x86_64 #1 > Hardware name: Supermicro PDSML/PDSML+, BIOS 6.00 08/27/2007 > c90040cfbb98 8136b76f 0013 > c90040cfbbe8 8108007d > ea141720 01ff41334434 8801 88004d3aedc0 > Call Trace: > [] dump_stack+0x67/0x98 > [] __warn+0xfd/0x120 > [] warn_slowpath_null+0x1d/0x20 > [] xen_blkbk_remove+0x138/0x140 > [] xenbus_dev_remove+0x47/0xa0 > [] __device_release_driver+0xb4/0x160 > [] device_release_driver+0x2d/0x40 > [] bus_remove_device+0x124/0x190 > [] device_del+0x112/0x210 > [] ? xenbus_read+0x53/0x70 > [] device_unregister+0x22/0x60 > [] frontend_changed+0xad/0x4c0 > [] xenbus_otherend_changed+0xc7/0x140 > [] ? _raw_spin_unlock_irqrestore+0x16/0x20 > [] frontend_changed+0x10/0x20 > [] xenwatch_thread+0x9c/0x140 > [] ? woken_wake_function+0x20/0x20 > [] ? schedule+0x3a/0xa0 > [] ? _raw_spin_unlock_irqrestore+0x16/0x20 > [] ? complete+0x4d/0x60 > [] ? split+0xf0/0xf0 > [] kthread+0xe5/0x100 > [] ? kthread+0xcd/0x100 > [] ? __kthread_init_worker+0x40/0x40 > [] ? __kthread_init_worker+0x40/0x40 > [] ? __kthread_init_worker+0x40/0x40 > [] ret_from_fork+0x25/0x30 > ---[ end trace ea3a48c80e4ad79d ]--- > > ___ > Xen-devel mailing list > Xen-devel@lists.xen.org > https://lists.xen.org/xen-devel -- Steven Haigh Email: net...@crc.id.au Web: https://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897 signature.asc Description: OpenPGP digital signature ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] null domains after xl destroy
On 20/04/17 02:22, Steven Haigh wrote: > On 19/04/17 20:09, Juergen Gross wrote: >> On 19/04/17 09:16, Roger Pau Monné wrote: >>> On Wed, Apr 19, 2017 at 06:39:41AM +0200, Juergen Gross wrote: >>>> On 19/04/17 03:02, Glenn Enright wrote: >>>>> Thanks Juergen. I applied that, to our 4.9.23 dom0 kernel, which still >>>>> shows the issue. When replicating the leak I now see this trace (via >>>>> dmesg). Hopefully that is useful. >>>>> >>>>> Please note, I'm going to be offline next week, but am keen to keep on >>>>> with this, it may just be a while before I followup is all. >>>>> >>>>> Regards, Glenn >>>>> http://rimuhosting.com >>>>> >>>>> >>>>> [ cut here ] >>>>> WARNING: CPU: 0 PID: 19 at drivers/block/xen-blkback/xenbus.c:508 >>>>> xen_blkbk_remove+0x138/0x140 >>>>> Modules linked in: xen_pciback xen_netback xen_gntalloc xen_gntdev >>>>> xen_evtchn xenfs xen_privcmd xt_CT ipt_REJECT nf_reject_ipv4 >>>>> ebtable_filter ebtables xt_hashlimit xt_recent xt_state iptable_security >>>>> iptable_raw igle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 >>>>> nf_nat_ipv4 nf_nat nf_conntrack iptable_filter ip_tables bridge stp llc >>>>> ipv6 crc_ccitt ppdev parport_pc parport serio_raw sg i2c_i801 i2c_smbus >>>>> i2c_core e1000e ptp p000_edac edac_core raid1 sd_mod ahci libahci floppy >>>>> dm_mirror dm_region_hash dm_log dm_mod >>>>> CPU: 0 PID: 19 Comm: xenwatch Not tainted 4.9.23-1.el6xen.x86_64 #1 >>>>> Hardware name: Supermicro PDSML/PDSML+, BIOS 6.00 08/27/2007 >>>>> c90040cfbba8 8136b61f 0013 >>>>> c90040cfbbf8 8108007d >>>>> ea0001373fe0 01fc33394434 8801 88004d93fac0 >>>>> Call Trace: >>>>> [] dump_stack+0x67/0x98 >>>>> [] __warn+0xfd/0x120 >>>>> [] warn_slowpath_null+0x1d/0x20 >>>>> [] xen_blkbk_remove+0x138/0x140 >>>>> [] xenbus_dev_remove+0x47/0xa0 >>>>> [] __device_release_driver+0xb4/0x160 >>>>> [] device_release_driver+0x2d/0x40 >>>>> [] bus_remove_device+0x124/0x190 >>>>> [] device_del+0x112/0x210 >>>>> [] ? xenbus_read+0x53/0x70 >>>>> [] device_unregister+0x22/0x60 >>>>> [] frontend_changed+0xad/0x4c0 >>>>> [] ? schedule_tail+0x1e/0xc0 >>>>> [] xenbus_otherend_changed+0xc7/0x140 >>>>> [] ? _raw_spin_unlock_irqrestore+0x16/0x20 >>>>> [] ? schedule_tail+0x1e/0xc0 >>>>> [] frontend_changed+0x10/0x20 >>>>> [] xenwatch_thread+0x9c/0x140 >>>>> [] ? woken_wake_function+0x20/0x20 >>>>> [] ? schedule+0x3a/0xa0 >>>>> [] ? _raw_spin_unlock_irqrestore+0x16/0x20 >>>>> [] ? complete+0x4d/0x60 >>>>> [] ? split+0xf0/0xf0 >>>>> [] kthread+0xcd/0xf0 >>>>> [] ? schedule_tail+0x1e/0xc0 >>>>> [] ? __kthread_init_worker+0x40/0x40 >>>>> [] ? __kthread_init_worker+0x40/0x40 >>>>> [] ret_from_fork+0x25/0x30 >>>>> ---[ end trace ee097287c9865a62 ]--- >>>> >>>> Konrad, Roger, >>>> >>>> this was triggered by a debug patch in xen_blkbk_remove(): >>>> >>>>if (be->blkif) >>>> - xen_blkif_disconnect(be->blkif); >>>> + WARN_ON(xen_blkif_disconnect(be->blkif)); >>>> >>>> So I guess we need something like xen_blk_drain_io() in case of calls to >>>> xen_blkif_disconnect() which are not allowed to fail (either at the call >>>> sites of xen_blkif_disconnect() or in this function depending on a new >>>> boolean parameter indicating it should wait for outstanding I/Os). >>>> >>>> I can try a patch, but I'd appreciate if you could confirm this wouldn't >>>> add further problems... >>> >>> Hello, >>> >>> Thanks for debugging this, the easiest solution seems to be to replace the >>> ring->inflight atomic_read check in xen_blkif_disconnect with a call to >>> xen_blk_drain_io instead, and making xen_blkif_disconnect return void (to >>> prevent further issues like this one). >> >> Glenn, >> >> can you please try the attached patch (in dom0)? Tested-by: Steven Haigh <net...@crc.id.au> I've tried specifically with 4.9.23 and can no long make this occur in my scenario. Also built with 4.9.24 and expecting similar results. I'm aware Glenn has a much wider test schedule and number of systems than me, however my testing is successful. -- Steven Haigh Email: net...@crc.id.au Web: https://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897 signature.asc Description: OpenPGP digital signature ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] Xen Project Security Team considering becoming a CNA
On 21/04/17 01:57, Ian Jackson wrote: > (Resending with the correct CC (!)) > > We are in discussions with MITRE with a view to potentially becoming a > CVE Numbering Authority. This would probably smooth the process of > getting CVE numbers for XSAs. > > If anyone has any opinions/representations/concerns/whatever about > this, please do share them (here in this thread, or privately to > security@). YES. YES. YES. YES. YES. YES. YES. YES. YES. YES. YES. YES. YES. YES. YES. YES. YES. YES. YES. YES. YES. YES. YES. YES. YES. YES. YES. YES. YES. Yes. -- Steven Haigh Email: net...@crc.id.au Web: https://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897 signature.asc Description: OpenPGP digital signature ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] null domains after xl destroy
On 19/04/17 20:09, Juergen Gross wrote: > On 19/04/17 09:16, Roger Pau Monné wrote: >> On Wed, Apr 19, 2017 at 06:39:41AM +0200, Juergen Gross wrote: >>> On 19/04/17 03:02, Glenn Enright wrote: >>>> Thanks Juergen. I applied that, to our 4.9.23 dom0 kernel, which still >>>> shows the issue. When replicating the leak I now see this trace (via >>>> dmesg). Hopefully that is useful. >>>> >>>> Please note, I'm going to be offline next week, but am keen to keep on >>>> with this, it may just be a while before I followup is all. >>>> >>>> Regards, Glenn >>>> http://rimuhosting.com >>>> >>>> >>>> [ cut here ] >>>> WARNING: CPU: 0 PID: 19 at drivers/block/xen-blkback/xenbus.c:508 >>>> xen_blkbk_remove+0x138/0x140 >>>> Modules linked in: xen_pciback xen_netback xen_gntalloc xen_gntdev >>>> xen_evtchn xenfs xen_privcmd xt_CT ipt_REJECT nf_reject_ipv4 >>>> ebtable_filter ebtables xt_hashlimit xt_recent xt_state iptable_security >>>> iptable_raw igle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 >>>> nf_nat_ipv4 nf_nat nf_conntrack iptable_filter ip_tables bridge stp llc >>>> ipv6 crc_ccitt ppdev parport_pc parport serio_raw sg i2c_i801 i2c_smbus >>>> i2c_core e1000e ptp p000_edac edac_core raid1 sd_mod ahci libahci floppy >>>> dm_mirror dm_region_hash dm_log dm_mod >>>> CPU: 0 PID: 19 Comm: xenwatch Not tainted 4.9.23-1.el6xen.x86_64 #1 >>>> Hardware name: Supermicro PDSML/PDSML+, BIOS 6.00 08/27/2007 >>>> c90040cfbba8 8136b61f 0013 >>>> c90040cfbbf8 8108007d >>>> ea0001373fe0 01fc33394434 8801 88004d93fac0 >>>> Call Trace: >>>> [] dump_stack+0x67/0x98 >>>> [] __warn+0xfd/0x120 >>>> [] warn_slowpath_null+0x1d/0x20 >>>> [] xen_blkbk_remove+0x138/0x140 >>>> [] xenbus_dev_remove+0x47/0xa0 >>>> [] __device_release_driver+0xb4/0x160 >>>> [] device_release_driver+0x2d/0x40 >>>> [] bus_remove_device+0x124/0x190 >>>> [] device_del+0x112/0x210 >>>> [] ? xenbus_read+0x53/0x70 >>>> [] device_unregister+0x22/0x60 >>>> [] frontend_changed+0xad/0x4c0 >>>> [] ? schedule_tail+0x1e/0xc0 >>>> [] xenbus_otherend_changed+0xc7/0x140 >>>> [] ? _raw_spin_unlock_irqrestore+0x16/0x20 >>>> [] ? schedule_tail+0x1e/0xc0 >>>> [] frontend_changed+0x10/0x20 >>>> [] xenwatch_thread+0x9c/0x140 >>>> [] ? woken_wake_function+0x20/0x20 >>>> [] ? schedule+0x3a/0xa0 >>>> [] ? _raw_spin_unlock_irqrestore+0x16/0x20 >>>> [] ? complete+0x4d/0x60 >>>> [] ? split+0xf0/0xf0 >>>> [] kthread+0xcd/0xf0 >>>> [] ? schedule_tail+0x1e/0xc0 >>>> [] ? __kthread_init_worker+0x40/0x40 >>>> [] ? __kthread_init_worker+0x40/0x40 >>>> [] ret_from_fork+0x25/0x30 >>>> ---[ end trace ee097287c9865a62 ]--- >>> >>> Konrad, Roger, >>> >>> this was triggered by a debug patch in xen_blkbk_remove(): >>> >>> if (be->blkif) >>> - xen_blkif_disconnect(be->blkif); >>> + WARN_ON(xen_blkif_disconnect(be->blkif)); >>> >>> So I guess we need something like xen_blk_drain_io() in case of calls to >>> xen_blkif_disconnect() which are not allowed to fail (either at the call >>> sites of xen_blkif_disconnect() or in this function depending on a new >>> boolean parameter indicating it should wait for outstanding I/Os). >>> >>> I can try a patch, but I'd appreciate if you could confirm this wouldn't >>> add further problems... >> >> Hello, >> >> Thanks for debugging this, the easiest solution seems to be to replace the >> ring->inflight atomic_read check in xen_blkif_disconnect with a call to >> xen_blk_drain_io instead, and making xen_blkif_disconnect return void (to >> prevent further issues like this one). > > Glenn, > > can you please try the attached patch (in dom0)? For what its worth, I have applied this in kernel package 4.9.23-2 as follows: * Wed Apr 19 2017 Steven Haigh <net...@crc.id.au> - 4.9.23-2 - xen/blkback: fix disconnect while I/Os in flight Its available from any 'in sync' mirror: https://xen.crc.id.au/downloads/ Feedback welcome for both mine and Juergen's sake. -- Steven Haigh Email: net...@crc.id.au Web: https://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897 signature.asc Description: OpenPGP digital signature ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] Xen 4.8.0 dom0pvh=1 causes ext4 filesystem corruption
On 10/03/17 02:13, Valtteri Kiviniemi wrote: > Hi, > > Yesterday I decided to upgrade my Xen version from 4.6.0 to 4.8.0. I > compiled it from source and at the same time I compiled the latest Linux > kernel (4.10.1). > > When rebooting I decided to try if dom0 PVH would work (with previous > Xen version it just caused kernel panic). Seemed to boot fine until > systemd started mounting the root filesystem and then the console was > filled with ext4 errors. Couldn't even log in. > > Booting with a systemrescuecd and running fsck just caused the whole > filesystem to be re-attached in thousands of small pieces under > lost+found. I was sure that this was a some kind of hardware failure, so > I switched my hard drives and did a clean reinstall for dom0 and tried > again. Again, after a reboot the whole rootfs was completely corrupted. > > Second reinstall and this time I disabled dom0 PVH and the system booted > just fine, and no ext4 errors. My root filesystem is just a simple Linux > software raid1 with ext4 on top of it. > > Now that I started thinking I have also had strange ext4 errors > happening inside my guests, so I also disabled PVH from all the guests. > With guests the ext4 error is always the same: "EXT4-fs error (device > xvda1): ext4_iget:4665: inode #317: comm find: bogus i_mode (135206)" > > Unfortunately I don't have any logs from the dom0 corruption as I can't > even log in to the system when dom0 PVH is enabled. The corruption > happens instantly during system bootup. I have this happen a lot using pvh mode in previous Xen versions. Is it supposed to be 'working' yet or is it still not recommended for use? -- Steven Haigh Email: net...@crc.id.au Web: https://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897 signature.asc Description: OpenPGP digital signature ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] Xen Security Advisory 209 (CVE-2017-2620) - cirrus_bitblt_cputovideo does not check if memory region is safe
On 23/02/17 20:43, Roger Pau Monné wrote: > On Tue, Feb 21, 2017 at 12:00:03PM +, Xen.org security team wrote: >> -BEGIN PGP SIGNED MESSAGE- >> Hash: SHA1 >> >> Xen Security Advisory CVE-2017-2620 / XSA-209 >> version 3 >> >>cirrus_bitblt_cputovideo does not check if memory region is safe >> >> UPDATES IN VERSION 3 >> >> >> Public release. >> >> ISSUE DESCRIPTION >> = >> >> In CIRRUS_BLTMODE_MEMSYSSRC mode the bitblit copy routine >> cirrus_bitblt_cputovideo fails to check wethehr the specified memory >> region is safe. >> >> IMPACT >> == >> >> A malicious guest administrator can cause an out of bounds memory >> write, very likely exploitable as a privilege escalation. >> >> VULNERABLE SYSTEMS >> == >> >> Versions of qemu shipped with all Xen versions are vulnerable. >> >> Xen systems running on x86 with HVM guests, with the qemu process >> running in dom0 are vulnerable. >> >> Only guests provided with the "cirrus" emulated video card can exploit >> the vulnerability. The non-default "stdvga" emulated video card is >> not vulnerable. (With xl the emulated video card is controlled by the >> "stdvga=" and "vga=" domain configuration options.) >> >> ARM systems are not vulnerable. Systems using only PV guests are not >> vulnerable. >> >> For VMs whose qemu process is running in a stub domain, a successful >> attacker will only gain the privileges of that stubdom, which should >> be only over the guest itself. >> >> Both upstream-based versions of qemu (device_model_version="qemu-xen") >> and `traditional' qemu (device_model_version="qemu-xen-traditional") >> are vulnerable. >> >> MITIGATION >> == >> >> Running only PV guests will avoid the issue. >> >> Running HVM guests with the device model in a stubdomain will mitigate >> the issue. >> >> Changing the video card emulation to stdvga (stdvga=1, vga="stdvga", >> in the xl domain configuration) will avoid the vulnerability. >> >> CREDITS >> === >> >> This issue was discovered by Gerd Hoffmann of Red Hat. >> >> RESOLUTION >> == >> >> Applying the appropriate attached patch resolves this issue. >> >> xsa209-qemuu.patch qemu-xen, qemu upstream >> (no backport yet)qemu-xen-traditional > > It would be nice to mention that (at least on QEMU shipped with 4.7) the > following patch is also needed for the XSA-209 fix to build correctly: > > 52b7f43c8fa185ab856bcaacda7abc9a6fc07f84 > display: cirrus: ignore source pitch value as needed in blit_is_unsafe I did request that an updated XSA be issued with this patch - as at the moment, nobody will be able to apply the XSA only patch to any other version of Xen. -- Steven Haigh Email: net...@crc.id.au Web: https://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897 signature.asc Description: OpenPGP digital signature ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] dom0pvh issue with XEN 4.8.0
On Sunday, 5 February 2017 4:05:32 PM AEDT G.R. wrote: > Hi all, > dom0pvh=1 is not working well for me with XEN 4.8.0 + linux kernel 4.9.2. > > The system boots with no obvious issue. > But many user mode application are suffering from segfault, which > makes the dom0 not useable: The segfault always come from libc-2.24.so > while it works just fine in PV dom0. > I have no idea why, but those segfault would kill my ssh connection > while sshd is not showing up in the victim list. > > Some examples: > Feb 5 14:25:28 gaia kernel: [ 123.446346] getty[3044]: segfault at 0 > ip 7f5e769e6c60 sp 7ffc57bc0a98 error 6 in > libc-2.24.so[7f5e769b7000+195000] > Feb 5 14:29:04 gaia kernel: [ 339.671742] grep[4195]: segfault at 0 > ip 7f5d3b95ac60 sp 7ffcc1620bb8 error 6 in > libc-2.24.so[7f5d3b92b000+195000] > Feb 5 14:29:23 gaia kernel: [ 358.495888] tail[4203]: segfault at 0 > ip 7f751314bc60 sp 7fffe5ce5e48 error 6 in > libc-2.24.so[7f751311c000+195000] > Feb 5 14:35:06 gaia kernel: [ 701.314247] bash[4323]: segfault at 0 > ip 7f3fef30ec60 sp 7ffd48cc2058 error 6 in > libc-2.24.so[7f3fef2df000+195000] > Feb 5 14:48:43 gaia kernel: [ 1518.809924] ls[4910]: segfault at 0 ip > 7f29e9bc1c60 sp 7ffd712752b8 error 6 in > libc-2.24.so[7f29e9b92000+195000] > > Any suggestion on how to get this fixed? > I don't think I can do live debug since the userspace is quite unstable. > On the other hand, dmesg from both dom0 && XEN looks just fine. > > PS: I'm using a custom compiled dom0 kernel. Is there any specific > kernel config is required to get dom0pvh=1 work? I've been down this path before - and the only thing that gave me stability back was to disable the pvh options. I had everything from disk corruption to what you mention with apps while trying this option. -- Steven Haigh Email: net...@crc.id.au Web: http://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897 signature.asc Description: This is a digitally signed message part. ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] Xen and Shared Clipboard.
Save yourself a *ton* of hassle and use GNU Screen. On 31/12/16 22:12, Jason Long wrote: > I write it? I'm a noob in developing and just start learning it :( > > > On Thursday, December 29, 2016 6:43 AM, Konrad Rzeszutek Wilk > <konrad.w...@oracle.com> wrote: > > > On Thu, Dec 29, 2016 at 10:11:42AM +, Jason Long wrote: >> I guess it is a good feature for work and Xen must have it. > > Looking forward for the patch from you on that! > > Thanks! > >> >> On Tue, 12/20/16, Jason Long <hack3r...@yahoo.com > <mailto:hack3r...@yahoo.com>> wrote: >> >> Subject: Xen and Shared Clipboard. >> To: "Xen-devel" <xen-de...@lists.xenproject.org > <mailto:xen-de...@lists.xenproject.org>> >> Date: Tuesday, December 20, 2016, 10:58 PM >> >> Hello.How can I enable Shared >> Clipboard in Xen? I like to copy and paste text from Host to >> Guest or vice versa. >> Thank you. > >> >> ___ >> Xen-devel mailing list >> Xen-devel@lists.xen.org <mailto:Xen-devel@lists.xen.org> >> https://lists.xen.org/xen-devel > > > > > > ___ > Xen-devel mailing list > Xen-devel@lists.xen.org > https://lists.xen.org/xen-devel > -- Steven Haigh Email: net...@crc.id.au Web: https://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897 signature.asc Description: OpenPGP digital signature ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] Proposed plan and URL name for new VM to download xen tarballs (ftp.xenproject.org)
On 12/10/16 21:38, Ian Jackson wrote: > Ian Jackson writes ("Re: Proposed plan and URL name for new VM to download > xen tarballs (ftp.xenproject.org)"): >> Sure, I don't have an opinion. I have changed this, so it's now >> under: >> https://downloads.xenproject.org/release/xen/ > > No-one has objected, so we are now committing to this. The new URLs > will be primary for the forthcoming RC (Wei will send an announcement > when it's ready). I missed this previously in the rest of the list happenings. I'm actually glad this is happening. Having predictable naming / pathing of the xen tarballs is fantastic. I lothe going via the web site to download the file.html which ends up being the filename on the system. Would be much nicer to have a direct download link automatically generated that works. As such, +1 from me :) -- Steven Haigh Email: net...@crc.id.au Web: https://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897 signature.asc Description: OpenPGP digital signature ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] Resend: [PATCH] v3 - Add exclusive locking option to block-iscsi
On 2016-05-09 14:22, Steven Haigh wrote: On 2016-05-05 15:52, Steven Haigh wrote: On 2016-05-05 12:32, Steven Haigh wrote: Overview If you're using iSCSI, you can mount a target by multiple Dom0 machines on the same target. For non-cluster aware filesystems, this can lead to disk corruption and general bad times by all. The iSCSI protocol allows the use of persistent reservations as per the SCSI disk spec. Low level SCSI commands for locking are handled by the sg_persist program (bundled with sg3_utils package in EL). The aim of this patch is to create a 'locktarget=y' option specified within the disk 'target' command for iSCSI to lock the target in exclusive mode on VM start with a key generated from the local systems IP, and release this lock on the shutdown of the DomU. Example Config: disk= ['script=block-iscsi,vdev=xvda,target=iqn=iqn.1986-03.com.sun:02:mytarget,portal=iscsi.example.com,locktarget=y'] In writing this, I have also re-factored parts of the script to put some things in what I believe to be a better place to make expansion easier. This is mainly in removing functions that purely call other functions with no actual code execution. Signed-off-by: Steven Haigh <net...@crc.id.au> (on a side note, first time I've submitted a patch to the list and I'm currently stuck on a webmail client, so apologies in advance if this all goes wrong ;) Changes in v2: Bugfix: Call find_device to locate the /dev/sdX component of the iSCSI target before trying to run unlock_device(). Apologies for this oversight. Changes in v3: * Split the block-iscsi cleanup into a seperate patch (block-iscsi-locking-v3_01_simplify_block-iscsi.patch). * Add locking in second patch file (block-iscsi-locking-v3_02_add_locking.patch) Resend of patches. There was a mention of having to add further documentation to xl-disk-configuration.txt - however there are no mentions of block-iscsi script within the documentation to add. As such, it probably would be out of place to add things here. The locktarget option is presented directly to the block-iscsi script and not evaluated anywhere outside this script. -- Steven Haigh Email: net...@crc.id.au Web: https://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897 --- block-iscsi.orig2016-05-09 15:12:02.489495212 +1000 +++ block-iscsi 2016-05-09 15:16:35.447480532 +1000 @@ -31,16 +31,6 @@ echo $1 | sed "s/^\("$2"\)//" } -check_tools() -{ -if ! command -v iscsiadm > /dev/null 2>&1; then -fatal "Unable to find iscsiadm tool" -fi -if [ "$multipath" = "y" ] && ! command -v multipath > /dev/null 2>&1; then -fatal "Unable to find multipath" -fi -} - # Sets the following global variables based on the params field passed in as # a parameter: iqn, portal, auth_method, user, multipath, password parse_target() @@ -52,12 +42,18 @@ case $param in iqn=*) iqn=$(remove_label $param "iqn=") +if ! command -v iscsiadm > /dev/null 2>&1; then +fatal "Could not find iscsiadm tool." +fi ;; portal=*) portal=$(remove_label $param "portal=") ;; multipath=*) multipath=$(remove_label $param "multipath=") +if ! command -v multipath > /dev/null 2>&1; then +fatal "Multipath selected, but no multipath tools found" +fi ;; esac done @@ -96,40 +92,6 @@ fi } -# Attaches the target $iqn in $portal and sets $dev to point to the -# multipath device -attach() -{ -do_or_die iscsiadm -m node --targetname "$iqn" -p "$portal" --login > /dev/null -find_device -} - -# Discovers targets in $portal and checks that $iqn is one of those targets -# Also sets the auth parameters to attach the device -prepare() -{ -# Check if target is already opened -iscsiadm -m session 2>&1 | grep -q "$iqn" && fatal "Device already opened" -# Discover portal targets -iscsiadm -m discovery -t st -p $portal 2>&1 | grep -q "$iqn" || \ -fatal "No matching target iqn found" -} - -# Attaches the device and writes xenstore backend entries to connect -# the device -add() -{ -attach -write_dev $dev -} - -# Disconnects the device -remove() -{ -find_device -do_or_die iscsiadm -m node --targetname "$iqn" -p "$portal" --logout > /dev/null -} - command=$1 target=$(xenstore-read $XENBUS_PATH/params || true) if [ -z "$target" ]; then @@ -138,15 +100,21 @@ parse_target "$target" -check_tools || exit 1 - case $command in add) -prepare -add +# Check if target is already opened +iscsiadm -m session 2>&1 | grep -
Re: [Xen-devel] [PATCH] v3 - Add exclusive locking option to block-iscsi
On 2016-05-12 21:02, Wei Liu wrote: Hi Steven On Mon, May 09, 2016 at 02:22:48PM +1000, Steven Haigh wrote: On 2016-05-05 15:52, Steven Haigh wrote: >On 2016-05-05 12:32, Steven Haigh wrote: >>Overview >> >>If you're using iSCSI, you can mount a target by multiple Dom0 >>machines on the same target. For non-cluster aware filesystems, this >>can lead to disk corruption and general bad times by all. The iSCSI >>protocol allows the use of persistent reservations as per the SCSI >>disk spec. Low level SCSI commands for locking are handled by the >>sg_persist program (bundled with sg3_utils package in EL). >> >>The aim of this patch is to create a 'locktarget=y' option specified >>within the disk 'target' command for iSCSI to lock the target in >>exclusive mode on VM start with a key generated from the local systems >>IP, and release this lock on the shutdown of the DomU. >> >>Example Config: >>disk= >>['script=block-iscsi,vdev=xvda,target=iqn=iqn.1986-03.com.sun:02:mytarget,portal=iscsi.example.com,locktarget=y'] You seem to suggest an extension (locktarget) to disk spec as well but you patch doesn't contain modification to docs/txt/misc/xl-disk-configuration.txt. Correct. There is no documentation for the existing block-iscsi script within xl-disk-configuration.txt. In fact, there is no mention at all regarding block-iscsi in any of the documentation that I can see. -- Steven Haigh Email: net...@crc.id.au Web: https://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH] v3 - Add exclusive locking option to block-iscsi
On 2016-05-05 15:52, Steven Haigh wrote: On 2016-05-05 12:32, Steven Haigh wrote: Overview If you're using iSCSI, you can mount a target by multiple Dom0 machines on the same target. For non-cluster aware filesystems, this can lead to disk corruption and general bad times by all. The iSCSI protocol allows the use of persistent reservations as per the SCSI disk spec. Low level SCSI commands for locking are handled by the sg_persist program (bundled with sg3_utils package in EL). The aim of this patch is to create a 'locktarget=y' option specified within the disk 'target' command for iSCSI to lock the target in exclusive mode on VM start with a key generated from the local systems IP, and release this lock on the shutdown of the DomU. Example Config: disk= ['script=block-iscsi,vdev=xvda,target=iqn=iqn.1986-03.com.sun:02:mytarget,portal=iscsi.example.com,locktarget=y'] In writing this, I have also re-factored parts of the script to put some things in what I believe to be a better place to make expansion easier. This is mainly in removing functions that purely call other functions with no actual code execution. Signed-off-by: Steven Haigh <net...@crc.id.au> (on a side note, first time I've submitted a patch to the list and I'm currently stuck on a webmail client, so apologies in advance if this all goes wrong ;) Changes in v2: Bugfix: Call find_device to locate the /dev/sdX component of the iSCSI target before trying to run unlock_device(). Apologies for this oversight. Changes in v3: * Split the block-iscsi cleanup into a seperate patch (block-iscsi-locking-v3_01_simplify_block-iscsi.patch). * Add locking in second patch file (block-iscsi-locking-v3_02_add_locking.patch) -- Steven Haigh Email: net...@crc.id.au Web: https://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897--- block-iscsi.orig2016-05-09 15:12:02.489495212 +1000 +++ block-iscsi 2016-05-09 15:16:35.447480532 +1000 @@ -31,16 +31,6 @@ echo $1 | sed "s/^\("$2"\)//" } -check_tools() -{ -if ! command -v iscsiadm > /dev/null 2>&1; then -fatal "Unable to find iscsiadm tool" -fi -if [ "$multipath" = "y" ] && ! command -v multipath > /dev/null 2>&1; then -fatal "Unable to find multipath" -fi -} - # Sets the following global variables based on the params field passed in as # a parameter: iqn, portal, auth_method, user, multipath, password parse_target() @@ -52,12 +42,18 @@ case $param in iqn=*) iqn=$(remove_label $param "iqn=") +if ! command -v iscsiadm > /dev/null 2>&1; then +fatal "Could not find iscsiadm tool." +fi ;; portal=*) portal=$(remove_label $param "portal=") ;; multipath=*) multipath=$(remove_label $param "multipath=") +if ! command -v multipath > /dev/null 2>&1; then +fatal "Multipath selected, but no multipath tools found" +fi ;; esac done @@ -96,40 +92,6 @@ fi } -# Attaches the target $iqn in $portal and sets $dev to point to the -# multipath device -attach() -{ -do_or_die iscsiadm -m node --targetname "$iqn" -p "$portal" --login > /dev/null -find_device -} - -# Discovers targets in $portal and checks that $iqn is one of those targets -# Also sets the auth parameters to attach the device -prepare() -{ -# Check if target is already opened -iscsiadm -m session 2>&1 | grep -q "$iqn" && fatal "Device already opened" -# Discover portal targets -iscsiadm -m discovery -t st -p $portal 2>&1 | grep -q "$iqn" || \ -fatal "No matching target iqn found" -} - -# Attaches the device and writes xenstore backend entries to connect -# the device -add() -{ -attach -write_dev $dev -} - -# Disconnects the device -remove() -{ -find_device -do_or_die iscsiadm -m node --targetname "$iqn" -p "$portal" --logout > /dev/null -} - command=$1 target=$(xenstore-read $XENBUS_PATH/params || true) if [ -z "$target" ]; then @@ -138,15 +100,21 @@ parse_target "$target" -check_tools || exit 1 - case $command in add) -prepare -add +# Check if target is already opened +iscsiadm -m session 2>&1 | grep -q "$iqn" && fatal "Device already opened" +# Discover portal targets +iscsiadm -m discovery -t st -p $portal 2>&1 | grep -q "$iqn" || \ +fatal "No matching target iqn found" + +## Login to the iSCSI target. +do_or_die iscsiadm -m node --targetname "$iqn" -p "$portal" --login > /dev/null + +write_dev $dev ;; remove)
Re: [Xen-devel] [PATCH] v2 - Add exclusive locking option to block-iscsi
On 6/05/2016 7:09 PM, Roger Pau Monné wrote: > On Thu, May 05, 2016 at 03:52:30PM +1000, Steven Haigh wrote: >> On 2016-05-05 12:32, Steven Haigh wrote: >>> Overview >>> >>> If you're using iSCSI, you can mount a target by multiple Dom0 >>> machines on the same target. For non-cluster aware filesystems, this >>> can lead to disk corruption and general bad times by all. The iSCSI >>> protocol allows the use of persistent reservations as per the SCSI >>> disk spec. Low level SCSI commands for locking are handled by the >>> sg_persist program (bundled with sg3_utils package in EL). >>> >>> The aim of this patch is to create a 'locktarget=y' option specified >>> within the disk 'target' command for iSCSI to lock the target in >>> exclusive mode on VM start with a key generated from the local systems >>> IP, and release this lock on the shutdown of the DomU. >>> >>> Example Config: >>> disk= >>> ['script=block-iscsi,vdev=xvda,target=iqn=iqn.1986-03.com.sun:02:mytarget,portal=iscsi.example.com,locktarget=y'] >>> >>> In writing this, I have also re-factored parts of the script to put >>> some things in what I believe to be a better place to make expansion >>> easier. This is mainly in removing functions that purely call other >>> functions with no actual code execution. >>> >>> Signed-off-by: Steven Haigh <net...@crc.id.au> >>> >>> (on a side note, first time I've submitted a patch to the list and I'm >>> currently stuck on a webmail client, so apologies in advance if this >>> all goes wrong ;) >> >> Changes in v2: >> Bugfix: Call find_device to locate the /dev/sdX component of the iSCSI >> target before trying to run unlock_device(). >> >> Apologies for this oversight. > > Thanks for the patch! A couple of comments below. > >> -- >> Steven Haigh >> >> Email: net...@crc.id.au >> Web: https://www.crc.id.au >> Phone: (03) 9001 6090 - 0412 935 897 > >> --- block-iscsi 2016-02-10 01:44:19.0 +1100 >> +++ block-iscsi-lock2016-05-05 15:42:09.557191235 +1000 >> @@ -31,33 +31,37 @@ >> echo $1 | sed "s/^\("$2"\)//" >> } >> >> -check_tools() >> -{ >> -if ! command -v iscsiadm > /dev/null 2>&1; then >> -fatal "Unable to find iscsiadm tool" >> -fi >> -if [ "$multipath" = "y" ] && ! command -v multipath > /dev/null 2>&1; >> then >> -fatal "Unable to find multipath" >> -fi >> -} >> - >> # Sets the following global variables based on the params field passed in as >> # a parameter: iqn, portal, auth_method, user, multipath, password >> parse_target() >> { >> # set multipath default value >> multipath="n" >> -for param in $(echo "$1" | tr "," "\n") >> -do >> +for param in $(echo "$1" | tr "," "\n"); do >> case $param in >> iqn=*) >> iqn=$(remove_label $param "iqn=") >> +if ! command -v iscsiadm > /dev/null 2>&1; then >> +fatal "Could not find iscsiadm tool." >> +fi >> ;; >> portal=*) >> portal=$(remove_label $param "portal=") >> ;; >> multipath=*) >> multipath=$(remove_label $param "multipath=") >> +if ! command -v multipath > /dev/null 2>&1; then >> +fatal "Multipath selected, but no multipath tools found" >> +fi >> +;; >> +locktarget=*) >> +locktarget=$(remove_label $param "locktarget=") >> +if ! command -v sg_persist > /dev/null 2>&1; then >> +fatal "Locking requested but no sg_persist found" >> +fi >> +if ! command -v gethostip > /dev/null 2>&1; then >> +fatal "Locking requested but no gethostip found for key >> generation" >> +fi > > Why don't you just add this to check_tools? In any case, if you want to fold > check_tools functionality into parse_target I think it should be done in a > separate patch in order for it to be easier to review. > > IMHO, I prefer to have both functions separated, because it's
[Xen-devel] [PATCH] v2 - Add exclusive locking option to block-iscsi
On 2016-05-05 12:32, Steven Haigh wrote: Overview If you're using iSCSI, you can mount a target by multiple Dom0 machines on the same target. For non-cluster aware filesystems, this can lead to disk corruption and general bad times by all. The iSCSI protocol allows the use of persistent reservations as per the SCSI disk spec. Low level SCSI commands for locking are handled by the sg_persist program (bundled with sg3_utils package in EL). The aim of this patch is to create a 'locktarget=y' option specified within the disk 'target' command for iSCSI to lock the target in exclusive mode on VM start with a key generated from the local systems IP, and release this lock on the shutdown of the DomU. Example Config: disk= ['script=block-iscsi,vdev=xvda,target=iqn=iqn.1986-03.com.sun:02:mytarget,portal=iscsi.example.com,locktarget=y'] In writing this, I have also re-factored parts of the script to put some things in what I believe to be a better place to make expansion easier. This is mainly in removing functions that purely call other functions with no actual code execution. Signed-off-by: Steven Haigh <net...@crc.id.au> (on a side note, first time I've submitted a patch to the list and I'm currently stuck on a webmail client, so apologies in advance if this all goes wrong ;) Changes in v2: Bugfix: Call find_device to locate the /dev/sdX component of the iSCSI target before trying to run unlock_device(). Apologies for this oversight. -- Steven Haigh Email: net...@crc.id.au Web: https://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897 --- block-iscsi 2016-02-10 01:44:19.0 +1100 +++ block-iscsi-lock2016-05-05 15:42:09.557191235 +1000 @@ -31,33 +31,37 @@ echo $1 | sed "s/^\("$2"\)//" } -check_tools() -{ -if ! command -v iscsiadm > /dev/null 2>&1; then -fatal "Unable to find iscsiadm tool" -fi -if [ "$multipath" = "y" ] && ! command -v multipath > /dev/null 2>&1; then -fatal "Unable to find multipath" -fi -} - # Sets the following global variables based on the params field passed in as # a parameter: iqn, portal, auth_method, user, multipath, password parse_target() { # set multipath default value multipath="n" -for param in $(echo "$1" | tr "," "\n") -do +for param in $(echo "$1" | tr "," "\n"); do case $param in iqn=*) iqn=$(remove_label $param "iqn=") +if ! command -v iscsiadm > /dev/null 2>&1; then +fatal "Could not find iscsiadm tool." +fi ;; portal=*) portal=$(remove_label $param "portal=") ;; multipath=*) multipath=$(remove_label $param "multipath=") +if ! command -v multipath > /dev/null 2>&1; then +fatal "Multipath selected, but no multipath tools found" +fi +;; +locktarget=*) +locktarget=$(remove_label $param "locktarget=") +if ! command -v sg_persist > /dev/null 2>&1; then +fatal "Locking requested but no sg_persist found" +fi +if ! command -v gethostip > /dev/null 2>&1; then +fatal "Locking requested but no gethostip found for key generation" +fi ;; esac done @@ -96,38 +100,29 @@ fi } -# Attaches the target $iqn in $portal and sets $dev to point to the -# multipath device -attach() -{ -do_or_die iscsiadm -m node --targetname "$iqn" -p "$portal" --login > /dev/null -find_device -} -# Discovers targets in $portal and checks that $iqn is one of those targets -# Also sets the auth parameters to attach the device -prepare() +lock_device() { -# Check if target is already opened -iscsiadm -m session 2>&1 | grep -q "$iqn" && fatal "Device already opened" -# Discover portal targets -iscsiadm -m discovery -t st -p $portal 2>&1 | grep -q "$iqn" || \ -fatal "No matching target iqn found" -} - -# Attaches the device and writes xenstore backend entries to connect -# the device -add() -{ -attach -write_dev $dev +## Lock the iSCSI target as Exclusive Access. +key=$(gethostip -x $(uname -n)) +if ! sg_persist -d ${dev} -o -G -S ${key} > /dev/null; then +unlock_device +iscsiadm -m node --targetname "$iqn" -p "$portal" --logout > /dev/null +fatal "iSCSI LOCK: Failed to register with target" +fi +if ! sg_persist -d ${dev} -o -R -K ${key} -T 6 > /dev/null; then +unlock_device +iscsiadm -m node
[Xen-devel] [PATCH] v1 - Add exclusive locking option to block-iscsi
Overview If you're using iSCSI, you can mount a target by multiple Dom0 machines on the same target. For non-cluster aware filesystems, this can lead to disk corruption and general bad times by all. The iSCSI protocol allows the use of persistent reservations as per the SCSI disk spec. Low level SCSI commands for locking are handled by the sg_persist program (bundled with sg3_utils package in EL). The aim of this patch is to create a 'locktarget=y' option specified within the disk 'target' command for iSCSI to lock the target in exclusive mode on VM start with a key generated from the local systems IP, and release this lock on the shutdown of the DomU. Example Config: disk= ['script=block-iscsi,vdev=xvda,target=iqn=iqn.1986-03.com.sun:02:mytarget,portal=iscsi.example.com,locktarget=y'] In writing this, I have also re-factored parts of the script to put some things in what I believe to be a better place to make expansion easier. This is mainly in removing functions that purely call other functions with no actual code execution. Signed-off-by: Steven Haigh <net...@crc.id.au> (on a side note, first time I've submitted a patch to the list and I'm currently stuck on a webmail client, so apologies in advance if this all goes wrong ;) -- Steven Haigh Email: net...@crc.id.au Web: https://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897--- block-iscsi 2016-02-10 01:44:19.0 +1100 +++ block-iscsi-lock2016-05-05 12:30:24.831903983 +1000 @@ -31,33 +31,37 @@ echo $1 | sed "s/^\("$2"\)//" } -check_tools() -{ -if ! command -v iscsiadm > /dev/null 2>&1; then -fatal "Unable to find iscsiadm tool" -fi -if [ "$multipath" = "y" ] && ! command -v multipath > /dev/null 2>&1; then -fatal "Unable to find multipath" -fi -} - # Sets the following global variables based on the params field passed in as # a parameter: iqn, portal, auth_method, user, multipath, password parse_target() { # set multipath default value multipath="n" -for param in $(echo "$1" | tr "," "\n") -do +for param in $(echo "$1" | tr "," "\n"); do case $param in iqn=*) iqn=$(remove_label $param "iqn=") +if ! command -v iscsiadm > /dev/null 2>&1; then +fatal "Could not find iscsiadm tool." +fi ;; portal=*) portal=$(remove_label $param "portal=") ;; multipath=*) multipath=$(remove_label $param "multipath=") +if ! command -v multipath > /dev/null 2>&1; then +fatal "Multipath selected, but no multipath tools found" +fi +;; +locktarget=*) +locktarget=$(remove_label $param "locktarget=") +if ! command -v sg_persist > /dev/null 2>&1; then +fatal "Locking requested but no sg_persist found" +fi +if ! command -v gethostip > /dev/null 2>&1; then +fatal "Locking requested but no gethostip found for key generation" +fi ;; esac done @@ -96,38 +100,29 @@ fi } -# Attaches the target $iqn in $portal and sets $dev to point to the -# multipath device -attach() -{ -do_or_die iscsiadm -m node --targetname "$iqn" -p "$portal" --login > /dev/null -find_device -} - -# Discovers targets in $portal and checks that $iqn is one of those targets -# Also sets the auth parameters to attach the device -prepare() -{ -# Check if target is already opened -iscsiadm -m session 2>&1 | grep -q "$iqn" && fatal "Device already opened" -# Discover portal targets -iscsiadm -m discovery -t st -p $portal 2>&1 | grep -q "$iqn" || \ -fatal "No matching target iqn found" -} -# Attaches the device and writes xenstore backend entries to connect -# the device -add() +lock_device() { -attach -write_dev $dev +## Lock the iSCSI target as Exclusive Access. +key=$(gethostip -x $(uname -n)) +if ! sg_persist -d ${dev} -o -G -S ${key} > /dev/null; then +unlock_device +iscsiadm -m node --targetname "$iqn" -p "$portal" --logout > /dev/null +fatal "iSCSI LOCK: Failed to register with target" +fi +if ! sg_persist -d ${dev} -o -R -K ${key} -T 6 > /dev/null; then +unlock_device +iscsiadm -m node --targetname "$iqn" -p "$portal" --logout > /dev/null +fatal "iSCSI LOCK: Failed to set persistent reservation" +fi } -# Disconnects the device
Re: [Xen-devel] block-iscsi with Xen 4.5 / 4.6
On 4/05/2016 7:37 PM, Roger Pau Monné wrote: > On Wed, May 04, 2016 at 06:41:26PM +1000, Steven Haigh wrote: >> On 4/05/2016 5:34 PM, Roger Pau Monné wrote: >>> On Wed, May 04, 2016 at 03:06:23PM +1000, Steven Haigh wrote: >>> It is important for us to use the '-e' in order to make sure all the >>> failure >>> points are correctly handled, without the '-e' some command might fail and >>> the script wouldn't realize. >> >> I honestly think this is pretty nasty. While it may not be true of all >> scripts, the block-iscsi script can only really fail in a couple of >> places - yet we have this set of procedures called: >> >> parse_target -> check_tools -> prepare -> add -> attach -> find_device >> -> write_dev. >> >> At least check_tools, prepare, add, attach, find_device could all be >> rolled into a single function - as the majority of the rest is 1-4 lines >> of code. > > No, check_tools is used by both the attach and the detach path, so it cannot > be rolled into a single function together with the other ones, and the same > applies to mostly all other functions (find_device is also shared between > the add and remove functions). > > IMHO, I think the current code is fine because each function has a small > logical task to accomplish, so it's easy to make sure each function does > what it's supposed to do, nothing more and nothing less. Batching everything > into one big function would make this harder. > > That doesn't mean that I'm not open to improving it, so if you think it > would be better/easier using some other logical organization patches are > welcome :). Right now, my changes are here: http://paste.fedoraproject.org/362462/62356799/ It works perfectly well if you're the ONLY device connecting to the specified iSCSI target, but falls apart when something else has the lock and doesn't clean up after itself. From using set -x in there, I can see this when it runs: ++ dirname /etc/xen/scripts/block-iscsi-lock + dir=/etc/xen/scripts + . /etc/xen/scripts/block-common.sh +++ dirname /etc/xen/scripts/block-iscsi-lock ++ dir=/etc/xen/scripts ++ . /etc/xen/scripts/xen-hotplug-common.sh dirname /etc/xen/scripts/block-iscsi-lock +++ dir=/etc/xen/scripts +++ . /etc/xen/scripts/hotplugpath.sh sbindir=/usr/sbin bindir=/usr/bin LIBEXEC=/usr/lib/xen LIBEXEC_BIN=/usr/lib/xen/bin libdir=/usr/lib64 SHAREDIR=/usr/share XENFIRMWAREDIR=/usr/lib/xen/boot XEN_CONFIG_DIR=/etc/xen XEN_SCRIPT_DIR=/etc/xen/scripts XEN_LOCK_DIR=/var/lock XEN_RUN_DIR=/var/run/xen XEN_PAGING_DIR=/var/lib/xen/xenpaging XEN_DUMP_DIR=/var/lib/xen/dump +++ . /etc/xen/scripts/logging.sh +++ . /etc/xen/scripts/xen-script-common.sh set -e +++ . /etc/xen/scripts/locking.sh LOCK_BASEDIR=/var/run/xen-hotplug +++ exec Entered lock_device SUN COMSTAR 1.0 Peripheral device type: disk libxl: error: libxl_exec.c:118:libxl_report_child_exitstatus: /etc/xen/scripts/block-iscsi-lock add [22016] exited with error status 99 libxl: error: libxl.c:3127:local_device_attach_cb: unable to add vbd with id 268446976: No such file or directory libxl: error: libxl_bootloader.c:408:bootloader_disk_attached_cb: failed to attach local disk for bootloader execution libxl: error: libxl_bootloader.c:279:bootloader_local_detached_cb: unable to detach locally attached disk libxl: error: libxl_create.c:1142:domcreate_rebuild_done: cannot (re-)build domain: -3 libxl: error: libxl.c:1591:libxl__destroy_domid: non-existant domain 57 libxl: error: libxl.c:1549:domain_destroy_callback: unable to destroy guest with domid 57 libxl: error: libxl.c:1476:domain_destroy_cb: destruction of domain 57 failed On a side note, it looks like the -e is not very uniformly used: block:#!/bin/bash block-enbd:#!/bin/bash block-iscsi:#!/bin/bash -e block-iscsi-lock:#!/bin/bash -x block-nbd:#!/bin/bash block-tap:#!/bin/bash -e external-device-migrate:#!/bin/bash vif2:#!/bin/bash vif-bridge:#!/bin/bash vif-nat:#!/bin/bash vif-openvswitch:#!/bin/bash vif-route:#!/bin/bash vif-setup:#!/bin/bash but probably gets set everywhere via xen-script-common.sh - hence things dying easily. -- Steven Haigh Email: net...@crc.id.au Web: https://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897 signature.asc Description: OpenPGP digital signature ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] block-iscsi with Xen 4.5 / 4.6
On 4/05/2016 6:25 PM, George Dunlap wrote: > On Wed, May 4, 2016 at 8:34 AM, Roger Pau Monné <roger@citrix.com> wrote: >> Hello, >> >> I'm re-adding xen-devel in case someone else also wants to provide feedback. >> >> On Wed, May 04, 2016 at 03:06:23PM +1000, Steven Haigh wrote: >>> Hi Roger, >>> >>> I've been getting some good progress with iSCSI thanks to your insights. >>> >>> I'm now trying to add support for locking via Persistent Reservations to >>> ensure that only one Dom0 can attach / use a single iSCSI target at once. >> >> This might be problematic with migrations. IIRC there's a point during the >> migration where both the sending and the receiving side have the disk open >> at the same time. However Xen always makes sure that only one guest is >> actually accessing the disk, either the one on the receiving side (if >> everything has gone OK) or the one on the senders side (if migration has >> failed). >> >>> In a nutshell, my thoughts are to use the following to 'lock' a device: >>> ## Create a hex key for the lock from the systems IP. >>> key=$(gethostip -x $(uname -n)) >>> sg_persist -d ${dev} -o -G -S ${key} >>> sg_persist -d ${dev} -o -R -K ${key} -T 6 >>> >>> This registers the device, and sets an Exclusive Access (-T 6) flag on >>> the iSCSI device which means nothing else will be able to open the >>> device until the lock is removed. >>> >>> To unlock the device, on remove, we should do something like: >>> key=$(gethostip -x $(uname -n)) >>> sg_persist -d ${dev} -o -L -K ${key} -T 6 >>> sg_persist -d ${dev} -o -G -K ${key} -S 0 >>> >>> This releases the device for other things to use. >>> >>> I've tried putting these in block-iscsi - by using a lock_device and >>> unlock_device function and calling it after find_device in both attach() >>> and remove(). >>> >>> My problems: >>> 1) -e is set on the script - and maybe elsewhere - so any time something >>> returns non-zero, you can't clean up. For example, if you can't get a >>> lock, you should make sure all locks are removed from the host in >>> question and then detach the iSCSI target. >> >> You can avoid this by adding something like: >> >> sg_persist ... || true >> >> Of course you can replace the "true" command with something else, like a >> fatal message or some cleanup code. You can also place the command inside of >> a conditional if you know it might fail: >> >> if ! sg_persist ...; then >> fatal ... >> fi >> >> It is important for us to use the '-e' in order to make sure all the failure >> points are correctly handled, without the '-e' some command might fail and >> the script wouldn't realize. > > I realize I'm a bit in the minority here, but I've always thought this > was rather a strange habit of bash scripts. In every other language, > you check the error codes of things that can fail and you handle them > appropriately. If you're just hacking something together, then "set > -e" is probably OK, but wouldn't it make more sense in an > infrastructure script like this to actually go through and handle all > the errors? Worst case you could just if ! [command] ; then exit 1 ; > fi. Then I'm in the minority too ;) > Regarding the "maybe elsewhere" -- AFAIK the block-scsi script itself > is run directly from libxl, so nothing "above" it should be setting > -e; and it only includes block-common, which does not seem to be > setting it. Right now, even removing -e from line 1 causes something like this to exit the script: if [ $? != 0 ]; then So, if I run the following, the script will always fail: key=$(gethostip -x $(uname -n)) sg_persist -d ${dev} -o -G -S ${key} > /dev/null if [ $? != 0 ]; then iscsiadm -m node -T ${iqn} --logout > /dev/null fatal "Could not obtain lock on $iqn" fi man 3 sg3_utils (http://linux.die.net/man/8/sg3_utils) shows a possible list of exit codes - which it would be useful to consult at least *some* of them with useful errors - such as: Exit status 3: the DEVICE reports that it is not ready for the operation requested. The device may be in the process of becoming ready (e.g. spinning up but not at speed) so the utility may work after a wait. Also possibly a case for locking not being supported. > That said, in theory as Roger said, if you actually are checking the > error code of all your commands (which you should be if you need to do > clean-up on failure), then 'set -e' shouldn't actually be causing an > exit. > > -George > -- Steven Haigh Email: net...@crc.id.au Web: https://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897 signature.asc Description: OpenPGP digital signature ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] block-iscsi with Xen 4.5 / 4.6
On 4/05/2016 5:34 PM, Roger Pau Monné wrote: > Hello, > > I'm re-adding xen-devel in case someone else also wants to provide feedback. > > On Wed, May 04, 2016 at 03:06:23PM +1000, Steven Haigh wrote: >> Hi Roger, >> >> I've been getting some good progress with iSCSI thanks to your insights. >> >> I'm now trying to add support for locking via Persistent Reservations to >> ensure that only one Dom0 can attach / use a single iSCSI target at once. > > This might be problematic with migrations. IIRC there's a point during the > migration where both the sending and the receiving side have the disk open > at the same time. However Xen always makes sure that only one guest is > actually accessing the disk, either the one on the receiving side (if > everything has gone OK) or the one on the senders side (if migration has > failed). True - however I'd like to eventually attempt to commit changes to the project and allow locking to be done as an option - just like iqn / portal / multipath. In my specific use case, its to stop someone accidentally starting the same VM on multiple Dom0's at the same time - which from what I've seen causes disk corruption and all kinds of issues. It leads to people not having a good time. The iSCSI system has a limit to the max connections - however it seems that only applies *per host* meaning max connections = 1 will allow one connection per Dom0. >> In a nutshell, my thoughts are to use the following to 'lock' a device: >> ## Create a hex key for the lock from the systems IP. >> key=$(gethostip -x $(uname -n)) >> sg_persist -d ${dev} -o -G -S ${key} >> sg_persist -d ${dev} -o -R -K ${key} -T 6 >> >> This registers the device, and sets an Exclusive Access (-T 6) flag on >> the iSCSI device which means nothing else will be able to open the >> device until the lock is removed. >> >> To unlock the device, on remove, we should do something like: >> key=$(gethostip -x $(uname -n)) >> sg_persist -d ${dev} -o -L -K ${key} -T 6 >> sg_persist -d ${dev} -o -G -K ${key} -S 0 >> >> This releases the device for other things to use. >> >> I've tried putting these in block-iscsi - by using a lock_device and >> unlock_device function and calling it after find_device in both attach() >> and remove(). >> >> My problems: >> 1) -e is set on the script - and maybe elsewhere - so any time something >> returns non-zero, you can't clean up. For example, if you can't get a >> lock, you should make sure all locks are removed from the host in >> question and then detach the iSCSI target. > > You can avoid this by adding something like: > > sg_persist ... || true > > Of course you can replace the "true" command with something else, like a > fatal message or some cleanup code. You can also place the command inside of > a conditional if you know it might fail: > > if ! sg_persist ...; then > fatal ... > fi > > It is important for us to use the '-e' in order to make sure all the failure > points are correctly handled, without the '-e' some command might fail and > the script wouldn't realize. I honestly think this is pretty nasty. While it may not be true of all scripts, the block-iscsi script can only really fail in a couple of places - yet we have this set of procedures called: parse_target -> check_tools -> prepare -> add -> attach -> find_device -> write_dev. At least check_tools, prepare, add, attach, find_device could all be rolled into a single function - as the majority of the rest is 1-4 lines of code. There are situations where you may want to evaluate the result of sg_persist beyond a simple "worked or failed" - and that seems to be the idea of fatal "The reason that I died is X". >> 2) I can't find an easy way to clean up by doing an iscsiadm --logout if >> the locking fails. > > I'm not really following here, maybe because I don't know that much about > iSCSI. Can you just put whatever code is needed in order to unlock before > doing the logout? Or that's not how it works? Yes, but if one of the two unlocks fails, the script terminates. It makes different error checking *VERY* difficult. If I remove the -e from line #1, the script still acts as if -e is still set - so something else is enforcing that. >> I'm wondering if there is a reason that the script is currently in the >> stucture that it is - or if it just evolved like this? It may be a good >> candidate for a complete re-write :\ > > TBH, I thought this was one of the most clean and well structured block > scripts that Xen has ;). Please don't scare me ;) -- Steven Haigh Email: net...@crc.id.au Web: https://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897 signature.asc Description: OpenPGP digital signature ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] block-iscsi with Xen 4.5 / 4.6
On 16/04/2016 12:30 AM, George Dunlap wrote: > On Fri, Apr 15, 2016 at 7:59 AM, Steven Haigh <net...@crc.id.au> wrote: >> Hi all, >> >> I'm wading through the somewhat confusing world of documentation regarding >> storing DomU disk images on an iSCSI target. >> >> I'm getting an error when using pygrub of: >> OSError: [Errno 2] No such file or directory: >> 'iqn=iqn.1986-03.com.sun:02:ff2d12c0-b709-4ec0-999d-976506c666f5,portal=192.168.133.250' >> >> After much hunting, I came across this post: >> http://lists.xen.org/archives/html/xen-devel/2013-04/msg02796.html >> >> As such, I'm wondering if it is still required to *NOT* use pygrub for >> booting iSCSI DomUs? >> >> If so, what are the alternatives? Using pv-grub / pv-grub2? Something else? >> >> As I'm running EL7.2, I figure if I have to use a pv-grub based solution, it >> would have to be pv-grub2? > > I see you've got a fix already. But even so, if using pv-grub2 is a > possibility for you, it might be worth pursuing anyway: > - it will be much more secure than pygrub > - it will probably more reliable, since it's actually a native grub > binary ported to Xen, while pygrub is just a python script that > attempts to duplicate some of grub's functionality. Hi George, I kind of agree - its on my todo list, but I haven't managed to see much about including it in a .spec build yet. If you've done any work on this so far, I'd be happy to discuss off-list. -- Steven Haigh Email: net...@crc.id.au Web: https://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897 signature.asc Description: OpenPGP digital signature ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] block-iscsi with Xen 4.5 / 4.6
On 15/04/2016 7:01 PM, Roger Pau Monné wrote: > On Fri, Apr 15, 2016 at 06:20:56PM +1000, Steven Haigh wrote: > [...] >> I might have spoken too soon here... I updated this system to 4.6.1 and >> created the DomU again - still seems to fail - although it does actually >> call the block-iscsi script this time: >> >> # xl -vvv create /etc/xen/test1.vm >> Parsing config from /etc/xen/test1.vm >> libxl: debug: libxl_create.c:1560:do_domain_create: ao 0x24ad330: create: >> how=(nil) callback=(nil) poller=0x24b7070 >> libxl: debug: libxl_device.c:269:libxl__device_disk_set_backend: Disk >> vdev=xvda spec.backend=unknown >> libxl: debug: libxl_device.c:207:disk_try_backend: Disk vdev=xvda, uses >> script=... assuming phy backend >> libxl: debug: libxl_device.c:298:libxl__device_disk_set_backend: Disk >> vdev=xvda, using backend phy >> libxl: debug: libxl_create.c:945:initiate_domain_create: running bootloader >> libxl: debug: libxl_device.c:269:libxl__device_disk_set_backend: Disk >> vdev=(null) spec.backend=phy >> libxl: debug: libxl_device.c:207:disk_try_backend: Disk vdev=(null), uses >> script=... assuming phy backend >> libxl: debug: libxl_device.c:269:libxl__device_disk_set_backend: Disk >> vdev=xvde spec.backend=phy >> libxl: debug: libxl_device.c:207:disk_try_backend: Disk vdev=xvde, uses >> script=... assuming phy backend >> libxl: debug: libxl_event.c:639:libxl__ev_xswatch_register: watch >> w=0x24ada00 wpath=/local/domain/0/backend/vbd/0/51776/state token=3/0: >> register slotnum=3 >> libxl: debug: libxl_create.c:1583:do_domain_create: ao 0x24ad330: >> inprogress: poller=0x24b7070, flags=i >> libxl: debug: libxl_event.c:576:watchfd_callback: watch w=0x24ada00 >> wpath=/local/domain/0/backend/vbd/0/51776/state token=3/0: event >> epath=/local/domain/0/backend/vbd/0/51776/state >> libxl: debug: libxl_event.c:884:devstate_callback: backend >> /local/domain/0/backend/vbd/0/51776/state wanted state 2 still waiting state >> 1 >> libxl: debug: libxl_event.c:576:watchfd_callback: watch w=0x24ada00 >> wpath=/local/domain/0/backend/vbd/0/51776/state token=3/0: event >> epath=/local/domain/0/backend/vbd/0/51776/state >> libxl: debug: libxl_event.c:880:devstate_callback: backend >> /local/domain/0/backend/vbd/0/51776/state wanted state 2 ok >> libxl: debug: libxl_event.c:677:libxl__ev_xswatch_deregister: watch >> w=0x24ada00 wpath=/local/domain/0/backend/vbd/0/51776/state token=3/0: >> deregister slotnum=3 >> libxl: debug: libxl_device.c:937:device_backend_callback: calling >> device_backend_cleanup >> libxl: debug: libxl_event.c:691:libxl__ev_xswatch_deregister: watch >> w=0x24ada00: deregister unregistered >> libxl: debug: libxl_linux.c:229:libxl__hotplug_disk: Args and environment >> ready >> libxl: debug: libxl_device.c:1034:device_hotplug: calling hotplug script: >> /etc/xen/scripts/block-iscsi add >> libxl: debug: libxl_aoutils.c:593:libxl__async_exec_start: forking to >> execute: /etc/xen/scripts/block-iscsi add >> libxl: error: libxl_exec.c:118:libxl_report_child_exitstatus: >> /etc/xen/scripts/block-iscsi add [2126] exited with error status 1 >> libxl: debug: libxl_event.c:691:libxl__ev_xswatch_deregister: watch >> w=0x24adb00: deregister unregistered >> libxl: error: libxl_device.c:1084:device_hotplug_child_death_cb: script: >> Device already opened > > The message indicates that you have this device already opened in this > system, this is detected by the following check, that you can also run from > a shell: > > # iscsiadm -m session 2>&1 | grep -q "$iqn" && fatal "Device already opened" > > You will have to perform a logout in order for the hotplug script to > correctly attach it. How right you are :) # iscsiadm -m session tcp: [1] 192.168.133.250:3260,1 iqn.1986-03.com.sun:02:ff2d12c0-b709-4ec0-999d-976506c666f5 (non-flash) # iscsiadm -m node --targetname iqn.1986-03.com.sun:02:ff2d12c0-b709-4ec0-999d-976506c666f5 -p 192.168.133.250 --logout Logging out of session [sid: 1, target: iqn.1986-03.com.sun:02:ff2d12c0-b709-4ec0-999d-976506c666f5, portal: 192.168.133.250,3260] Logout of [sid: 1, target: iqn.1986-03.com.sun:02:ff2d12c0-b709-4ec0-999d-976506c666f5, portal: 192.168.133.250,3260] successful. # iscsiadm -m session iscsiadm: No active sessions. The DomU then started successfully. Thanks for your help. I'll try the previously mentioned patch on 4.5 and see how I go with that next week. -- Steven Haigh Email: net...@crc.id.au Web: https://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897 signature.asc Description: OpenPGP digital signature ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] block-iscsi with Xen 4.5 / 4.6
On 2016-04-15 18:11, Steven Haigh wrote: On 2016-04-15 18:03, Roger Pau Monné wrote: On Fri, Apr 15, 2016 at 05:48:24PM +1000, Steven Haigh wrote: On 2016-04-15 17:46, Roger Pau Monné wrote: > On Fri, Apr 15, 2016 at 05:28:12PM +1000, Steven Haigh wrote: > > On 2016-04-15 17:23, Roger Pau Monné wrote: > > > On Fri, Apr 15, 2016 at 04:59:11PM +1000, Steven Haigh wrote: > > > > Hi all, > > > > > > > > I'm wading through the somewhat confusing world of documentation > > > > regarding > > > > storing DomU disk images on an iSCSI target. > > > > > > > > I'm getting an error when using pygrub of: > > > > OSError: [Errno 2] No such file or directory: 'iqn=iqn.1986-03.com.sun:02:ff2d12c0-b709-4ec0-999d-976506c666f5,portal=192.168.133.250' > > > > > > Hello, > > > > > > It should work. Can you please paste your guest configuration file and > > > the > > > output of the create command with "-vvv"? > > > > DomU config file: > > bootloader = "pygrub" > > name= "test1.vm" > > memory = 2048 > > vcpus = 2 > > cpus= "1-7" > > vif = ['bridge=br-151, vifname=vm.test1'] > > disk= ['script=block-iscsi,vdev=xvda,target=iqn=iqn.1986-03.com.sun:02:ff2d12c0-b709-4ec0-999d-976506c666f5,portal=192.168.133.250'] > > boot= "c" > > > > # xl create /etc/xen/test1.vm -d -c > > Please post the output of xl -vvv create /etc/xen/test1.vm. Whoops - apologies: # xl -vvv create /etc/xen/test1.vm Parsing config from /etc/xen/test1.vm libxl: debug: libxl_create.c:1507:do_domain_create: ao 0x20b7260: create: how=(nil) callback=(nil) poller=0x20b6b30 libxl: debug: libxl_device.c:269:libxl__device_disk_set_backend: Disk vdev=xvda spec.backend=unknown libxl: debug: libxl_device.c:207:disk_try_backend: Disk vdev=xvda, uses script=... assuming phy backend libxl: debug: libxl_device.c:298:libxl__device_disk_set_backend: Disk vdev=xvda, using backend phy libxl: debug: libxl_create.c:907:initiate_domain_create: running bootloader libxl: debug: libxl_device.c:269:libxl__device_disk_set_backend: Disk vdev=(null) spec.backend=phy libxl: debug: libxl_device.c:207:disk_try_backend: Disk vdev=(null), uses script=... assuming phy backend libxl: debug: libxl.c:3064:libxl__device_disk_local_initiate_attach: locally attaching PHY disk iqn=iqn.1986-03.com.sun:02:ff2d12c0-b709-4ec0-999d-976506c666f5,portal=192.168.133.250 Now I remember, this was fixed not long ago, you will need to apply b1882a424ae098d722b19086b16e64b9aeccc7ca to your source tree/package in order to get pygrub working with hotplug scripts [0]. I guess you are using Xen 4.5, because this commit is already present in Xen 4.6, and it should fix your issue. [0] http://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=b1882a424ae098d722b19086b16e64b9aeccc7ca Ahhh - thanks for the pointer. As this is a dev system, its probably easier for me to upgrade it to Xen 4.6 - however I'll take that commit and look at adding it to my Xen 4.5 packages for public consumption. I might have spoken too soon here... I updated this system to 4.6.1 and created the DomU again - still seems to fail - although it does actually call the block-iscsi script this time: # xl -vvv create /etc/xen/test1.vm Parsing config from /etc/xen/test1.vm libxl: debug: libxl_create.c:1560:do_domain_create: ao 0x24ad330: create: how=(nil) callback=(nil) poller=0x24b7070 libxl: debug: libxl_device.c:269:libxl__device_disk_set_backend: Disk vdev=xvda spec.backend=unknown libxl: debug: libxl_device.c:207:disk_try_backend: Disk vdev=xvda, uses script=... assuming phy backend libxl: debug: libxl_device.c:298:libxl__device_disk_set_backend: Disk vdev=xvda, using backend phy libxl: debug: libxl_create.c:945:initiate_domain_create: running bootloader libxl: debug: libxl_device.c:269:libxl__device_disk_set_backend: Disk vdev=(null) spec.backend=phy libxl: debug: libxl_device.c:207:disk_try_backend: Disk vdev=(null), uses script=... assuming phy backend libxl: debug: libxl_device.c:269:libxl__device_disk_set_backend: Disk vdev=xvde spec.backend=phy libxl: debug: libxl_device.c:207:disk_try_backend: Disk vdev=xvde, uses script=... assuming phy backend libxl: debug: libxl_event.c:639:libxl__ev_xswatch_register: watch w=0x24ada00 wpath=/local/domain/0/backend/vbd/0/51776/state token=3/0: register slotnum=3 libxl: debug: libxl_create.c:1583:do_domain_create: ao 0x24ad330: inprogress: poller=0x24b7070, flags=i libxl: debug: libxl_event.c:576:watchfd_callback: watch w=0x24ada00 wpath=/local/domain/0/backend/vbd/0/51776/state token=3/0: event epath=/local/domain/0/backend/vbd/0/51776/state libxl: debug: libxl_event.c:884:devstate_callback: bac
Re: [Xen-devel] block-iscsi with Xen 4.5 / 4.6
On 2016-04-15 18:03, Roger Pau Monné wrote: On Fri, Apr 15, 2016 at 05:48:24PM +1000, Steven Haigh wrote: On 2016-04-15 17:46, Roger Pau Monné wrote: > On Fri, Apr 15, 2016 at 05:28:12PM +1000, Steven Haigh wrote: > > On 2016-04-15 17:23, Roger Pau Monné wrote: > > > On Fri, Apr 15, 2016 at 04:59:11PM +1000, Steven Haigh wrote: > > > > Hi all, > > > > > > > > I'm wading through the somewhat confusing world of documentation > > > > regarding > > > > storing DomU disk images on an iSCSI target. > > > > > > > > I'm getting an error when using pygrub of: > > > > OSError: [Errno 2] No such file or directory: 'iqn=iqn.1986-03.com.sun:02:ff2d12c0-b709-4ec0-999d-976506c666f5,portal=192.168.133.250' > > > > > > Hello, > > > > > > It should work. Can you please paste your guest configuration file and > > > the > > > output of the create command with "-vvv"? > > > > DomU config file: > > bootloader = "pygrub" > > name= "test1.vm" > > memory = 2048 > > vcpus = 2 > > cpus= "1-7" > > vif = ['bridge=br-151, vifname=vm.test1'] > > disk= ['script=block-iscsi,vdev=xvda,target=iqn=iqn.1986-03.com.sun:02:ff2d12c0-b709-4ec0-999d-976506c666f5,portal=192.168.133.250'] > > boot= "c" > > > > # xl create /etc/xen/test1.vm -d -c > > Please post the output of xl -vvv create /etc/xen/test1.vm. Whoops - apologies: # xl -vvv create /etc/xen/test1.vm Parsing config from /etc/xen/test1.vm libxl: debug: libxl_create.c:1507:do_domain_create: ao 0x20b7260: create: how=(nil) callback=(nil) poller=0x20b6b30 libxl: debug: libxl_device.c:269:libxl__device_disk_set_backend: Disk vdev=xvda spec.backend=unknown libxl: debug: libxl_device.c:207:disk_try_backend: Disk vdev=xvda, uses script=... assuming phy backend libxl: debug: libxl_device.c:298:libxl__device_disk_set_backend: Disk vdev=xvda, using backend phy libxl: debug: libxl_create.c:907:initiate_domain_create: running bootloader libxl: debug: libxl_device.c:269:libxl__device_disk_set_backend: Disk vdev=(null) spec.backend=phy libxl: debug: libxl_device.c:207:disk_try_backend: Disk vdev=(null), uses script=... assuming phy backend libxl: debug: libxl.c:3064:libxl__device_disk_local_initiate_attach: locally attaching PHY disk iqn=iqn.1986-03.com.sun:02:ff2d12c0-b709-4ec0-999d-976506c666f5,portal=192.168.133.250 Now I remember, this was fixed not long ago, you will need to apply b1882a424ae098d722b19086b16e64b9aeccc7ca to your source tree/package in order to get pygrub working with hotplug scripts [0]. I guess you are using Xen 4.5, because this commit is already present in Xen 4.6, and it should fix your issue. [0] http://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=b1882a424ae098d722b19086b16e64b9aeccc7ca Ahhh - thanks for the pointer. As this is a dev system, its probably easier for me to upgrade it to Xen 4.6 - however I'll take that commit and look at adding it to my Xen 4.5 packages for public consumption. -- Steven Haigh Email: net...@crc.id.au Web: https://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] block-iscsi with Xen 4.5 / 4.6
On 2016-04-15 17:46, Roger Pau Monné wrote: On Fri, Apr 15, 2016 at 05:28:12PM +1000, Steven Haigh wrote: On 2016-04-15 17:23, Roger Pau Monné wrote: > On Fri, Apr 15, 2016 at 04:59:11PM +1000, Steven Haigh wrote: > > Hi all, > > > > I'm wading through the somewhat confusing world of documentation > > regarding > > storing DomU disk images on an iSCSI target. > > > > I'm getting an error when using pygrub of: > > OSError: [Errno 2] No such file or directory: 'iqn=iqn.1986-03.com.sun:02:ff2d12c0-b709-4ec0-999d-976506c666f5,portal=192.168.133.250' > > Hello, > > It should work. Can you please paste your guest configuration file and > the > output of the create command with "-vvv"? DomU config file: bootloader = "pygrub" name= "test1.vm" memory = 2048 vcpus = 2 cpus= "1-7" vif = ['bridge=br-151, vifname=vm.test1'] disk= ['script=block-iscsi,vdev=xvda,target=iqn=iqn.1986-03.com.sun:02:ff2d12c0-b709-4ec0-999d-976506c666f5,portal=192.168.133.250'] boot= "c" # xl create /etc/xen/test1.vm -d -c Please post the output of xl -vvv create /etc/xen/test1.vm. Whoops - apologies: # xl -vvv create /etc/xen/test1.vm Parsing config from /etc/xen/test1.vm libxl: debug: libxl_create.c:1507:do_domain_create: ao 0x20b7260: create: how=(nil) callback=(nil) poller=0x20b6b30 libxl: debug: libxl_device.c:269:libxl__device_disk_set_backend: Disk vdev=xvda spec.backend=unknown libxl: debug: libxl_device.c:207:disk_try_backend: Disk vdev=xvda, uses script=... assuming phy backend libxl: debug: libxl_device.c:298:libxl__device_disk_set_backend: Disk vdev=xvda, using backend phy libxl: debug: libxl_create.c:907:initiate_domain_create: running bootloader libxl: debug: libxl_device.c:269:libxl__device_disk_set_backend: Disk vdev=(null) spec.backend=phy libxl: debug: libxl_device.c:207:disk_try_backend: Disk vdev=(null), uses script=... assuming phy backend libxl: debug: libxl.c:3064:libxl__device_disk_local_initiate_attach: locally attaching PHY disk iqn=iqn.1986-03.com.sun:02:ff2d12c0-b709-4ec0-999d-976506c666f5,portal=192.168.133.250 libxl: debug: libxl_bootloader.c:411:bootloader_disk_attached_cb: Config bootloader value: pygrub libxl: debug: libxl_bootloader.c:427:bootloader_disk_attached_cb: Checking for bootloader in libexec path: /usr/lib/xen/bin/pygrub libxl: debug: libxl_create.c:1523:do_domain_create: ao 0x20b7260: inprogress: poller=0x20b6b30, flags=i libxl: debug: libxl_event.c:581:libxl__ev_xswatch_register: watch w=0x20b7a30 wpath=/local/domain/12 token=3/0: register slotnum=3 libxl: debug: libxl_event.c:1950:libxl__ao_progress_report: ao 0x20b7260: progress report: ignored libxl: debug: libxl_bootloader.c:537:bootloader_gotptys: executing bootloader: /usr/lib/xen/bin/pygrub libxl: debug: libxl_bootloader.c:541:bootloader_gotptys: bootloader arg: /usr/lib/xen/bin/pygrub libxl: debug: libxl_bootloader.c:541:bootloader_gotptys: bootloader arg: --output=/var/run/xen/bootloader.12.out libxl: debug: libxl_bootloader.c:541:bootloader_gotptys: bootloader arg: --output-format=simple0 libxl: debug: libxl_bootloader.c:541:bootloader_gotptys: bootloader arg: --output-directory=/var/run/xen/bootloader.12.d libxl: debug: libxl_bootloader.c:541:bootloader_gotptys: bootloader arg: iqn=iqn.1986-03.com.sun:02:ff2d12c0-b709-4ec0-999d-976506c666f5,portal=192.168.133.250 libxl: debug: libxl_event.c:518:watchfd_callback: watch w=0x20b7a30 wpath=/local/domain/12 token=3/0: event epath=/local/domain/12 libxl: error: libxl_bootloader.c:630:bootloader_finished: bootloader failed - consult logfile /var/log/xen/bootloader.12.log libxl: error: libxl_exec.c:118:libxl_report_child_exitstatus: bootloader [-1] exited with error status 1 libxl: debug: libxl_event.c:619:libxl__ev_xswatch_deregister: watch w=0x20b7a30 wpath=/local/domain/12 token=3/0: deregister slotnum=3 libxl: error: libxl_create.c:1121:domcreate_rebuild_done: cannot (re-)build domain: -3 libxl: info: libxl.c:1698:devices_destroy_cb: forked pid 3003 for destroy of domain 12 libxl: debug: libxl_event.c:1774:libxl__ao_complete: ao 0x20b7260: complete, rc=-3 libxl: debug: libxl_event.c:1746:libxl__ao__destroy: ao 0x20b7260: destroy xc: debug: hypercall buffer: total allocations:41 total releases:41 xc: debug: hypercall buffer: current allocations:0 maximum allocations:2 xc: debug: hypercall buffer: cache current size:2 xc: debug: hypercall buffer: cache hits:30 misses:2 toobig:9 -- Steven Haigh Email: net...@crc.id.au Web: https://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] block-iscsi with Xen 4.5 / 4.6
On 2016-04-15 17:23, Roger Pau Monné wrote: On Fri, Apr 15, 2016 at 04:59:11PM +1000, Steven Haigh wrote: Hi all, I'm wading through the somewhat confusing world of documentation regarding storing DomU disk images on an iSCSI target. I'm getting an error when using pygrub of: OSError: [Errno 2] No such file or directory: 'iqn=iqn.1986-03.com.sun:02:ff2d12c0-b709-4ec0-999d-976506c666f5,portal=192.168.133.250' Hello, It should work. Can you please paste your guest configuration file and the output of the create command with "-vvv"? DomU config file: bootloader = "pygrub" name= "test1.vm" memory = 2048 vcpus = 2 cpus= "1-7" vif = ['bridge=br-151, vifname=vm.test1'] disk= ['script=block-iscsi,vdev=xvda,target=iqn=iqn.1986-03.com.sun:02:ff2d12c0-b709-4ec0-999d-976506c666f5,portal=192.168.133.250'] boot= "c" # xl create /etc/xen/test1.vm -d -c Parsing config from /etc/xen/test1.vm { "domid": null, "config": { "c_info": { "type": "pv", "name": "test1.vm", "uuid": "a7134f81-4616-4cf6-99db-3d2bc90b2d58", "run_hotplug_scripts": "True" }, "b_info": { "max_vcpus": 2, "avail_vcpus": [ 0, 1 ], "vcpu_hard_affinity": [ [ 1, 2, 3, 4, 5, 6, 7 ], [ 1, 2, 3, 4, 5, 6, 7 ] ], "numa_placement": "False", "max_memkb": 2097152, "target_memkb": 2097152, "shadow_memkb": 18432, "sched_params": { }, "claim_mode": "True", "type.pv": { "bootloader": "pygrub" } }, "disks": [ { "pdev_path": "iqn=iqn.1986-03.com.sun:02:ff2d12c0-b709-4ec0-999d-976506c666f5,portal=192.168.133.250", "vdev": "xvda", "format": "raw", "script": "block-iscsi", "readwrite": 1 } ], "nics": [ { "devid": 0, "bridge": "br-151", "ifname": "vm.test1" } ], "on_reboot": "restart" } } libxl: error: libxl_bootloader.c:630:bootloader_finished: bootloader failed - consult logfile /var/log/xen/bootloader.11.log libxl: error: libxl_exec.c:118:libxl_report_child_exitstatus: bootloader [-1] exited with error status 1 libxl: error: libxl_create.c:1121:domcreate_rebuild_done: cannot (re-)build domain: -3 libxl: info: libxl.c:1698:devices_destroy_cb: forked pid 2982 for destroy of domain 11 libxl: error: libxl_dom.c:36:libxl__domain_type: unable to get domain type for domid=11 xl: unable to exec console client: No such file or directory libxl: error: libxl_exec.c:118:libxl_report_child_exitstatus: console child [2981] exited with error status 1 # cat /var/log/xen/bootloader.11.log Traceback (most recent call last): File "/usr/lib/xen/bin/pygrub", line 894, in part_offs = get_partition_offsets(file) File "/usr/lib/xen/bin/pygrub", line 114, in get_partition_offsets image_type = identify_disk_image(file) File "/usr/lib/xen/bin/pygrub", line 57, in identify_disk_image fd = os.open(file, os.O_RDONLY) OSError: [Errno 2] No such file or directory: 'iqn=iqn.1986-03.com.sun:02:ff2d12c0-b709-4ec0-999d-976506c666f5,portal=192.168.133.250' -- Steven Haigh Email: net...@crc.id.au Web: https://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] block-iscsi with Xen 4.5 / 4.6
Hi all, I'm wading through the somewhat confusing world of documentation regarding storing DomU disk images on an iSCSI target. I'm getting an error when using pygrub of: OSError: [Errno 2] No such file or directory: 'iqn=iqn.1986-03.com.sun:02:ff2d12c0-b709-4ec0-999d-976506c666f5,portal=192.168.133.250' After much hunting, I came across this post: http://lists.xen.org/archives/html/xen-devel/2013-04/msg02796.html As such, I'm wondering if it is still required to *NOT* use pygrub for booting iSCSI DomUs? If so, what are the alternatives? Using pv-grub / pv-grub2? Something else? As I'm running EL7.2, I figure if I have to use a pv-grub based solution, it would have to be pv-grub2? -- Steven Haigh Email: net...@crc.id.au Web: https://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] 4.4: INFO: rcu_sched self-detected stall on CPU
On 30/03/2016 1:14 AM, Boris Ostrovsky wrote: > On 03/29/2016 04:56 AM, Steven Haigh wrote: >> >> Interestingly enough, this just happened again - but on a different >> virtual machine. I'm starting to wonder if this may have something to do >> with the uptime of the machine - as the system that this seems to happen >> to is always different. >> >> Destroying it and monitoring it again has so far come up blank. >> >> I've thrown the latest lot of kernel messages here: >> http://paste.fedoraproject.org/346802/59241532 > > Would be good to see full console log. The one that you posted starts > with an error so I wonder what was before that. Ok, so I had a virtual machine do this again today. Both vcpus went to 100% usage and essentially hung. I attached to the screen console that was connected via 'xl console' and copied the entire buffer to paste below: yum-cron[30740]: segfault at 1781ab8 ip 7f2a7fcd282f sp 7ffe8655fe90 error 5 in libpython2.7.so.1.0[7f2a7fbf5000+178000] swap_free: Bad swap file entry 2a2b7d5bb69515d8 BUG: Bad page map in process yum-cron pte:56fab76d2a2bb06a pmd:0309e067 addr:0178 vm_flags:00100073 anon_vma:88007b974c08 mapping: (null) index:1780 file: (null) fault: (null) mmap: (null) readpage: (null) CPU: 0 PID: 30740 Comm: yum-cron Tainted: GB 4.4.6-4.el7xen.x86_64 #1 88004176bac0 81323d17 0178 3000 88004176bb08 8117e574 81193d6e 1780 88000309ec00 0178 56fab76d2a2bb06a Call Trace: [] dump_stack+0x63/0x8c [] print_bad_pte+0x1e4/0x290 [] ? swap_info_get+0x7e/0xe0 [] unmap_single_vma+0x4ff/0x840 [] unmap_vmas+0x47/0x90 [] exit_mmap+0x98/0x150 [] mmput+0x47/0x100 [] do_exit+0x24e/0xad0 [] do_group_exit+0x3f/0xa0 [] get_signal+0x1c3/0x5e0 [] do_signal+0x28/0x630 [] ? printk+0x4d/0x4f [] ? vprintk_default+0x1f/0x30 [] ? bad_area_access_error+0x43/0x4a [] ? __do_page_fault+0x22c/0x3f0 [] exit_to_usermode_loop+0x4c/0x95 [] prepare_exit_to_usermode+0x18/0x20 [] retint_user+0x8/0x13 BUG: Bad page state in process yum-cron pfn:0f3bf page:ea3cefc0 count:0 mapcount:7 mapping:88000f3bf008 index:0x88000f3bf000 flags: 0x100094(referenced|dirty|slab) page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set bad because of flags: flags: 0x80(slab) Modules linked in: rpcsec_gss_krb5 nfsv4 nfs fscache x86_pkg_temp_thermal coretemp crct10dif_pclmul aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd pcspkr nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xen_netfront crc32c_intel xen_gntalloc xen_evtchn ipv6 autofs4 CPU: 0 PID: 30740 Comm: yum-cron Tainted: GB 4.4.6-4.el7xen.x86_64 #1 88004176b958 81323d17 ea3cefc0 817ab348 88004176b980 811c1ab5 ea3cefc0 0001 88004176b9d0 81159584 Call Trace: [] dump_stack+0x63/0x8c [] bad_page.part.69+0xdf/0xfc [] free_pages_prepare+0x294/0x2a0 [] free_hot_cold_page+0x31/0x160 [] free_hot_cold_page_list+0x49/0xb0 [] release_pages+0xc5/0x260 [] free_pages_and_swap_cache+0x7d/0x90 [] tlb_flush_mmu_free+0x36/0x60 [] unmap_single_vma+0x664/0x840 [] unmap_vmas+0x47/0x90 [] exit_mmap+0x98/0x150 [] mmput+0x47/0x100 [] do_exit+0x24e/0xad0 [] do_group_exit+0x3f/0xa0 [] get_signal+0x1c3/0x5e0 [] do_signal+0x28/0x630 [] ? printk+0x4d/0x4f [] ? vprintk_default+0x1f/0x30 [] ? bad_area_access_error+0x43/0x4a [] ? __do_page_fault+0x22c/0x3f0 [] exit_to_usermode_loop+0x4c/0x95 [] prepare_exit_to_usermode+0x18/0x20 [] retint_user+0x8/0x13 BUG: Bad rss-counter state mm:88007b99e4c0 idx:0 val:-1 BUG: Bad rss-counter state mm:88007b99e4c0 idx:1 val:2 BUG: Bad rss-counter state mm:88007b99e4c0 idx:2 val:-1 yum-cron[4197]: segfault at 32947fcb ip 7ff0fa1bf8bd sp 7ffdb1c54990 error 4 in libpython2.7.so.1.0[7ff0fa13a000+178000] BUG: unable to handle kernel paging request at 88010f3beffe IP: [] free_block+0x119/0x190 PGD 188b063 PUD 0 Oops: 0002 [#1] SMP Modules linked in: rpcsec_gss_krb5 nfsv4 nfs fscache x86_pkg_temp_thermal coretemp crct10dif_pclmul aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd pcspkr nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xen_netfront crc32c_intel xen_gntalloc xen_evtchn ipv6 autofs4 CPU: 1 PID: 8519 Comm: kworker/1:2 Tainted: GB 4.4.6-4.el7xen.x86_64 #1 Workqueue: events cache_reap task: 8800346bf1c0 ti: 88005170 task.ti: 88005170 RIP: 0010:[] [] free_block+0x119/0x190 RSP: 0018:880051703d40 EFLAGS: 00010082 RAX: ea3cefc0 RBX: ea00 RCX: 88000f3bf000 RDX: fffe RSI: 88007fd19c40 RDI: 88007d012100 RBP: 880051703d68 R08: 880051703d88 R09: 0006 R10: 88007d01a9
Re: [Xen-devel] 4.4: INFO: rcu_sched self-detected stall on CPU
Greg, please see below - this is probably more for you... On 03/29/2016 04:56 AM, Steven Haigh wrote: > > Interestingly enough, this just happened again - but on a different > virtual machine. I'm starting to wonder if this may have something to do > with the uptime of the machine - as the system that this seems to happen > to is always different. > > Destroying it and monitoring it again has so far come up blank. > > I've thrown the latest lot of kernel messages here: > http://paste.fedoraproject.org/346802/59241532 So I just did a bit of digging via the almighty Google. I started hunting for these lines, as they happen just before the stall: BUG: Bad rss-counter state mm:88007b7db480 idx:2 val:-1 BUG: Bad rss-counter state mm:880079c638c0 idx:0 val:-1 BUG: Bad rss-counter state mm:880079c638c0 idx:2 val:-1 I stumbled across this post on the lkml: http://marc.info/?l=linux-kernel=145141546409607 The patch attached seems to reference the following change in unmap_mapping_range in mm/memory.c: > - struct zap_details details; > + struct zap_details details = { }; When I browse the GIT tree for 4.4.6: https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/tree/mm/memory.c?id=refs/tags/v4.4.6 I see at line 2411: struct zap_details details; Is this something that has been missed being merged into the 4.4 tree? I'll admit my kernel knowledge is not enough to understand what the code actually does - but the similarities here seem uncanny. -- Steven Haigh Email: net...@crc.id.au Web: https://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897 signature.asc Description: OpenPGP digital signature ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] 4.4: INFO: rcu_sched self-detected stall on CPU
On 30/03/2016 1:14 AM, Boris Ostrovsky wrote: > On 03/29/2016 04:56 AM, Steven Haigh wrote: >> >> Interestingly enough, this just happened again - but on a different >> virtual machine. I'm starting to wonder if this may have something to do >> with the uptime of the machine - as the system that this seems to happen >> to is always different. >> >> Destroying it and monitoring it again has so far come up blank. >> >> I've thrown the latest lot of kernel messages here: >> http://paste.fedoraproject.org/346802/59241532 > > Would be good to see full console log. The one that you posted starts > with an error so I wonder what was before that. Agreed. It started off with me observing this on one VM - but since trying to get details on that VM - others have started showing issues as well. It frustrating as it seems I've been playing whack-a-mole to get more debug on what is going on. So, I've changed the kernel command line to the following on ALL VMs on this system: enforcemodulesig=1 selinux=0 fsck.repair=yes loglevel=7 console=tty0 console=ttyS0,38400n8 In the Dom0 (which runs the same kernel package), I've started a screen sessions with a screen for each of the DomUs running attached to the console via 'xl console blah' - so hopefully the next one that goes down (whichever one that is) will get caught in the console. > Have you tried this on bare metal, BTW? And you said this is only > observed on 4.4, not 4.5, right? I use the same kernel package as the Dom0 kernel - and so far haven't seen any issues running this as the Dom0. I haven't used it on baremetal as a non-xen kernel as yet. The kernel package I'm currently running is for CentOS / Scientific Linux / RHEL at: http://au1.mirror.crc.id.au/repo/el7-testing/x86_64/ I'm using 4.4.6-3 at the moment - which has CONFIG_PREEMPT_VOLUNTARY set - which *MAY* have increased the time between this happening - or may have no effect at all. I'm not convinced either way as yet. With respect to 4.5, I have had reports from another user of my packages that they haven't seen the same crash using the same Xen packages but with kernel 4.5. I have not verified this myself as yet as I haven't gone down the path of making 4.5 packages for testing. As such, I wouldn't treat this as a conclusive test case as yet. I'm hoping that the steps I've taken above may give some more information in which we can drill down into exactly what is going on - or at least give more pointers into the root cause. >> >> Interestingly, around the same time, /var/log/messages on the remote >> syslog server shows: >> Mar 29 17:00:01 zeus systemd: Created slice user-0.slice. >> Mar 29 17:00:01 zeus systemd: Starting user-0.slice. >> Mar 29 17:00:01 zeus systemd: Started Session 1567 of user root. >> Mar 29 17:00:01 zeus systemd: Starting Session 1567 of user root. >> Mar 29 17:00:01 zeus systemd: Removed slice user-0.slice. >> Mar 29 17:00:01 zeus systemd: Stopping user-0.slice. >> Mar 29 17:01:01 zeus systemd: Created slice user-0.slice. >> Mar 29 17:01:01 zeus systemd: Starting user-0.slice. >> Mar 29 17:01:01 zeus systemd: Started Session 1568 of user root. >> Mar 29 17:01:01 zeus systemd: Starting Session 1568 of user root. >> Mar 29 17:08:34 zeus ntpdate[18569]: adjust time server 203.56.246.94 >> offset -0.002247 sec >> Mar 29 17:08:34 zeus systemd: Removed slice user-0.slice. >> Mar 29 17:08:34 zeus systemd: Stopping user-0.slice. >> Mar 29 17:10:01 zeus systemd: Created slice user-0.slice. >> Mar 29 17:10:01 zeus systemd: Starting user-0.slice. >> Mar 29 17:10:01 zeus systemd: Started Session 1569 of user root. >> Mar 29 17:10:01 zeus systemd: Starting Session 1569 of user root. >> Mar 29 17:10:01 zeus systemd: Removed slice user-0.slice. >> Mar 29 17:10:01 zeus systemd: Stopping user-0.slice. >> Mar 29 17:20:01 zeus systemd: Created slice user-0.slice. >> Mar 29 17:20:01 zeus systemd: Starting user-0.slice. >> Mar 29 17:20:01 zeus systemd: Started Session 1570 of user root. >> Mar 29 17:20:01 zeus systemd: Starting Session 1570 of user root. >> Mar 29 17:20:01 zeus systemd: Removed slice user-0.slice. >> Mar 29 17:20:01 zeus systemd: Stopping user-0.slice. >> Mar 29 17:30:55 zeus systemd: systemd-logind.service watchdog timeout >> (limit 1min)! >> Mar 29 17:32:25 zeus systemd: systemd-logind.service stop-sigabrt timed >> out. Terminating. >> Mar 29 17:33:56 zeus systemd: systemd-logind.service stop-sigterm timed >> out. Killing. >> Mar 29 17:35:26 zeus systemd: systemd-logind.service still around after >> SIGKILL. Ignoring. >> Mar 29 17:36:56 zeus systemd: systemd-logind.service stop-final-sigterm >> timed out. Killing. >> Mar 29 17:38:26 zeus systemd:
Re: [Xen-devel] 4.4: INFO: rcu_sched self-detected stall on CPU
On 26/03/2016 8:07 AM, Steven Haigh wrote: > On 26/03/2016 3:20 AM, Boris Ostrovsky wrote: >> On 03/25/2016 12:04 PM, Steven Haigh wrote: >>> It may not actually be the full logs. Once the system gets really upset, >>> you can't run anything - as such, grabbing anything from dmesg is not >>> possible. >>> >>> The logs provided above is all that gets spat out to the syslog server. >>> >>> I'll try tinkering with a few things to see if I can get more output - >>> but right now, that's all I've been able to achieve. So far, my only >>> ideas are to remove the 'quiet' options from the kernel command line - >>> but I'm not sure how much that would help. >>> >>> Suggestions gladly accepted on this front. >> >> You probably want to run connected to guest serial console (" >> serial='pty' " in guest config file and something like 'loglevel=7 >> console=tty0 console=ttyS0,38400n8' on guest kernel commandline). And >> start the guest with 'xl create -c ' or connect later with 'xl >> console '. > > Ok thanks, I've booted the DomU with: > > $ cat /proc/cmdline > root=UUID=63ade949-ee67-4afb-8fe7-ecd96faa15e2 ro enforcemodulesig=1 > selinux=0 fsck.repair=yes loglevel=7 console=tty0 console=ttyS0,38400n8 > > I've left a screen session attached to the console (via xl console) and > I'll see if that turns anything up. As this seems to be rather > unpredictable when it happens, it may take a day or two to get anything. > I just hope its more than the syslog output :) Interestingly enough, this just happened again - but on a different virtual machine. I'm starting to wonder if this may have something to do with the uptime of the machine - as the system that this seems to happen to is always different. Destroying it and monitoring it again has so far come up blank. I've thrown the latest lot of kernel messages here: http://paste.fedoraproject.org/346802/59241532 Interestingly, around the same time, /var/log/messages on the remote syslog server shows: Mar 29 17:00:01 zeus systemd: Created slice user-0.slice. Mar 29 17:00:01 zeus systemd: Starting user-0.slice. Mar 29 17:00:01 zeus systemd: Started Session 1567 of user root. Mar 29 17:00:01 zeus systemd: Starting Session 1567 of user root. Mar 29 17:00:01 zeus systemd: Removed slice user-0.slice. Mar 29 17:00:01 zeus systemd: Stopping user-0.slice. Mar 29 17:01:01 zeus systemd: Created slice user-0.slice. Mar 29 17:01:01 zeus systemd: Starting user-0.slice. Mar 29 17:01:01 zeus systemd: Started Session 1568 of user root. Mar 29 17:01:01 zeus systemd: Starting Session 1568 of user root. Mar 29 17:08:34 zeus ntpdate[18569]: adjust time server 203.56.246.94 offset -0.002247 sec Mar 29 17:08:34 zeus systemd: Removed slice user-0.slice. Mar 29 17:08:34 zeus systemd: Stopping user-0.slice. Mar 29 17:10:01 zeus systemd: Created slice user-0.slice. Mar 29 17:10:01 zeus systemd: Starting user-0.slice. Mar 29 17:10:01 zeus systemd: Started Session 1569 of user root. Mar 29 17:10:01 zeus systemd: Starting Session 1569 of user root. Mar 29 17:10:01 zeus systemd: Removed slice user-0.slice. Mar 29 17:10:01 zeus systemd: Stopping user-0.slice. Mar 29 17:20:01 zeus systemd: Created slice user-0.slice. Mar 29 17:20:01 zeus systemd: Starting user-0.slice. Mar 29 17:20:01 zeus systemd: Started Session 1570 of user root. Mar 29 17:20:01 zeus systemd: Starting Session 1570 of user root. Mar 29 17:20:01 zeus systemd: Removed slice user-0.slice. Mar 29 17:20:01 zeus systemd: Stopping user-0.slice. Mar 29 17:30:55 zeus systemd: systemd-logind.service watchdog timeout (limit 1min)! Mar 29 17:32:25 zeus systemd: systemd-logind.service stop-sigabrt timed out. Terminating. Mar 29 17:33:56 zeus systemd: systemd-logind.service stop-sigterm timed out. Killing. Mar 29 17:35:26 zeus systemd: systemd-logind.service still around after SIGKILL. Ignoring. Mar 29 17:36:56 zeus systemd: systemd-logind.service stop-final-sigterm timed out. Killing. Mar 29 17:38:26 zeus systemd: systemd-logind.service still around after final SIGKILL. Entering failed mode. Mar 29 17:38:26 zeus systemd: Unit systemd-logind.service entered failed state. Mar 29 17:38:26 zeus systemd: systemd-logind.service failed. -- Steven Haigh Email: net...@crc.id.au Web: https://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897 signature.asc Description: OpenPGP digital signature ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] 4.4: INFO: rcu_sched self-detected stall on CPU
On 26/03/2016 3:20 AM, Boris Ostrovsky wrote: > On 03/25/2016 12:04 PM, Steven Haigh wrote: >> It may not actually be the full logs. Once the system gets really upset, >> you can't run anything - as such, grabbing anything from dmesg is not >> possible. >> >> The logs provided above is all that gets spat out to the syslog server. >> >> I'll try tinkering with a few things to see if I can get more output - >> but right now, that's all I've been able to achieve. So far, my only >> ideas are to remove the 'quiet' options from the kernel command line - >> but I'm not sure how much that would help. >> >> Suggestions gladly accepted on this front. > > You probably want to run connected to guest serial console (" > serial='pty' " in guest config file and something like 'loglevel=7 > console=tty0 console=ttyS0,38400n8' on guest kernel commandline). And > start the guest with 'xl create -c ' or connect later with 'xl > console '. Ok thanks, I've booted the DomU with: $ cat /proc/cmdline root=UUID=63ade949-ee67-4afb-8fe7-ecd96faa15e2 ro enforcemodulesig=1 selinux=0 fsck.repair=yes loglevel=7 console=tty0 console=ttyS0,38400n8 I've left a screen session attached to the console (via xl console) and I'll see if that turns anything up. As this seems to be rather unpredictable when it happens, it may take a day or two to get anything. I just hope its more than the syslog output :) -- Steven Haigh Email: net...@crc.id.au Web: https://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897 signature.asc Description: OpenPGP digital signature ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] 4.4: INFO: rcu_sched self-detected stall on CPU
On 26/03/2016 1:44 AM, Boris Ostrovsky wrote: > On 03/25/2016 10:05 AM, Steven Haigh wrote: >> On 25/03/2016 11:23 PM, Boris Ostrovsky wrote: >>> On 03/24/2016 10:53 PM, Steven Haigh wrote: >>>> Hi all, >>>> >>>> Firstly, I've cross-posted this to xen-devel and the lkml - as this >>>> problem seems to only exist when using kernel 4.4 as a Xen DomU kernel. >>>> I have also CC'ed Greg KH for his awesome insight as maintainer. >>>> >>>> Please CC myself into replies - as I'm not a member of the kernel >>>> mailing list - I may miss replies from monitoring the archives. >>>> >>>> I've noticed recently that heavy disk IO is causing rcu_sched to detect >>>> stalls. The process mentioned usually goes to 100% CPU usage, and >>>> eventually processes start segfaulting and dying. The only fix to >>>> recover the system is to use 'xl destroy' to force-kill the VM and to >>>> start it again. >>>> >>>> The majority of these issues seem to mention ext4 in the trace. This >>>> may >>>> indicate an issue there - or may be a red herring. >>>> >>>> The gritty details: >>>> INFO: rcu_sched self-detected stall on CPU >>>> #0110-...: (20999 ticks this GP) idle=327/141/0 >>>> softirq=1101493/1101493 fqs=6973 >>>> #011 (t=21000 jiffies g=827095 c=827094 q=524) >>>> Task dump for CPU 0: >>>> rsync R running task0 2446 2444 0x0088 >>>> 818d0c00 88007fc03c58 810a625f >>>> 818d0c00 88007fc03c70 810a8699 0001 >>>> 88007fc03ca0 810d0e5a 88007fc170c0 818d0c00 >>>> Call Trace: >>>> [] sched_show_task+0xaf/0x110 >>>> [] dump_cpu_task+0x39/0x40 >>>> [] rcu_dump_cpu_stacks+0x8a/0xc0 >>>> [] rcu_check_callbacks+0x424/0x7a0 >>>> [] ? account_system_time+0x81/0x110 >>>> [] ? account_process_tick+0x61/0x160 >>>> [] ? tick_sched_do_timer+0x30/0x30 >>>> [] update_process_times+0x39/0x60 >>>> [] tick_sched_handle.isra.15+0x36/0x50 >>>> [] tick_sched_timer+0x3d/0x70 >>>> [] __hrtimer_run_queues+0xf2/0x250 >>>> [] hrtimer_interrupt+0xa8/0x190 >>>> [] xen_timer_interrupt+0x2e/0x140 >>>> [] handle_irq_event_percpu+0x55/0x1e0 >>>> [] handle_percpu_irq+0x3a/0x50 >>>> [] generic_handle_irq+0x22/0x30 >>>> [] __evtchn_fifo_handle_events+0x15f/0x180 >>>> [] evtchn_fifo_handle_events+0x10/0x20 >>>> [] __xen_evtchn_do_upcall+0x43/0x80 >>>> [] xen_evtchn_do_upcall+0x30/0x50 >>>> [] xen_hvm_callback_vector+0x82/0x90 >>>> [] ? queued_write_lock_slowpath+0x3d/0x80 >>>> [] _raw_write_lock+0x1e/0x30 >>> This looks to me like ext4 failing to grab a lock. Everything above it >>> (in Xen code) is regular tick interrupt handling which detects the >>> stall. >>> >>> Your config does not have CONFIG_PARAVIRT_SPINLOCKS so that eliminates >>> any possible issues with pv locks. >>> >>> Do you see anything "interesting" in dom0? (e.g. dmesg, xl dmesg, >>> /var/log/xen/) Are you oversubscribing your guest (CPU-wise)? >> There is nothing special being logged anywhere that I can see. dmesg / >> xl dmesg on the Dom0 show nothing unusual. >> >> I do share CPUs - but I don't give any DomU more than 2 vcpus. The >> physical host has 4 cores - 1 pinned to the Dom0. >> >> I log to a remote syslog on this system - and I've uploaded the entire >> log to a pastebin (don't want to do a 45Kb attachment here): >> http://paste.fedoraproject.org/345095/58914452 > > That doesn't look like a full log. In any case, the RCU stall may be a > secondary problem --- there is a bunch of splats before the stall. It may not actually be the full logs. Once the system gets really upset, you can't run anything - as such, grabbing anything from dmesg is not possible. The logs provided above is all that gets spat out to the syslog server. I'll try tinkering with a few things to see if I can get more output - but right now, that's all I've been able to achieve. So far, my only ideas are to remove the 'quiet' options from the kernel command line - but I'm not sure how much that would help. Suggestions gladly accepted on this front. >> >> Not sure if it makes any difference at all, but my DomU config is: >> # cat /etc/xen/backup.vm >&g
Re: [Xen-devel] 4.4: INFO: rcu_sched self-detected stall on CPU
On 25/03/2016 11:23 PM, Boris Ostrovsky wrote: > On 03/24/2016 10:53 PM, Steven Haigh wrote: >> Hi all, >> >> Firstly, I've cross-posted this to xen-devel and the lkml - as this >> problem seems to only exist when using kernel 4.4 as a Xen DomU kernel. >> I have also CC'ed Greg KH for his awesome insight as maintainer. >> >> Please CC myself into replies - as I'm not a member of the kernel >> mailing list - I may miss replies from monitoring the archives. >> >> I've noticed recently that heavy disk IO is causing rcu_sched to detect >> stalls. The process mentioned usually goes to 100% CPU usage, and >> eventually processes start segfaulting and dying. The only fix to >> recover the system is to use 'xl destroy' to force-kill the VM and to >> start it again. >> >> The majority of these issues seem to mention ext4 in the trace. This may >> indicate an issue there - or may be a red herring. >> >> The gritty details: >> INFO: rcu_sched self-detected stall on CPU >> #0110-...: (20999 ticks this GP) idle=327/141/0 >> softirq=1101493/1101493 fqs=6973 >> #011 (t=21000 jiffies g=827095 c=827094 q=524) >> Task dump for CPU 0: >> rsync R running task0 2446 2444 0x0088 >> 818d0c00 88007fc03c58 810a625f >> 818d0c00 88007fc03c70 810a8699 0001 >> 88007fc03ca0 810d0e5a 88007fc170c0 818d0c00 >> Call Trace: >> [] sched_show_task+0xaf/0x110 >> [] dump_cpu_task+0x39/0x40 >> [] rcu_dump_cpu_stacks+0x8a/0xc0 >> [] rcu_check_callbacks+0x424/0x7a0 >> [] ? account_system_time+0x81/0x110 >> [] ? account_process_tick+0x61/0x160 >> [] ? tick_sched_do_timer+0x30/0x30 >> [] update_process_times+0x39/0x60 >> [] tick_sched_handle.isra.15+0x36/0x50 >> [] tick_sched_timer+0x3d/0x70 >> [] __hrtimer_run_queues+0xf2/0x250 >> [] hrtimer_interrupt+0xa8/0x190 >> [] xen_timer_interrupt+0x2e/0x140 >> [] handle_irq_event_percpu+0x55/0x1e0 >> [] handle_percpu_irq+0x3a/0x50 >> [] generic_handle_irq+0x22/0x30 >> [] __evtchn_fifo_handle_events+0x15f/0x180 >> [] evtchn_fifo_handle_events+0x10/0x20 >> [] __xen_evtchn_do_upcall+0x43/0x80 >> [] xen_evtchn_do_upcall+0x30/0x50 >> [] xen_hvm_callback_vector+0x82/0x90 >> [] ? queued_write_lock_slowpath+0x3d/0x80 >> [] _raw_write_lock+0x1e/0x30 > > This looks to me like ext4 failing to grab a lock. Everything above it > (in Xen code) is regular tick interrupt handling which detects the stall. > > Your config does not have CONFIG_PARAVIRT_SPINLOCKS so that eliminates > any possible issues with pv locks. > > Do you see anything "interesting" in dom0? (e.g. dmesg, xl dmesg, > /var/log/xen/) Are you oversubscribing your guest (CPU-wise)? There is nothing special being logged anywhere that I can see. dmesg / xl dmesg on the Dom0 show nothing unusual. I do share CPUs - but I don't give any DomU more than 2 vcpus. The physical host has 4 cores - 1 pinned to the Dom0. I log to a remote syslog on this system - and I've uploaded the entire log to a pastebin (don't want to do a 45Kb attachment here): http://paste.fedoraproject.org/345095/58914452 Not sure if it makes any difference at all, but my DomU config is: # cat /etc/xen/backup.vm name= "backup.vm" memory = 2048 vcpus = 2 cpus= "1-3" disk= [ 'phy:/dev/vg_raid1_new/backup.vm,xvda,w' ] vif = [ "mac=00:11:36:35:35:09, bridge=br203, vifname=vm.backup, script=vif-bridge" ] bootloader = 'pygrub' pvh = 1 on_poweroff = 'destroy' on_reboot = 'restart' on_crash= 'restart' cpu_weight = 64 I never had this problem when running kernel 4.1.x - it only started when I upgraded everything to 4.4 - not exactly a great help - but may help narrow things down? >> [] ext4_es_remove_extent+0x43/0xc0 >> [] ext4_clear_inode+0x39/0x80 >> [] ext4_evict_inode+0x8d/0x4e0 >> [] evict+0xb7/0x180 >> [] dispose_list+0x36/0x50 >> [] prune_icache_sb+0x4b/0x60 >> [] super_cache_scan+0x141/0x190 >> [] shrink_slab.part.37+0x1ee/0x390 >> [] shrink_zone+0x26c/0x280 >> [] do_try_to_free_pages+0x15c/0x410 >> [] try_to_free_pages+0xba/0x170 >> [] __alloc_pages_nodemask+0x525/0xa60 >> [] ? kmem_cache_free+0xcc/0x2c0 >> [] alloc_pages_current+0x8d/0x120 >> [] __page_cache_alloc+0x91/0xc0 >> [] pagecache_get_page+0x56/0x1e0 >> [] grab_cache_page_write_begin+0x26/0x40 >> [] ext4_da_write_begin+0xa1/0x300 >> [] ? ext4_da_write_end+0x124/0x2b0 >> [] generic_perfo
[Xen-devel] 4.4: INFO: rcu_sched self-detected stall on CPU
Hi all, Firstly, I've cross-posted this to xen-devel and the lkml - as this problem seems to only exist when using kernel 4.4 as a Xen DomU kernel. I have also CC'ed Greg KH for his awesome insight as maintainer. Please CC myself into replies - as I'm not a member of the kernel mailing list - I may miss replies from monitoring the archives. I've noticed recently that heavy disk IO is causing rcu_sched to detect stalls. The process mentioned usually goes to 100% CPU usage, and eventually processes start segfaulting and dying. The only fix to recover the system is to use 'xl destroy' to force-kill the VM and to start it again. The majority of these issues seem to mention ext4 in the trace. This may indicate an issue there - or may be a red herring. The gritty details: INFO: rcu_sched self-detected stall on CPU #0110-...: (20999 ticks this GP) idle=327/141/0 softirq=1101493/1101493 fqs=6973 #011 (t=21000 jiffies g=827095 c=827094 q=524) Task dump for CPU 0: rsync R running task0 2446 2444 0x0088 818d0c00 88007fc03c58 810a625f 818d0c00 88007fc03c70 810a8699 0001 88007fc03ca0 810d0e5a 88007fc170c0 818d0c00 Call Trace: [] sched_show_task+0xaf/0x110 [] dump_cpu_task+0x39/0x40 [] rcu_dump_cpu_stacks+0x8a/0xc0 [] rcu_check_callbacks+0x424/0x7a0 [] ? account_system_time+0x81/0x110 [] ? account_process_tick+0x61/0x160 [] ? tick_sched_do_timer+0x30/0x30 [] update_process_times+0x39/0x60 [] tick_sched_handle.isra.15+0x36/0x50 [] tick_sched_timer+0x3d/0x70 [] __hrtimer_run_queues+0xf2/0x250 [] hrtimer_interrupt+0xa8/0x190 [] xen_timer_interrupt+0x2e/0x140 [] handle_irq_event_percpu+0x55/0x1e0 [] handle_percpu_irq+0x3a/0x50 [] generic_handle_irq+0x22/0x30 [] __evtchn_fifo_handle_events+0x15f/0x180 [] evtchn_fifo_handle_events+0x10/0x20 [] __xen_evtchn_do_upcall+0x43/0x80 [] xen_evtchn_do_upcall+0x30/0x50 [] xen_hvm_callback_vector+0x82/0x90 [] ? queued_write_lock_slowpath+0x3d/0x80 [] _raw_write_lock+0x1e/0x30 [] ext4_es_remove_extent+0x43/0xc0 [] ext4_clear_inode+0x39/0x80 [] ext4_evict_inode+0x8d/0x4e0 [] evict+0xb7/0x180 [] dispose_list+0x36/0x50 [] prune_icache_sb+0x4b/0x60 [] super_cache_scan+0x141/0x190 [] shrink_slab.part.37+0x1ee/0x390 [] shrink_zone+0x26c/0x280 [] do_try_to_free_pages+0x15c/0x410 [] try_to_free_pages+0xba/0x170 [] __alloc_pages_nodemask+0x525/0xa60 [] ? kmem_cache_free+0xcc/0x2c0 [] alloc_pages_current+0x8d/0x120 [] __page_cache_alloc+0x91/0xc0 [] pagecache_get_page+0x56/0x1e0 [] grab_cache_page_write_begin+0x26/0x40 [] ext4_da_write_begin+0xa1/0x300 [] ? ext4_da_write_end+0x124/0x2b0 [] generic_perform_write+0xc0/0x1a0 [] __generic_file_write_iter+0x188/0x1e0 [] ext4_file_write_iter+0xf0/0x340 [] __vfs_write+0xaa/0xe0 [] vfs_write+0xa2/0x1a0 [] SyS_write+0x46/0xa0 [] entry_SYSCALL_64_fastpath+0x12/0x71 Some 11 hours later: sshd[785]: segfault at 1f0 ip 7f03bb94ae5c sp 7ffe9eb54470 error 4 in ld-2.17.so[7f03bb94+21000] sh[787]: segfault at 1f0 ip 7f6b4a0dfe5c sp 7ffe3d4a71e0 error 4 in ld-2.17.so[7f6b4a0d5000+21000] systemd-cgroups[788]: segfault at 1f0 ip 7f4baa82ce5c sp 7ffd28e4c4b0 error 4 in ld-2.17.so[7f4baa822000+21000] sshd[791]: segfault at 1f0 ip 7ff8c8a8ce5c sp 7ffede9e1c20 error 4 in ld-2.17.so[7ff8c8a82000+21000] sshd[792]: segfault at 1f0 ip 7f183cf75e5c sp 7ffc81ab7160 error 4 in ld-2.17.so[7f183cf6b000+21000] sshd[793]: segfault at 1f0 ip 7f3c665ece5c sp 7ffd9a13c850 error 4 in ld-2.17.so[7f3c665e2000+21000] From isolated testing, this does not occur on kernel 4.5.x - however I have not verified this myself. The kernel config used can be found in the kernel-xen git repo if it assists in debugging: http://xen.crc.id.au/git/?p=kernel-xen;a=summary -- Steven Haigh Email: net...@crc.id.au Web: https://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897 signature.asc Description: OpenPGP digital signature ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] rcu_sched self-detected stall on CPU on kernel 4.4.5/6 in PV DomU
Just wanted to give a bit of a poke about this. Currently running kernel 4.4.6 in a PV DomU and still occasionally getting hangs. Also stumbled across this that may be related: https://lkml.org/lkml/2016/2/4/724 My latest hang shows: [339844.594001] INFO: rcu_sched self-detected stall on CPU [339844.594001] 1-...: (287557828 ticks this GP) idle=4cb/141/0 softirq=1340383/1340384 fqs=95372371 [339844.594001] (t=287566692 jiffies g=999283 c=999282 q=1725381) [339844.594001] Task dump for CPU 1: [339844.594001] find R running task 0 2840 2834 0x0088 [339844.594001] 818d0c00 88007fd03c58 810a625f 0001 [339844.594001] 818d0c00 88007fd03c70 810a8699 0002 [339844.594001] 88007fd03ca0 810d0e5a 88007fd170c0 818d0c00 [339844.594001] Call Trace: [339844.594001] [] sched_show_task+0xaf/0x110 [339844.594001] [] dump_cpu_task+0x39/0x40 [339844.594001] [] rcu_dump_cpu_stacks+0x8a/0xc0 [339844.594001] [] rcu_check_callbacks+0x424/0x7a0 [339844.594001] [] ? account_system_time+0x81/0x110 [339844.594001] [] ? account_process_tick+0x61/0x160 [339844.594001] [] ? tick_sched_do_timer+0x30/0x30 [339844.594001] [] update_process_times+0x39/0x60 [339844.594001] [] tick_sched_handle.isra.15+0x36/0x50 [339844.594001] [] tick_sched_timer+0x3d/0x70 [339844.594001] [] __hrtimer_run_queues+0xf2/0x250 [339844.594001] [] hrtimer_interrupt+0xa8/0x190 [339844.594001] [] xen_timer_interrupt+0x2e/0x140 [339844.594001] [] handle_irq_event_percpu+0x55/0x1e0 [339844.594001] [] handle_percpu_irq+0x3a/0x50 [339844.594001] [] generic_handle_irq+0x22/0x30 [339844.594001] [] __evtchn_fifo_handle_events+0x15f/0x180 [339844.594001] [] evtchn_fifo_handle_events+0x10/0x20 [339844.594001] [] __xen_evtchn_do_upcall+0x43/0x80 [339844.594001] [] xen_evtchn_do_upcall+0x30/0x50 [339844.594001] [] xen_hvm_callback_vector+0x82/0x90 [339844.594001] [] ? _raw_spin_lock+0x10/0x3 On 2016-03-19 08:46, Steven Haigh wrote: On 19/03/2016 8:40 AM, Steven Haigh wrote: Hi all, So I'd just like to give this a prod. I'm still getting DomU's randomly go to 100% CPU usage using kernel 4.4.6 now. It seems running 4.4.6 as the DomU does not induce these problems. Sorry - slight correction. Running 4.4.6 as the Dom0 kernel doesn't show these errors. Only in the DomU. Latest crash message from today: INFO: rcu_sched self-detected stall on CPU 0-...: (20869552 ticks this GP) idle=9c9/141/0 softirq=1440865/1440865 fqs=15068 (t=20874993 jiffies g=1354899 c=1354898 q=798) rcu_sched kthread starved for 20829030 jiffies! g1354899 c1354898 f0x0 s3 ->state=0x0 Task dump for CPU 0: kworker/u4:1R running task0 5853 2 0x0088 Workqueue: writeback wb_workfn (flush-202:0) 818d0c00 88007fc03c58 810a625f 818d0c00 88007fc03c70 810a8699 0001 88007fc03ca0 810d0e5a 88007fc170c0 818d0c00 Call Trace: [] sched_show_task+0xaf/0x110 [] dump_cpu_task+0x39/0x40 [] rcu_dump_cpu_stacks+0x8a/0xc0 [] rcu_check_callbacks+0x424/0x7a0 [] ? account_system_time+0x81/0x110 [] ? account_process_tick+0x61/0x160 [] ? tick_sched_do_timer+0x30/0x30 [] update_process_times+0x39/0x60 [] tick_sched_handle.isra.15+0x36/0x50 [] tick_sched_timer+0x3d/0x70 [] __hrtimer_run_queues+0xf2/0x250 [] hrtimer_interrupt+0xa8/0x190 [] xen_timer_interrupt+0x2e/0x140 [] handle_irq_event_percpu+0x55/0x1e0 [] handle_percpu_irq+0x3a/0x50 [] generic_handle_irq+0x22/0x30 [] __evtchn_fifo_handle_events+0x15f/0x180 [] evtchn_fifo_handle_events+0x10/0x20 [] __xen_evtchn_do_upcall+0x43/0x80 [] xen_evtchn_do_upcall+0x30/0x50 [] xen_hvm_callback_vector+0x82/0x90 [] ? queued_spin_lock_slowpath+0x22/0x170 [] _raw_spin_lock+0x20/0x30 [] writeback_sb_inodes+0x124/0x560 [] ? _raw_spin_unlock_irqrestore+0x16/0x20 [] __writeback_inodes_wb+0x86/0xc0 [] wb_writeback+0x1d6/0x2d0 [] wb_workfn+0x284/0x3e0 [] process_one_work+0x151/0x400 [] worker_thread+0x11a/0x460 [] ? __schedule+0x2bf/0x880 [] ? rescuer_thread+0x2f0/0x2f0 [] kthread+0xc9/0xe0 [] ? kthread_park+0x60/0x60 [] ret_from_fork+0x3f/0x70 [] ? kthread_park+0x60/0x60 This repeats over and over causing 100% CPU usage - eventually on all vcpus assigned to the DomU and the only recovery is 'xl destroy'. I'm currently running Xen 4.6.1 on this system - with kernel 4.4.6 in both the DomU and Dom0. On 17/03/2016 8:39 AM, Steven Haigh wrote: Hi all, I've noticed the following problem that ends up with a non-repsonsive PV DomU using kernel 4.4.5 under heavy disk IO: INFO: rcu_sched self-detected stall on CPU 0-...: (6759098 ticks this GP) idle=cb3/141/0 softirq=3244615/3244615 fqs=4 (t=6762321 jiffies g=2275626 c=2275625 q=54) rcu_sched kthread starved for 6762309 jiffies! g2275
[Xen-devel] rcu_sched self-detected stall on CPU on kernel 4.4.5 in PV DomU
Hi all, I've noticed the following problem that ends up with a non-repsonsive PV DomU using kernel 4.4.5 under heavy disk IO: INFO: rcu_sched self-detected stall on CPU 0-...: (6759098 ticks this GP) idle=cb3/141/0 softirq=3244615/3244615 fqs=4 (t=6762321 jiffies g=2275626 c=2275625 q=54) rcu_sched kthread starved for 6762309 jiffies! g2275626 c2275625 f0x0 s3 ->state=0x0 Task dump for CPU 0: updatedbR running task0 6027 6021 0x0088 818d0c00 88007fc03c58 810a625f 818d0c00 88007fc03c70 810a8699 0001 88007fc03ca0 810d0e5a 88007fc170c0 818d0c00 Call Trace: [] sched_show_task+0xaf/0x110 [] dump_cpu_task+0x39/0x40 [] rcu_dump_cpu_stacks+0x8a/0xc0 [] rcu_check_callbacks+0x424/0x7a0 [] ? account_system_time+0x81/0x110 [] ? account_process_tick+0x61/0x160 [] ? tick_sched_do_timer+0x30/0x30 [] update_process_times+0x39/0x60 [] tick_sched_handle.isra.15+0x36/0x50 [] tick_sched_timer+0x3d/0x70 [] __hrtimer_run_queues+0xf2/0x250 [] hrtimer_interrupt+0xa8/0x190 [] xen_timer_interrupt+0x2e/0x140 [] handle_irq_event_percpu+0x55/0x1e0 [] handle_percpu_irq+0x3a/0x50 [] generic_handle_irq+0x22/0x30 [] __evtchn_fifo_handle_events+0x15f/0x180 [] evtchn_fifo_handle_events+0x10/0x20 [] __xen_evtchn_do_upcall+0x43/0x80 [] xen_evtchn_do_upcall+0x30/0x50 [] xen_hvm_callback_vector+0x82/0x90 [] ? queued_spin_lock_slowpath+0x10/0x170 [] _raw_spin_lock+0x20/0x30 [] find_inode_fast+0x61/0xa0 [] iget_locked+0x6e/0x170 [] ext4_iget+0x33/0xae0 [] ? out_of_line_wait_on_bit+0x72/0x80 [] ext4_iget_normal+0x30/0x40 [] ext4_lookup+0xd5/0x140 [] lookup_real+0x1d/0x50 [] __lookup_hash+0x33/0x40 [] walk_component+0x177/0x280 [] path_lookupat+0x60/0x110 [] filename_lookup+0x9c/0x150 [] ? kfree+0x10d/0x290 [] ? call_filldir+0x9c/0x130 [] ? getname_flags+0x4f/0x1f0 [] user_path_at_empty+0x36/0x40 [] vfs_fstatat+0x53/0xa0 [] ? __fput+0x169/0x1d0 [] SYSC_newlstat+0x22/0x40 [] ? __audit_syscall_exit+0x1f0/0x270 [] ? syscall_slow_exit_work+0x3f/0xc0 [] ? __audit_syscall_entry+0xaf/0x100 [] SyS_newlstat+0xe/0x10 [] entry_SYSCALL_64_fastpath+0x12/0x71 This ends up with the system not responding at 100% CPU usage. Has anyone else seen this using kernel 4.4.5 in a DomU? -- Steven Haigh Email: net...@crc.id.au Web: https://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897 signature.asc Description: OpenPGP digital signature ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] rcu_sched self-detected stall on CPU on kernel 4.4.5 in PV DomU
Hi all, So I'd just like to give this a prod. I'm still getting DomU's randomly go to 100% CPU usage using kernel 4.4.6 now. It seems running 4.4.6 as the DomU does not induce these problems. Latest crash message from today: INFO: rcu_sched self-detected stall on CPU 0-...: (20869552 ticks this GP) idle=9c9/141/0 softirq=1440865/1440865 fqs=15068 (t=20874993 jiffies g=1354899 c=1354898 q=798) rcu_sched kthread starved for 20829030 jiffies! g1354899 c1354898 f0x0 s3 ->state=0x0 Task dump for CPU 0: kworker/u4:1R running task0 5853 2 0x0088 Workqueue: writeback wb_workfn (flush-202:0) 818d0c00 88007fc03c58 810a625f 818d0c00 88007fc03c70 810a8699 0001 88007fc03ca0 810d0e5a 88007fc170c0 818d0c00 Call Trace: [] sched_show_task+0xaf/0x110 [] dump_cpu_task+0x39/0x40 [] rcu_dump_cpu_stacks+0x8a/0xc0 [] rcu_check_callbacks+0x424/0x7a0 [] ? account_system_time+0x81/0x110 [] ? account_process_tick+0x61/0x160 [] ? tick_sched_do_timer+0x30/0x30 [] update_process_times+0x39/0x60 [] tick_sched_handle.isra.15+0x36/0x50 [] tick_sched_timer+0x3d/0x70 [] __hrtimer_run_queues+0xf2/0x250 [] hrtimer_interrupt+0xa8/0x190 [] xen_timer_interrupt+0x2e/0x140 [] handle_irq_event_percpu+0x55/0x1e0 [] handle_percpu_irq+0x3a/0x50 [] generic_handle_irq+0x22/0x30 [] __evtchn_fifo_handle_events+0x15f/0x180 [] evtchn_fifo_handle_events+0x10/0x20 [] __xen_evtchn_do_upcall+0x43/0x80 [] xen_evtchn_do_upcall+0x30/0x50 [] xen_hvm_callback_vector+0x82/0x90 [] ? queued_spin_lock_slowpath+0x22/0x170 [] _raw_spin_lock+0x20/0x30 [] writeback_sb_inodes+0x124/0x560 [] ? _raw_spin_unlock_irqrestore+0x16/0x20 [] __writeback_inodes_wb+0x86/0xc0 [] wb_writeback+0x1d6/0x2d0 [] wb_workfn+0x284/0x3e0 [] process_one_work+0x151/0x400 [] worker_thread+0x11a/0x460 [] ? __schedule+0x2bf/0x880 [] ? rescuer_thread+0x2f0/0x2f0 [] kthread+0xc9/0xe0 [] ? kthread_park+0x60/0x60 [] ret_from_fork+0x3f/0x70 [] ? kthread_park+0x60/0x60 This repeats over and over causing 100% CPU usage - eventually on all vcpus assigned to the DomU and the only recovery is 'xl destroy'. I'm currently running Xen 4.6.1 on this system - with kernel 4.4.6 in both the DomU and Dom0. On 17/03/2016 8:39 AM, Steven Haigh wrote: > Hi all, > > I've noticed the following problem that ends up with a non-repsonsive PV > DomU using kernel 4.4.5 under heavy disk IO: > > INFO: rcu_sched self-detected stall on CPU > 0-...: (6759098 ticks this GP) idle=cb3/141/0 > softirq=3244615/3244615 fqs=4 > (t=6762321 jiffies g=2275626 c=2275625 q=54) > rcu_sched kthread starved for 6762309 jiffies! g2275626 c2275625 f0x0 s3 > ->state=0x0 > Task dump for CPU 0: > updatedbR running task0 6027 6021 0x0088 > 818d0c00 88007fc03c58 810a625f > 818d0c00 88007fc03c70 810a8699 0001 > 88007fc03ca0 810d0e5a 88007fc170c0 818d0c00 > Call Trace: >[] sched_show_task+0xaf/0x110 > [] dump_cpu_task+0x39/0x40 > [] rcu_dump_cpu_stacks+0x8a/0xc0 > [] rcu_check_callbacks+0x424/0x7a0 > [] ? account_system_time+0x81/0x110 > [] ? account_process_tick+0x61/0x160 > [] ? tick_sched_do_timer+0x30/0x30 > [] update_process_times+0x39/0x60 > [] tick_sched_handle.isra.15+0x36/0x50 > [] tick_sched_timer+0x3d/0x70 > [] __hrtimer_run_queues+0xf2/0x250 > [] hrtimer_interrupt+0xa8/0x190 > [] xen_timer_interrupt+0x2e/0x140 > [] handle_irq_event_percpu+0x55/0x1e0 > [] handle_percpu_irq+0x3a/0x50 > [] generic_handle_irq+0x22/0x30 > [] __evtchn_fifo_handle_events+0x15f/0x180 > [] evtchn_fifo_handle_events+0x10/0x20 > [] __xen_evtchn_do_upcall+0x43/0x80 > [] xen_evtchn_do_upcall+0x30/0x50 > [] xen_hvm_callback_vector+0x82/0x90 >[] ? queued_spin_lock_slowpath+0x10/0x170 > [] _raw_spin_lock+0x20/0x30 > [] find_inode_fast+0x61/0xa0 > [] iget_locked+0x6e/0x170 > [] ext4_iget+0x33/0xae0 > [] ? out_of_line_wait_on_bit+0x72/0x80 > [] ext4_iget_normal+0x30/0x40 > [] ext4_lookup+0xd5/0x140 > [] lookup_real+0x1d/0x50 > [] __lookup_hash+0x33/0x40 > [] walk_component+0x177/0x280 > [] path_lookupat+0x60/0x110 > [] filename_lookup+0x9c/0x150 > [] ? kfree+0x10d/0x290 > [] ? call_filldir+0x9c/0x130 > [] ? getname_flags+0x4f/0x1f0 > [] user_path_at_empty+0x36/0x40 > [] vfs_fstatat+0x53/0xa0 > [] ? __fput+0x169/0x1d0 > [] SYSC_newlstat+0x22/0x40 > [] ? __audit_syscall_exit+0x1f0/0x270 > [] ? syscall_slow_exit_work+0x3f/0xc0 > [] ? __audit_syscall_entry+0xaf/0x100 > [] SyS_newlstat+0xe/0x10 > [] entry_SYSCALL_64_fastpath+0x12/0x71 > > This ends up with the system not responding at 100% CPU usa
Re: [Xen-devel] rcu_sched self-detected stall on CPU on kernel 4.4.5 in PV DomU
On 19/03/2016 8:40 AM, Steven Haigh wrote: > Hi all, > > So I'd just like to give this a prod. I'm still getting DomU's randomly > go to 100% CPU usage using kernel 4.4.6 now. It seems running 4.4.6 as > the DomU does not induce these problems. Sorry - slight correction. Running 4.4.6 as the Dom0 kernel doesn't show these errors. Only in the DomU. > > Latest crash message from today: > INFO: rcu_sched self-detected stall on CPU > 0-...: (20869552 ticks this GP) idle=9c9/141/0 > softirq=1440865/1440865 fqs=15068 > (t=20874993 jiffies g=1354899 c=1354898 q=798) > rcu_sched kthread starved for 20829030 jiffies! g1354899 c1354898 f0x0 > s3 ->state=0x0 > Task dump for CPU 0: > kworker/u4:1R running task0 5853 2 0x0088 > Workqueue: writeback wb_workfn (flush-202:0) > 818d0c00 88007fc03c58 810a625f > 818d0c00 88007fc03c70 810a8699 0001 > 88007fc03ca0 810d0e5a 88007fc170c0 818d0c00 > Call Trace: >[] sched_show_task+0xaf/0x110 > [] dump_cpu_task+0x39/0x40 > [] rcu_dump_cpu_stacks+0x8a/0xc0 > [] rcu_check_callbacks+0x424/0x7a0 > [] ? account_system_time+0x81/0x110 > [] ? account_process_tick+0x61/0x160 > [] ? tick_sched_do_timer+0x30/0x30 > [] update_process_times+0x39/0x60 > [] tick_sched_handle.isra.15+0x36/0x50 > [] tick_sched_timer+0x3d/0x70 > [] __hrtimer_run_queues+0xf2/0x250 > [] hrtimer_interrupt+0xa8/0x190 > [] xen_timer_interrupt+0x2e/0x140 > [] handle_irq_event_percpu+0x55/0x1e0 > [] handle_percpu_irq+0x3a/0x50 > [] generic_handle_irq+0x22/0x30 > [] __evtchn_fifo_handle_events+0x15f/0x180 > [] evtchn_fifo_handle_events+0x10/0x20 > [] __xen_evtchn_do_upcall+0x43/0x80 > [] xen_evtchn_do_upcall+0x30/0x50 > [] xen_hvm_callback_vector+0x82/0x90 >[] ? queued_spin_lock_slowpath+0x22/0x170 > [] _raw_spin_lock+0x20/0x30 > [] writeback_sb_inodes+0x124/0x560 > [] ? _raw_spin_unlock_irqrestore+0x16/0x20 > [] __writeback_inodes_wb+0x86/0xc0 > [] wb_writeback+0x1d6/0x2d0 > [] wb_workfn+0x284/0x3e0 > [] process_one_work+0x151/0x400 > [] worker_thread+0x11a/0x460 > [] ? __schedule+0x2bf/0x880 > [] ? rescuer_thread+0x2f0/0x2f0 > [] kthread+0xc9/0xe0 > [] ? kthread_park+0x60/0x60 > [] ret_from_fork+0x3f/0x70 > [] ? kthread_park+0x60/0x60 > > This repeats over and over causing 100% CPU usage - eventually on all > vcpus assigned to the DomU and the only recovery is 'xl destroy'. > > I'm currently running Xen 4.6.1 on this system - with kernel 4.4.6 in > both the DomU and Dom0. > > On 17/03/2016 8:39 AM, Steven Haigh wrote: >> Hi all, >> >> I've noticed the following problem that ends up with a non-repsonsive PV >> DomU using kernel 4.4.5 under heavy disk IO: >> >> INFO: rcu_sched self-detected stall on CPU >> 0-...: (6759098 ticks this GP) idle=cb3/141/0 >> softirq=3244615/3244615 fqs=4 >> (t=6762321 jiffies g=2275626 c=2275625 q=54) >> rcu_sched kthread starved for 6762309 jiffies! g2275626 c2275625 f0x0 s3 >> ->state=0x0 >> Task dump for CPU 0: >> updatedbR running task0 6027 6021 0x0088 >> 818d0c00 88007fc03c58 810a625f >> 818d0c00 88007fc03c70 810a8699 0001 >> 88007fc03ca0 810d0e5a 88007fc170c0 818d0c00 >> Call Trace: >>[] sched_show_task+0xaf/0x110 >> [] dump_cpu_task+0x39/0x40 >> [] rcu_dump_cpu_stacks+0x8a/0xc0 >> [] rcu_check_callbacks+0x424/0x7a0 >> [] ? account_system_time+0x81/0x110 >> [] ? account_process_tick+0x61/0x160 >> [] ? tick_sched_do_timer+0x30/0x30 >> [] update_process_times+0x39/0x60 >> [] tick_sched_handle.isra.15+0x36/0x50 >> [] tick_sched_timer+0x3d/0x70 >> [] __hrtimer_run_queues+0xf2/0x250 >> [] hrtimer_interrupt+0xa8/0x190 >> [] xen_timer_interrupt+0x2e/0x140 >> [] handle_irq_event_percpu+0x55/0x1e0 >> [] handle_percpu_irq+0x3a/0x50 >> [] generic_handle_irq+0x22/0x30 >> [] __evtchn_fifo_handle_events+0x15f/0x180 >> [] evtchn_fifo_handle_events+0x10/0x20 >> [] __xen_evtchn_do_upcall+0x43/0x80 >> [] xen_evtchn_do_upcall+0x30/0x50 >> [] xen_hvm_callback_vector+0x82/0x90 >>[] ? queued_spin_lock_slowpath+0x10/0x170 >> [] _raw_spin_lock+0x20/0x30 >> [] find_inode_fast+0x61/0xa0 >> [] iget_locked+0x6e/0x170 >> [] ext4_iget+0x33/0xae0 >> [] ? out_of_line_wait_on_bit+0x72/0x80 >> [] ext4_iget_normal+0x30/0x40 >> [] ext4_lookup+0xd5/0x140 >> [] lookup_rea
[Xen-devel] pygrub detects false keypress on console
Hi all, Testing on Xen 4.6.1 - and I've noticed that if I start an EL7 instance and auto-attach to the console, that pygrub seems to see a key press and waits. Sometimes, I have to press ^] to make it actually boot. The guest is EL7 with grub2 config file. Started with: # xl create /etc/xen/configfile.vm -c I believe this behaviour was seen in the past - but had been fixed in - maybe 4.5.x? Has anyone else noticed this? -- Steven Haigh Email: net...@crc.id.au Web: https://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897 signature.asc Description: OpenPGP digital signature ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] Fixation on polarssl 1.1.4 - EOL was 2013-10-01
Hi all, Just been looking at the polarssl parts in Xen 4.6 and others - seems like we're hard coded to version 1.1.4 which was released on 31st May 2012. Branch 1.1.x has been EOL for a number of years, 1.2.x has been EOL since Jan. It's now called mbedtls and current versions are 2.2.1 released in Jan this year. I'm not exactly clear on what polarssl is used for (and why not openssl?) - but is it time this was shown some loving? -- Steven Haigh Email: net...@crc.id.au Web: https://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897 signature.asc Description: OpenPGP digital signature ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [BUG?] qemuu only built with i386-softmmu
On 05/02/16 20:51, Ian Campbell wrote: > On Fri, 2016-02-05 at 08:09 +1100, Steven Haigh wrote: >> In building my Xen 4.6.0 packages, I disable qemu-traditional and ONLY >> build qemu-upstream - however as the value for i386-softmmu is not based >> on variables, I'm not sure this makes a difference. > > QEMU in a Xen system only provides device model (DM) emulation and not any > CPU instruction emulation, so the nominal arch doesn't actually matter and > Xen build's i386 everywhere as a basically arbitrary choice. > > It happens that the Xen DM part of QEMU is quite closely tied to the x86 > scaffolding for various historical reasons, so we end up using qemu-system- > i386 even e.g. on ARM! > > This comes up a lot, So I've also pasted the two paras above into a new > section in http://wiki.xenproject.org/wiki/QEMU_Upstream . If anyone thinks > the above is inaccurate then please edit the wiki (and post here too if you > like). I think this is a great addition that explains the situation well. Documenting these is always a good thing. > > On thing I was sure on (so didn't write) is whether the second paragraph > could have an extra sentence: > > If you are using a distro supplied QEMU then the qemu-system-x86_64 > could also be used, but it makes no practical difference to the > functionality of the system. > > I wasn't sure if that was true (I suspect it is) and in any case I think > various bits of libxl etc will look for qemu-system-i386 in various paths > so a user would need to try reasonably hard to do so by giving an explicit > path and there is no real reason to do so maybe better not to muddy the > waters? Maybe go along the lines of: "There is no practical difference between qemu-system-i386 and qemu-system-x86_64 therefore both can be interchanged freely." -- Steven Haigh Email: net...@crc.id.au Web: https://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897 signature.asc Description: OpenPGP digital signature ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [QUESTION] x86_64 -> i386/i686 CPU translation between xl and qemu binary?
On 2016-02-05 09:22, Andrew Cooper wrote: On 04/02/2016 22:06, Alex Braunegg wrote: root 30511 46.4 0.1 398728 1860 ?RLsl 08:47 0:27 /usr/lib/xen/bin/qemu-system-i386 -xen-domid 6 -chardev socket,id=libxl-cmd,path=/var/run/xen/qmp-libxl-6,server,n owait -no-shutdown -mon chardev=libxl-cmd,mode=control -chardev socket,id=libxenstat-cmd,path=/var/run/xen/qmp-libxenstat-6,server,nowait -mon chardev=libxenstat-cmd,mode=control -nodefaults -name test2 -vnc 0.0.0.0:0,websocket,x509=/etc/pki/xen,password,to=99 -display none -serial pty -device VGA,vgamem_mb=16 -boot order=cd -usb -usbdevice tablet -soundhw ac97 -device rtl8139,id=nic0,netdev=net0,mac=00:16:3e:f1:48:8c -netdev type=tap,id=net0,ifname=vif6.0-emu,script=no,downscript=no -machine xenfv -m 496 -drive file=/dev/zvol/stor age0/xen/test2/disk_sda,if=ide,index=0,media=disk,format=raw,cache=writeback -drive file=/storage0/data-shares/iso/CentOS-6.5-x86_64-minimal.iso,if=ide,index=2, readonly=on,media=c drom,format=raw,cache=writeback,id=ide-5632 -- So - to me it appears that xl is performing some sort of x86_64 -> i386/i686 instruction translation to make things work. Would this not be introducing a performance impediment by having some sort of extra translation processing going on between xl and the qemu binary? Qemu is only used for device emulation when used with Xen, not CPU emulation. The "-machine xenfv" tells this to Qemu, and "-xen-domid 6" tells it which Xen domain to connect to. All HVM domains run with hardware virtualisation extensions, which are managed by Xen itself. Hi Andrew, Thanks for this response - to ensure I have this correct, there is no need for qemu-upstream to build qemu-system-x86_64 as the CPU is directly handled by Xen and not by qemu - thereby passing through the capabilities of the CPU directly to the guest. As such, as long as qemu starts on a 64 bit machine it will be able to run 64 bit OS/kernel etc. I ask this as I see a number of qemu packages that do include qemu-system-x86_64 as well as qemu-system-i386 - which makes me seek clarification. I would assume that these are just not built to use Xen as the hypervisor for hardware acceleration? -- Steven Haigh Email: net...@crc.id.au Web: https://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [BUG?] qemuu only built with i386-softmmu
Hi all, Looking specifically at 4.6.0. It seems that the Makefile for qemuu uses the following: $$source/configure --enable-xen --target-list=i386-softmmu \ $(QEMU_XEN_ENABLE_DEBUG) \ --prefix=$(LIBEXEC) \ --libdir=$(LIBEXEC_LIB) \ --includedir=$(LIBEXEC_INC) \ --source-path=$$source \ --extra-cflags="-I$(XEN_ROOT)/tools/include \ -I$(XEN_ROOT)/tools/libxc/include \ -I$(XEN_ROOT)/tools/xenstore/include \ -I$(XEN_ROOT)/tools/xenstore/compat/include \ $(EXTRA_CFLAGS_QEMU_XEN)" \ --extra-ldflags="-L$(XEN_ROOT)/tools/libxc \ -L$(XEN_ROOT)/tools/xenstore \ $(QEMU_UPSTREAM_RPATH)" \ --bindir=$(LIBEXEC_BIN) \ --datadir=$(SHAREDIR)/qemu-xen \ --localstatedir=$(localstatedir) \ --disable-kvm \ --disable-docs \ --disable-guest-agent \ --python=$(PYTHON) \ $(CONFIG_QEMUU_EXTRA_ARGS) \ --cpu=$(IOEMU_CPU_ARCH) \ $(IOEMU_CONFIGURE_CROSS); \ $(MAKE) all As such, this only builds the 32 bit version of qemuu. It seems that starting a HVM with more than 4Gb of RAM fails: libxl: debug: libxl_event.c:691:libxl__ev_xswatch_deregister: watch w=0xa4d188: deregister unregistered xc: detail: elf_parse_binary: phdr: paddr=0x10 memsz=0x5b3a4 xc: detail: elf_parse_binary: memory: 0x10 -> 0x15b3a4 xc: detail: VIRTUAL MEMORY ARRANGEMENT: xc: detail: Loader: 0010->0015b3a4 xc: detail: Modules: -> xc: detail: TOTAL:->00017f00 xc: detail: ENTRY:00100600 xc: detail: PHYSICAL MEMORY ALLOCATION: xc: detail: 4KB PAGES: 0x0200 xc: detail: 2MB PAGES: 0x05f7 xc: detail: 1GB PAGES: 0x0003 xc: detail: elf_load_binary: phdr 0 at 0x7ff67320f000 -> 0x7ff673260910 xc: error: Could not clear special pages (22 = Invalid argument): Internal error libxl: error: libxl_dom.c:1003:libxl__build_hvm: hvm building failed In building my Xen 4.6.0 packages, I disable qemu-traditional and ONLY build qemu-upstream - however as the value for i386-softmmu is not based on variables, I'm not sure this makes a difference. My question is, should this also build x86_64-softmmu as well as i386-softmmu at a bare minimum? -- Steven Haigh Email: net...@crc.id.au Web: https://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897 signature.asc Description: OpenPGP digital signature ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] bridge call iptables being forced
On 2015-11-19 12:46, Juan Rossi wrote: Hi I am sending this due the change of behaviour in some parts, and perhaps it needs some code amendments, unsure if the devel list is the best place, fell free to point me to the right place for this. Let me know if I should load a bug instead. I'm tracking this at: http://xen.crc.id.au/bugs/view.php?id=62 diff --git a/tools/hotplug/Linux/vif-bridge b/tools/hotplug/Linux/vif-bridge index 3d72ca4..7fc6650 100644 --- a/tools/hotplug/Linux/vif-bridge +++ b/tools/hotplug/Linux/vif-bridge @@ -93,7 +93,16 @@ case "$command" in ;; esac -handle_iptable +brcalliptables=$(sysctl -n net.bridge.bridge-nf-call-iptables 2>/dev/null) +brcalliptables=${brcalliptables:-0} + +brcallip6tables=$(sysctl -n net.bridge.bridge-nf-call-ip6tables 2>/dev/null) +brcallip6tables=${brcallip6tables:-0} + +if [ "$brcalliptables" -eq "1" -a "$brcallip6tables" -eq "1" ]; +then + handle_iptable +fi call_hooks vif post I'm not a fan of this as it will also enable the call to handle_iptable() if people create their own firewall rules - ie these will be true - hence the rule will get loaded anyway. My comment on the bug report is included below to hopefully get further input from people: Thinking about this further - as it is a change in behaviour for a point release, I believe we should do the following: 1) Create a new option in /etc/xen/xl.conf - and default it to False. 2) Name an option "autocreate_firewall_files" 3) Evaluate autocreate_firewall_rules in vif-common.sh function handle_iptable() I suggest something like the following psuedo code: if [ $autocreate_firewall_rules == 0 ]; then return fi Happy to start debate on the correct way of handling this :) Hopefully this can lead to some further debate. -- Steven Haigh Email: net...@crc.id.au Web: https://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] PV random device
On 2015-10-06 15:29, Andy Smith wrote: - Your typical EntropyKey or OneRNG can generate quite a bit of entropy. Maybe 32 kilobytes per second for ~$50 each. If you can get one... :) - You can access them over the network so no USB passthrough needed. Care to give details on this? I've got a HWRNG on a system that I'd like to 'share' the entropy source out - but haven't found anything to do this. -- Steven Haigh Email: net...@crc.id.au Web: https://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] RFC: change to 6 months release cycle
On 5/10/2015 10:23 PM, Wei Liu wrote: > On Mon, Oct 05, 2015 at 05:04:19AM -0600, Jan Beulich wrote: >>>>> On 02.10.15 at 19:43, <wei.l...@citrix.com> wrote: >>> The main objection from previous discussion seems to be that "shorter >>> release cycle creates burdens for downstream projects". I couldn't >>> quite get the idea, but I think we can figure out a way to sort that >>> out once we know what exactly the burdens are. >> >> I don't recall it that way. My main objection remains the resulting >> higher burden of maintaining stable trees. Right now, most of the >> time we have two trees to maintain. A 6-month release cycle means >> three of them (shortening the time we maintain those trees doesn't >> seem a viable option to me). >> >> Similar considerations apply to security maintenance of older trees. > Just to throw around some ideas: we can have more stable tree > maintainers, we can pick a stable tree every X releases etc etc. So everyone else in the industry is increasing their support periods for stable things, and we're wanting to go the opposite way? Sorry - but this is nuts. Have a stable branch that is actually supported properly with backports of security fixes etc - then have a 'bleeding edge' branch that rolls with the punches. Remember that folks are still running Xen 3.4 on EL5 - and will be at least until 2017. I still run the occasional patch for 4.2, and most people are on either 4.4 or testing with 4.5 when running with EL6. EL6 is supported until November 30, 2020. EL7 until 2024. People are not exactly thrilled with EL7 in the virt area - but will eventually move to it (or directly to EL8 or EL9). The 6 month release cycle is exactly why people don't run Fedora on their production environments. Why are we suddenly wanting the same release schedule for Xen? Sorry - but I'm VERY much against this proposal. Focus on stable and complete, not Ooo Shiny! -- Steven Haigh Email: net...@crc.id.au Web: http://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897 signature.asc Description: OpenPGP digital signature ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] RFC: change to 6 months release cycle
On 5/10/2015 10:44 PM, Ian Campbell wrote: > On Mon, 2015-10-05 at 12:23 +0100, Wei Liu wrote: >> we can pick a stable tree every X releases etc etc. > > I think switching to an LTS style model, i.e. only supporting 1/N for > longer than it takes to release the next major version might be interesting > to consider. I'm thinking e.g. of N=4 with a 6 month cycle. ^^ This. > I think some of our downstreams (i.e. distros) would like this, since it > gives them releases which are supported for a length of time more like > their own release cycles. ^^ This as well :) -- Steven Haigh Email: net...@crc.id.au Web: http://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897 signature.asc Description: OpenPGP digital signature ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] RFC: change to 6 months release cycle
On 6/10/2015 12:05 AM, George Dunlap wrote: > On Mon, Oct 5, 2015 at 12:44 PM, Steven Haigh <net...@crc.id.au> wrote: >> On 5/10/2015 10:23 PM, Wei Liu wrote: >>> On Mon, Oct 05, 2015 at 05:04:19AM -0600, Jan Beulich wrote: >>>>>>> On 02.10.15 at 19:43, <wei.l...@citrix.com> wrote: >>>>> The main objection from previous discussion seems to be that "shorter >>>>> release cycle creates burdens for downstream projects". I couldn't >>>>> quite get the idea, but I think we can figure out a way to sort that >>>>> out once we know what exactly the burdens are. >>>> >>>> I don't recall it that way. My main objection remains the resulting >>>> higher burden of maintaining stable trees. Right now, most of the >>>> time we have two trees to maintain. A 6-month release cycle means >>>> three of them (shortening the time we maintain those trees doesn't >>>> seem a viable option to me). >>>> >>>> Similar considerations apply to security maintenance of older trees. >> >>> Just to throw around some ideas: we can have more stable tree >>> maintainers, we can pick a stable tree every X releases etc etc. >> >> So everyone else in the industry is increasing their support periods for >> stable things, and we're wanting to go the opposite way? >> >> Sorry - but this is nuts. Have a stable branch that is actually >> supported properly with backports of security fixes etc - then have a >> 'bleeding edge' branch that rolls with the punches. >> >> Remember that folks are still running Xen 3.4 on EL5 - and will be at >> least until 2017. I still run the occasional patch for 4.2, and most >> people are on either 4.4 or testing with 4.5 when running with EL6. >> >> EL6 is supported until November 30, 2020. EL7 until 2024. People are not >> exactly thrilled with EL7 in the virt area - but will eventually move to >> it (or directly to EL8 or EL9). >> >> The 6 month release cycle is exactly why people don't run Fedora on >> their production environments. Why are we suddenly wanting the same >> release schedule for Xen? >> >> Sorry - but I'm VERY much against this proposal. Focus on stable and >> complete, not Ooo Shiny! > > I think you're talking about something completely different. > > Wei is talking about releasing *more often*; you're talking about > having *longer support windows*. I think we are both along the same lines - however we both have different points. The problem is, the more releases you have in a support window, the more you have to maintain. I did like Ian's idea of a new stable / lts / whatever you want to call it every 4 x normal releases at 6 month timing. This would mean an LTS release would be supported for 2 years. I would really like to see: LTS = 4 year full support + 1 year security fixes only Rolling Release = 6 - 12 months between releases. Is this possible? Not really sure - but the bigger end users don't want to have to retest everything every year. Honestly, even an LTS of *longer* than 4 years would be good - but I'm not sure that is even in the realm of consideration. > Nobody is suggesting that we shouldn't have releases that are > supported for long periods of time. What Wei is proposing is that > instead of releasing every 0.75 years and supporting every release for > N years, we release every 0.5 years, but every 1.0 (or 1.5) years make > a release that we support for N years. Many projects do this, > including the Linux kernel. True, but the kernel has several orders of magnitude more resources contributed. I still do my best to keep a security patched package of 4.2 for EL6 users - some of who don't want to move to XL due to reworking all their management tools. -- Steven Haigh Email: net...@crc.id.au Web: https://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897 signature.asc Description: OpenPGP digital signature ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Xen 4.5.1 released
On 23/06/2015 8:45 PM, Ian Jackson wrote: Ian Jackson writes (Re: [Xen-devel] Xen 4.5.1 released): M A Young writes (Re: [Xen-devel] Xen 4.5.1 released): I don't believe this release has the qemu-xen-traditional half of XSA-135. If this wasn't deliberate it might be worth noting it somewhere. You're right. It appears that the patch for XSA-135 was never applied to qemu-traditional, due to an oversight. The XSA-135 fix was missing everywhere. I have now applied it (to all trees 4.1 and onward). Out of interest, is the plan now to re-release a fixed 4.5.1 archive or document the lack of the XSA135 patches and allow people to patch manually? -- Steven Haigh Email: net...@crc.id.au Web: http://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897 signature.asc Description: OpenPGP digital signature ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Inplace upgrading 4.4.x - 4.5.0
On 18/02/2015 11:38 PM, Ian Campbell wrote: On Mon, 2015-02-09 at 20:36 +1100, Steven Haigh wrote: This sounds like a packaging issue -- Debian's packages for example jump through some hoops to make sure multiple tools packages can be installed in parallel and the correct ones selected for the currently running hypervisor. Hmmm - that sounds very hacky :\ I've been slowly unpicking the Debian patches and upstreaming bits of them, I'm not sure if I'll manage to get this stuff upstream though since it is a bit more invasive than the other stuff. Anything that helps out here is a good thing. Hmmm Andrew is correct, the errors are all: = xl info == libxl: error: libxl.c:5044:libxl_get_physinfo: getting physinfo: Permission denied EPERM is essential tools/hypervisor version mismatch in most contexts. [...] So, this leads me to wonder - as I'm sure MANY people get bitten by this - how to control (at least to shutdown) DomUs after an in-place upgrade? You should evacuate the host before upgrading it, which is what I suppose most people do as the first step in their maintenance window. Evacuation might involve migrating VMs to another host (perhaps as part of a pool rolling upgrade type manoeuvre) or just shutting them down. Even if no other functions are implemented other than shutdown, I would call that an acceptable functionality. At least this way, you're not hard killing running VMs on reboot. I'd expect that it might be possible to arrange to connect to the VM console and shut it down from within, or possibly to use the xenstore CLI tools to initiate the shutdown externally. After that then you would still end up with some zombie domains since after they have shutdown actually reaping them would require toolstack actions to talk to the hypervisor and you'd hit the version mismatch. In large scale organisations (10+ systems), then yes, I'd say you're probably right. This problem hits those who are smaller than that however. I could list countless people who have been bitten by this in the past - the majority of which are small businesses / hobbyists that don't have the kind of equipment to do any of the above. The expected upgrade path for those types is yum -y update reboot As we know, this falls over in a heap - however I don't think it is beyond the realms of expectation for this to work. -- Steven Haigh Email: net...@crc.id.au Web: http://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897 signature.asc Description: OpenPGP digital signature ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Inplace upgrading 4.4.x - 4.5.0
On 9/02/2015 7:59 PM, Sander Eikelenboom wrote: Monday, February 9, 2015, 9:35:33 AM, you wrote: Hello Steven, upgrades from Xen 4.4 to 4.5 are supposed to work out of the box. Please post more details and we'll try to help you figure out what's wrong. Cheers, Stefano On Sun, 8 Feb 2015, Steven Haigh wrote: Hi all, I was under the impression that you should be able to do in-place upgrades from Xen 4.4 to 4.5 on a system without losing the ability to manage DomUs... This would support upgrades from running systems from Xen 4.4.x to 4.5.0 - only requiring a reboot to boot into the 4.5.0 hypervisor. When I try this in practice, I get a whole heap of permission denied errors and lose control of any running DomUs. Is there some secret sauce that will allow this to work? You are probably running into a mismatch between the running hypervisor (4.4) and the now installed toolstack (4.5) .. for instance when trying to shutdown the VM's to do the reboot. (Since the newly installed hypervisor parts are only loaded and run on the next boot). Correct - It is the 4.4 Hypervisor with 4.5 toolstack. After a reboot, all is good. However this causes the problem - once you update the packages from 4.4 - 4.5, you lose the ability to manage any running DomUs. This is problematic - if only for the fact that you can't shut down running DomUs for the Dom0 reboot. I understand that large jumps in versions isn't supported - but I believe that point versions should be supported using the same toolset. ie 4.2 - 4.3, 4.4 - 4.5 etc. I'm just about to gather some data for it - and I'll make a new thread with what I can gather. -- Steven Haigh Email: net...@crc.id.au Web: http://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897 signature.asc Description: OpenPGP digital signature ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Inplace upgrading 4.4.x - 4.5.0
On 9/02/2015 8:16 PM, Ian Campbell wrote: On Mon, 2015-02-09 at 20:09 +1100, Steven Haigh wrote: On 9/02/2015 7:59 PM, Sander Eikelenboom wrote: Monday, February 9, 2015, 9:35:33 AM, you wrote: Hello Steven, upgrades from Xen 4.4 to 4.5 are supposed to work out of the box. Please post more details and we'll try to help you figure out what's wrong. Cheers, Stefano On Sun, 8 Feb 2015, Steven Haigh wrote: Hi all, I was under the impression that you should be able to do in-place upgrades from Xen 4.4 to 4.5 on a system without losing the ability to manage DomUs... This would support upgrades from running systems from Xen 4.4.x to 4.5.0 - only requiring a reboot to boot into the 4.5.0 hypervisor. When I try this in practice, I get a whole heap of permission denied errors and lose control of any running DomUs. Is there some secret sauce that will allow this to work? You are probably running into a mismatch between the running hypervisor (4.4) and the now installed toolstack (4.5) .. for instance when trying to shutdown the VM's to do the reboot. (Since the newly installed hypervisor parts are only loaded and run on the next boot). Correct - It is the 4.4 Hypervisor with 4.5 toolstack. After a reboot, all is good. However this causes the problem - once you update the packages from 4.4 - 4.5, you lose the ability to manage any running DomUs. This sounds like a packaging issue -- Debian's packages for example jump through some hoops to make sure multiple tools packages can be installed in parallel and the correct ones selected for the currently running hypervisor. Hmmm - that sounds very hacky :\ Otherwise I think the upgrade path is: * shutdown all VMs (or migrate them away) * install new Xen + tools * reboot * restart domains with new tools. I'm afraid that using old tools on a new Xen is not something which is supported, even in the midst of an upgrade and AFAIK never has been. The N-N+1-N+2 statement is normally with reference to live migration (i.e. you can live migrate from a 4.4 system to a 4.5 one). Hmmm Andrew is correct, the errors are all: = xl info == libxl: error: libxl.c:5044:libxl_get_physinfo: getting physinfo: Permission denied libxl_physinfo failed. libxl: error: libxl.c:5534:libxl_get_scheduler: getting domain info list: Permission denied host : xenhost release: 3.14.32-1.el6xen.x86_64 version: #1 SMP Sun Feb 8 15:41:07 AEDT 2015 machine: x86_64 xen_major : 4 xen_minor : 4 xen_extra : .1 xen_version: 4.4.1 xen_caps : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 hvm-3.0-x86_32p hvm-3.0-x86_64 xen_scheduler : (null) xen_pagesize : 4096 platform_params: virt_start=0x8000 xen_changeset : xen_commandline: dom0_mem=1024M cpufreq=xen dom0_max_vcpus=1 dom0_vcpus_pin console=tty0 console=com1 com1=115200,8n1 cc_compiler: gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-11) cc_compile_by : mockbuild cc_compile_domain : crc.id.au cc_compile_date: Thu Jan 1 18:19:30 AEDT 2015 xend_config_format : 4 = xl list == libxl: error: libxl.c:669:libxl_list_domain: getting domain info list: Permission denied libxl_list_domain failed. = xl dmesg == libxl: error: libxl.c:6061:libxl_xen_console_read_line: reading console ring buffer: Permission denied So, this leads me to wonder - as I'm sure MANY people get bitten by this - how to control (at least to shutdown) DomUs after an in-place upgrade? Even if no other functions are implemented other than shutdown, I would call that an acceptable functionality. At least this way, you're not hard killing running VMs on reboot. -- Steven Haigh Email: net...@crc.id.au Web: http://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897 signature.asc Description: OpenPGP digital signature ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Inplace upgrading 4.4.x - 4.5.0
Thanks Stefano, I wanted to make sure I was correct with this belief before I posted all the info. I'll gather as much info as I can in the next day or so and see what we can do. On 9/02/2015 7:35 PM, Stefano Stabellini wrote: Hello Steven, upgrades from Xen 4.4 to 4.5 are supposed to work out of the box. Please post more details and we'll try to help you figure out what's wrong. Cheers, Stefano On Sun, 8 Feb 2015, Steven Haigh wrote: Hi all, I was under the impression that you should be able to do in-place upgrades from Xen 4.4 to 4.5 on a system without losing the ability to manage DomUs... This would support upgrades from running systems from Xen 4.4.x to 4.5.0 - only requiring a reboot to boot into the 4.5.0 hypervisor. When I try this in practice, I get a whole heap of permission denied errors and lose control of any running DomUs. Is there some secret sauce that will allow this to work? -- Steven Haigh Email: net...@crc.id.au Web: http://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897 -- Steven Haigh Email: net...@crc.id.au Web: http://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897 signature.asc Description: OpenPGP digital signature ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] Inplace upgrading 4.4.x - 4.5.0
Hi all, I was under the impression that you should be able to do in-place upgrades from Xen 4.4 to 4.5 on a system without losing the ability to manage DomUs... This would support upgrades from running systems from Xen 4.4.x to 4.5.0 - only requiring a reboot to boot into the 4.5.0 hypervisor. When I try this in practice, I get a whole heap of permission denied errors and lose control of any running DomUs. Is there some secret sauce that will allow this to work? -- Steven Haigh Email: net...@crc.id.au Web: http://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897 signature.asc Description: OpenPGP digital signature ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel