Re: [ovirt-users] New user intro some questions
- Original Message - From: Sandro Bonazzola sbona...@redhat.com To: Sven Kieske s.kie...@mittwald.de, users@ovirt.org, Federico Simoncelli fsimo...@redhat.com Sent: Tuesday, February 10, 2015 4:56:05 PM Subject: Re: [ovirt-users] New user intro some questions Il 10/02/2015 16:49, Sven Kieske ha scritto: On 10/02/15 03:02, Jason Brooks wrote: The meaning of support is important here -- support from whom? It's true that there's no gluster+virt SKU of the RHEV downstream project. All configurations of ovirt proper are self-supported, or community- supported, and what we choose to support is up to us individuals. However, gluster + virt on the same nodes does work -- even w/ management through the engine. I do use gluster on my virt nodes, but I don't manage them w/ the engine, because, afaik, there isn't a way to have gluster and virt on separate networks this way, so I just manage gluster from the gluster cli. It's true, oVirt is happiest w/ separate machines for everything, and a rock-solid san of some sort, etc., but that's not the only way, and as you point out, hardware isn't free. Well you might be interested in these upcoming features: http://www.ovirt.org/Features/Self_Hosted_Engine_Hyper_Converged_Gluster_Support sadly the slides from this talk are not online: https://fosdem.org/2015/schedule/event/hyperconvergence/ maybe brian(cc'ed) can put them somewhere? Or Federico (CCed) :-) The slides of my sessions are available at: http://www.ovirt.org/images/6/6c/2015-ovirt-glusterfs-hyperconvergence.pdf http://www.ovirt.org/images/9/97/2015-docker-ovirt-iaas.pdf -- Federico ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Reclaim/Trim/issue_discards : how to inform my LUNs about disk space waste?
Hi Nicolas, you can find more information on this at: https://bugzilla.redhat.com/show_bug.cgi?id=981626 First of all an important note (that was already mentioned): vdsm is not using lvm.conf, so whatever change you make there it won't affect vdsm behavior. Anyway long story short, enabling issue_discards in lvm would lead to lvm commands starvation when the lv that you're removing is large and the granularity is small. The correct solution is to use blkdiscard on the lv and I happened to submit a patch series for that yesterday: http://gerrit.ovirt.org/#/q/status:open+project:vdsm+branch:master+topic:block-discard,n,z (very much experimental) The approach is to begin with issuing blkdiscard when wipe after delete is selected on the disk. That is because blkdiscard in the majority of the cases will wipe the lv data and I know that someone in the past has been brave enough to try and recover data from a mistakenly removed lv that wasn't post-zeroed. Anyway extending the support to non post-zero is trivial and it's just a matter of agreement and expectations. With regard to the legitimate question of why both post-zero and block discard, the answer is that after discussing it with storage array experts it seems that blkdiscard has no contract in guaranteeing that the data will be blanked out and it could later on show up even on a completely different LUN of the same storage. -- Federico - Original Message - From: Nicolas Ecarnot nico...@ecarnot.net To: Federico Simoncelli fsimo...@redhat.com Cc: users users@ovirt.org Sent: Thursday, November 27, 2014 10:43:06 AM Subject: Re: [ovirt-users] Reclaim/Trim/issue_discards : how to inform my LUNs about disk space waste? Le 22/07/2014 14:23, Federico Simoncelli a écrit : - Original Message - From: Nicolas Ecarnot nico...@ecarnot.net To: users users@ovirt.org Sent: Thursday, July 3, 2014 10:54:57 AM Subject: [ovirt-users] Reclaim/Trim/issue_discards : how to inform my LUNs about disk space waste? Hi, In my hosts, I see that /etc/lvm/lvm.conf shows : issue_discards = 0 Can I enable it to 1 ? Thank you? You can change it but it's not going to affect the lvm behavior in VDSM since we don't use the host lvm config file. Frederico, May you describe a little more how it's done, explain the principle, or point us to a place where we can learn more about how LVM is used in oVirt, amongst the manager and the hosts. Thank you. This will be probably addressed as part of bz1017284 as we're considering to extend discard also to vdsm images (and not direct luns only). https://bugzilla.redhat.com/show_bug.cgi?id=1017284 ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Hosted engine: sending ioctl 5401 to a partition!
- Original Message - From: Chris Adams c...@cmadams.net To: users@ovirt.org Sent: Friday, November 21, 2014 10:28:28 PM Subject: [ovirt-users] Hosted engine: sending ioctl 5401 to a partition! I have set up oVirt with hosted engine, on an iSCSI volume. On both nodes, the kernel logs the following about every 10 seconds: Nov 21 15:27:49 node8 kernel: ovirt-ha-broker: sending ioctl 5401 to a partition! Is this a known bug, something that I need to address, etc.? Is this on centos or fedora? We may have to do some testing to identify where that's coming from. Feel free to ping me: fsimonce (#ovirt on OFTC) so we can check what's going on. -- Federico ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Status libgfapi support in oVirt
- Original Message - From: noc n...@nieuwland.nl To: users@ovirt.org Sent: Friday, November 21, 2014 10:01:30 AM Subject: Re: [ovirt-users] Status libgfapi support in oVirt On 21-11-2014 9:47, noc wrote: The VM doesn't start but that can be caused by my L2 virt OR that the host name = .. port=0 ... is wrong. Shouldn't there be a port in the 24007 or 49152 range? Sorry, forgot to install vdsm-gluster. Starting the VM on an el6 host now works and still the same line with the port=0 so that doesn't seem to matter. OK, back to setting up a el6 host except when you generate a el7 version too which would be awesome 8-) . I updated the packages (rebasing on a newer master) and I provided an el7 build as well: https://fsimonce.fedorapeople.org/vdsm-libgfapi/ These rpms are less tested than the previous ones but the rebase was straight forward. -- Federico ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Status libgfapi support in oVirt
- Original Message - From: noc n...@nieuwland.nl To: users@ovirt.org Sent: Thursday, November 20, 2014 8:46:01 AM Subject: Re: [ovirt-users] Status libgfapi support in oVirt On 19-11-2014 23:44, Darrell Budic wrote: Is there an el7 build of this available too? That would be nice too. Forgot that I updated my test env to el7 to see if that helped. Can test on F20 @home tonight if needed. Gonna try something else first and will let you all know how that went. I've prepared a fedora 20 build as well: https://fsimonce.fedorapeople.org/vdsm-libgfapi/fc20/ -- Federico ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Status libgfapi support in oVirt
- Original Message - From: Joop jvdw...@xs4all.nl To: users@ovirt.org Sent: Monday, November 17, 2014 9:39:36 AM Subject: [ovirt-users] Status libgfapi support in oVirt I have been trying to use libgfapi glusterfs support in oVirt but can't get it to work. After talks on IRC it seems I should apply a patch (http://gerrit.ovirt.org/33768) to enable libgf BUT I can't get it to work. Systems used: - hosts Centos7 or Fedora20 (so upto date qemu/libvirt/oVirt(3.5)) - glusterfs-3.6.1 - vdsm-4.16.0-524.gitbc618a4.el7.x86_64 (snapshot master 14-nov) - vdsm-4.16.7-1.gitdb83943.el7.x86_64 (official ovirt-3.5 vdsm, seems newer than master snapshot?? ) Just adding the patch to vdsm-4.16.7-1.gitdb83943.el7.x86_64 doesn't work, vdsm doesn't start anymore due to an error in virt/vm.py. Q1: what is de exact status of libgf and oVirt. Q2: how do I test that patch? Rebasing and applying patches could be tricky sometimes and if you got an error in virt/vm.py it is most likely because the patch didn't apply cleanly. I prepared a build (el6) here: https://fsimonce.fedorapeople.org/vdsm-libgfapi/ In case you want to try it on fedora you just need to get the source rpm here: https://fsimonce.fedorapeople.org/vdsm-libgfapi/source/ and rebuild it on fedora. Let me know if you have any problem. -- Federico ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Status libgfapi support in oVirt
- Original Message - From: noc n...@nieuwland.nl To: users@ovirt.org Sent: Wednesday, November 19, 2014 9:36:28 AM Subject: Re: [ovirt-users] Status libgfapi support in oVirt On 18-11-2014 20:57, Christopher Young wrote: I'm replying to 'up' this as well as I'm most interested in this. I actually thought this was implemented and working too. On Mon, Nov 17, 2014 at 10:01 AM, Daniel Helgenberger daniel.helgenber...@m-box.de wrote: Hello Joop, thanks for raising the issue as it is one of the things I assumed are already implemented and working. Sadly I cannot provide any answer ... On 17.11.2014 09:39, Joop wrote: I have been trying to use libgfapi glusterfs support in oVirt but can't get it to work. After talks on IRC it seems I should apply a patch ( http://gerrit.ovirt.org/33768 ) to enable libgf BUT I can't get it to work. Systems used: - hosts Centos7 or Fedora20 (so upto date qemu/libvirt/oVirt(3.5)) - glusterfs-3.6.1 - vdsm-4.16.0-524.gitbc618a4.el7.x86_64 (snapshot master 14-nov) - vdsm-4.16.7-1.gitdb83943.el7.x86_64 (official ovirt-3.5 vdsm, seems newer than master snapshot?? ) Just adding the patch to vdsm-4.16.7-1.gitdb83943.el7.x86_64 doesn't work, vdsm doesn't start anymore due to an error in virt/vm.py. Q1: what is de exact status of libgf and oVirt. Q2: how do I test that patch? I experimented a little more and found that if I create a VM in oVirt on a glusterfs storage domain and start it, it won't use libgfapi, BUT if I use virsh on the host where the VM runs and then add a disk the libgfapi way the VM will see the disk and can use it. So the underlying infra is capable of using libgf but oVirt isn't using it. Thats where the patch comes in I think but I can't get it to work. Correct. oVirt up until now didn't use libgfapi because of missing features (e.g. live snapshot). It seems that now all those gaps have been fixed and we're trying to re-enable libgfapi. I just mentioned that I uploaded an el6 build here: https://fsimonce.fedorapeople.org/vdsm-libgfapi/ and sources here (to rebuild on fedora): https://fsimonce.fedorapeople.org/vdsm-libgfapi/source/ Let me know if the most of you are using fedora and I'll make a build on fedora as well. Please let me know how it goes. Thanks, -- Federico ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] [QE] oVirt 3.5.1 status
- Original Message - From: Sven Kieske svenkie...@gmail.com To: users@ovirt.org Sent: Thursday, October 23, 2014 6:42:11 PM Subject: Re: [ovirt-users] [QE] oVirt 3.5.1 status -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi, please consider: https://bugzilla.redhat.com/show_bug.cgi?id=1156115 as a backport to 3.5.1 I don't know if you plan to release a new vdsm version though and I also don't know if this patch is already matured enough, if I can test or help make this patch better, please let me know, as I'm very interested in getting it into this release. Hi Sven, first of all we need to merge the patch in master and yes, as you suggested having some help would speed up the process. The patch affects the move/copy of images with (one or more) snapshots. I already briefly tested the patch with qemu-img from rhel 6 so we need to cover other platforms (fedora and centos) and test: - cold/live move of disks from one storage to another (both nfs/iscsi and cold move from nfs to iscsi and backward) - export vms (with snapshots) to export domain and re-import it (both nfs and iscsi) Once the patch is in master and we have a feedback on how stable it is we may consider it for backporting (maybe 3.5.2 or 3.5.3 if ever). Thanks, -- Federico ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] oVirt 3.2 - iSCSI offload (broadcom - bnx2i)
- Original Message - From: Ricardo Esteves ricardo.m.este...@gmail.com To: Federico Simoncelli fsimo...@redhat.com Cc: users@ovirt.org Sent: Wednesday, October 8, 2014 1:32:51 AM Subject: Re: [ovirt-users] oVirt 3.2 - iSCSI offload (broadcom - bnx2i) Hi, here it goes: ethtool -i eth3 driver: bnx2 version: 2.2.4g firmware-version: bc 5.2.3 bus-info: :06:00.1 supports-statistics: yes supports-test: yes supports-eeprom-access: yes supports-register-dump: yes supports-priv-flags: no Thanks, can you also add the output of: # lspci -nn I'd have expected the driver to be bnx2i (bnx2 is a regular ethernet driver, no offloading). Can you also check if you have the bnx2i driver loaded? # lsmod | grep bnx2 and eventually if there's any bnx2 related message in /var/log/messages Check also the adapter bios (at boot time) if by any chance you have to enable the offloading there first (check the specific manual too). -- Federico ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] oVirt 3.2 - iSCSI offload (broadcom - bnx2i)
- Original Message - From: Ricardo Esteves ricardo.m.este...@gmail.com To: Federico Simoncelli fsimo...@redhat.com Cc: users@ovirt.org Sent: Tuesday, October 7, 2014 8:44:19 PM Subject: Re: [ovirt-users] oVirt 3.2 - iSCSI offload (broadcom - bnx2i) cat /var/lib/iscsi/ifaces/eth3 # BEGIN RECORD 6.2.0-873.10.el6 iface.iscsi_ifacename = eth3 iface.transport_name = tcp iface.vlan_id = 0 iface.vlan_priority = 0 iface.iface_num = 0 iface.mtu = 0 iface.port = 0 # END RECORD Is there anyway to tell ovirt to use bnx2i instead of tcp? Hi Ricardo, can you paste the output of: # ethtool -i eth3 Thanks, -- Federico ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] [ovirt-devel] Building vdsm within Fedora
- Original Message - From: Dan Kenigsberg dan...@redhat.com To: Sandro Bonazzola sbona...@redhat.com Cc: crobi...@redhat.com, users users@ovirt.org, de...@ovirt.org Sent: Thursday, September 25, 2014 3:06:01 PM Subject: Re: [ovirt-devel] [ovirt-users] Building vdsm within Fedora On Wed, Sep 24, 2014 at 10:57:21AM +0200, Sandro Bonazzola wrote: Il 24/09/2014 09:44, Sven Kieske ha scritto: On 24/09/14 09:13, Federico Simoncelli wrote: You probably missed the first part we were using qemu-kvm/qemu-img in the spec file. In that case you won't fail in any requirement. Basically the question is: was there any problem on centos6 before committing http://gerrit.ovirt.org/31214 ? Federico: as we checked a few minutes ago, it seems there's no problem in requiring qemu-kvm/qemu-img in the spec file. Only issue is that if non rhev version is installed a manual yum update is required for moving to the rhevm version. Right. Without the patch, RPM does not enforce qemu-kvm-rhev. So our code has to check for qemu-kvm-rhev functionality, instead of knowing that it is there. Furthermore, we had several reports of users finding themselves without qemu-kvm-rhev on their node, and not understanding why they do not have live merge. Live merge? The biggest problem with live merge is libvirt not qemu. Anyway the qemu-kvm/qemu-kvm-rhev problem is relevant only for centos and centos has a specific way to address these special needs: http://www.centos.org/variants/ A CentOS variant is a special edition of CentOS Linux that starts with the core distribution, then replaces or supplements a specific subset of packages. This may include replacing everything down to the kernel, networking, and other subsystems. I think the plan was to have our own centos variant (shipping qemu-kvm-rhev). I remember Doron participated to the centos meetings but I don't remember the outcome. -- Federico ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Building vdsm within Fedora
- Original Message - From: Dan Kenigsberg dan...@redhat.com To: Sandro Bonazzola sbona...@redhat.com, de...@ovirt.org, fsimo...@redhat.com, dougsl...@redhat.com Cc: Sven Kieske s.kie...@mittwald.de, users users@ovirt.org Sent: Tuesday, September 23, 2014 11:21:18 PM Subject: Building vdsm within Fedora Since Vdsm was open-sourced, it was built and deployed via Fedora. Recently [http://gerrit.ovirt.org/31214] vdsm introduced a spec-file dependency onf qemu-kvm-rhev, and considered to backport it to the ovirt-3.4 brach. Requiring qemu-kvm-rhev, which is not part of Fedora's EPEL6 branch, violates Fedora's standards. So basically we have two options: 1. Revert the qemu-kvm-rhev dependency. 2. Drop vdsm from EPEL6 (or completely from Fedora); ship Vdsm only within the oVirt repositories. A third option would be to have one rpm, with qemu-kvm-rhev, shipped in ovirt, and another without it - shipped in Fedora. I find this overly complex and confusing. I think that until now (centos6) we were using qemu-kvm/qemu-img in the spec file and then the ovirt repository was distributing qemu-*-rhev from: http://resources.ovirt.org/pub/ovirt-3.4-snapshot/rpm/el6/x86_64/ It this not possible with centos7? Any problem with that? I find being in fedora a way to keep the spec file and the rpm updated and as clean as possible. -- Federico ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] [ovirt-devel] Building vdsm within Fedora
- Original Message - From: Sven Kieske s.kie...@mittwald.de To: de...@ovirt.org, users users@ovirt.org Sent: Wednesday, September 24, 2014 9:44:17 AM Subject: Re: [ovirt-devel] Building vdsm within Fedora On 24/09/14 09:13, Federico Simoncelli wrote: You probably missed the first part we were using qemu-kvm/qemu-img in the spec file. In that case you won't fail in any requirement. Basically the question is: was there any problem on centos6 before committing http://gerrit.ovirt.org/31214 ? Of course there was a problem, please follow the link in this very commit to the according bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1127763 In short: you can not use live snapshots without this updated spec file. And it's a PITA to install this package by hand, you must track it's versions yourself etc pp. you basically lose all the stuff a proper spec file gives you. As soon as you have the ovirt repository installed there shouldn't be any reason for you to have any of these problems. Sandro, is there any reason why the rpm available here: http://resources.ovirt.org/pub/ovirt-3.4/rpm/el6/x86_64/ are not published here? http://resources.ovirt.org/releases/3.4/rpm/el6/x86_64/ Is there any additional repository (that provides qemu-*-rhev) that we are missing from the ovirt.repo file? -- Federico ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] [ovirt-devel] Building vdsm within Fedora
- Original Message - From: Sandro Bonazzola sbona...@redhat.com To: Federico Simoncelli fsimo...@redhat.com Cc: de...@ovirt.org, users users@ovirt.org, Sven Kieske s.kie...@mittwald.de Sent: Wednesday, September 24, 2014 11:01:35 AM Subject: Re: [ovirt-devel] Building vdsm within Fedora Il 24/09/2014 10:35, Federico Simoncelli ha scritto: - Original Message - From: Sven Kieske s.kie...@mittwald.de To: de...@ovirt.org, users users@ovirt.org Sent: Wednesday, September 24, 2014 9:44:17 AM Subject: Re: [ovirt-devel] Building vdsm within Fedora On 24/09/14 09:13, Federico Simoncelli wrote: You probably missed the first part we were using qemu-kvm/qemu-img in the spec file. In that case you won't fail in any requirement. Basically the question is: was there any problem on centos6 before committing http://gerrit.ovirt.org/31214 ? Of course there was a problem, please follow the link in this very commit to the according bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1127763 In short: you can not use live snapshots without this updated spec file. And it's a PITA to install this package by hand, you must track it's versions yourself etc pp. you basically lose all the stuff a proper spec file gives you. As soon as you have the ovirt repository installed there shouldn't be any reason for you to have any of these problems. Sandro, is there any reason why the rpm available here: http://resources.ovirt.org/pub/ovirt-3.4/rpm/el6/x86_64/ are not published here? http://resources.ovirt.org/releases/3.4/rpm/el6/x86_64/ this second link points to the previous layout, abandoned since we moved from /releases to /pub. /releases is still around for historical purpose, I think we should consider to drop it at some point avoinding confusion or renaming it to something that make it clear that it shouldn't be used anymore. Sven can you let us know if you still have any problem using: http://resources.ovirt.org/pub/yum-repo/ovirt-release34.rpm (which should contain the correct ovirt.repo) Thanks, -- Federico ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Problem Refreshing/Using ovirt-image-repository / 3.5 RC2
- Original Message - From: Oved Ourfali ov...@redhat.com To: j...@internetx.com, Federico Simoncelli fsimo...@redhat.com Cc: users@ovirt.org, Allon Mureinik amure...@redhat.com Sent: Tuesday, September 23, 2014 9:56:28 AM Subject: Re: [ovirt-users] Problem Refreshing/Using ovirt-image-repository / 3.5 RC2 - Original Message - From: InterNetX - Juergen Gotteswinter j...@internetx.com To: users@ovirt.org Sent: Tuesday, September 23, 2014 10:41:41 AM Subject: Re: [ovirt-users] Problem Refreshing/Using ovirt-image-repository / 3.5 RC2 Am 23.09.2014 um 09:32 schrieb Oved Ourfali: - Original Message - From: InterNetX - Juergen Gotteswinter j...@internetx.com To: users@ovirt.org Sent: Tuesday, September 23, 2014 10:29:07 AM Subject: [ovirt-users] Problem Refreshing/Using ovirt-image-repository / 3.5 RC2 Hi, when trying to refresh the ovirt glance repository i get a 500 Error Message Operation Canceled Error while executing action: A Request to the Server failed with the following Status Code: 500 engine.log says: 2014-09-23 09:23:08,960 INFO [org.ovirt.engine.core.bll.provider.TestProviderConnectivityCommand] (ajp--127.0.0.1-8702-10) [7fffb4bd] Running command: TestProviderConnectivityCommand internal: false. Entities affected : ID: aaa0----123456789aaa Type: SystemAction group CREATE_STORAGE_POOL with role type ADMIN 2014-09-23 09:23:08,975 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ajp--127.0.0.1-8702-10) [7fffb4bd] Correlation ID: 7fffb4bd, Call Stack: null, Custom Event ID: -1, Message: Unrecognized audit log type has been used. 2014-09-23 09:23:20,173 INFO [org.ovirt.engine.core.bll.aaa.LogoutUserCommand] (ajp--127.0.0.1-8702-11) [712895c3] Running command: LogoutUserCommand internal: false. 2014-09-23 09:23:20,184 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ajp--127.0.0.1-8702-11) [712895c3] Correlation ID: 712895c3, Call Stack: null, Custom Event ID: -1, Message: User admin logged out. 2014-09-23 09:23:20,262 INFO [org.ovirt.engine.core.bll.aaa.LoginAdminUserCommand] (ajp--127.0.0.1-8702-6) Running command: LoginAdminUserCommand internal: false. All these message are good... no error here. Can you attach the full engine log? imho there is nothing else related to this :/ i attached the log starting from today. except firing up a test vm nothing else happened yet (and several tries refreshing the image repo) I don't see a refresh attempt in the log, but i'm not familiar enough with that. Federico - can you have a look? I don't see any reference to glance or error 500 in the logs. My impression is that the error 500 is between the ui and the engine... have you tried to force-refresh the ovirt webadmin page? You can try and use the rest-api to check if the listing is working there. -- Federico ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
[ovirt-users] Fedora 21 Test Day
FYI, in a couple of days it will be the Fedora 21 virtualization test day: http://fedoramagazine.org/5tftw-2014-09-02/ https://fedoraproject.org/wiki/Test_Day:2014-09-25_Virtualization it's a good opportunity for us to check for regressions with vdsm (qemu, libvirt, virt-tools), get more attention from the fedora community, have quicker feedback on issues and get them fixed before release. -- Federico ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Can I use qcow2?
- Original Message - From: Itamar Heim ih...@redhat.com To: Demeter Tibor tdeme...@itsmart.hu Cc: users@ovirt.org, Allon Mureinik amure...@redhat.com, Federico Simoncelli fsimo...@redhat.com Sent: Wednesday, September 3, 2014 12:50:30 PM Subject: Re: [ovirt-users] Can I use qcow2? On 09/03/2014 11:14 AM, Demeter Tibor wrote: On shared glusterfs. Allon/Federico - I remember on NFS, qcow2 isn't used by default, since raw is sparse by default. (but i don't remember if it won't work, or just not enabled by default). can one create a qcow2 disk for a VM with gluster storage? Yes through rest-api. From the ovirt-shell you could run: $ add disk \ --vm-identifier vm_name \ --provisioned_size size_in_bytes \ --interface virtio \ --name disk_name \ --format cow \ --sparse true \ --storage_domains-storage_domain storage_domain.name=gluster_domain_name -- Federico ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] feature review - ReportGuestDisksLogicalDeviceName
- Original Message - From: Dan Kenigsberg dan...@redhat.com To: Liron Aravot lara...@redhat.com Cc: users@ovirt.org, de...@ovirt.org, smizr...@redhat.com, fsimo...@redhat.com, Michal Skrivanek mskri...@redhat.com, Vinzenz Feenstra vfeen...@redhat.com, Allon Mureinik amure...@redhat.com Sent: Monday, September 1, 2014 11:23:45 PM Subject: Re: feature review - ReportGuestDisksLogicalDeviceName On Sun, Aug 31, 2014 at 07:20:04AM -0400, Liron Aravot wrote: Feel free to review the the following feature. http://www.ovirt.org/Features/ReportGuestDisksLogicalDeviceName Thanks for posting this feature page. Two things worry me about this feature. The first is timing. It is not reasonable to suggest an API change, and expect it to get to ovirt-3.5.0. We are two late anyway. The other one is the suggested API. You suggest placing volatile and optional infomation in getVMList. It won't be the first time that we have it (guestIPs, guestFQDN, clientIP, and displayIP are there) but it's foreign to the notion of conf reported by getVMList() - the set of parameters needed to recreate the VM. At first sight this seems something belonging to getVmStats (which is reporting already other guest agent information). -- Federico ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] 答复: Error after changing IP of Node (FQDN is still the same)
What's the version of the vdsm and sanlock packages? Can you please share the logs on the host side? We need vdsm.log and sanlock.log containing the relevant errors (Cannot acquire host id). Thanks, -- Federico - Original Message - From: ml ml mliebher...@googlemail.com To: d...@redhat.com Cc: users@ovirt.org Users@ovirt.org Sent: Sunday, August 3, 2014 8:57:18 PM Subject: Re: [ovirt-users]答复: Error after changing IP of Node (FQDN is still the same) ok, i now removed the nodes and added them again. Same FQDN. I still get Start SPM Task failed - result: cleanSuccess, message: VDSGenericException: VDSErrorException: Failed to HSMGetTaskStatusVDS, error = Cannot acquire host id, code = 661 I also got some error. Now whats the deal with that host id? Can somone please point me to the way how to debug this instead of pressing some remove and add buttons? Does someone really know how ovirt works under the hood? ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] SPM in oVirt 3.6
- Original Message - From: Nir Soffer nsof...@redhat.com To: Daniel Helgenberger daniel.helgenber...@m-box.de Cc: users@ovirt.org, Federico Simoncelli fsimo...@redhat.com Sent: Monday, July 28, 2014 6:43:30 PM Subject: Re: [ovirt-users] SPM in oVirt 3.6 - Original Message - From: Daniel Helgenberger daniel.helgenber...@m-box.de To: users@ovirt.org Sent: Friday, July 25, 2014 7:51:33 PM Subject: [ovirt-users] SPM in oVirt 3.6 just out of pure curiosity: In a BZ [1] Allon mentions SPM will go away in ovirt 3.6. This seems like a major change for me. I assume this will replace sanlock as well? What will SPM be replaced with? No, sanlock is not going anywhere. The change is that we will not have an SPM node, but any node that need to make meta data changes, will take a lock using sanlock while it make the changes. Federico: can you describe in more details how it is going to work? Most of the information can be found on the feature page: http://www.ovirt.org/Features/Decommission_Master_Domain_and_SPM -- Federico ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Reclaim/Trim/issue_discards : how to inform my LUNs about disk space waste?
- Original Message - From: Nicolas Ecarnot nico...@ecarnot.net To: users users@ovirt.org Sent: Thursday, July 3, 2014 10:54:57 AM Subject: [ovirt-users] Reclaim/Trim/issue_discards : how to inform my LUNs about disk space waste? Hi, In my hosts, I see that /etc/lvm/lvm.conf shows : issue_discards = 0 Can I enable it to 1 ? Thank you? You can change it but it's not going to affect the lvm behavior in VDSM since we don't use the host lvm config file. This will be probably addressed as part of bz1017284 as we're considering to extend discard also to vdsm images (and not direct luns only). https://bugzilla.redhat.com/show_bug.cgi?id=1017284 -- Federico ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Can HA Agent control NFS Mount?
- Original Message - From: Bob Doolittle b...@doolittle.us.com To: Doron Fediuck dfedi...@redhat.com, Andrew Lau and...@andrewklau.com Cc: users users@ovirt.org, Federico Simoncelli fsimo...@redhat.com Sent: Saturday, June 14, 2014 1:29:54 AM Subject: Re: [ovirt-users] Can HA Agent control NFS Mount? But there may be more going on. Even if I stop vdsmd, the HA services, and libvirtd, and sleep 60 seconds, I still see a lock held on the Engine VM storage: daemon 6f3af037-d05e-4ad8-a53c-61627e0c2464.xion2.smar p -1 helper p -1 listener p -1 status s 003510e8-966a-47e6-a5eb-3b5c8a6070a9:1:/rhev/data-center/mnt/xion2.smartcity.net\:_export_VM__NewDataDomain/003510e8-966a-47e6-a5eb-3b5c8a6070a9/dom_md/ids:0 s hosted-engine:1:/rhev/data-center/mnt/xion2\:_export_vm_he1/18eeab54-e482-497f-b096-11f8a43f94f4/ha_agent/hosted-engine.lockspace:0 This output shows that the lockspaces are still acquired. When you put hosted-engine in maintenance they must be released. One by directly using rem_lockspace (since it's the hosted-engine one) and the other one by stopMonitoringDomain. I quickly looked at the ovirt-hosted-engine* projects and I haven't found anything related to that. -- Federico ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] sanlock + gluster recovery -- RFE
- Original Message - From: Ted Miller tmil...@hcjb.org To: users users@ovirt.org Sent: Tuesday, May 20, 2014 11:31:42 PM Subject: [ovirt-users] sanlock + gluster recovery -- RFE As you are aware, there is an ongoing split-brain problem with running sanlock on replicated gluster storage. Personally, I believe that this is the 5th time that I have been bitten by this sanlock+gluster problem. I believe that the following are true (if not, my entire request is probably off base). * ovirt uses sanlock in such a way that when the sanlock storage is on a replicated gluster file system, very small storage disruptions can result in a gluster split-brain on the sanlock space Although this is possible (at the moment) we are working hard to avoid it. The hardest part here is to ensure that the gluster volume is properly configured. The suggested configuration for a volume to be used with ovirt is: Volume Name: (...) Type: Replicate Volume ID: (...) Status: Started Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: (...three bricks...) Options Reconfigured: network.ping-timeout: 10 cluster.quorum-type: auto The two options ping-timeout and quorum-type are really important. You would also need a build where this bug is fixed in order to avoid any chance of a split-brain: https://bugzilla.redhat.com/show_bug.cgi?id=1066996 How did I get into this mess? ... What I would like to see in ovirt to help me (and others like me). Alternates listed in order from most desirable (automatic) to least desirable (set of commands to type, with lots of variables to figure out). The real solution is to avoid the split-brain altogether. At the moment it seems that using the suggested configurations and the bug fix we shouldn't hit a split-brain. 1. automagic recovery 2. recovery subcommand 3. script 4. commands I think that the commands to resolve a split-brain should be documented. I just started a page here: http://www.ovirt.org/Gluster_Storage_Domain_Reference Could you add your documentation there? Thanks! -- Federico ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] sanlock + gluster recovery -- RFE
- Original Message - From: Giuseppe Ragusa giuseppe.rag...@hotmail.com To: fsimo...@redhat.com Cc: users@ovirt.org Sent: Wednesday, May 21, 2014 5:15:30 PM Subject: sanlock + gluster recovery -- RFE Hi, - Original Message - From: Ted Miller tmiller at hcjb.org To: users users at ovirt.org Sent: Tuesday, May 20, 2014 11:31:42 PM Subject: [ovirt-users] sanlock + gluster recovery -- RFE As you are aware, there is an ongoing split-brain problem with running sanlock on replicated gluster storage. Personally, I believe that this is the 5th time that I have been bitten by this sanlock+gluster problem. I believe that the following are true (if not, my entire request is probably off base). * ovirt uses sanlock in such a way that when the sanlock storage is on a replicated gluster file system, very small storage disruptions can result in a gluster split-brain on the sanlock space Although this is possible (at the moment) we are working hard to avoid it. The hardest part here is to ensure that the gluster volume is properly configured. The suggested configuration for a volume to be used with ovirt is: Volume Name: (...) Type: Replicate Volume ID: (...) Status: Started Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: (...three bricks...) Options Reconfigured: network.ping-timeout: 10 cluster.quorum-type: auto The two options ping-timeout and quorum-type are really important. You would also need a build where this bug is fixed in order to avoid any chance of a split-brain: https://bugzilla.redhat.com/show_bug.cgi?id=1066996 It seems that the aforementioned bug is peculiar to 3-bricks setups. I understand that a 3-bricks setup can allow proper quorum formation without resorting to first-configured-brick-has-more-weight convention used with only 2 bricks and quorum auto (which makes one node special, so not properly any-single-fault tolerant). Correct. But, since we are on ovirt-users, is there a similar suggested configuration for a 2-hosts setup oVirt+GlusterFS with oVirt-side power management properly configured and tested-working? I mean a configuration where any host can go south and oVirt (through the other one) fences it (forcibly powering it off with confirmation from IPMI or similar) then restarts HA-marked vms that were running there, all the while keeping the underlying GlusterFS-based storage domains responsive and readable/writeable (maybe apart from a lapse between detected other-node unresposiveness and confirmed fencing)? We already had a discussion with gluster asking if it was possible to add fencing to the replica 2 quorum/consistency mechanism. The idea is that as soon as you can't replicate a write you have to freeze all IO until either the connection is re-established or you know that the other host has been killed. Adding Vijay. -- Federico ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] oVirt 3.2 - iSCSI offload (broadcom - bnx2i)
- Original Message - From: Ricardo Esteves ricardo.m.este...@gmail.com To: Federico Simoncelli fsimo...@redhat.com Cc: users@ovirt.org Sent: Wednesday, May 14, 2014 1:45:53 AM Subject: RE: [ovirt-users] oVirt 3.2 - iSCSI offload (broadcom - bnx2i) In attachment follows the defaults and the modified versions of the nodes files. If selecting the relevant host in Hosts you see the bnx2i interface in the Network Interfaces subtab, then you can try to: 1. create a new network in the Network tab, VM network checkbox should be disabled 2. select the relevant host in Hosts and use Setup Host Networks in the Network Interfaces subtab 3. configure the bnx2i interface and assign it to the new network you just created 4. in iSCSI Multipathing subtab of tab Data Center add a new entry where you bind the iscsi connection to the new network you created Ping me on IRC if you need more help. My nick is fsimonce on #ovirt -- Federico -Original Message- From: Federico Simoncelli [mailto:fsimo...@redhat.com] Sent: terça-feira, 13 de Maio de 2014 08:58 To: Ricardo Esteves Cc: users@ovirt.org Subject: Re: [ovirt-users] oVirt 3.2 - iSCSI offload (broadcom - bnx2i) - Original Message - From: Ricardo Esteves ricardo.m.este...@gmail.com To: users@ovirt.org Sent: Friday, April 11, 2014 1:07:31 AM Subject: [ovirt-users] oVirt 3.2 - iSCSI offload (broadcom - bnx2i) Hi, I've put my host on maintenance, then i configured iscsi offload for my broadcom cards changing target's file's (192.168.12.2,3260 and 192.168.12.4,3260) in my node iqn.1986-03.com.hp:storage.msa2324i.1226151a6 to use interface bnx2i.d8:d3:85:67:e3:bb, but after activating the host, configurations are back to default. Can you share the changes you made? Thanks. ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] oVirt 3.2 - iSCSI offload (broadcom - bnx2i)
- Original Message - From: Federico Simoncelli fsimo...@redhat.com To: Ricardo Esteves ricardo.m.este...@gmail.com Cc: users@ovirt.org Sent: Wednesday, May 14, 2014 3:47:58 PM Subject: Re: [ovirt-users] oVirt 3.2 - iSCSI offload (broadcom - bnx2i) - Original Message - From: Ricardo Esteves ricardo.m.este...@gmail.com To: Federico Simoncelli fsimo...@redhat.com Cc: users@ovirt.org Sent: Wednesday, May 14, 2014 1:45:53 AM Subject: RE: [ovirt-users] oVirt 3.2 - iSCSI offload (broadcom - bnx2i) In attachment follows the defaults and the modified versions of the nodes files. If selecting the relevant host in Hosts you see the bnx2i interface in the Network Interfaces subtab, then you can try to: Sorry I just noticed that you mentioned in the subject that you're using oVirt 3.2. What I suggested is available only since oVirt 3.4. -- Federico 1. create a new network in the Network tab, VM network checkbox should be disabled 2. select the relevant host in Hosts and use Setup Host Networks in the Network Interfaces subtab 3. configure the bnx2i interface and assign it to the new network you just created 4. in iSCSI Multipathing subtab of tab Data Center add a new entry where you bind the iscsi connection to the new network you created Ping me on IRC if you need more help. My nick is fsimonce on #ovirt -- Federico ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] oVirt 3.2 - iSCSI offload (broadcom - bnx2i)
- Original Message - From: Ricardo Esteves ricardo.m.este...@gmail.com To: users@ovirt.org Sent: Friday, April 11, 2014 1:07:31 AM Subject: [ovirt-users] oVirt 3.2 - iSCSI offload (broadcom - bnx2i) Hi, I've put my host on maintenance, then i configured iscsi offload for my broadcom cards changing target's file's (192.168.12.2,3260 and 192.168.12.4,3260) in my node iqn.1986-03.com.hp:storage.msa2324i.1226151a6 to use interface bnx2i.d8:d3:85:67:e3:bb, but after activating the host, configurations are back to default. Can you share the changes you made? Thanks. -- Federico ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Snapshot removal
- Original Message - From: Dafna Ron d...@redhat.com To: users@ovirt.org Sent: Thursday, April 17, 2014 12:52:04 PM Subject: Re: [ovirt-users] Snapshot removal NFS is wipe_after_delete=true always. so delete of snapshot will merge the data to upper level image + zero in on the data which is why this is taking a long time. I double checked this and I was very much surprised by this finding! We need a bz right away, Dafna can file it? Thanks! -- Federico ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Sanlock log entry
- Original Message - From: Maurice James mja...@media-node.com To: users@ovirt.org Sent: Saturday, April 12, 2014 3:06:37 PM Subject: [ovirt-users] Sanlock log entry I'm seeing this about every 20 seconds in my sanlock.log.What does it mean? s1:r1694 resource a033c2ac-0d01-490c-9552-99ca53d6a64a:SDM:/rhev/data-center/mnt/ashtivh02.suprtekstic.com:_var_lib_exports_storage/a033c2ac-0d01-490c-9552-99ca53d6a64a/dom_md/leases:1048576 for 3,14,3960 Can you provide more info? Anything weird in the VDSM logs? (errors/warnings) Anything in the engine logs? That message may be related to acquiring the SPM role but it shouldn't happen so often. -- Federico ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Error creating Disks
- Original Message - From: Maurice James mja...@media-node.com To: d...@redhat.com Cc: users@ovirt.org Sent: Tuesday, April 15, 2014 6:54:11 PM Subject: Re: [ovirt-users] Error creating Disks Logs are attached. Live Migration failed In the engine logs I see: 2014-04-15 12:51:07,420 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SnapshotVDSCommand] (org.ovirt.thread.pool-6-thread-42) START, SnapshotVDSCommand(HostName = vhost3, HostId = bc9c25e6-714e-4eac-8af0-860ac76fd195, vmId=ba49605b-fb7e-4a70-a380-6286d3903e50), log id: e953f9a ... 2014-04-15 12:51:07,496 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.SnapshotVDSCommand] (org.ovirt.thread.pool-6-thread-42) Command SnapshotVDSCommand(HostName = vhost3, HostId = bc9c25e6-714e-4eac-8af0-860ac76fd195, vmId=ba49605b-fb7e-4a70-a380-6286d3903e50) execution failed. Exception: VDSErrorException: VDSGenericException: VDSErrorException: Failed to SnapshotVDS, error = Snapshot failed, code = 48 but I can't find anything related to that in the vdsm log. Are you sure you attached the correct vdsm log? Are the hosts running centos or fedora? -- Federico ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [Users] oVirt 3.5 planning
- Original Message - From: Itamar Heim ih...@redhat.com To: Federico Alberto Sayd fs...@uncu.edu.ar, users@ovirt.org, Federico Simoncelli fsimo...@redhat.com Sent: Thursday, March 20, 2014 6:47:14 PM Subject: Re: [Users] oVirt 3.5 planning On 03/20/2014 07:30 PM, Federico Alberto Sayd wrote: 3 - Another question, when you convert a VM to template I see that it is created with preallocated disk even if the original VM had thinly provisioned disk. Is there no way to make the template with the same type of disk (thinly provisioned)?? on NFS - it doesn't matter. on block storage - i don't remember why. maybe federico remembers. It was for performance reasons since on block it would result in a qcow2 image. We wanted the access to the template to be as fast as possible. The same question was brought up few months ago when we discussed importing images from glance as templates. Anyway in that case we decided (for simplicity) to allow the import as template also for qcow2 images. -- Federico ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [Users] GSoC 14 Idea Discussion - virt-sparsify integration
- Original Message - From: Utkarsh Singh utkarshs...@gmail.com To: users@ovirt.org Cc: fsimo...@redhat.com Sent: Friday, March 7, 2014 6:16:46 PM Subject: GSoC 14 Idea Discussion - virt-sparsify integration Hello, I am Utkarsh, a 4th year undergrad from IIT Delhi and a GSoC-14 aspirant. I have been lately involved in an ongoing project Baadal Cloud Computing Platform in my institute, which has got me interested in oVirt for a potential GSoC project. I was going through the virt-sparsify integration project idea. I have gone through the architecture documentation on the oVirt website. As far as I understand, the virt-sparsify integration needs to be done on the VDSM daemon, and it's control is either going to be completely independent of ovirt-engine (for example running it once every 24 hours), or it's something that is centrally controlled by the ovirt-engine through XML/RPC calls. The details are not specified in the project ideas page. I would like to ask - The request to sparsifying the image is controlled by ovirt-engine. The user will pick one (or eventually more) disk(s) that are not in use (vm down) and he'll request to sparsify it/them. 1. What would be the proposed ideal implementation? (Central-Control or Independent-Control) Central-Control 2. Is virt-sparsify usage going to be automated or administrator-triggered, or a combination of both? administrator-triggered There are some aspects of the idea, which I would like to discuss before I start working on a proposal. It's not necessary that an automated usage of virt-sparsify is limited to any simple idea. Architecture documentation states that ovirt-engine has features like Monitoring that would allow administrators (and possibly users) to be aware of vm-guest performance as well as vm-host performance. I am not very sure about how this data is collected, Is it done through MoM, or Is this directly done by VDSM, or is someone else doing this (for hosts). It would be great if someone can explain that to me. This information about vm-guest usage and vm-host health can help in determining how virt-sparsify is to be used. The vm/hosts statistics are gathered and provided by VDSM. Anyway I would leave this part out at the moment. The virt-sparsify command is a long running task and in the current architecture it can be only an SPM task. There is some ongoing work to remove the pool and the SPM (making virt-sparsify operable by any host) but I wouldn't block on that. I am also not very clear about the Shared Storage component in the architecture. Does oVirt make any assumptions about the Shared Storage. For example, the performance difference between running virt-sparsify on NFS as compared to running it (if possible) directly on storage hardware. If the Storage solution is necessarily a NAS instance, then virt-sparsify on NFS mount is the only option. The storage connections are already managed by vdsm and the volume chains are maintained transparently in /rhev/data-center/... There are few differences between image files on NFS/Gluster and images stored on LVs but with regard to how to reach them it is transparent (similar path). Right now, I am in the process of setting up oVirt on my system, and getting more familiar with the architecture. Regarding my experience. I am acquainted with both Java and Python. I have little experience with JBoss, but I have worked on some other Web Application Servers like web2py and Play Framework. My involvement in Baadal Platform has got me acquainted with libvirt/QEMU, the details of which I have mentioned below (if anyone is interested). Depending on the amount of time that you can dedicate to this project it seems that you could tackle both the vdsm and ovirt-engine parts. -- Federico ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [Users] Volume Group does not exist. Blame device-mapper ?
- Original Message - From: Nicolas Ecarnot nico...@ecarnot.net To: Federico Simoncelli fsimo...@redhat.com Cc: users users@ovirt.org Sent: Monday, February 17, 2014 10:14:56 AM Subject: Re: [Users] Volume Group does not exist. Blame device-mapper ? Le 14/02/2014 15:39, Federico Simoncelli a écrit : Hi Nicolas, are you still able to reproduce this issue? Are you using fedora or centos? If providing the logs is problematic for you could you try to ping me on irc (fsimonce on #ovirt OFTC) so that we can work on the issue together? Thanks, Hi Frederico, Since I haven't changed anything related to the SAN or the network, I'm pretty sure I'll be able to reproduce the bug. We are using CentOS. I can provide the logs, no issue. This week, our oVirt setup will be strongly used, so this is not the better time to play with it. I'm very thankful you took the time to answer, but may I delay my answer about this bug to next week? Ok, no problem. Feel free to contact me on IRC when you start testing. -- Federico ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [Users] Volume Group does not exist. Blame device-mapper ?
Hi Nicolas, are you still able to reproduce this issue? Are you using fedora or centos? If providing the logs is problematic for you could you try to ping me on irc (fsimonce on #ovirt OFTC) so that we can work on the issue together? Thanks, -- Federico - Original Message - From: Nicolas Ecarnot nico...@ecarnot.net To: users users@ovirt.org Sent: Monday, January 20, 2014 11:06:21 AM Subject: [Users] Volume Group does not exist. Blame device-mapper ? Hi, oVirt 3.3, no big issue since the recent snapshot joke, but all in all running fine. All my VM are stored in a iSCSI SAN. The VM usually are using only one or two disks (1: system, 2: data) and it is OK. Friday, I created a new LUN. Inside a VM, I linked to it via iscsiadm and successfully login to the Lun (session, automatic attach on boot, read, write) : nice. Then after detaching it and shuting down the MV, and for the first time, I tried to make use of the feature direct attach to attach the disk directly from oVirt, login the session via oVirt. I connected nice and I saw the disk appear in my VM as /dev/sda or whatever. I was able to mount it, read and write. Then disaster stoke all this : many nodes suddenly began to become unresponsive, quickly migrating their VM to the remaining nodes. Hopefully, the migrations ran fine and I lost no VM nor downtime, but I had to reboot every concerned node (other actions failed). In the failing nodes, /var/log/messages showed the log you can read in the end of this message. I first get device-mapper warnings, then the host unable to collaborate with the logical volumes. The 3 volumes are the three main storage domains, perfectly up and running where I store my oVirt VMs. My reflexions : - I'm not sure device-mapper is to blame. I frequently see device mapper complaining and nothing is getting worse (not oVirt specifically) - I have not change my network settings for months (bonding, linking...) The only new factor is the usage of direct attach LUN. - This morning I was able to reproduce the bug, just by trying again this attachement, and booting the VM. No mounting of the LUN, just VM booting, waiting, and this is enough to crash oVirt. - when the disaster happens, usually, amongst the nodes, only three nodes gets stroke, the only one that run VMs. Obviously, after migration, different nodes are hosting the VMs, and those new nodes are the one that then get stroke. This is quite reproductible. And frightening. ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
[Users] Google Summer of Code 2014
Hi everyone, I started a wiki page to list ideas for Google Summer of Code 2014: http://www.ovirt.org/Summer_of_Code The deadline for the submission is really soon (14th of Feb) but please feel free to try and add any idea that you may have. For more information about Google Summer of Code please refer to: https://developers.google.com/open-source/soc/ If you can't edit the wiki page please follow up to this thread with your proposals. Thanks, -- Federico ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [Users] Data Center stuck between Non Responsive and Contending
- Original Message - From: Itamar Heim ih...@redhat.com To: Ted Miller tmil...@hcjb.org, users@ovirt.org, Federico Simoncelli fsimo...@redhat.com Cc: Allon Mureinik amure...@redhat.com Sent: Sunday, January 26, 2014 11:17:04 PM Subject: Re: [Users] Data Center stuck between Non Responsive and Contending On 01/27/2014 12:00 AM, Ted Miller wrote: On 1/26/2014 4:00 PM, Itamar Heim wrote: On 01/26/2014 10:51 PM, Ted Miller wrote: On 1/26/2014 3:10 PM, Itamar Heim wrote: On 01/26/2014 10:08 PM, Ted Miller wrote: is this gluster storage (guessing sunce you mentioned a 'volume') yes (mentioned under setup above) does it have a quorum? Volume Name: VM2 Type: Replicate Volume ID: 7bea8d3b-ec2a-4939-8da8-a82e6bda841e Status: Started Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: 10.41.65.2:/bricks/01/VM2 Brick2: 10.41.65.4:/bricks/01/VM2 Brick3: 10.41.65.4:/bricks/101/VM2 Options Reconfigured: cluster.server-quorum-type: server storage.owner-gid: 36 storage.owner-uid: 36 auth.allow: * user.cifs: off nfs.disa (there were reports of split brain on the domain metadata before when no quorum exist for gluster) after full heal: [root@office4a ~]$ gluster volume heal VM2 info Gathering Heal info on volume VM2 has been successful Brick 10.41.65.2:/bricks/01/VM2 Number of entries: 0 Brick 10.41.65.4:/bricks/01/VM2 Number of entries: 0 Brick 10.41.65.4:/bricks/101/VM2 Number of entries: 0 [root@office4a ~]$ gluster volume heal VM2 info split-brain Gathering Heal info on volume VM2 has been successful Brick 10.41.65.2:/bricks/01/VM2 Number of entries: 0 Brick 10.41.65.4:/bricks/01/VM2 Number of entries: 0 Brick 10.41.65.4:/bricks/101/VM2 Number of entries: 0 noticed this in host /var/log/messages (while looking for something else). Loop seems to repeat over and over. Jan 26 15:35:52 office4a sanlock[3763]: 2014-01-26 15:35:52-0500 14678 [30419]: read_sectors delta_leader offset 512 rv -90 /rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids Jan 26 15:35:53 office4a sanlock[3763]: 2014-01-26 15:35:53-0500 14679 [3771]: s1997 add_lockspace fail result -90 Jan 26 15:35:58 office4a vdsm TaskManager.Task ERROR Task=`89885661-88eb-4ea3-8793-00438735e4ab`::Unexpected error#012Traceback (most recent call last):#012 File /usr/share/vdsm/storage/task.py, line 857, in _run#012 return fn(*args, **kargs)#012 File /usr/share/vdsm/logUtils.py, line 45, in wrapper#012res = f(*args, **kwargs)#012 File /usr/share/vdsm/storage/hsm.py, line 2111, in getAllTasksStatuses#012allTasksStatus = sp.getAllTasksStatuses()#012 File /usr/share/vdsm/storage/securable.py, line 66, in wrapper#012 raise SecureError()#012SecureError Jan 26 15:35:59 office4a sanlock[3763]: 2014-01-26 15:35:59-0500 14686 [30495]: read_sectors delta_leader offset 512 rv -90 /rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids Jan 26 15:36:00 office4a sanlock[3763]: 2014-01-26 15:36:00-0500 14687 [3772]: s1998 add_lockspace fail result -90 Jan 26 15:36:00 office4a vdsm TaskManager.Task ERROR Task=`8db9ff1a-2894-407a-915a-279f6a7eb205`::Unexpected error#012Traceback (most recent call last):#012 File /usr/share/vdsm/storage/task.py, line 857, in _run#012 return fn(*args, **kargs)#012 File /usr/share/vdsm/storage/task.py, line 318, in run#012return self.cmd(*self.argslist, **self.argsdict)#012 File /usr/share/vdsm/storage/sp.py, line 273, in startSpm#012 self.masterDomain.acquireHostId(self.id)#012 File /usr/share/vdsm/storage/sd.py, line 458, in acquireHostId#012 self._clusterLock.acquireHostId(hostId, async)#012 File /usr/share/vdsm/storage/clusterlock.py, line 189, in acquireHostId#012raise se.AcquireHostIdFailure(self._sdUUID, e)#012AcquireHostIdFailure: Cannot acquire host id: ('0322a407-2b16-40dc-ac67-13d387c6eb4c', SanlockException(90, 'Sanlock lockspace add failure', 'Message too long')) fede - thoughts on above? (vojtech reported something similar, but it sorted out for him after some retries) Something truncated the ids file, as also reported by: [root@office4a ~]$ ls /rhev/data-center/mnt/glusterSD/10.41.65.2\:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ -l total 1029 -rw-rw 1 vdsm kvm 0 Jan 22 00:44 ids -rw-rw 1 vdsm kvm 0 Jan 16 18:50 inbox -rw-rw 1 vdsm kvm 2097152 Jan 21 18:20 leases -rw-r--r-- 1 vdsm kvm 491 Jan 21 18:20 metadata -rw-rw 1 vdsm kvm 0 Jan 16 18:50 outbox In the past I saw that happening because of a glusterfs bug: https://bugzilla.redhat.com/show_bug.cgi?id=862975 Anyway in general it seems that glusterfs is not always able to reconcile the ids file (as it's written by all the hosts at the same time). Maybe someone from gluster can identify easily what happened. Meanwhile if you
[Users] oVirt 3.4 test day - Template Versions
Feature tested: http://www.ovirt.org/Features/Template_Versions - create a new vm vm1 and make a template template1 from it - create a new vm vm2 based on template1 and make some changes - upgrade to 3.4 - create a new template template1.1 from vm2 - create a new vm vm3 from template1 (clone) - content ok - create a new vm vm4 from template1.1 (thin) - content ok - create a new vm vm5 from template1 last (thin) - content ok (same as 1.1) - try to remove template1 (failed as template1.1 is still present) - try to remove template1.1 (failed as vm5 is still present) - create a new vm vm6 and make a template blank1.1 as new version of the blank template (succeeded) - create a vm pool vmpool1 with the latest template from template1 - create a vm pool vmpool2 with the template1.1 (last) template from template1 - start vmpool1 and vmpool2 and verify that the content is the same - create a new template template1.2 - start vmpool1 and verify that the content is the same as latest (template1.2) - start vmpool2 and verify that the content is the same as template1.1 Suggestions: - the template blank is special, I am not sure if allowing versioning may be confusing (for example is not even editable) - as far as I can see the Sub Version Name is not editable anymore (after picking it) -- Federico ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [Users] Data Center stuck between Non Responsive and Contending
- Original Message - From: Ted Miller tmil...@hcjb.org To: Federico Simoncelli fsimo...@redhat.com, Itamar Heim ih...@redhat.com Cc: users@ovirt.org Sent: Monday, January 27, 2014 7:16:14 PM Subject: Re: [Users] Data Center stuck between Non Responsive and Contending On 1/27/2014 3:47 AM, Federico Simoncelli wrote: Maybe someone from gluster can identify easily what happened. Meanwhile if you just want to repair your data-center you could try with: $ cd /rhev/data-center/mnt/glusterSD/10.41.65.2\:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ $ touch ids $ sanlock direct init -s 0322a407-2b16-40dc-ac67-13d387c6eb4c:0:ids:1048576 Federico, I won't be able to do anything to the ovirt setup for another 5 hours or so (it is a trial system I am working on at home, I am at work), but I will try your repair script and report back. In bugzilla 862975 they suggested turning off write-behind caching and eager locking on the gluster volume to avoid/reduce the problems that come from many different computers all writing to the same file(s) on a very frequent basis. If I interpret the comment in the bug correctly, it did seem to help in that situation. My situation is a little different. My gluster setup is replicate only, replica 3 (though there are only two hosts). I was not stress-testing it, I was just using it, trying to figure out how I can import some old VMWare VMs without an ESXi server to run them on. Have you done anything similar to what is described here in comment 21? https://bugzilla.redhat.com/show_bug.cgi?id=859589#c21 When did you realize that you weren't able to use the data-center anymore? Can you describe exactly what you did and what happened, for example: 1. I created the data center (up and running) 2. I tried to import some VMs from VMWare 3. During the import (or after it) the data-center went in the contending state ... Did something special happened? I don't know, power loss, split-brain? For example also an excessive load on one of the servers could have triggered a timeout somewhere (forcing the data-center to go back in the contending state). Could you check if any host was fenced? (Forcibly rebooted) I am guessing that what makes cluster storage have the (Master) designation is that this is the one that actually contains the sanlocks? If so, would it make sense to set up a gluster volume to be (Master), but not use it for VM storage, just for storing the sanlock info? Separate gluster volume(s) could then have the VMs on it(them), and would not need the optimizations turned off. Any domain must be able to become the master at any time. Without a master the data center is unusable (at the present time), that's why we migrate (or reconstruct) it on another domain when necessary. -- Federico ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [Users] oVirt 3.4 test day - Template Versions
- Original Message - From: Omer Frenkel ofren...@redhat.com To: Federico Simoncelli fsimo...@redhat.com Cc: oVirt Users List users@ovirt.org, Itamar Heim ih...@redhat.com Sent: Monday, January 27, 2014 4:31:56 PM Subject: Re: oVirt 3.4 test day - Template Versions Thanks for the feedback! much appreciated. - Original Message - From: Federico Simoncelli fsimo...@redhat.com To: oVirt Users List users@ovirt.org Cc: Omer Frenkel ofren...@redhat.com, Itamar Heim ih...@redhat.com Sent: Monday, January 27, 2014 5:12:38 PM Subject: oVirt 3.4 test day - Template Versions Feature tested: http://www.ovirt.org/Features/Template_Versions - create a new vm vm1 and make a template template1 from it - create a new vm vm2 based on template1 and make some changes - upgrade to 3.4 - create a new template template1.1 from vm2 - create a new vm vm3 from template1 (clone) - content ok - create a new vm vm4 from template1.1 (thin) - content ok - create a new vm vm5 from template1 last (thin) - content ok (same as 1.1) - try to remove template1 (failed as template1.1 is still present) - try to remove template1.1 (failed as vm5 is still present) - create a new vm vm6 and make a template blank1.1 as new version of the blank template (succeeded) - create a vm pool vmpool1 with the latest template from template1 - create a vm pool vmpool2 with the template1.1 (last) template from template1 - start vmpool1 and vmpool2 and verify that the content is the same - create a new template template1.2 - start vmpool1 and verify that the content is the same as latest (template1.2) - start vmpool2 and verify that the content is the same as template1.1 Suggestions: - the template blank is special, I am not sure if allowing versioning may be confusing (for example is not even editable) right, i also thought about this, and my thought was not to block the user from doing this, but if it was confusing we better block it. - as far as I can see the Sub Version Name is not editable anymore (after picking it) thanks, i see its missing in the UI, do you care to open a bug on that? https://bugzilla.redhat.com/show_bug.cgi?id=1058501 -- Federico ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [Users] wiki site out of date?
- Original Message - From: Sven Kieske s.kie...@mittwald.de To: Users@ovirt.org List Users@ovirt.org Sent: Monday, January 13, 2014 12:07:20 PM Subject: [Users] wiki site out of date? Hi, is this feature page up to date? http://www.ovirt.org/Features/Online_Virtual_Drive_Resize specifically the point: QEMU-GA support for notifying the guest and updating the size of the visible disk: To be integrated That actually meant triggering the partition/lvm/filesystem resize automatically (in the guest), so that you don't have to do it manually. because we used online drive resize and it does show up in ubuntu based vm, without qemu-ga installed? -- Federico ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [Users] Broken Snapshots
- Original Message - From: Maurice James midnightst...@msn.com To: d...@redhat.com, Leonid Natapov lnata...@redhat.com Cc: eduardo Warszawski ewars...@redhat.com, Federico Simoncelli fsimo...@redhat.com, Liron Aravot lara...@redhat.com, users@ovirt.org Sent: Thursday, January 2, 2014 2:04:06 PM Subject: RE: [Users] Broken Snapshots When I get home from work. I will attempt to delete the snapshot again then send the vdsm.log to you guys. Thanks Date: Thu, 2 Jan 2014 12:58:47 + From: d...@redhat.com To: lnata...@redhat.com CC: midnightst...@msn.com; ewars...@redhat.com; fsimo...@redhat.com; lara...@redhat.com; users@ovirt.org Subject: Re: [Users] Broken Snapshots Leo, there are two issues here: 1. I want to know what happened in Maurice's environment in the first place (why is the snapshot broken). 2. help with a workaround so that Maurice can delete the broken snapshots and continue working. At least for me this is hard to track, can we open a bug and attach the logs there? Are you using fedora or centos? On centos there's a known issue: https://bugzilla.redhat.com/show_bug.cgi?id=1009100 -- Federico ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [Users] Broken Snapshots
- Original Message - From: Leonid Natapov lnata...@redhat.com To: Federico Simoncelli fsimo...@redhat.com Cc: Maurice James midnightst...@msn.com, d...@redhat.com, eduardo Warszawski ewars...@redhat.com, Liron Aravot lara...@redhat.com, users@ovirt.org Sent: Thursday, January 2, 2014 2:09:13 PM Subject: Re: [Users] Broken Snapshots Fede,there is an open BZ about that. See if it's the same https://bugzilla.redhat.com/show_bug.cgi?id=996945 That's the second part (deleting the broken snapshot), I'm trying to understand why the snapshot failed. -- Federico - Original Message - From: Federico Simoncelli fsimo...@redhat.com To: Maurice James midnightst...@msn.com Cc: d...@redhat.com, Leonid Natapov lnata...@redhat.com, eduardo Warszawski ewars...@redhat.com, Liron Aravot lara...@redhat.com, users@ovirt.org Sent: Thursday, January 2, 2014 3:07:53 PM Subject: Re: [Users] Broken Snapshots - Original Message - From: Maurice James midnightst...@msn.com To: d...@redhat.com, Leonid Natapov lnata...@redhat.com Cc: eduardo Warszawski ewars...@redhat.com, Federico Simoncelli fsimo...@redhat.com, Liron Aravot lara...@redhat.com, users@ovirt.org Sent: Thursday, January 2, 2014 2:04:06 PM Subject: RE: [Users] Broken Snapshots When I get home from work. I will attempt to delete the snapshot again then send the vdsm.log to you guys. Thanks Date: Thu, 2 Jan 2014 12:58:47 + From: d...@redhat.com To: lnata...@redhat.com CC: midnightst...@msn.com; ewars...@redhat.com; fsimo...@redhat.com; lara...@redhat.com; users@ovirt.org Subject: Re: [Users] Broken Snapshots Leo, there are two issues here: 1. I want to know what happened in Maurice's environment in the first place (why is the snapshot broken). 2. help with a workaround so that Maurice can delete the broken snapshots and continue working. At least for me this is hard to track, can we open a bug and attach the logs there? Are you using fedora or centos? On centos there's a known issue: https://bugzilla.redhat.com/show_bug.cgi?id=1009100 -- Federico ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [Users] AcquireHostId problem
- Original Message - From: Pascal Jakobi pascal.jak...@gmail.com To: Federico Simoncelli fsimo...@redhat.com Cc: users@ovirt.org Sent: Saturday, December 21, 2013 12:37:51 PM Subject: Re: [Users] AcquireHostId problem Nope. Nothing more in logs. My guess is that the timeout problem generates the error. However, in reality if you run mount, you have the target partitions mounted If you still have a problem and something is not working there's an error somewhere and we only have to find it. Look in the engine logs and in the vdsm logs for any error (not only Traceback but also ERROR). Try to describe with more details what you're trying to do, what you expect to happen and what is happening instead. Therefore, I guess the problem is to understand why dev/watchdog0 failed to set timeout The watchdog provided by your laptop is not working properly or it's not able to set the timeout we need. You inserted the softdog module and wdmd is now up and running as you reported with: 2013/12/21 Federico Simoncelli fsimo...@redhat.com - Original Message - From: Pascal Jakobi pascal.jak...@gmail.com To: Federico Simoncelli fsimo...@redhat.com, users@ovirt.org Sent: Friday, December 20, 2013 11:54:21 PM Subject: Re: [Users] AcquireHostId problem Dec 20 23:43:59 lab2 kernel: [183033.639261] softdog: Software Watchdog Timer: 0.08 initialized. soft_noboot=0 soft_margin=60 sec soft_panic=0 (nowayout=0) Dec 20 23:44:11 lab2 systemd[1]: Starting Watchdog Multiplexing Daemon... Dec 20 23:44:11 lab2 wdmd[25072]: wdmd started S0 H1 G179 Dec 20 23:44:11 lab2 systemd-wdmd[25066]: Starting wdmd: [ OK ] Dec 20 23:44:11 lab2 wdmd[25072]: /dev/watchdog0 failed to set timeout Dec 20 23:44:11 lab2 wdmd[25072]: /dev/watchdog0 disarmed Dec 20 23:44:11 lab2 wdmd[25072]: /dev/watchdog1 armed with fire_timeout 60 Dec 20 23:44:11 lab2 systemd[1]: Started Watchdog Multiplexing Daemon. So as far as I can see this part is working. -- Federico ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [Users] AcquireHostId problem
- Original Message - From: Pascal Jakobi pascal.jak...@gmail.com To: users@ovirt.org Sent: Monday, December 23, 2013 4:33:23 PM Subject: Re: [Users] AcquireHostId problem Ok. What I am doing is just adding a new NFS domain that fails : Failed to add Storage Domain DataLab2. (User: admin@internal) And I thought that the /dev/watchdog0 failed to set timeout msg was signalling an error. No that's just the attempt to use the laptop watchdog but then it fallbacks to the softdog one: Dec 20 23:44:11 lab2 wdmd[25072]: /dev/watchdog1 armed with fire_timeout 60 Dec 20 23:44:11 lab2 systemd[1]: Started Watchdog Multiplexing Daemon. -- Federico ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [Users] AcquireHostId problem
- Original Message - From: Pascal Jakobi pascal.jak...@gmail.com To: users@ovirt.org Sent: Monday, December 23, 2013 5:34:55 PM Subject: Re: [Users] AcquireHostId problem Here is the message I get on the console : Error while executing action Attach Storage Domain: AcquireHostIdFailure The software seems to go pretty far : it reaches the locked stated before failing. In engine.log 2013-12-23 16:56:49,497 ERROR [org.ovirt.engine.core.bll.storage.AddStoragePoolWithStoragesCommand] (ajp--127.0.0.1-8702-2) Command org.ovirt.engine.core.bll.storage.AddStoragePoolWithStoragesCommand throw Vdc Bll exception. With error message VdcBLLException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to CreateStoragePoolVDS, error = Cannot acquire host id: ('8c626a3f-5846-434e-83d8-6238e1ff9e03', SanlockException(-203, 'Sanlock lockspace add failure', 'Sanlock exception')) (Failed with error AcquireHostIdFailure and code 661) Can this help ? What's the logs in VDSM? Is this the same host where wdmd was up and running or another one? If you restarted your laptop and you didn't persist the module loading (following the instruction in one of my previous emails) you'll end up in the same problem every time. -- Federico ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [Users] AcquireHostId problem
- Original Message - From: Pascal Jakobi pascal.jak...@gmail.com To: Federico Simoncelli fsimo...@redhat.com Cc: users@ovirt.org, David Teigland teigl...@redhat.com Sent: Friday, December 20, 2013 7:19:52 AM Subject: Re: [Users] AcquireHostId problem Here you go ! I am running F19 on a Lenovo S30. Thxs Thanks, can you open a bug on this issue? (Attach also the files to the bug). I suppose it will be later split into different ones, one for the failing watchdog device and maybe an RFE to wdmd to automatically load the softdog if there are no usable watchdog devices. -- Federico ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [Users] AcquireHostId problem
- Original Message - From: Pascal Jakobi pascal.jak...@gmail.com To: Federico Simoncelli fsimo...@redhat.com, users@ovirt.org Sent: Friday, December 20, 2013 11:54:21 PM Subject: Re: [Users] AcquireHostId problem Dec 20 23:43:59 lab2 kernel: [183033.639261] softdog: Software Watchdog Timer: 0.08 initialized. soft_noboot=0 soft_margin=60 sec soft_panic=0 (nowayout=0) Dec 20 23:44:11 lab2 systemd[1]: Starting Watchdog Multiplexing Daemon... Dec 20 23:44:11 lab2 wdmd[25072]: wdmd started S0 H1 G179 Dec 20 23:44:11 lab2 systemd-wdmd[25066]: Starting wdmd: [ OK ] Dec 20 23:44:11 lab2 wdmd[25072]: /dev/watchdog0 failed to set timeout Dec 20 23:44:11 lab2 wdmd[25072]: /dev/watchdog0 disarmed Dec 20 23:44:11 lab2 wdmd[25072]: /dev/watchdog1 armed with fire_timeout 60 Dec 20 23:44:11 lab2 systemd[1]: Started Watchdog Multiplexing Daemon. Dec 20 23:45:33 lab2 rpc.mountd[2819]: authenticated mount request from 192.168.1.41:994 for /home/vdsm/data (/home/vdsm/data) Dec 20 23:45:39 lab2 rpc.mountd[2819]: authenticated mount request from 192.168.1.41:954 for /home/vdsm/data (/home/vdsm/data) Seems to work a bit. However I still get unable to attach storage when creating a domain It is probably a different error now. Anything interesting in vdsm.log? -- Federico ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [Users] AcquireHostId problem
- Original Message - From: Pascal Jakobi pascal.jak...@gmail.com To: users@ovirt.org Sent: Wednesday, December 18, 2013 9:32:00 PM Subject: Re: [Users] AcquireHostId problem sanlock.log : 2013-12-18 21:23:32+0100 1900 [867]: s1 lockspace b2d69b22-a8b8-466c-bf1f-b6e565228238:250:/rhev/data-center/mnt/lab2.home:_home_vdsm_data/b2d69b22-a8b8-466c-bf1f-b6e565228238/dom_md/ids:0 2013-12-18 21:23:52+0100 1920 [4238]: s1 wdmd_connect failed -111 2013-12-18 21:23:52+0100 1920 [4238]: s1 create_watchdog failed -1 2013-12-18 21:23:53+0100 1921 [867]: s1 add_lockspace fail result -203 Hi Pascal, is wdmd up and running? # ps aux | grep wdmd root 1650 0.0 0.2 13552 3320 ?SLs 03:49 0:00 wdmd -G sanlock -- Federico ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [Users] AcquireHostId problem
- Original Message - From: Pascal Jakobi pascal.jak...@gmail.com To: Federico Simoncelli fsimo...@redhat.com Cc: users@ovirt.org Sent: Thursday, December 19, 2013 4:05:07 PM Subject: Re: [Users] AcquireHostId problem Federico On may suspect wdmd isn't running as wdmd_connect failed (see sanlock.log). I have ran a ps command - no wdmd Any idea why wdmd_connect might fail ? Are you using fedora? What is the version of sanlock? Do you see any information about wdmd in /var/log/messages? Is the wdmd service started? # service wdmd status # systemctl status wdmd.service Thanks, -- Federico ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [Users] Cinder Integration
- Original Message - From: Itamar Heim ih...@redhat.com To: Udaya Kiran P ukiran...@yahoo.in, users@ovirt.org, Oved Ourfalli oourf...@redhat.com, Federico Simoncelli fsimo...@redhat.com Sent: Tuesday, November 19, 2013 12:42:05 PM Subject: Re: [Users] Cinder Integration On 11/19/2013 06:00 AM, Udaya Kiran P wrote: Hi All, I want to consume the oVirt Storage Domains in OpenStack Cinder. Is this driver available or are there any resources pointing on how this can be done? Federico - was that your sample driver or Oved's? It was Oved's. I have the feeling that I did some research in that area as well but then I moved to glance. -- Federico ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [Users] Urgent: Export NFS Migration issue oVirt 3.0 - 3.2.1
- Original Message - From: Sven Knohsalla s.knohsa...@netbiscuits.com To: users@ovirt.org Sent: Friday, November 8, 2013 10:32:32 AM Subject: Re: [Users] Urgent: Export NFS Migration issue oVirt 3.0 - 3.2.1 Hi, I could eliminate this issue to our oVirt 3.0 instance, as the pool_uuid SHA checksum in metadata on NFS Export wasn't cleared properly from engine 3.0. (/NFSmountpoint/storage-pool-id/dom_md/metadata) Hi Sven, can you send the original metadata and the relevant vdsm logs? If I read your engine logs correctly we need the vdsm logs from the host deovn-a01 (vds id 66b546c2-ae62-11e1-b734-5254005cbe44) around the same time this was issued: 2013-11-08 08:12:49,075 INFO [org.ovirt.engine.core.bll.storage.ConnectStorageToVdsCommand] (pool-5-thread-39) Running command: ConnectStorageToVdsCommand internal: true. Entities affected : ID: aaa0----123456789aaa Type: System 2013-11-08 08:12:49,079 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (pool-5-thread-39) START, ConnectStorageServerVDSCommand(vdsId = 66b546c2-ae62-11e1-b734-5254005cbe44, storagePoolId = ----, storageType = NFS, connectionList = [{ id: 2a84acc3-1700-45c4-bbf7-a3305b338f83, connection: 172.16.101.95:/ovirtmig02 };]), log id: 2482b112 2013-11-08 08:12:52,092 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (pool-5-thread-50) FINISH, ConnectStorageServerVDSCommand, return: {2a84acc3-1700-45c4-bbf7-a3305b338f83=451}, log id: 7dcfb51f 2013-11-08 08:12:52,099 ERROR [org.ovirt.engine.core.bll.storage.NFSStorageHelper] (pool-5-thread-50) The connection with details 172.16.101.95:/ovirtmig02 failed because of error code 451 and error message is: error storage server connection Thanks, -- Federico ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [Users] Live storage migration snapshot removal (fails)
- Original Message - From: Dan Kenigsberg dan...@redhat.com To: Sander Grendelman san...@grendelman.com Cc: users@ovirt.org, fsimo...@redhat.com, ykap...@redhat.com Sent: Friday, November 8, 2013 4:06:53 PM Subject: Re: [Users] Live storage migration snapshot removal (fails) On Fri, Nov 08, 2013 at 02:20:39PM +0100, Sander Grendelman wrote: snip 9d4e8a43-4851-42ff-a684-f3d802527cf7/c512267d-ebba-4907-a782-fec9b6c95116 52178458-1764-4317-b85b-71843054aae9::WARNING::2013-11-08 14:02:53,772::image::1164::Storage.Image::(merge) Auto shrink after merge failed Traceback (most recent call last): File /usr/share/vdsm/storage/image.py, line 1162, in merge srcVol.shrinkToOptimalSize() File /usr/share/vdsm/storage/blockVolume.py, line 320, in shrinkToOptimalSize qemuImg.FORMAT.QCOW2) File /usr/lib64/python2.6/site-packages/vdsm/qemuImg.py, line 109, in check raise QImgError(rc, out, err, unable to parse qemu-img check output) QImgError: ecode=0, stdout=['No errors were found on the image.'], stderr=[], message=unable to parse qemu-img check output I'm not sure that it's the only problem in this flow, but there's a clear bug in lib/vdsm/qemuImg.py's check() function: it fails to parse the output of qemu-img. Would you open a bug on that? I found no open one. I remember that this was discussed and the agreement was that if the offset is not reported by qemu-img we should have used the old method to calculate the new volume size. We'll probably need to verify it. Sander can you open a bug on this? Thanks, -- Federico ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [Users] oVirt Weekly Meeting Minutes -- 2013-10-09
- Original Message - From: Dan Kenigsberg dan...@redhat.com To: Mike Burns mbu...@redhat.com Cc: bo...@ovirt.org, users users@ovirt.org Sent: Wednesday, October 9, 2013 5:45:23 PM Subject: Re: [Users] oVirt Weekly Meeting Minutes -- 2013-10-09 On Wed, Oct 09, 2013 at 11:15:41AM -0400, Mike Burns wrote: Minutes: http://ovirt.org/meetings/ovirt/2013/ovirt.2013-10-09-14.06.html Minutes (text): http://ovirt.org/meetings/ovirt/2013/ovirt.2013-10-09-14.06.txt Log: http://ovirt.org/meetings/ovirt/2013/ovirt.2013-10-09-14.06.log.html = #ovirt: oVirt Weekly sync = Meeting started by mburns at 14:06:41 UTC. The full logs are available at http://ovirt.org/meetings/ovirt/2013/ovirt.2013-10-09-14.06.log.html . Meeting summary --- * agenda and roll call (mburns, 14:07:00) * 3.3 updates (mburns, 14:07:17) * 3.4 planning (mburns, 14:07:24) * conferences and workshops (mburns, 14:07:31) * infra update (mburns, 14:07:34) * 3.3 updates (mburns, 14:08:42) * 3.3.0.1 vdsm packages are posted to updates-testing (mburns, 14:09:04) * LINK: https://bugzilla.redhat.com/show_bug.cgi?id=1009100 (sbonazzo, 14:10:33) * 2 open bugs blocking 3.3.0.1 (mburns, 14:29:35) * 1 is deferred due to qemu-kvm feature set in el6 (mburns, 14:29:49) * other is allowed versions for vdsm (mburns, 14:30:01) * vdsm version bug will be backported to 3.3.0.1 today (mburns, 14:30:13) * ACTION: sbonazzo to build engine 3.3.0.1 tomorrow (mburns, 14:30:22) * ACTION: mburns to post 3.3.0.1 to ovirt.org tomorrow (mburns, 14:30:32) * expected release: next week (mburns, 14:30:46) * ACTION: danken and sbonazzo to provide release notes for 3.3.0.1 (mburns, 14:37:56) A vdsm bug (BZ#1007980) made it impossible to migrate or re-run a VM with a glusterfs-backed virtual disk if the VM was originally started with an empty cdrom. If you have encountered this bug, you would have to manually find the affected VMs with psql -U engine -d engine -c select distinct vm_name from vm_static, vm_device where vm_guid=vm_id and device='cdrom' and address ilike '%pci%'; and remove their junk cdrom address with psql -U engine -d engine -c update vm_device set address='' where device='cdrom' and address ilike '%pci%'; We are currently building vdsm-4.12.1-4 that is carrying a new critical fix related to drive resize (extension). Release notes: A vdsm bug introduced in a specific case of disk resize (raw on nfs) accidentally wipes the content of the virtual disk. The issue was masked on the master (and ovirt-3.3) branch by an unrelated change that happened to fix the problem leaving only vdsm-4.12 affected. It is of critical importance to update all your machines to the new vdsm release. -- Federico ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [Users] VM wont restart after some NFS snapshot restore.
Hi Usman, can you paste somewhere the content of the meta files? $ cat 039a8482-c267-4051-b1e6-1c1dee49b3d7.meta 8d48505d-846d-49a7-8b50-d972ee051145.meta could you also provide the absolute path to those files? (in the vdsm host) Thanks, -- Federico - Original Message - From: Usman Aslam us...@linkshift.com To: users@ovirt.org Sent: Thursday, October 3, 2013 4:29:43 AM Subject: [Users] VM wont restart after some NFS snapshot restore. I have some VM's that live on NFS share. Basically, I had to revert the VM disk to a backup from a few days ago. So I powered the VM down, copied over the following files 039a8482-c267-4051-b1e6-1c1dee49b3d7 039a8482-c267-4051-b1e6-1c1dee49b3d7.lease 039a8482-c267-4051-b1e6-1c1dee49b3d7.meta 8d48505d-846d-49a7-8b50-d972ee051145 8d48505d-846d-49a7-8b50-d972ee051145.lease 8d48505d-846d-49a7-8b50-d972ee051145.meta and now when I try to power the VM, it complains 2013-Oct-02, 22:02:38 Failed to run VM zabbix-prod-01 (User: admin@internal). 2013-Oct-02, 22:02:38 Failed to run VM zabbix-prod-01 on Host tss-tusk-ovirt-01-ovirtmgmt.tusk.tufts.edu . 2013-Oct-02, 22:02:38 VM zabbix-prod-01 is down. Exit message: 'truesize'. Any ideas on how I could resolve this? Perhaps a better way of approaching the restore on a filesystem level? I see the following the vsdm.log Thread-7843::ERROR::2013-10-02 22:02:37,548::vm::716::vm.Vm::(_startUnderlyingVm) vmId=`8e8764ad-6b4c-48d8-9a19-fa5cf77208ef`::The vm start process failed Traceback (most recent call last): File /usr/share/vdsm/vm.py, line 678, in _startUnderlyingVm self._run() File /usr/share/vdsm/libvirtvm.py, line 1467, in _run devices = self.buildConfDevices() File /usr/share/vdsm/vm.py, line 515, in buildConfDevices self._normalizeVdsmImg(drv) File /usr/share/vdsm/vm.py, line 408, in _normalizeVdsmImg drv['truesize'] = res['truesize'] KeyError: 'truesize' Thread-7843::DEBUG::2013-10-02 22:02:37,553::vm::1065::vm.Vm::(setDownStatus) vmId=`8e8764ad-6b4c-48d8-9a19-fa5cf77208ef`::Changed state to Down: 'truesize' Any help would be really nice, thanks! -- Usman ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [Users] VM wont restart after some NFS snapshot restore.
Usman, the only thing that comes to my mind is something related to: http://gerrit.ovirt.org/13529 which means that in some way the restored volumes are either inaccessible (permissions?) or their metadata is corrupted (but it doesn't seem so). There is probably another Traceback in the logs that should give us more information. Could you post somewhere the entire vdsm log? Thanks. -- Federico - Original Message - From: Usman Aslam us...@linkshift.com To: Federico Simoncelli fsimo...@redhat.com Cc: users@ovirt.org Sent: Friday, October 4, 2013 4:49:08 PM Subject: Re: [Users] VM wont restart after some NFS snapshot restore. Federico, The files reside on this mount on the hypervisor /rhev/data-center/mnt/xyz-02.tufts.edu: _vol_tusk__vm_tusk__vm/fa3279ec-2912-45ac-b7bc-9fe89151ed99/images/79ccd989-3033-4e6a-80da-ba210c94225a and are symlinked as described below [root@xyz-02 430cd986-6488-403b-8d46-29abbc3eba38]# pwd /rhev/data-center/430cd986-6488-403b-8d46-29abbc3eba38 [root@xyz-02 430cd986-6488-403b-8d46-29abbc3eba38]# ll total 12 lrwxrwxrwx 1 vdsm kvm 120 Oct 3 12:35 ee2ae498-6e45-448d-8f91-0efca377dcf6 - /rhev/data-center/mnt/xyz-02.tufts.edu: _vol_tusk__iso_tusk__iso/ee2ae498-6e45-448d-8f91-0efca377dcf6 lrwxrwxrwx 1 vdsm kvm 118 Oct 3 12:35 fa3279ec-2912-45ac-b7bc-9fe89151ed99 - /rhev/data-center/mnt/xyz-02.tufts.edu: _vol_tusk__vm_tusk__vm/fa3279ec-2912-45ac-b7bc-9fe89151ed99 lrwxrwxrwx 1 vdsm kvm 118 Oct 3 12:35 mastersd - /rhev/data-center/mnt/xyz-02.tufts.edu: _vol_tusk__vm_tusk__vm/fa3279ec-2912-45ac-b7bc-9fe89151ed99 I did a diff and the contents of the the *Original *meta file (that works and VM starts but have bad file system) and the *Backup *meta file (the files being restored from nfs snapshot) *are exactly the same*. Contents are listed blow. Also the files sizes for all six related files are exactly the same. [root@xyz-02 images]# cat 79ccd989-3033-4e6a-80da-ba210c94225a/039a8482-c267-4051-b1e6-1c1dee49b3d7.meta DOMAIN=fa3279ec-2912-45ac-b7bc-9fe89151ed99 VOLTYPE=SHARED CTIME=1368457020 FORMAT=RAW IMAGE=59b6a429-bd11-40c6-a218-78df840725c6 DISKTYPE=2 PUUID=---- LEGALITY=LEGAL MTIME=1368457020 POOL_UUID= DESCRIPTION=Active VM TYPE=SPARSE SIZE=104857600 EOF [root@tss-tusk-ovirt-02 images]# cat 79ccd989-3033-4e6a-80da-ba210c94225a/8d48505d-846d-49a7-8b50-d972ee051145.meta DOMAIN=fa3279ec-2912-45ac-b7bc-9fe89151ed99 CTIME=1370303194 FORMAT=COW DISKTYPE=2 LEGALITY=LEGAL SIZE=104857600 VOLTYPE=LEAF DESCRIPTION= IMAGE=79ccd989-3033-4e6a-80da-ba210c94225a PUUID=039a8482-c267-4051-b1e6-1c1dee49b3d7 MTIME=1370303194 POOL_UUID= TYPE=SPARSE EOF Any help would be greatly appreciated! Thanks, Usman On Fri, Oct 4, 2013 at 9:50 AM, Federico Simoncelli fsimo...@redhat.comwrote: Hi Usman, can you paste somewhere the content of the meta files? $ cat 039a8482-c267-4051-b1e6-1c1dee49b3d7.meta 8d48505d-846d-49a7-8b50-d972ee051145.meta could you also provide the absolute path to those files? (in the vdsm host) Thanks, -- Federico - Original Message - From: Usman Aslam us...@linkshift.com To: users@ovirt.org Sent: Thursday, October 3, 2013 4:29:43 AM Subject: [Users] VM wont restart after some NFS snapshot restore. I have some VM's that live on NFS share. Basically, I had to revert the VM disk to a backup from a few days ago. So I powered the VM down, copied over the following files 039a8482-c267-4051-b1e6-1c1dee49b3d7 039a8482-c267-4051-b1e6-1c1dee49b3d7.lease 039a8482-c267-4051-b1e6-1c1dee49b3d7.meta 8d48505d-846d-49a7-8b50-d972ee051145 8d48505d-846d-49a7-8b50-d972ee051145.lease 8d48505d-846d-49a7-8b50-d972ee051145.meta and now when I try to power the VM, it complains 2013-Oct-02, 22:02:38 Failed to run VM zabbix-prod-01 (User: admin@internal). 2013-Oct-02, 22:02:38 Failed to run VM zabbix-prod-01 on Host tss-tusk-ovirt-01-ovirtmgmt.tusk.tufts.edu . 2013-Oct-02, 22:02:38 VM zabbix-prod-01 is down. Exit message: 'truesize'. Any ideas on how I could resolve this? Perhaps a better way of approaching the restore on a filesystem level? I see the following the vsdm.log Thread-7843::ERROR::2013-10-02 22:02:37,548::vm::716::vm.Vm::(_startUnderlyingVm) vmId=`8e8764ad-6b4c-48d8-9a19-fa5cf77208ef`::The vm start process failed Traceback (most recent call last): File /usr/share/vdsm/vm.py, line 678, in _startUnderlyingVm self._run() File /usr/share/vdsm/libvirtvm.py, line 1467, in _run devices = self.buildConfDevices() File /usr/share/vdsm/vm.py, line 515, in buildConfDevices self._normalizeVdsmImg(drv) File /usr/share/vdsm/vm.py, line 408, in _normalizeVdsmImg drv['truesize'] = res['truesize'] KeyError: 'truesize' Thread-7843::DEBUG::2013-10-02 22:02:37,553::vm::1065::vm.Vm::(setDownStatus) vmId=`8e8764ad-6b4c
Re: [Users] Resizing disks destroys contents
Hi Martijn, can you post somewhere the relevant vdsm logs (from the spm host)? Thanks, -- Federico - Original Message - From: Martijn Grendelman martijn.grendel...@isaac.nl To: users@ovirt.org Sent: Tuesday, October 1, 2013 5:01:39 PM Subject: [Users] Resizing disks destroys contents Hi, I just tried out another feature of oVirt and again, I am shocked by the results. I did the following: - create new VM based on an earlier created template, with 20 GB disk - Run the VM - boots fine - Shut down the VM - Via Disks - Edit - Extend size by(GB) add 20 GB to the disk - Run the VM Result: no bootable device. Linux installation gone. Just to be sure, I booted the VM with a gparted live iso, and gparted reports the entire 40 GB as unallocated space. Where's my data? What's wrong with my oVirt installation? What am I doing wrong? Regards, Martijn. ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [Users] vdsm live migration errors in latest master
- Original Message - From: Dan Kenigsberg dan...@redhat.com To: Federico Simoncelli fsimo...@redhat.com Cc: Dead Horse deadhorseconsult...@gmail.com, users users@ovirt.org, vdsm-de...@fedorahosted.org, aba...@redhat.com Sent: Thursday, September 26, 2013 1:38:15 AM Subject: Re: [Users] vdsm live migration errors in latest master On Tue, Sep 24, 2013 at 12:04:14PM -0400, Federico Simoncelli wrote: - Original Message - From: Dan Kenigsberg dan...@redhat.com To: Dead Horse deadhorseconsult...@gmail.com Cc: users@ovirt.org users@ovirt.org, vdsm-de...@fedorahosted.org, fsimo...@redhat.com, aba...@redhat.com Sent: Tuesday, September 24, 2013 11:44:48 AM Subject: Re: [Users] vdsm live migration errors in latest master On Mon, Sep 23, 2013 at 04:05:34PM -0500, Dead Horse wrote: Seeing failed live migrations and these errors in the vdsm logs with latest VDSM/Engine master. Hosts are EL6.4 Thanks for posting this report. The log is from the source of migration, right? Could you trace the history of the hosts of this VM? Could it be that it was started on an older version of vdsm (say ovirt-3.3.0) and then (due to migration or vdsm upgrade) got into a host with a much newer vdsm? Would you share the vmCreate (or vmMigrationCreate) line for this Vm in your log? I smells like an unintended regression of http://gerrit.ovirt.org/17714 vm: extend shared property to support locking solving it may not be trivial, as we should not call _normalizeDriveSharedAttribute() automatically on migration destination, as it may well still be apart of a 3.3 clusterLevel. Also, migration from vdsm with extended shared property, to an ovirt 3.3 vdsm is going to explode (in a different way), since the destination does not expect the extended values. Federico, do we have a choice but to revert that patch, and use something like shared3 property instead? I filed a bug at: https://bugzilla.redhat.com/show_bug.cgi?id=1011608 A possible fix could be: http://gerrit.ovirt.org/#/c/19509 Beyond this, we must make sure that on Engine side, the extended shared values would be used only for clusterLevel 3.4 and above. Are the extended shared values already used by Engine? Yes. That's the idea. Actually to be fair, the second case you mentioned (migrating from extended shared property to old vdsm) it wouldn't have been possible I suppose (the issue here is that Dead Horse has one or more hosts running on master instead of 3.3). The extended shared property would have appeared only in 3.4 and to allow the migration you would have had to upgrade all the nodes. But anyway since we were also talking about a new 3.3.1 barnch I just went ahead and covered all cases. -- Federico ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [Users] vdsm live migration errors in latest master
- Original Message - From: Dan Kenigsberg dan...@redhat.com To: Federico Simoncelli fsimo...@redhat.com Cc: Dead Horse deadhorseconsult...@gmail.com, users users@ovirt.org, vdsm-de...@fedorahosted.org, aba...@redhat.com Sent: Thursday, September 26, 2013 2:09:15 PM Subject: Re: [Users] vdsm live migration errors in latest master On Thu, Sep 26, 2013 at 05:35:46AM -0400, Federico Simoncelli wrote: - Original Message - From: Dan Kenigsberg dan...@redhat.com To: Federico Simoncelli fsimo...@redhat.com Cc: Dead Horse deadhorseconsult...@gmail.com, users users@ovirt.org, vdsm-de...@fedorahosted.org, aba...@redhat.com Sent: Thursday, September 26, 2013 1:38:15 AM Subject: Re: [Users] vdsm live migration errors in latest master On Tue, Sep 24, 2013 at 12:04:14PM -0400, Federico Simoncelli wrote: - Original Message - From: Dan Kenigsberg dan...@redhat.com To: Dead Horse deadhorseconsult...@gmail.com Cc: users@ovirt.org users@ovirt.org, vdsm-de...@fedorahosted.org, fsimo...@redhat.com, aba...@redhat.com Sent: Tuesday, September 24, 2013 11:44:48 AM Subject: Re: [Users] vdsm live migration errors in latest master On Mon, Sep 23, 2013 at 04:05:34PM -0500, Dead Horse wrote: Seeing failed live migrations and these errors in the vdsm logs with latest VDSM/Engine master. Hosts are EL6.4 Thanks for posting this report. The log is from the source of migration, right? Could you trace the history of the hosts of this VM? Could it be that it was started on an older version of vdsm (say ovirt-3.3.0) and then (due to migration or vdsm upgrade) got into a host with a much newer vdsm? Would you share the vmCreate (or vmMigrationCreate) line for this Vm in your log? I smells like an unintended regression of http://gerrit.ovirt.org/17714 vm: extend shared property to support locking solving it may not be trivial, as we should not call _normalizeDriveSharedAttribute() automatically on migration destination, as it may well still be apart of a 3.3 clusterLevel. Also, migration from vdsm with extended shared property, to an ovirt 3.3 vdsm is going to explode (in a different way), since the destination does not expect the extended values. Federico, do we have a choice but to revert that patch, and use something like shared3 property instead? I filed a bug at: https://bugzilla.redhat.com/show_bug.cgi?id=1011608 A possible fix could be: http://gerrit.ovirt.org/#/c/19509 Beyond this, we must make sure that on Engine side, the extended shared values would be used only for clusterLevel 3.4 and above. Are the extended shared values already used by Engine? Yes. That's the idea. Actually to be fair, the second case you mentioned (migrating from extended shared property to old vdsm) it wouldn't have been possible I suppose (the issue here is that Dead Horse has one or more hosts running on master instead of 3.3). The extended shared property would have appeared only in 3.4 and to allow the migration you would have had to upgrade all the nodes. But anyway since we were also talking about a new 3.3.1 barnch I just went ahead and covered all cases. I do not see how the 3.3.1 branch is relevant to the discussion, as its Vdsm is NOT going to support clusterLevel 3.4. That is what I was referring to. If 3.3.1 was 3.3.0 + backported patches then we just wouldn't backport the extended shared attributes patch and that's it. But from what I understood 3.3.1 will be rebased on master (where instead we have the extended shared attributes) and that is why we have to cover both migration direction cases (instead of just the simple getattr one). Pardon my slowliness, but would you confirm that this feature is to be used only on clusterLevel 3.4 and above? If so, I'm +2ing your patch. Yes, the extended attributes will be used in the hosted engine and cluster level 3.4. But what the engine does is not relevant to +2ing correct vdsm patches. -- Federico ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [Users] vdsm live migration errors in latest master
- Original Message - From: Dan Kenigsberg dan...@redhat.com To: Dead Horse deadhorseconsult...@gmail.com Cc: users@ovirt.org users@ovirt.org, vdsm-de...@fedorahosted.org, fsimo...@redhat.com, aba...@redhat.com Sent: Tuesday, September 24, 2013 11:44:48 AM Subject: Re: [Users] vdsm live migration errors in latest master On Mon, Sep 23, 2013 at 04:05:34PM -0500, Dead Horse wrote: Seeing failed live migrations and these errors in the vdsm logs with latest VDSM/Engine master. Hosts are EL6.4 Thanks for posting this report. The log is from the source of migration, right? Could you trace the history of the hosts of this VM? Could it be that it was started on an older version of vdsm (say ovirt-3.3.0) and then (due to migration or vdsm upgrade) got into a host with a much newer vdsm? Would you share the vmCreate (or vmMigrationCreate) line for this Vm in your log? I smells like an unintended regression of http://gerrit.ovirt.org/17714 vm: extend shared property to support locking solving it may not be trivial, as we should not call _normalizeDriveSharedAttribute() automatically on migration destination, as it may well still be apart of a 3.3 clusterLevel. Also, migration from vdsm with extended shared property, to an ovirt 3.3 vdsm is going to explode (in a different way), since the destination does not expect the extended values. Federico, do we have a choice but to revert that patch, and use something like shared3 property instead? I filed a bug at: https://bugzilla.redhat.com/show_bug.cgi?id=1011608 A possible fix could be: http://gerrit.ovirt.org/#/c/19509 -- Federico ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [Users] oVirt 3.3.0-4/F19 - Extending VM disk gives correct size but appears to wipe the drive contents
Hi Chris, can you post the vdsm logs (spm host) somewhere? Thanks. -- Federico - Original Message - From: Chris SULLIVAN (WGK) chris.sulli...@woodgroupkenny.com To: users@ovirt.org Sent: Monday, September 23, 2013 9:08:26 PM Subject: [Users] oVirt 3.3.0-4/F19 - Extending VM disk gives correct size but appears to wipe the drive contents Hi, I had a number of Windows VMs running in oVirt 3.3 that required their preallocated OS disks to be extended. Each OS disk had a single partition taking up the entire drive. As per http://www.ovirt.org/Features/Online_Virtual_Drive_Resize I shut down all the VMs, extended each OS disk by 10GB (total 25GB) via the web interface, then clicked OK. The tasks appeared to complete successfully and each of the OS disks had the expected real size on the Gluster storage volume. On startup however none of the VMs would recognize their OS disk as being a bootable device. Checking one of the OS disks via TestDisk (both quick and deep scans) revealed no partitions and the error ‘Partition sector doesn’t have the endmark 0xAA55’. It appears that each OS disk was wiped as part of the extension process although I’m really hoping that this isn’t the case! Are there any other approaches I could use to attempt to recover the OS disk data or at least verify whether the original disk partitions are recoverable? ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [Users] Unable to attach to storage domain (Ovirt 3.2)
Hi Dan, it looks like one of the domains is missing: 6cf7e7e9-3ae5-4645-a29c-fb17ecb38a50 Is there any target missing? (disconnected or somehow faulty or unreachable) -- Federico - Original Message - From: Dan Ferris dfer...@prometheusresearch.com To: users@ovirt.org Sent: Friday, September 20, 2013 4:01:06 AM Subject: [Users] Unable to attach to storage domain (Ovirt 3.2) Hi, This is my first post to the list. I am happy to say that we have been using Ovirt for 6 months with a few bumps, but it's mostly been ok. Until tonight that is... I had to do a maintenance that required rebooting both of our Hypervisor nodes. Both of them run Fedora Core 18 and have been happy for months. After rebooting them tonight, they will not attach to the storage. If it matters, the storage is a server running LIO with a Fibre Channel target. Vdsm log: Thread-22::DEBUG::2013-09-19 21:57:09,392::misc::84::Storage.Misc.excCmd::(lambda) '/usr/bin/dd iflag=direct if=/dev/b358e46b-635b-4c0e-8e73-0a494602e21d/metadata bs=4096 count=1' (cwd None) Thread-22::DEBUG::2013-09-19 21:57:09,400::misc::84::Storage.Misc.excCmd::(lambda) SUCCESS: err = '1+0 records in\n1+0 records out\n4096 bytes (4.1 kB) copied, 0.000547161 s, 7.5 MB/s\n'; rc = 0 Thread-23::DEBUG::2013-09-19 21:57:16,587::lvm::368::OperationMutex::(_reloadvgs) Operation 'lvm reload operation' got the operation mutex Thread-23::DEBUG::2013-09-19 21:57:16,587::misc::84::Storage.Misc.excCmd::(lambda) u'/usr/bin/sudo -n /sbin/lvm vgs --config devices { preferred_names = [\\^/dev/mapper/\\] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter = [ \\a%360014055193f840cb3743f9befef7aa3%\\, \\r%.*%\\ ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 } backup { retain_min = 50 retain_days = 0 } --noheadings --units b --nosuffix --separator | -o uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free 6cf7e7e9-3ae5-4645-a29c-fb17ecb38a50' (cwd None) Thread-23::DEBUG::2013-09-19 21:57:16,643::misc::84::Storage.Misc.excCmd::(lambda) FAILED: err = ' Volume group 6cf7e7e9-3ae5-4645-a29c-fb17ecb38a50 not found\n'; rc = 5 Thread-23::WARNING::2013-09-19 21:57:16,649::lvm::373::Storage.LVM::(_reloadvgs) lvm vgs failed: 5 [] [' Volume group 6cf7e7e9-3ae5-4645-a29c-fb17ecb38a50 not found'] Thread-23::DEBUG::2013-09-19 21:57:16,649::lvm::397::OperationMutex::(_reloadvgs) Operation 'lvm reload operation' released the operation mutex Thread-23::ERROR::2013-09-19 21:57:16,650::domainMonitor::208::Storage.DomainMonitorThread::(_monitorDomain) Error while collecting domain 6cf7e7e9-3ae5-4645-a29c-fb17ecb38a50 monitoring information Traceback (most recent call last): File /usr/share/vdsm/storage/domainMonitor.py, line 182, in _monitorDomain self.domain = sdCache.produce(self.sdUUID) File /usr/share/vdsm/storage/sdc.py, line 97, in produce domain.getRealDomain() File /usr/share/vdsm/storage/sdc.py, line 52, in getRealDomain return self._cache._realProduce(self._sdUUID) File /usr/share/vdsm/storage/sdc.py, line 121, in _realProduce domain = self._findDomain(sdUUID) File /usr/share/vdsm/storage/sdc.py, line 152, in _findDomain raise se.StorageDomainDoesNotExist(sdUUID) StorageDomainDoesNotExist: Storage domain does not exist: (u'6cf7e7e9-3ae5-4645-a29c-fb17ecb38a50',) vgs output (Note that I don't know what the device (Wy3Ymi-J7bJ-hVxg-sg3L-F5Gv-MQmz-Utwv7z is) : [root@node01 vdsm]# vgs Couldn't find device with uuid Wy3Ymi-J7bJ-hVxg-sg3L-F5Gv-MQmz-Utwv7z. VG #PV #LV #SN Attr VSize VFree b358e46b-635b-4c0e-8e73-0a494602e21d 1 39 0 wz--n- 8.19t 5.88t build 2 2 0 wz-pn- 299.75g 16.00m fedora 1 3 0 wz--n- 557.88g 0 lvs output: [root@node01 vdsm]# lvs Couldn't find device with uuid Wy3Ymi-J7bJ-hVxg-sg3L-F5Gv-MQmz-Utwv7z. LV VG Attr LSizePool Origin Data% Move Log Copy% Convert 0b8cca47-313f-48da-84f2-154810790d5a b358e46b-635b-4c0e-8e73-0a494602e21d -wi-a 40.00g 0f6f7572-8797-4d84-831b-87dbc4e1aa48 b358e46b-635b-4c0e-8e73-0a494602e21d -wi-a 100.00g 19a1473f-c375-411f-9a02-c6054b9a28d2 b358e46b-635b-4c0e-8e73-0a494602e21d -wi-a 50.00g 221144dc-51dc-46ae-9399-c0b8e030f38a b358e46b-635b-4c0e-8e73-0a494602e21d -wi-a 40.00g 2386932f-5f68-46e1-99a4-e96c944ac21b b358e46b-635b-4c0e-8e73-0a494602e21d -wi-a 40.00g 3e027010-931b-43d6-9c9f-eeeabbdcd47a b358e46b-635b-4c0e-8e73-0a494602e21d -wi-a2.00g 4257ccc2-94d5-4d71-b21a-c188acbf7ca1 b358e46b-635b-4c0e-8e73-0a494602e21d -wi-a 200.00g 4979b2a4-04aa-46a1-be0d-f10be0a1f587 b358e46b-635b-4c0e-8e73-0a494602e21d -wi-a 100.00g
Re: [Users] NFS troubleshooting page
- Original Message - From: Itamar Heim ih...@redhat.com To: Markus Stockhausen stockhau...@collogia.de Cc: Ayal Baron aba...@redhat.com, Federico Simoncelli fsimo...@redhat.com, users@ovirt.org, Allon Mureinik amure...@redhat.com Sent: Monday, September 9, 2013 3:45:56 PM Subject: Re: AW: [Users] NFS troubleshooting page On 09/09/2013 04:42 PM, Markus Stockhausen wrote: ayal/federico - thoughts on how we can make things better? warn? etc? Got my account and already modified the wiki. Thanks for the help. Markus thanks markus. my question to ayal/federico is on how to fix the original issue to at least warn about the issue. Good question. I don't have an answer yet, but I'll look into it. -- Federico ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
[Users] oVirt - Glance Integration Deep Dive Session
BEGIN:VCALENDAR PRODID:Zimbra-Calendar-Provider VERSION:2.0 METHOD:REQUEST BEGIN:VTIMEZONE TZID:Europe/Berlin BEGIN:STANDARD DTSTART:16010101T03 TZOFFSETTO:+0100 TZOFFSETFROM:+0200 RRULE:FREQ=YEARLY;WKST=MO;INTERVAL=1;BYMONTH=10;BYDAY=-1SU TZNAME:CET END:STANDARD BEGIN:DAYLIGHT DTSTART:16010101T02 TZOFFSETTO:+0200 TZOFFSETFROM:+0100 RRULE:FREQ=YEARLY;WKST=MO;INTERVAL=1;BYMONTH=3;BYDAY=-1SU TZNAME:CEST END:DAYLIGHT END:VTIMEZONE BEGIN:VEVENT UID:8ea8a413-e9de-46be-be2d-8f83ca22df3a SUMMARY:oVirt - Glance Integration Deep Dive Session ATTENDEE;CN=oVirt Users;ROLE=REQ-PARTICIPANT;PARTSTAT=NEEDS-ACTION;RSVP=TRUE :mailto:users@ovirt.org ATTENDEE;CN=engine-devel;ROLE=REQ-PARTICIPANT;PARTSTAT=NEEDS-ACTION;RSVP=TRU E:mailto:engine-de...@ovirt.org ORGANIZER;CN=Federico Simoncelli:mailto:fsimo...@redhat.com DTSTART;TZID=Europe/Berlin:20130730T15 DTEND;TZID=Europe/Berlin:20130730T16 STATUS:CONFIRMED CLASS:PUBLIC X-MICROSOFT-CDO-INTENDEDSTATUS:BUSY TRANSP:OPAQUE LAST-MODIFIED:20130729T094421Z DTSTAMP:20130729T094421Z SEQUENCE:0 DESCRIPTION:The following is a new meeting request:\n\nSubject: oVirt - Glan ce Integration Deep Dive Session \nOrganizer: Federico Simoncelli fsimonc e...@redhat.com \n\nTime: Tuesday\, July 30\, 2013\, 3:00:00 PM - 4:00:00 PM G MT +01:00 Amsterdam\, Berlin\, Bern\, Rome\, Stockholm\, Vienna\n \nInvitees : users@ovirt.org\; engine-de...@ovirt.org \n\n\n*~*~*~*~*~*~*~*~*~*\n\nHi e veryone\,\n on Tuesday at 3pm (CEST) I will be presenting the recent work do ne in\nintegrating OpenStack Glance into oVirt 3.3.\n\nThe presentation will include both a high level overview (usage in webadmin)\nand a deep dive abo ut the low level implementation details.\n\nWhen:\nTue 30 Jul 2013 15:00 - 1 6:00 (CEST)\n\nWhere:\nhttps://sas.elluminate.com/m.jnlp?sid=819password=M. 9E565882E4EA0288E3479F3D2141BD\n\nBridge: 8425973915#\nPhone numbers: http:/ /www.ovirt.org/Intercall\n\n-- \nFederico\n BEGIN:VALARM ACTION:DISPLAY TRIGGER;RELATED=START:-PT5M DESCRIPTION:Reminder END:VALARM END:VEVENT END:VCALENDAR___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [Users] VM crashes and doesn't recover
- Original Message - From: Yuval M yuva...@gmail.com To: Dan Kenigsberg dan...@redhat.com Cc: users@ovirt.org, Nezer Zaidenberg nzaidenb...@mac.com Sent: Friday, March 29, 2013 2:19:43 PM Subject: Re: [Users] VM crashes and doesn't recover Any ideas on what can cause that storage crash? could it be related to using a SSD? What the logs say is that the IO on the storage domain are failing (both the oop timeouts and the sanlock log) and this triggers the VDSM restart. On Sun, Mar 24, 2013 at 09:50:02PM +0200, Yuval M wrote: I am running vdsm from packages as my interest is in developing for the I noticed that when the storage domain crashes I can't even do df -h (hangs) This is also consistent with the unreachable domain. The dmesg log that you attached doesn't contain timestamps so it's hard to correlate with the rest. If you want you can try to reproduce the issue and resubmit the logs: /var/log/vdsm/vdsm.log /var/log/sanlock.log /var/log/messages (Maybe stating also the exact time when the issue begins to appear) In the logs I noticed that you're using only one NFS domain, and I think that the SSD (on the storage side) shouldn't be a problem. When you experience such failure are you able to read/write from/to the SSD on machine that is serving the share? (If it's the same machine check that using the real path where it's mounted, not the nfs share) -- Federico ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [Users] Failing to attach NFS data storage domain (Ovirt 3.2)
- Original Message - From: Limor Gavish lgav...@gmail.com To: Eli Mesika emes...@redhat.com Cc: Yuval M yuva...@gmail.com, users@ovirt.org, Nezer Zaidenberg nzaidenb...@mac.com Sent: Wednesday, March 20, 2013 11:47:49 AM Subject: Re: [Users] Failing to attach NFS data storage domain (Ovirt 3.2) Thank you very much for your reply. I attached the vdsm.log Hi Limor, can you please inspect the status of the NFS mount? # mkdir /mnt/tmp # mount -t nfs your_nfs_share /mnt/tmp # cd /mnt/tmp/1902354b-4c39-4707-ac6c-3637aaf1943b/dom_md And please report the output of: # ls -l # sanlock direct dump ids Can you also include more vdsm logs? More specifically the ones where the NFS domain has been created? (createStorageDomain with sdUUID='1902354b-4c39-4707-ac6c-3637aaf1943b') Thanks, -- Federico ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [Users] Failing to attach NFS data storage domain (Ovirt 3.2)
- Original Message - From: Limor Gavish lgav...@gmail.com To: Federico Simoncelli fsimo...@redhat.com Cc: Yuval M yuva...@gmail.com, users@ovirt.org, Nezer Zaidenberg nzaidenb...@mac.com, Eli Mesika emes...@redhat.com, Maor Lipchuk mlipc...@redhat.com Sent: Wednesday, March 20, 2013 9:02:35 PM Subject: Re: [Users] Failing to attach NFS data storage domain (Ovirt 3.2) Thank you very much for your response. Attached VDSM logs as you requested (The VDSM logs where the NFS domain was created were missing so we had to recreate the NFS domain, therefore the sdUUID has changed). Here is the rest of the commands you asked: [root@bufferoverflow wil]# mount -t nfs bufferoverflow:/home/BO_Ovirt_Storage /mnt/tmp [root@bufferoverflow wil]# cd /mnt/tmp/1083422e-a5db-41b6-b667-b9ef1ef244f0/dom_md/ [root@bufferoverflow dom_md]# ls -l total 2052 -rw-rw 1 vdsm kvm 1048576 Mar 20 21:46 ids -rw-rw 1 vdsm kvm 0 Mar 20 21:45 inbox -rw-rw 1 vdsm kvm 2097152 Mar 20 21:45 leases -rw-r--r-- 1 vdsm kvm 311 Mar 20 21:45 metadata -rw-rw 1 vdsm kvm 0 Mar 20 21:45 outbox [root@bufferoverflow dom_md]# sanlock direct dump ids Sorry I should have mentioned that if you use root_squash for your nfs share you have to switch to the vdsm user: (root)# su -s /bin/sh vdsm (vdsm)$ cd /mnt/tmp/sduuid/dom_md/ (vdsm)$ sanlock direct dump ids (and now you should be able to see the output) If the output is still empty then used hexdump -C to inspect it (and eventually post it here compressed). Another important thing that you should check is: # ps fax | grep sanlock If the output doesn't look like the following: 1966 ?SLs0:00 wdmd -G sanlock 2036 ?SLsl 0:00 sanlock daemon -U sanlock -G sanlock 2037 ?S 0:00 \_ sanlock daemon -U sanlock -G sanlock Then I suggest you to update sanlock to the latest build: http://koji.fedoraproject.org/koji/buildinfo?buildID=377815 (sanlock-2.6-7.fc18) And eventually if after rebooting the problem persists, please post also the sanlock log (/var/log/sanlock.log) Please note, the VDSM is running as a system service (it was installed from a package) while ovirt-engine was built from sources and thus is not running as root. Is this an issue? It shouldn't be. -- Federico ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [Users] oVirt 3.2 on CentOS with Gluster 3.3
- Original Message - From: Dan Kenigsberg dan...@redhat.com To: Balamurugan Arumugam barum...@redhat.com, Federico Simoncelli fsimo...@redhat.com, Mike Burns mbu...@redhat.com Cc: Rob Zwissler r...@zwissler.org, users@ovirt.org, a...@ovirt.org, Aravinda VK avish...@redhat.com, Ayal Baron aba...@redhat.com Sent: Wednesday, March 13, 2013 9:03:39 PM Subject: Re: [Users] oVirt 3.2 on CentOS with Gluster 3.3 On Mon, Mar 11, 2013 at 12:34:51PM +0200, Dan Kenigsberg wrote: On Mon, Mar 11, 2013 at 06:09:56AM -0400, Balamurugan Arumugam wrote: Rob, It seems that a bug in vdsm code is hiding the real issue. Could you do a sed -i s/ParseError/ElementTree.ParseError /usr/share/vdsm/gluster/cli.py restart vdsmd, and retry? Bala, would you send a patch fixing the ParseError issue (and adding a Ok, both issues have fixes which are in the ovirt-3.2 git branch. I believe this deserves a respin of vdsm, as having an undeclated requirement is impolite. Federico, Mike, would you take care for that? Since we're at it... I have the feeling that this might be important enough to be backported to 3.2 too: http://gerrit.ovirt.org/#/c/12178/ -- Federico ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [Users] F18 iSCSI/FC and latest systemd/udev
- Original Message - From: Jeff Bailey bai...@cs.kent.edu To: users@ovirt.org Sent: Saturday, February 16, 2013 5:17:49 AM Subject: [Users] F18 iSCSI/FC and latest systemd/udev While not an actual problem with oVirt, the latest systemd/udev packages for F18 (197-1) break permissions on LVM volumes and stop vdsm/qemu/etc from accessing them. I just downgraded them and everything seems OK but I thought I'd let people know (easier to just avoid rather than repair :) ). There's a bugzilla for it from a week or two ago but since 3.2 came out I figured a lot more people might be installing it on new F18 installations with all the updates and running into problems. Please test and give karma to: https://admin.fedoraproject.org/updates/FEDORA-2013-1775/vdsm-4.10.3-7.fc18 which is requiring the correct systemd package. If we reach 3 points of karma the package will be released. Thanks, -- Federico ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [Users] 3.2 beta and f18 host on dell R815 problem
Hi Gianluca can you post/attach/provide the output of cpuid? # cpuid In case it's not installed it's provided by the rpm: cpuid-20120601-2.fc18.x86_64 Thanks, -- Federico - Original Message - From: Gianluca Cecchi gianluca.cec...@gmail.com To: users users@ovirt.org Sent: Thursday, January 31, 2013 12:24:41 PM Subject: [Users] 3.2 beta and f18 host on dell R815 problem during install of server I get this Host installation failed. Fix installation issues and try to Re-Install In deploy log 2013-01-31 12:17:30 DEBUG otopi.plugins.ovirt_host_deploy.vdsm.hardware hardware._isVirtualizationEnabled:144 virtualization support GenuineIntel (cpu: False, bios: True) 2013-01-31 12:17:30 DEBUG otopi.context context._executeMethod:127 method exception Traceback (most recent call last): File /tmp/ovirt-SfEARpd3h4/pythonlib/otopi/context.py, line 117, in _executeMethod method['method']() File /tmp/ovirt-SfEARpd3h4/otopi-plugins/ovirt-host-deploy/vdsm/hardware.py, line 170, in _validate_virtualization _('Hardware does not support virtualization') RuntimeError: Hardware does not support virtualization 2013-01-31 12:17:30 ERROR otopi.context context._executeMethod:136 Failed to execute stage 'Setup validation': Hardware does not support virtualization note the GenuineIntel above... ?? But actually it is AMD [root@f18ovn03 ~]# lsmod|grep kvm kvm_amd59623 0 kvm 431794 1 kvm_amd cat /proc/cpuinfo ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [Users] could not add local storage domain
Hi Jorick and Cristian, the error you posted here looks like an issue we recently fixed (vdsm on nfs with kernel 3.6). Anyway it's quite difficult to make a comprehensive list of things to report and tests to execute. For this particular issue (and not as a general rule) I suggest you to contact me on IRC (fsimonce on #ovirt OFTC) so that we can sort out the issue together. We will report back to the ML our findings so that it will be helpful for everyone else. -- Federico - Original Message - From: Jorick Astrego jor...@netbulae.eu To: users@ovirt.org Sent: Wednesday, November 14, 2012 1:45:21 PM Subject: Re: [Users] could not add local storage domain - I'm not the original submitter of this issue, but I have exactly the same problem with the latest nightly all-in-one installation. We don't use public key auth for sshd on this machine so that's not the problem. This is what I see in the vdsm.log: [...] Thread-17:: INFO::2012-11-14 12:46:14,129::logUtils::37::dispatcher::(wrapper) Run and protect: connectStorageServer(domType=4, spUUID='----', conList=[{'connection': '/data', 'iqn': '', 'portal': '', 'user': '', 'password': '**', 'id': '----', 'port': ''}], options=None) Thread-17::ERROR::2012-11-14 12:46:14,212::hsm::2057::Storage.HSM::(connectStorageServer) Could not connect to storageServer Traceback (most recent call last): File /usr/share/vdsm/storage/hsm.py, line 2054, in connectStorageServer conObj.connect() File /usr/share/vdsm/storage/storageServer.py, line 462, in connect if not self.checkTarget(): File /usr/share/vdsm/storage/storageServer.py, line 449, in checkTarget fileSD.validateDirAccess(self._path)) File /usr/share/vdsm/storage/fileSD.py, line 51, in validateDirAccess getProcPool().fileUtils.validateAccess(dirPath) File /usr/share/vdsm/storage/remoteFileHandler.py, line 274, in callCrabRPCFunction *args, **kwargs) File /usr/share/vdsm/storage/remoteFileHandler.py, line 180, in callCrabRPCFunction rawLength = self._recvAll(LENGTH_STRUCT_LENGTH, timeout) File /usr/share/vdsm/storage/remoteFileHandler.py, line 149, in _recvAll timeLeft): File /usr/lib64/python2.7/contextlib.py, line 84, in helper return GeneratorContextManager(func(*args, **kwds)) File /usr/share/vdsm/storage/remoteFileHandler.py, line 136, in _poll raise Timeout() Timeout ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [Users] SELinux policy issue with oVirt/sanlock
- Original Message - From: Haim Ateya hat...@redhat.com To: Brian Vetter bjvet...@gmail.com Cc: users@ovirt.org, seli...@lists.fedoraproject.org Sent: Wednesday, October 24, 2012 7:03:39 PM Subject: Re: [Users] SELinux policy issue with oVirt/sanlock - Original Message - From: Brian Vetter bjvet...@gmail.com To: Haim Ateya hat...@redhat.com Cc: users@ovirt.org, seli...@lists.fedoraproject.org Sent: Wednesday, October 24, 2012 6:24:31 PM Subject: Re: [Users] SELinux policy issue with oVirt/sanlock I removed lock_manager=sanlock from the settings file, restarted the daemons, and all works fine right now. I'm guessing that means there is no locking of the VMs (the default?). that's right, i'm glad it works for you, but it just a workaround since we expect this configuration to work, it would be much appreciated if you could open a bug on that issue so we can track and resolve when possible. please attach all required logs such as: vdsm.log, libvirtd.log, qemu.log (under /var/log/libvirt/qemu/), audit.log, sanlock.log and /var/log/messages. What's the bug number? To clarify/recap: - the lock_manager=sanlock configuration is correct (and it shouldn't be removed) - you should set setenforce 0 (with lock_manager=sanlock) and try to start a VM; all the avc errors that you find in /var/log/messages and in /var/log/audit/audit.log should be used to open a selinux policy bug -- Federico ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [Users] Error creating the first storage domain (NFS)
- Original Message - From: Vered Volansky ve...@redhat.com To: Brian Vetter bjvet...@gmail.com Cc: users@ovirt.org Sent: Tuesday, October 23, 2012 11:38:42 AM Subject: Re: [Users] Error creating the first storage domain (NFS) Hi Brian, We'll need your engine host (full) logs at the very least to look into the problem. Can you try it with nfs3 and tell us if it works? Note, more comments in the email body. Regards, Vered Hi Brian, we also need the sanlock logs (/var/log/sanlock.log). Adding David to the thread as he might be able to help us debugging your problem (sanlock-2.4-2.fc17). -- Federico ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [Users] Error creating the first storage domain (NFS)
Hi Brian, I hate progressing by guesses but could you try to disable selinux: # setenforce 0 If that works you could go on, re-enable it and try something more specific: # setenforce 1 # setsebool sanlock_use_nfs on I have the feeling that the vdsm patch setting the sanlock_use_nfs sebool flag didn't made it to fedora 17 yet. -- Federico - Original Message - From: Brian Vetter bjvet...@gmail.com To: Federico Simoncelli fsimo...@redhat.com Cc: Vered Volansky ve...@redhat.com, users@ovirt.org, David Teigland teigl...@redhat.com Sent: Tuesday, October 23, 2012 6:10:36 PM Subject: Re: [Users] Error creating the first storage domain (NFS) Ok. Here's four log files: engine.log from my ovirt engine server. vdsm.log from my host sanlock.log from my host messages from my host The errors occur around the 20:17:57 time frame. You might see other errors from either previous attempts or for the time after when I tried to attach the storage domain. It looks like everything starts with an error -13 in sanlock. If the -13 maps to 13/EPERM in errno.h, then it is likely be some kind of permission or other access error. I saw things that were related to the nfs directories not being owned by vdsm:kvm, but that is not the case here. I did see a note online about some issues with sanlock and F17 (which I am running), but those bugs were related to sanlock crashing. Brian ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [Users] Error creating the first storage domain (NFS)
- Original Message - From: Brian Vetter bjvet...@gmail.com To: Federico Simoncelli fsimo...@redhat.com Cc: Vered Volansky ve...@redhat.com, users@ovirt.org, David Teigland teigl...@redhat.com Sent: Wednesday, October 24, 2012 4:54:11 AM Subject: Re: [Users] Error creating the first storage domain (NFS) That was the problem. I checked the sanlock_use_nfs boolean and it was off. I set it and then created and attached the storage and it all works. Thanks for testing. Do you have a way of verifying a scratch build? http://koji.fedoraproject.org/koji/taskinfo?taskID=4620480 This should fix your problem (on a brand new installation). -- Federico On Oct 23, 2012, at 8:55 PM, Federico Simoncelli wrote: Hi Brian, I hate progressing by guesses but could you try to disable selinux: # setenforce 0 If that works you could go on, re-enable it and try something more specific: # setenforce 1 # setsebool sanlock_use_nfs on I have the feeling that the vdsm patch setting the sanlock_use_nfs sebool flag didn't made it to fedora 17 yet. -- Federico - Original Message - From: Brian Vetter bjvet...@gmail.com To: Federico Simoncelli fsimo...@redhat.com Cc: Vered Volansky ve...@redhat.com, users@ovirt.org, David Teigland teigl...@redhat.com Sent: Tuesday, October 23, 2012 6:10:36 PM Subject: Re: [Users] Error creating the first storage domain (NFS) Ok. Here's four log files: engine.log from my ovirt engine server. vdsm.log from my host sanlock.log from my host messages from my host The errors occur around the 20:17:57 time frame. You might see other errors from either previous attempts or for the time after when I tried to attach the storage domain. It looks like everything starts with an error -13 in sanlock. If the -13 maps to 13/EPERM in errno.h, then it is likely be some kind of permission or other access error. I saw things that were related to the nfs directories not being owned by vdsm:kvm, but that is not the case here. I did see a note online about some issues with sanlock and F17 (which I am running), but those bugs were related to sanlock crashing. Brian ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [Users] Error creating the first storage domain (NFS)
- Original Message - From: Brian Vetter bjvet...@gmail.com To: Federico Simoncelli fsimo...@redhat.com Cc: Vered Volansky ve...@redhat.com, users@ovirt.org, David Teigland teigl...@redhat.com Sent: Wednesday, October 24, 2012 5:48:21 AM Subject: Re: [Users] Error creating the first storage domain (NFS) Ugh. Spoke a little too soon. While I got past my problem creating a storage domain, I ran into a new sanlock issue. When trying to run a VM (the first one so I can create a template), I get an error in the admin UI: VM DCC4.0 is down. Exit message: Failed to acquire lock: Permission denied. On a lark, I turned off selinux enforcement and tried it again. It worked just fine. So what selinux option do I need to enable to get it to work? The only other sanlock specific settings I saw are: sanlock_use_fusefs -- off sanlock_use_nfs -- on sanlock_use_samba -- off Do I turn these all on or is there some other setting I need to enable? No for nfs you just need sanlock_use_nfs. I'd say that if you could verify the scratch build that I prepared at: http://koji.fedoraproject.org/koji/taskinfo?taskID=4620480 (up until starting a vm), then all the new selinux errors/messages that you see in the audit log (/var/log/audit/audit.log) are issues that should be reported to the selinux-policy package. -- Federico ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [Users] Can't start a VM - sanlock permission denied
- Original Message - From: Dan Kenigsberg dan...@redhat.com To: Mike Burns mbu...@redhat.com Cc: Federico Simoncelli fsimo...@redhat.com, users@ovirt.org Sent: Monday, October 15, 2012 11:02:45 AM Subject: Re: [Users] Can't start a VM - sanlock permission denied On Sun, Oct 14, 2012 at 09:53:51PM -0400, Mike Burns wrote: On Sun, 2012-10-14 at 19:11 -0400, Federico Simoncelli wrote: - Original Message - From: Alexandre Santos santosa...@gmail.com To: Dan Kenigsberg dan...@redhat.com Cc: Haim Ateya hat...@redhat.com, users@ovirt.org, Federico Simoncelli fsimo...@redhat.com Sent: Sunday, October 14, 2012 7:23:36 PM Subject: Re: [Users] Can't start a VM - sanlock permission denied 2012/10/13 Dan Kenigsberg dan...@redhat.com On Sat, Oct 13, 2012 at 11:25:37AM +0100, Alexandre Santos wrote: Hi, after getting to the oVirt Node console (F2) I figured out that selinux wasn't allowing the sanlock, so I entered the setsebool virt_use_sanlock 1 and the problem is fixed. Which version of vdsm is istalled on your node? and which selinux-policy? sanlock should work out-of-the-box. vdsm-4.10.0-10.fc17 on /etc/sysconfig/selinux SELINUX=enforcing SELINUXTYPE=targeted As far as I understand the selinux policies for the ovirt-node are set by recipe/common-post.ks (in the ovirt-node repo): semanage boolean -m -S targeted -F /dev/stdin \EOF_semanage allow_execstack=0 virt_use_nfs=1 EOF_semanage We should update it with what vdsm is currently setting: virt_use_sanlock=1 sanlock_use_nfs=1 Shouldn't vdsm be setting these if they're needed? It should - I'd like to know which vdsm version was it, and why this was skipped. The version was 4.10.0-10.fc17 and what I thought (but I didn't test yesterday night) is that the ovirt-node was overriding what we were setting. Anyway this is not the case. I can certainly set the values, but IMO, if vdsm needs it, vdsm should set it. virt_use_nfs=1 made it into the node. Maybe there was a good reason for it that applies to virt_use_sanlock as well. (I really hate to persist the policy files, and dislike the idea of setting virt_use_sanlock every time vdsmd starts - it's slow). We set them when we install vdsm (not when the service starts) so they should be good to go in the iso. It might be a glitch during the vdsm package installation, it could be something like semanage taking the boolean from the host where the iso is built rather than the root where the package is installed. Do we have the iso build logs? -- Federico ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [Users] Can't start a VM - sanlock permission denied
- Original Message - From: Alexandre Santos santosa...@gmail.com To: Dan Kenigsberg dan...@redhat.com Cc: Haim Ateya hat...@redhat.com, users@ovirt.org, Federico Simoncelli fsimo...@redhat.com Sent: Sunday, October 14, 2012 7:23:36 PM Subject: Re: [Users] Can't start a VM - sanlock permission denied 2012/10/13 Dan Kenigsberg dan...@redhat.com On Sat, Oct 13, 2012 at 11:25:37AM +0100, Alexandre Santos wrote: Hi, after getting to the oVirt Node console (F2) I figured out that selinux wasn't allowing the sanlock, so I entered the setsebool virt_use_sanlock 1 and the problem is fixed. Which version of vdsm is istalled on your node? and which selinux-policy? sanlock should work out-of-the-box. vdsm-4.10.0-10.fc17 on /etc/sysconfig/selinux SELINUX=enforcing SELINUXTYPE=targeted As far as I understand the selinux policies for the ovirt-node are set by recipe/common-post.ks (in the ovirt-node repo): semanage boolean -m -S targeted -F /dev/stdin \EOF_semanage allow_execstack=0 virt_use_nfs=1 EOF_semanage We should update it with what vdsm is currently setting: virt_use_sanlock=1 sanlock_use_nfs=1 -- Federico ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [Users] Fwd: Re: oVirt live snapshot problem
- Original Message - From: Haim Ateya hat...@redhat.com To: Neil nwilson...@gmail.com, Federico Simoncelli fsimo...@redhat.com, Kiril Nesenko ki...@redhat.com Cc: users@ovirt.org Sent: Wednesday, June 13, 2012 7:00:34 AM Subject: Re: [Users] Fwd: Re: oVirt live snapshot problem Federico\Kiril, is this problem known to you ? I can't say without the logs. Please look at the relevant log parts that I quote in my bug comment and check if they are the same: https://bugzilla.redhat.com/show_bug.cgi?id=829645#c3 -- Federico ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [Users] image ownership
- Original Message - From: Jacob Wyatt jwy...@ggc.edu To: users@ovirt.org Sent: Wednesday, May 9, 2012 8:08:54 PM Subject: [Users] image ownership Greetings all, I've set up a new oVirt installation and it's behaving strangely with regard to virtual machine image files on the NFS storage. Whenever I shut down a machine it's changing the owner of the image to root:root (0:0) instead of vdsm:kvm (36:36). After that it can't start or do anything with that image again until I manually change the ownership back. Everything works fine again until I shut the machine down. I assume this is some mistake I've made in installation. I did not have this problem in the test environment, but I'm stumped as to what went wrong. -Jacob Hi Jacob, could you check the dynamic_ownership in /etc/libvirt/qemu.conf: # grep dynamic_ownership /etc/libvirt/qemu.conf #dynamic_ownership = 1 dynamic_ownership=0 # by vdsm Thanks, -- Federico ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [Users] error in multipath.py
Could you check if this fixes your problem? http://gerrit.ovirt.org/3863 Thanks, -- Federico - Original Message - From: ov...@qip.ru To: users@ovirt.org Sent: Tuesday, April 24, 2012 8:54:58 AM Subject: [Users] error in multipath.py i tried the latest releases of vdsm from jenkins . ovirt .org, but found that they did'n work. V dsm d cycles and corrupted my multipath . conf . error is in / usr /share/ vdsm /storage/ multipath .py this is the diff with my py # diff multipath .py.bad multipath .py 88,89c88,89 first = mpath conf [0] second = mpath conf [1] --- first = mpath conf .split('\n', 1)[0] second = mpath conf .split('\n', 1)[1] ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [Users] Can't start vm
- Original Message - From: kumar shantanu k.shantanu2...@gmail.com To: users users@ovirt.org Sent: Monday, March 12, 2012 9:16:34 AM Subject: [Users] Can't start vm Hi all, I created host from ovirt manager but when trying to run it's failing with the error, == vdsm.log == Thread-180651::DEBUG::2012-03-12 13:42:12,482::vm::577::vm.Vm::(_startUnderlyingVm) vmId=`c13c4c09-f696-47e1-b8cd-8d499242e151`::_ongoingCreations released Thread-180651::ERROR::2012-03-12 13:42:12,482::vm::601::vm.Vm::(_startUnderlyingVm) vmId=`c13c4c09-f696-47e1-b8cd-8d499242e151`::The vm start process failed Traceback (most recent call last): File /usr/share/vdsm/vm.py, line 567, in _startUnderlyingVm self._run() File /usr/share/vdsm/libvirtvm.py, line 1306, in _run self._connection.createXML(domxml, flags), File /usr/share/vdsm/libvirtconnection.py, line 82, in wrapper ret = f(*args, **kwargs) File /usr/lib64/python2.6/site-packages/libvirt.py, line 2087, in createXML if ret is None:raise libvirtError('virDomainCreateXML() failed', conn=self) libvirtError: internal error Process exited while reading console log output: Supported machines are: pc RHEL 6.2.0 PC (alias of rhel6.2.0) Pythong version running is [root@ovirt ~]# python -V Python 2.7 Can anyone please suggest . Hi Kumar, when the engine starts a VM it also specifies a machine type. The machine types supported by an host depend on the system (RHEL/Fedora) and you can get the list with: # vdsClient 0 getVdsCaps | grep emulatedMachines emulatedMachines = ['pc-1.1', 'pc', 'pc-1.0', 'pc-0.15', ... Once you discovered the types supported by your hosts you can configure the engine with the correct value: http://www.ovirt.org/wiki/Engine_Node_Integration psql -U postgres engine -c update vdc_options set option_value='pc-0.14' where option_name='EmulatedMachine' and version='3.0'; I assume that you ran the command above but your VDSM hosts are rhel6, so you would need to use the rhel6.2.0 value instead. I believe that the value pc is an alias that works both for RHEL and Fedora and it might be handy for testing, but in general I really discourage its use because it would allow a mixed cluster of RHEL and Fedora hosts which could be problematic in case of live migrations. -- Federico ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users