Re: [ovirt-users] Fwd: Re: urgent issue
Sorry - its too late - all hosts have been re-imaged and are setup as local storage. On Mon, Sep 21, 2015 at 10:38 PM, Ravishankar N <ravishan...@redhat.com> wrote: > Hi Chris, > > Replies inline.. > > On 09/22/2015 09:31 AM, Sahina Bose wrote: > > > > > Forwarded Message Subject: Re: [ovirt-users] urgent > issue Date: Wed, 9 Sep 2015 08:31:07 -0700 From: Chris Liebman > <chri...@taboola.com> <chri...@taboola.com> To: users <users@ovirt.org> > <users@ovirt.org> > > Ok - I think I'm going to switch to local storage - I've had way to many > unexplainable issue with glusterfs  :-(. Is there any reason I cant add > local storage to the existing shared-storage cluster? I see that the menu > item is greyed out > > > > What version of gluster and ovirt are you using? > > > > > On Tue, Sep 8, 2015 at 4:19 PM, Chris Liebman <chri...@taboola.com> wrote: > >> Its possible that this is specific to just one gluster volume... I've >> moved a few VM disks off of that volume and am able to start them fine. >> My recolection is that any VM started on the "bad" volume causes it to be >> disconnected and forces the ovirt node to be marked down until >> Maint->Activate. >> >> On Tue, Sep 8, 2015 at 3:52 PM, Chris Liebman < <chri...@taboola.com> >> chri...@taboola.com> wrote: >> >>> In attempting to put an ovirt cluster in production I'm running into >>> some off errors with gluster it looks like. Its 12 hosts each with one >>> brick in distributed-replicate.  (actually 2 bricks but they are separate >>> volumes) >>> >>> > These 12 nodes in dist-rep config, are they in replica 2 or replica 3? The > latter is what is recommended for VM use-cases. Could you give the output > of `gluster volume info` ? > > [root@ovirt-node268 glusterfs]# rpm -qa | grep vdsm >>> >>> vdsm-jsonrpc-4.16.20-0.el6.noarch >>> >>> vdsm-gluster-4.16.20-0.el6.noarch >>> >>> vdsm-xmlrpc-4.16.20-0.el6.noarch >>> >>> vdsm-yajsonrpc-4.16.20-0.el6.noarch >>> >>> vdsm-4.16.20-0.el6.x86_64 >>> >>> vdsm-python-zombiereaper-4.16.20-0.el6.noarch >>> >>> vdsm-python-4.16.20-0.el6.noarch >>> >>> vdsm-cli-4.16.20-0.el6.noarch >>> >>> >>>   Everything was fine last week, however, today various clients in >>> the gluster cluster seem get "client quorum not met" periodically - when >>> they get this they take one of the bricks offline - this causes VM's to be >>> attempted to move - sometimes 20 at a time. That takes a long time :-(. >>> I've tried disabling automatic migration and teh VM's get paused when this >>> happens - resuming gets nothing at that point as the volumes mount on the >>> server hosting the VM is not connected: >>> >>> from >>> rhev-data-center-mnt-glusterSD-ovirt-node268.la.taboolasyndication.com: >>> _LADC-TBX-V02.log: >>> >>> [2015-09-08 21:18:42.920771] W [MSGID: 108001] >>> [afr-common.c:4043:afr_notify] 2-LADC-TBX-V02-replicate-2: Client-quorum >>> is not met >>> >> > When client-quorum is not met (due to network disconnects, or gluster > brick processes going down etc), gluster makes the volume read-only. This > is expected behavior and prevents split-brains. It's probably a bit late, > but do you have the gluster fuse mount logs to confirm this indeed was the > issue? > > [2015-09-08 21:18:42.931751] I [fuse-bridge.c:4900:fuse_thread_proc] >>> 0-fuse: unmounting >>> /rhev/data-center/mnt/glusterSD/ovirt-node268.la.taboolasyndication.com: >>> _LADC-TBX-V02 >>> >>> [2015-09-08 21:18:42.931836] W [glusterfsd.c:1219:cleanup_and_exit] >>> (-->/lib64/libpthread.so.0(+0x7a51) [0x7f1bebc84a51] >>> -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xcd) [0x405e4d] >>> -->/usr/sbin/glusterfs(cleanup_and_exit+0x >>> >>> 65) [0x4059b5] ) 0-: received signum (15), shutting down >>> >>> [2015-09-08 21:18:42.931858] I [fuse-bridge.c:5595:fini] 0-fuse: >>> Unmounting >>> '/rhev/data-center/mnt/glusterSD/ovirt-node268.la.taboolasyndication.com: >>> _LADC-TBX-V02'. >>> >> > The VM pause you saw could be because of the unmount.I understand that a > fix (https://gerrit.ovirt.org/#/c/40240/) went in for ovirt 3-.6 > (vdsm-4.17) to prevent vdsm from unmounting the gluster volume when vdsm > exits/restarts. > Is it possible to run a test setup on 3.6 and see i
Re: [ovirt-users] urgent issue
Ok - I think I'm going to switch to local storage - I've had way to many unexplainable issue with glusterfs :-(. Is there any reason I cant add local storage to the existing shared-storage cluster? I see that the menu item is greyed out On Tue, Sep 8, 2015 at 4:19 PM, Chris Liebman <chri...@taboola.com> wrote: > Its possible that this is specific to just one gluster volume... I've > moved a few VM disks off of that volume and am able to start them fine. My > recolection is that any VM started on the "bad" volume causes it to be > disconnected and forces the ovirt node to be marked down until > Maint->Activate. > > On Tue, Sep 8, 2015 at 3:52 PM, Chris Liebman <chri...@taboola.com> wrote: > >> In attempting to put an ovirt cluster in production I'm running into some >> off errors with gluster it looks like. Its 12 hosts each with one brick in >> distributed-replicate. (actually 2 bricks but they are separate volumes) >> >> [root@ovirt-node268 glusterfs]# rpm -qa | grep vdsm >> >> vdsm-jsonrpc-4.16.20-0.el6.noarch >> >> vdsm-gluster-4.16.20-0.el6.noarch >> >> vdsm-xmlrpc-4.16.20-0.el6.noarch >> >> vdsm-yajsonrpc-4.16.20-0.el6.noarch >> >> vdsm-4.16.20-0.el6.x86_64 >> >> vdsm-python-zombiereaper-4.16.20-0.el6.noarch >> >> vdsm-python-4.16.20-0.el6.noarch >> >> vdsm-cli-4.16.20-0.el6.noarch >> >> >>Everything was fine last week, however, today various clients in the >> gluster cluster seem get "client quorum not met" periodically - when they >> get this they take one of the bricks offline - this causes VM's to be >> attempted to move - sometimes 20 at a time. That takes a long time :-(. >> I've tried disabling automatic migration and teh VM's get paused when this >> happens - resuming gets nothing at that point as the volumes mount on the >> server hosting the VM is not connected: >> >> from >> rhev-data-center-mnt-glusterSD-ovirt-node268.la.taboolasyndication.com: >> _LADC-TBX-V02.log: >> >> [2015-09-08 21:18:42.920771] W [MSGID: 108001] >> [afr-common.c:4043:afr_notify] 2-LADC-TBX-V02-replicate-2: Client-quorum is >> not >> met >> >> [2015-09-08 21:18:42.931751] I [fuse-bridge.c:4900:fuse_thread_proc] >> 0-fuse: unmounting >> /rhev/data-center/mnt/glusterSD/ovirt-node268.la.taboolasyndication.com: >> _LADC-TBX-V02 >> >> [2015-09-08 21:18:42.931836] W [glusterfsd.c:1219:cleanup_and_exit] >> (-->/lib64/libpthread.so.0(+0x7a51) [0x7f1bebc84a51] >> -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xcd) [0x405e4d] >> -->/usr/sbin/glusterfs(cleanup_and_exit+0x >> >> 65) [0x4059b5] ) 0-: received signum (15), shutting down >> >> [2015-09-08 21:18:42.931858] I [fuse-bridge.c:5595:fini] 0-fuse: >> Unmounting >> '/rhev/data-center/mnt/glusterSD/ovirt-node268.la.taboolasyndication.com: >> _LADC-TBX-V02'. >> >> >> And the mount is broken at that point: >> >> [root@ovirt-node267 ~]# df >> >> *df: >> `/rhev/data-center/mnt/glusterSD/ovirt-node268.la.taboolasyndication.com:_LADC-TBX-V02': >> Transport endpoint is not connected* >> >> Filesystem1K-blocks Used Available Use% Mounted on >> >> /dev/sda3 51475068 1968452 46885176 5% / >> >> tmpfs 132210244 0 132210244 0% /dev/shm >> >> /dev/sda2487652 32409 429643 8% /boot >> >> /dev/sda1204580 260 204320 1% /boot/efi >> >> /dev/sda51849960960 156714056 1599267616 9% /data1 >> >> /dev/sdb11902274676 18714468 1786923588 2% /data2 >> >> ovirt-node268.la.taboolasyndication.com:/LADC-TBX-V01 >> >> 9249804800 727008640 8052899712 9% >> /rhev/data-center/mnt/glusterSD/ovirt-node268.la.taboolasyndication.com: >> _LADC-TBX-V01 >> >> ovirt-node251.la.taboolasyndication.com:/LADC-TBX-V03 >> >> 1849960960 73728 1755907968 1% >> /rhev/data-center/mnt/glusterSD/ovirt-node251.la.taboolasyndication.com: >> _LADC-TBX-V03 >> >> The fix for that is to put the server in maintenance mode then activate >> it again. But all VM's need to be migrated or stopped for that to work. >> >> I'm not seeing any obvious network or disk errors.. >> >> Are their configuration options I'm missing? >> >> > ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] urgent issue
Its possible that this is specific to just one gluster volume... I've moved a few VM disks off of that volume and am able to start them fine. My recolection is that any VM started on the "bad" volume causes it to be disconnected and forces the ovirt node to be marked down until Maint->Activate. On Tue, Sep 8, 2015 at 3:52 PM, Chris Liebman <chri...@taboola.com> wrote: > In attempting to put an ovirt cluster in production I'm running into some > off errors with gluster it looks like. Its 12 hosts each with one brick in > distributed-replicate. (actually 2 bricks but they are separate volumes) > > [root@ovirt-node268 glusterfs]# rpm -qa | grep vdsm > > vdsm-jsonrpc-4.16.20-0.el6.noarch > > vdsm-gluster-4.16.20-0.el6.noarch > > vdsm-xmlrpc-4.16.20-0.el6.noarch > > vdsm-yajsonrpc-4.16.20-0.el6.noarch > > vdsm-4.16.20-0.el6.x86_64 > > vdsm-python-zombiereaper-4.16.20-0.el6.noarch > > vdsm-python-4.16.20-0.el6.noarch > > vdsm-cli-4.16.20-0.el6.noarch > > >Everything was fine last week, however, today various clients in the > gluster cluster seem get "client quorum not met" periodically - when they > get this they take one of the bricks offline - this causes VM's to be > attempted to move - sometimes 20 at a time. That takes a long time :-(. > I've tried disabling automatic migration and teh VM's get paused when this > happens - resuming gets nothing at that point as the volumes mount on the > server hosting the VM is not connected: > > from > rhev-data-center-mnt-glusterSD-ovirt-node268.la.taboolasyndication.com: > _LADC-TBX-V02.log: > > [2015-09-08 21:18:42.920771] W [MSGID: 108001] > [afr-common.c:4043:afr_notify] 2-LADC-TBX-V02-replicate-2: Client-quorum is > not > met > > [2015-09-08 21:18:42.931751] I [fuse-bridge.c:4900:fuse_thread_proc] > 0-fuse: unmounting > /rhev/data-center/mnt/glusterSD/ovirt-node268.la.taboolasyndication.com: > _LADC-TBX-V02 > > [2015-09-08 21:18:42.931836] W [glusterfsd.c:1219:cleanup_and_exit] > (-->/lib64/libpthread.so.0(+0x7a51) [0x7f1bebc84a51] > -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xcd) [0x405e4d] > -->/usr/sbin/glusterfs(cleanup_and_exit+0x > > 65) [0x4059b5] ) 0-: received signum (15), shutting down > > [2015-09-08 21:18:42.931858] I [fuse-bridge.c:5595:fini] 0-fuse: > Unmounting > '/rhev/data-center/mnt/glusterSD/ovirt-node268.la.taboolasyndication.com: > _LADC-TBX-V02'. > > > And the mount is broken at that point: > > [root@ovirt-node267 ~]# df > > *df: > `/rhev/data-center/mnt/glusterSD/ovirt-node268.la.taboolasyndication.com:_LADC-TBX-V02': > Transport endpoint is not connected* > > Filesystem1K-blocks Used Available Use% Mounted on > > /dev/sda3 51475068 1968452 46885176 5% / > > tmpfs 132210244 0 132210244 0% /dev/shm > > /dev/sda2487652 32409 429643 8% /boot > > /dev/sda1204580 260 204320 1% /boot/efi > > /dev/sda51849960960 156714056 1599267616 9% /data1 > > /dev/sdb11902274676 18714468 1786923588 2% /data2 > > ovirt-node268.la.taboolasyndication.com:/LADC-TBX-V01 > > 9249804800 727008640 8052899712 9% > /rhev/data-center/mnt/glusterSD/ovirt-node268.la.taboolasyndication.com: > _LADC-TBX-V01 > > ovirt-node251.la.taboolasyndication.com:/LADC-TBX-V03 > > 1849960960 73728 1755907968 1% > /rhev/data-center/mnt/glusterSD/ovirt-node251.la.taboolasyndication.com: > _LADC-TBX-V03 > > The fix for that is to put the server in maintenance mode then activate it > again. But all VM's need to be migrated or stopped for that to work. > > I'm not seeing any obvious network or disk errors.. > > Are their configuration options I'm missing? > > ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
[ovirt-users] urgent issue
In attempting to put an ovirt cluster in production I'm running into some off errors with gluster it looks like. Its 12 hosts each with one brick in distributed-replicate. (actually 2 bricks but they are separate volumes) [root@ovirt-node268 glusterfs]# rpm -qa | grep vdsm vdsm-jsonrpc-4.16.20-0.el6.noarch vdsm-gluster-4.16.20-0.el6.noarch vdsm-xmlrpc-4.16.20-0.el6.noarch vdsm-yajsonrpc-4.16.20-0.el6.noarch vdsm-4.16.20-0.el6.x86_64 vdsm-python-zombiereaper-4.16.20-0.el6.noarch vdsm-python-4.16.20-0.el6.noarch vdsm-cli-4.16.20-0.el6.noarch Everything was fine last week, however, today various clients in the gluster cluster seem get "client quorum not met" periodically - when they get this they take one of the bricks offline - this causes VM's to be attempted to move - sometimes 20 at a time. That takes a long time :-(. I've tried disabling automatic migration and teh VM's get paused when this happens - resuming gets nothing at that point as the volumes mount on the server hosting the VM is not connected: from rhev-data-center-mnt-glusterSD-ovirt-node268.la.taboolasyndication.com: _LADC-TBX-V02.log: [2015-09-08 21:18:42.920771] W [MSGID: 108001] [afr-common.c:4043:afr_notify] 2-LADC-TBX-V02-replicate-2: Client-quorum is not met [2015-09-08 21:18:42.931751] I [fuse-bridge.c:4900:fuse_thread_proc] 0-fuse: unmounting /rhev/data-center/mnt/glusterSD/ovirt-node268.la.taboolasyndication.com: _LADC-TBX-V02 [2015-09-08 21:18:42.931836] W [glusterfsd.c:1219:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7a51) [0x7f1bebc84a51] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xcd) [0x405e4d] -->/usr/sbin/glusterfs(cleanup_and_exit+0x 65) [0x4059b5] ) 0-: received signum (15), shutting down [2015-09-08 21:18:42.931858] I [fuse-bridge.c:5595:fini] 0-fuse: Unmounting '/rhev/data-center/mnt/glusterSD/ovirt-node268.la.taboolasyndication.com: _LADC-TBX-V02'. And the mount is broken at that point: [root@ovirt-node267 ~]# df *df: `/rhev/data-center/mnt/glusterSD/ovirt-node268.la.taboolasyndication.com:_LADC-TBX-V02': Transport endpoint is not connected* Filesystem1K-blocks Used Available Use% Mounted on /dev/sda3 51475068 1968452 46885176 5% / tmpfs 132210244 0 132210244 0% /dev/shm /dev/sda2487652 32409 429643 8% /boot /dev/sda1204580 260 204320 1% /boot/efi /dev/sda51849960960 156714056 1599267616 9% /data1 /dev/sdb11902274676 18714468 1786923588 2% /data2 ovirt-node268.la.taboolasyndication.com:/LADC-TBX-V01 9249804800 727008640 8052899712 9% /rhev/data-center/mnt/glusterSD/ovirt-node268.la.taboolasyndication.com: _LADC-TBX-V01 ovirt-node251.la.taboolasyndication.com:/LADC-TBX-V03 1849960960 73728 1755907968 1% /rhev/data-center/mnt/glusterSD/ovirt-node251.la.taboolasyndication.com: _LADC-TBX-V03 The fix for that is to put the server in maintenance mode then activate it again. But all VM's need to be migrated or stopped for that to work. I'm not seeing any obvious network or disk errors.. Are their configuration options I'm missing? ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] VLAN with one NIC
If your sharing a physical interface with both tagged and untagged traffic they don't allow VM's on the untagged link. I believe this is because some earlier versions of the bridging code in the kernel would allow pass the tagged traffic to VM's located on the untagged interface. That being a security issue. On Fri, Sep 4, 2015 at 9:47 AM, gregor <gregor_fo...@catrix.at> wrote: > Thanks, now I can use the eth0 to connect to the web-interface and the > logical vlan network to use in a VM but I can't use the untagged VLAN1 > in the VM. When I add a new logical network without VLAN tagging the > web-interface returns "Cannot have more than one non-VLAN network on one > interface." > > When I create a logical network with taggged ID 1 I can't connect from a > VM to my VLAN1. > > So I have to find a way to created the following networks: > - ovirtmgmt: without tagged vlan for managing -> works > - vlan1: without tagged vlan for the VM's to connect to my default > network -> currently no solution > - vlan10: tagged vlan with id 10 -> works > > cheers > gregor > > On 2015-09-04 18:05, Chris Liebman wrote: > > You have to edit the ovirtgmt network and un-check the "VM Network" box: > > > > Inline image 1 > > > > On Fri, Sep 4, 2015 at 8:47 AM, gregor <gregor_fo...@catrix.at > > <mailto:gregor_fo...@catrix.at>> wrote: > > > > Hi, > > > > is it possible to use different VLAN with one NIC in ovirt? > > > > I can not add a logical network configured as VLAN to my ovirtmgmt > > interface, I get "Cannot have a non-VLAN VM network and VLAN-tagged > > networks on one interface." > > > > The setup for the Port is a trunk of different tagged VLAN's and the > > default VLAN1 is untagged. Normally this work under centos where I > give > > the nic eth0 an IP and create a nic eth0.10 for the VLAN with ID 10 > and > > set an IP for it. > > > > cheers > > gregor > > > > ___ > > Users mailing list > > Users@ovirt.org <mailto:Users@ovirt.org> > > http://lists.ovirt.org/mailman/listinfo/users > > > > > ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] VLAN with one NIC
You have to edit the ovirtgmt network and un-check the "VM Network" box: [image: Inline image 1] On Fri, Sep 4, 2015 at 8:47 AM, gregorwrote: > Hi, > > is it possible to use different VLAN with one NIC in ovirt? > > I can not add a logical network configured as VLAN to my ovirtmgmt > interface, I get "Cannot have a non-VLAN VM network and VLAN-tagged > networks on one interface." > > The setup for the Port is a trunk of different tagged VLAN's and the > default VLAN1 is untagged. Normally this work under centos where I give > the nic eth0 an IP and create a nic eth0.10 for the VLAN with ID 10 and > set an IP for it. > > cheers > gregor > > ___ > Users mailing list > Users@ovirt.org > http://lists.ovirt.org/mailman/listinfo/users > ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Move DataCenter or Cluster from one engine to another
3.5 (its running 3.5.2) On Tue, Aug 25, 2015 at 5:14 AM, Liron Aravot lara...@redhat.com wrote: - Original Message - From: Yaniv Dary yd...@redhat.com To: Chris Liebman chri...@taboola.com Cc: users users@ovirt.org Sent: Tuesday, August 25, 2015 1:47:53 PM Subject: Re: [ovirt-users] Move DataCenter or Cluster from one engine to another Yes, using import storage domain: http://www.ovirt.org/Features/ImportStorageDomain Yaniv Dary Technical Product Manager Red Hat Israel Ltd. 34 Jerusalem Road Building A, 4th floor Ra'anana, Israel 4350109 Tel : +972 (9) 7692306 8272306 Email: yd...@redhat.com IRC : ydary On Mon, Aug 24, 2015 at 8:56 PM, Chris Liebman chri...@taboola.com wrote: Is it possible to export a data center from one engine and import it to another? Currently I have an engine running in Europe and a set of nodes comprising a datacenter on the west coast of the US and am seeing communication issues? There are a number of other data centers that the engine in Europe is managing and I'd like to deploy an engine in closer proximity to the nodes for one data center. Has anyone one this? Is it possible? -- Chris Hi Chris, what is your (source) Data Center compatibility version? thanks, Liron ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
[ovirt-users] Move DataCenter or Cluster from one engine to another
Is it possible to export a data center from one engine and import it to another? Currently I have an engine running in Europe and a set of nodes comprising a datacenter on the west coast of the US and am seeing communication issues? There are a number of other data centers that the engine in Europe is managing and I'd like to deploy an engine in closer proximity to the nodes for one data center. Has anyone one this? Is it possible? -- Chris ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
[ovirt-users] error messages filling disk
This has happened twice now, these messages start spewing into the both /var/log/messages and /var/log/vdsm/vdsm.log with slightly different formats but the same information. Restarting vdsmd fixes this but I'l like to find why it gets into this state. CentOS 6.7 oVirt 3.5. Any ideas? Aug 18 16:06:36 ovirt-node250 vdsm vds.MultiProtocolAcceptor ERROR Unhandled exception#012Traceback (most recent call last):#012 File /usr/share/vdsm/protocoldetector.py, line 86, in serve_forever#012 self._process_events()#012 File /usr/share/vdsm/protocoldetector.py, line 105, in _process_events#012self._handle_connection_read(fd)#012 File /usr/share/vdsm/protocoldetector.py, line 225, in _handle_connection_read#012data = client_socket.recv(self._required_size, socket.MSG_PEEK)#012 File /usr/lib/python2.6/site-packages/vdsm/sslutils.py, line 58, in read#012 self._data = self.connection.read(size)#012 File /usr/lib64/python2.6/site-packages/M2Crypto/SSL/Connection.py, line 229, in read#012return self._read_nbio(size)#012 File /usr/lib64/python2.6/site-packages/M2Crypto/SSL/Connection.py, line 218, in _read_nbio#012return m2.ssl_read_nbio(self.ssl, size)#012SSLError: unexpected eof Aug 18 16:06:36 ovirt-node250 vdsm vds.MultiProtocolAcceptor ERROR Unhandled exception#012Traceback (most recent call last):#012 File /usr/share/vdsm/protocoldetector.py, line 86, in serve_forever#012 self._process_events()#012 File /usr/share/vdsm/protocoldetector.py, line 105, in _process_events#012self._handle_connection_read(fd)#012 File /usr/share/vdsm/protocoldetector.py, line 225, in _handle_connection_read#012data = client_socket.recv(self._required_size, socket.MSG_PEEK)#012 File /usr/lib/python2.6/site-packages/vdsm/sslutils.py, line 58, in read#012 self._data = self.connection.read(size)#012 File /usr/lib64/python2.6/site-packages/M2Crypto/SSL/Connection.py, line 229, in read#012return self._read_nbio(size)#012 File /usr/lib64/python2.6/site-packages/M2Crypto/SSL/Connection.py, line 218, in _read_nbio#012return m2.ssl_read_nbio(self.ssl, size)#012SSLError: unexpected eof Aug 18 16:06:36 ovirt-node250 vdsm vds.MultiProtocolAcceptor ERROR Unhandled exception#012Traceback (most recent call last):#012 File /usr/share/vdsm/protocoldetector.py, line 86, in serve_forever#012 self._process_events()#012 File /usr/share/vdsm/protocoldetector.py, line 105, in _process_events#012self._handle_connection_read(fd)#012 File /usr/share/vdsm/protocoldetector.py, line 225, in _handle_connection_read#012data = client_socket.recv(self._required_size, socket.MSG_PEEK)#012 File /usr/lib/python2.6/site-packages/vdsm/sslutils.py, line 58, in read#012 self._data = self.connection.read(size)#012 File /usr/lib64/python2.6/site-packages/M2Crypto/SSL/Connection.py, line 229, in read#012return self._read_nbio(size)#012 File /usr/lib64/python2.6/site-packages/M2Crypto/SSL/Connection.py, line 218, in _read_nbio#012return m2.ssl_read_nbio(self.ssl, size)#012SSLError: unexpected eof Aug 18 16:06:36 ovirt-node250 vdsm vds.MultiProtocolAcceptor ERROR Unhandled exception#012Traceback (most recent call last):#012 File /usr/share/vdsm/protocoldetector.py, line 86, in serve_forever#012 self._process_events()#012 File /usr/share/vdsm/protocoldetector.py, line 105, in _process_events#012self._handle_connection_read(fd)#012 File /usr/share/vdsm/protocoldetector.py, line 225, in _handle_connection_read#012data = client_socket.recv(self._required_size, socket.MSG_PEEK)#012 File /usr/lib/python2.6/site-packages/vdsm/sslutils.py, line 58, in read#012 self._data = self.connection.read(size)#012 File /usr/lib64/python2.6/site-packages/M2Crypto/SSL/Connection.py, line 229, in read#012return self._read_nbio(size)#012 File /usr/lib64/python2.6/site-packages/M2Crypto/SSL/Connection.py, line 218, in _read_nbio#012return m2.ssl_read_nbio(self.ssl, size)#012SSLError: unexpected eof Aug 18 16:06:36 ovirt-node250 vdsm vds.MultiProtocolAcceptor ERROR Unhandled exception#012Traceback (most recent call last):#012 File /usr/share/vdsm/protocoldetector.py, line 86, in serve_forever#012 self._process_events()#012 File /usr/share/vdsm/protocoldetector.py, line 105, in _process_events#012self._handle_connection_read(fd)#012 File /usr/share/vdsm/protocoldetector.py, line 225, in _handle_connection_read#012data = client_socket.recv(self._required_size, socket.MSG_PEEK)#012 File /usr/lib/python2.6/site-packages/vdsm/sslutils.py, line 58, in read#012 self._data = self.connection.read(size)#012 File /usr/lib64/python2.6/site-packages/M2Crypto/SSL/Connection.py, line 229, in read#012return self._read_nbio(size)#012 File /usr/lib64/python2.6/site-packages/M2Crypto/SSL/Connection.py, line 218, in _read_nbio#012return m2.ssl_read_nbio(self.ssl, size)#012SSLError: unexpected eof Aug 18 16:06:36 ovirt-node250 vdsm
Re: [ovirt-users] stuck hosts - how can I delete them?
Yes - thanks! On Sunday, August 16, 2015, Sahina Bose sab...@redhat.com wrote: On 08/13/2015 11:48 PM, Chris Liebman wrote: I've just force deleted a DC. I did this because gluster was completely hosed. Multiple nodes with broken disks - don't ask... Anyway - now I see that the Cluster still exists with the hosts. And I cant remove, re-install etc the hosts, nor can I delete the cluster. Help! Are you facing the same issue as https://bugzilla.redhat.com/show_bug.cgi?id=1244935 -- Chris ___ Users mailing listus...@ovirt.org javascript:_e(%7B%7D,'cvml','Users@ovirt.org');http://lists.ovirt.org/mailman/listinfo/users ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
[ovirt-users] stuck hosts - how can I delete them?
I've just force deleted a DC. I did this because gluster was completely hosed. Multiple nodes with broken disks - don't ask... Anyway - now I see that the Cluster still exists with the hosts. And I cant remove, re-install etc the hosts, nor can I delete the cluster. Help! -- Chris ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
[ovirt-users] ovirt 3.5.2 issues with nodes becoming Non Operational
Hi, I'm new to oVirt and recently built a 10 node ovirt 3.5 DC with shared storage using gluster configured as distributed-replicated (replication = 2). Shortly after 7 of the 10 nodes dropped, one at a time over a few hours, into Non Operational state. Attempting to activate one of these nodes gives the error: Failed to connect Host ovirt-node260 to Storage Pool LADC-TBX. Attempting to put the node into Maintenance eaves the node stuck in Preparing For maintenance. When I rebooted one of the nodes I see this in the nodes event list: Host ovirt-node269 reports about one of the Active Storage Domains as Problematic. I see many of these errors in the vdsm log from the failed nodes: Thread-1::ERROR::2015-08-12 10:01:17,748::__init__::506::jsonrpc.JsonRpcServer::(_serveRequest) Internal server error Traceback (most recent call last): File /usr/lib/python2.6/site-packages/yajsonrpc/__init__.py, line 501, in _serveRequest res = method(**params) File /usr/share/vdsm/rpc/Bridge.py, line 267, in _dynamicMethod result = fn(*methodArgs) File /usr/share/vdsm/API.py, line 1330, in getStats stats.update(self._cif.mom.getKsmStats()) File /usr/share/vdsm/momIF.py, line 60, in getKsmStats stats = self._mom.getStatistics()['host'] File /usr/lib/python2.6/site-packages/mom/MOMFuncs.py, line 75, in getStatistics host_stats = self.threads['host_monitor'].interrogate().statistics[-1] AttributeError: 'NoneType' object has no attribute 'statistics' Any help here is appreciated. -- Chris ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] ovirt 3.5.2 issues with nodes becoming Non Operational
I may have figured this out. The systems that failed are running the Oracle unbreakable kernel: 3.8.13-98.el6uek.x86_64 The working systems are running the default CentOS 6 2.6 kernel. and the error from the vdsm.log only show up on the UEK kernel. -- Chris On Wed, Aug 12, 2015 at 9:34 AM, Chris Liebman chri...@taboola.com wrote: Hi, I'm new to oVirt and recently built a 10 node ovirt 3.5 DC with shared storage using gluster configured as distributed-replicated (replication = 2). Shortly after 7 of the 10 nodes dropped, one at a time over a few hours, into Non Operational state. Attempting to activate one of these nodes gives the error: Failed to connect Host ovirt-node260 to Storage Pool LADC-TBX. Attempting to put the node into Maintenance eaves the node stuck in Preparing For maintenance. When I rebooted one of the nodes I see this in the nodes event list: Host ovirt-node269 reports about one of the Active Storage Domains as Problematic. I see many of these errors in the vdsm log from the failed nodes: Thread-1::ERROR::2015-08-12 10:01:17,748::__init__::506::jsonrpc.JsonRpcServer::(_serveRequest) Internal server error Traceback (most recent call last): File /usr/lib/python2.6/site-packages/yajsonrpc/__init__.py, line 501, in _serveRequest res = method(**params) File /usr/share/vdsm/rpc/Bridge.py, line 267, in _dynamicMethod result = fn(*methodArgs) File /usr/share/vdsm/API.py, line 1330, in getStats stats.update(self._cif.mom.getKsmStats()) File /usr/share/vdsm/momIF.py, line 60, in getKsmStats stats = self._mom.getStatistics()['host'] File /usr/lib/python2.6/site-packages/mom/MOMFuncs.py, line 75, in getStatistics host_stats = self.threads['host_monitor'].interrogate().statistics[-1] AttributeError: 'NoneType' object has no attribute 'statistics' Any help here is appreciated. -- Chris ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users