Re: [ovirt-users] Fwd: Re: urgent issue

2015-09-22 Thread Chris Liebman
Sorry - its too late - all hosts have been re-imaged and are setup as local
storage.

On Mon, Sep 21, 2015 at 10:38 PM, Ravishankar N <ravishan...@redhat.com>
wrote:

> Hi Chris,
>
> Replies inline..
>
> On 09/22/2015 09:31 AM, Sahina Bose wrote:
>
>
>
>
>  Forwarded Message  Subject: Re: [ovirt-users] urgent
> issue Date: Wed, 9 Sep 2015 08:31:07 -0700 From: Chris Liebman
> <chri...@taboola.com> <chri...@taboola.com> To: users <users@ovirt.org>
> <users@ovirt.org>
>
> Ok - I think I'm going to switch to local storage - I've had way to many
> unexplainable issue with glusterfs  :-(.  Is there any reason I cant add
> local storage to the existing shared-storage cluster?  I see that the menu
> item is greyed out
>
>
>
> What version of gluster and ovirt are you using?
>
>
>
>
> On Tue, Sep 8, 2015 at 4:19 PM, Chris Liebman <chri...@taboola.com> wrote:
>
>> Its possible that this is specific to just one gluster volume...  I've
>> moved a few VM disks off of that volume and am able to start them fine.Â
>> My recolection is that any VM started on the "bad" volume causes it to be
>> disconnected and forces the ovirt node to be marked down until
>> Maint->Activate.
>>
>> On Tue, Sep 8, 2015 at 3:52 PM, Chris Liebman < <chri...@taboola.com>
>> chri...@taboola.com> wrote:
>>
>>> In attempting to put an ovirt cluster in production I'm running into
>>> some off errors with gluster it looks like.  Its 12 hosts each with one
>>> brick in distributed-replicate. Â (actually 2 bricks but they are separate
>>> volumes)
>>>
>>>
> These 12 nodes in dist-rep config, are they in replica 2 or replica 3? The
> latter is what is recommended for VM use-cases. Could you give the output
> of `gluster volume info` ?
>
> [root@ovirt-node268 glusterfs]# rpm -qa | grep vdsm
>>>
>>> vdsm-jsonrpc-4.16.20-0.el6.noarch
>>>
>>> vdsm-gluster-4.16.20-0.el6.noarch
>>>
>>> vdsm-xmlrpc-4.16.20-0.el6.noarch
>>>
>>> vdsm-yajsonrpc-4.16.20-0.el6.noarch
>>>
>>> vdsm-4.16.20-0.el6.x86_64
>>>
>>> vdsm-python-zombiereaper-4.16.20-0.el6.noarch
>>>
>>> vdsm-python-4.16.20-0.el6.noarch
>>>
>>> vdsm-cli-4.16.20-0.el6.noarch
>>>
>>>
>>> Â  Â Everything was fine last week, however, today various clients in
>>> the gluster cluster seem get "client quorum not met" periodically - when
>>> they get this they take one of the bricks offline - this causes VM's to be
>>> attempted to move - sometimes 20 at a time.  That takes a long time :-(.
>>> I've tried disabling automatic migration and teh VM's get paused when this
>>> happens - resuming gets nothing at that point as the volumes mount on the
>>> server hosting the VM is not connected:
>>>
>>> from
>>> rhev-data-center-mnt-glusterSD-ovirt-node268.la.taboolasyndication.com:
>>> _LADC-TBX-V02.log:
>>>
>>> [2015-09-08 21:18:42.920771] W [MSGID: 108001]
>>> [afr-common.c:4043:afr_notify] 2-LADC-TBX-V02-replicate-2: Client-quorum
>>> is not met
>>>
>>
> When client-quorum is not met (due to network disconnects, or gluster
> brick processes going down etc), gluster makes the volume read-only. This
> is expected behavior and prevents split-brains. It's probably a bit late,
> but do you have the  gluster fuse mount logs to confirm this indeed was the
> issue?
>
> [2015-09-08 21:18:42.931751] I [fuse-bridge.c:4900:fuse_thread_proc]
>>> 0-fuse: unmounting
>>> /rhev/data-center/mnt/glusterSD/ovirt-node268.la.taboolasyndication.com:
>>> _LADC-TBX-V02
>>>
>>> [2015-09-08 21:18:42.931836] W [glusterfsd.c:1219:cleanup_and_exit]
>>> (-->/lib64/libpthread.so.0(+0x7a51) [0x7f1bebc84a51]
>>> -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xcd) [0x405e4d]
>>> -->/usr/sbin/glusterfs(cleanup_and_exit+0x
>>>
>>> 65) [0x4059b5] ) 0-: received signum (15), shutting down
>>>
>>> [2015-09-08 21:18:42.931858] I [fuse-bridge.c:5595:fini] 0-fuse:
>>> Unmounting
>>> '/rhev/data-center/mnt/glusterSD/ovirt-node268.la.taboolasyndication.com:
>>> _LADC-TBX-V02'.
>>>
>>
> The VM pause you saw could be because of the unmount.I understand that a
> fix (https://gerrit.ovirt.org/#/c/40240/)  went in for ovirt 3-.6
> (vdsm-4.17) to prevent vdsm from unmounting the gluster volume when vdsm
> exits/restarts.
> Is it possible to run a test setup on 3.6 and see i

Re: [ovirt-users] urgent issue

2015-09-09 Thread Chris Liebman
Ok - I think I'm going to switch to local storage - I've had way to many
unexplainable issue with glusterfs  :-(.  Is there any reason I cant add
local storage to the existing shared-storage cluster?  I see that the menu
item is greyed out





On Tue, Sep 8, 2015 at 4:19 PM, Chris Liebman <chri...@taboola.com> wrote:

> Its possible that this is specific to just one gluster volume...  I've
> moved a few VM disks off of that volume and am able to start them fine.  My
> recolection is that any VM started on the "bad" volume causes it to be
> disconnected and forces the ovirt node to be marked down until
> Maint->Activate.
>
> On Tue, Sep 8, 2015 at 3:52 PM, Chris Liebman <chri...@taboola.com> wrote:
>
>> In attempting to put an ovirt cluster in production I'm running into some
>> off errors with gluster it looks like.  Its 12 hosts each with one brick in
>> distributed-replicate.  (actually 2 bricks but they are separate volumes)
>>
>> [root@ovirt-node268 glusterfs]# rpm -qa | grep vdsm
>>
>> vdsm-jsonrpc-4.16.20-0.el6.noarch
>>
>> vdsm-gluster-4.16.20-0.el6.noarch
>>
>> vdsm-xmlrpc-4.16.20-0.el6.noarch
>>
>> vdsm-yajsonrpc-4.16.20-0.el6.noarch
>>
>> vdsm-4.16.20-0.el6.x86_64
>>
>> vdsm-python-zombiereaper-4.16.20-0.el6.noarch
>>
>> vdsm-python-4.16.20-0.el6.noarch
>>
>> vdsm-cli-4.16.20-0.el6.noarch
>>
>>
>>Everything was fine last week, however, today various clients in the
>> gluster cluster seem get "client quorum not met" periodically - when they
>> get this they take one of the bricks offline - this causes VM's to be
>> attempted to move - sometimes 20 at a time.  That takes a long time :-(.
>> I've tried disabling automatic migration and teh VM's get paused when this
>> happens - resuming gets nothing at that point as the volumes mount on the
>> server hosting the VM is not connected:
>>
>> from
>> rhev-data-center-mnt-glusterSD-ovirt-node268.la.taboolasyndication.com:
>> _LADC-TBX-V02.log:
>>
>> [2015-09-08 21:18:42.920771] W [MSGID: 108001]
>> [afr-common.c:4043:afr_notify] 2-LADC-TBX-V02-replicate-2: Client-quorum is 
>> not
>> met
>>
>> [2015-09-08 21:18:42.931751] I [fuse-bridge.c:4900:fuse_thread_proc]
>> 0-fuse: unmounting
>> /rhev/data-center/mnt/glusterSD/ovirt-node268.la.taboolasyndication.com:
>> _LADC-TBX-V02
>>
>> [2015-09-08 21:18:42.931836] W [glusterfsd.c:1219:cleanup_and_exit]
>> (-->/lib64/libpthread.so.0(+0x7a51) [0x7f1bebc84a51]
>> -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xcd) [0x405e4d]
>> -->/usr/sbin/glusterfs(cleanup_and_exit+0x
>>
>> 65) [0x4059b5] ) 0-: received signum (15), shutting down
>>
>> [2015-09-08 21:18:42.931858] I [fuse-bridge.c:5595:fini] 0-fuse:
>> Unmounting
>> '/rhev/data-center/mnt/glusterSD/ovirt-node268.la.taboolasyndication.com:
>> _LADC-TBX-V02'.
>>
>>
>> And the mount is broken at that point:
>>
>> [root@ovirt-node267 ~]# df
>>
>> *df:
>> `/rhev/data-center/mnt/glusterSD/ovirt-node268.la.taboolasyndication.com:_LADC-TBX-V02':
>> Transport endpoint is not connected*
>>
>> Filesystem1K-blocks  Used  Available Use% Mounted on
>>
>> /dev/sda3  51475068   1968452   46885176   5% /
>>
>> tmpfs 132210244 0  132210244   0% /dev/shm
>>
>> /dev/sda2487652 32409 429643   8% /boot
>>
>> /dev/sda1204580   260 204320   1% /boot/efi
>>
>> /dev/sda51849960960 156714056 1599267616   9% /data1
>>
>> /dev/sdb11902274676  18714468 1786923588   2% /data2
>>
>> ovirt-node268.la.taboolasyndication.com:/LADC-TBX-V01
>>
>>  9249804800 727008640 8052899712   9%
>> /rhev/data-center/mnt/glusterSD/ovirt-node268.la.taboolasyndication.com:
>> _LADC-TBX-V01
>>
>> ovirt-node251.la.taboolasyndication.com:/LADC-TBX-V03
>>
>>  1849960960 73728 1755907968   1%
>> /rhev/data-center/mnt/glusterSD/ovirt-node251.la.taboolasyndication.com:
>> _LADC-TBX-V03
>>
>> The fix for that is to put the server in maintenance mode then activate
>> it again. But all VM's need to be migrated or stopped for that to work.
>>
>> I'm not seeing any obvious network or disk errors..
>>
>> Are their configuration options I'm missing?
>>
>>
>
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] urgent issue

2015-09-08 Thread Chris Liebman
Its possible that this is specific to just one gluster volume...  I've
moved a few VM disks off of that volume and am able to start them fine.  My
recolection is that any VM started on the "bad" volume causes it to be
disconnected and forces the ovirt node to be marked down until
Maint->Activate.

On Tue, Sep 8, 2015 at 3:52 PM, Chris Liebman <chri...@taboola.com> wrote:

> In attempting to put an ovirt cluster in production I'm running into some
> off errors with gluster it looks like.  Its 12 hosts each with one brick in
> distributed-replicate.  (actually 2 bricks but they are separate volumes)
>
> [root@ovirt-node268 glusterfs]# rpm -qa | grep vdsm
>
> vdsm-jsonrpc-4.16.20-0.el6.noarch
>
> vdsm-gluster-4.16.20-0.el6.noarch
>
> vdsm-xmlrpc-4.16.20-0.el6.noarch
>
> vdsm-yajsonrpc-4.16.20-0.el6.noarch
>
> vdsm-4.16.20-0.el6.x86_64
>
> vdsm-python-zombiereaper-4.16.20-0.el6.noarch
>
> vdsm-python-4.16.20-0.el6.noarch
>
> vdsm-cli-4.16.20-0.el6.noarch
>
>
>Everything was fine last week, however, today various clients in the
> gluster cluster seem get "client quorum not met" periodically - when they
> get this they take one of the bricks offline - this causes VM's to be
> attempted to move - sometimes 20 at a time.  That takes a long time :-(.
> I've tried disabling automatic migration and teh VM's get paused when this
> happens - resuming gets nothing at that point as the volumes mount on the
> server hosting the VM is not connected:
>
> from
> rhev-data-center-mnt-glusterSD-ovirt-node268.la.taboolasyndication.com:
> _LADC-TBX-V02.log:
>
> [2015-09-08 21:18:42.920771] W [MSGID: 108001]
> [afr-common.c:4043:afr_notify] 2-LADC-TBX-V02-replicate-2: Client-quorum is 
> not
> met
>
> [2015-09-08 21:18:42.931751] I [fuse-bridge.c:4900:fuse_thread_proc]
> 0-fuse: unmounting
> /rhev/data-center/mnt/glusterSD/ovirt-node268.la.taboolasyndication.com:
> _LADC-TBX-V02
>
> [2015-09-08 21:18:42.931836] W [glusterfsd.c:1219:cleanup_and_exit]
> (-->/lib64/libpthread.so.0(+0x7a51) [0x7f1bebc84a51]
> -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xcd) [0x405e4d]
> -->/usr/sbin/glusterfs(cleanup_and_exit+0x
>
> 65) [0x4059b5] ) 0-: received signum (15), shutting down
>
> [2015-09-08 21:18:42.931858] I [fuse-bridge.c:5595:fini] 0-fuse:
> Unmounting
> '/rhev/data-center/mnt/glusterSD/ovirt-node268.la.taboolasyndication.com:
> _LADC-TBX-V02'.
>
>
> And the mount is broken at that point:
>
> [root@ovirt-node267 ~]# df
>
> *df:
> `/rhev/data-center/mnt/glusterSD/ovirt-node268.la.taboolasyndication.com:_LADC-TBX-V02':
> Transport endpoint is not connected*
>
> Filesystem1K-blocks  Used  Available Use% Mounted on
>
> /dev/sda3  51475068   1968452   46885176   5% /
>
> tmpfs 132210244 0  132210244   0% /dev/shm
>
> /dev/sda2487652 32409 429643   8% /boot
>
> /dev/sda1204580   260 204320   1% /boot/efi
>
> /dev/sda51849960960 156714056 1599267616   9% /data1
>
> /dev/sdb11902274676  18714468 1786923588   2% /data2
>
> ovirt-node268.la.taboolasyndication.com:/LADC-TBX-V01
>
>  9249804800 727008640 8052899712   9%
> /rhev/data-center/mnt/glusterSD/ovirt-node268.la.taboolasyndication.com:
> _LADC-TBX-V01
>
> ovirt-node251.la.taboolasyndication.com:/LADC-TBX-V03
>
>  1849960960 73728 1755907968   1%
> /rhev/data-center/mnt/glusterSD/ovirt-node251.la.taboolasyndication.com:
> _LADC-TBX-V03
>
> The fix for that is to put the server in maintenance mode then activate it
> again. But all VM's need to be migrated or stopped for that to work.
>
> I'm not seeing any obvious network or disk errors..
>
> Are their configuration options I'm missing?
>
>
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] urgent issue

2015-09-08 Thread Chris Liebman
In attempting to put an ovirt cluster in production I'm running into some
off errors with gluster it looks like.  Its 12 hosts each with one brick in
distributed-replicate.  (actually 2 bricks but they are separate volumes)

[root@ovirt-node268 glusterfs]# rpm -qa | grep vdsm

vdsm-jsonrpc-4.16.20-0.el6.noarch

vdsm-gluster-4.16.20-0.el6.noarch

vdsm-xmlrpc-4.16.20-0.el6.noarch

vdsm-yajsonrpc-4.16.20-0.el6.noarch

vdsm-4.16.20-0.el6.x86_64

vdsm-python-zombiereaper-4.16.20-0.el6.noarch

vdsm-python-4.16.20-0.el6.noarch

vdsm-cli-4.16.20-0.el6.noarch


   Everything was fine last week, however, today various clients in the
gluster cluster seem get "client quorum not met" periodically - when they
get this they take one of the bricks offline - this causes VM's to be
attempted to move - sometimes 20 at a time.  That takes a long time :-(.
I've tried disabling automatic migration and teh VM's get paused when this
happens - resuming gets nothing at that point as the volumes mount on the
server hosting the VM is not connected:

from rhev-data-center-mnt-glusterSD-ovirt-node268.la.taboolasyndication.com:
_LADC-TBX-V02.log:

[2015-09-08 21:18:42.920771] W [MSGID: 108001]
[afr-common.c:4043:afr_notify] 2-LADC-TBX-V02-replicate-2: Client-quorum is not
met

[2015-09-08 21:18:42.931751] I [fuse-bridge.c:4900:fuse_thread_proc]
0-fuse: unmounting
/rhev/data-center/mnt/glusterSD/ovirt-node268.la.taboolasyndication.com:
_LADC-TBX-V02

[2015-09-08 21:18:42.931836] W [glusterfsd.c:1219:cleanup_and_exit]
(-->/lib64/libpthread.so.0(+0x7a51) [0x7f1bebc84a51]
-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xcd) [0x405e4d]
-->/usr/sbin/glusterfs(cleanup_and_exit+0x

65) [0x4059b5] ) 0-: received signum (15), shutting down

[2015-09-08 21:18:42.931858] I [fuse-bridge.c:5595:fini] 0-fuse: Unmounting
'/rhev/data-center/mnt/glusterSD/ovirt-node268.la.taboolasyndication.com:
_LADC-TBX-V02'.


And the mount is broken at that point:

[root@ovirt-node267 ~]# df

*df:
`/rhev/data-center/mnt/glusterSD/ovirt-node268.la.taboolasyndication.com:_LADC-TBX-V02':
Transport endpoint is not connected*

Filesystem1K-blocks  Used  Available Use% Mounted on

/dev/sda3  51475068   1968452   46885176   5% /

tmpfs 132210244 0  132210244   0% /dev/shm

/dev/sda2487652 32409 429643   8% /boot

/dev/sda1204580   260 204320   1% /boot/efi

/dev/sda51849960960 156714056 1599267616   9% /data1

/dev/sdb11902274676  18714468 1786923588   2% /data2

ovirt-node268.la.taboolasyndication.com:/LADC-TBX-V01

 9249804800 727008640 8052899712   9%
/rhev/data-center/mnt/glusterSD/ovirt-node268.la.taboolasyndication.com:
_LADC-TBX-V01

ovirt-node251.la.taboolasyndication.com:/LADC-TBX-V03

 1849960960 73728 1755907968   1%
/rhev/data-center/mnt/glusterSD/ovirt-node251.la.taboolasyndication.com:
_LADC-TBX-V03

The fix for that is to put the server in maintenance mode then activate it
again. But all VM's need to be migrated or stopped for that to work.

I'm not seeing any obvious network or disk errors..

Are their configuration options I'm missing?
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] VLAN with one NIC

2015-09-04 Thread Chris Liebman
If your sharing a physical interface with both tagged and untagged traffic
they don't allow VM's on the untagged link.  I believe this is because some
earlier versions of the bridging code in the kernel would allow pass the
tagged traffic to VM's located on the untagged interface.  That being a
security issue.

On Fri, Sep 4, 2015 at 9:47 AM, gregor <gregor_fo...@catrix.at> wrote:

> Thanks, now I can use the eth0 to connect to the web-interface and the
> logical vlan network to use in a VM but I can't use the untagged VLAN1
> in the VM. When I add a new logical network without VLAN tagging the
> web-interface returns "Cannot have more than one non-VLAN network on one
> interface."
>
> When I create a logical network with taggged ID 1 I can't connect from a
> VM to my VLAN1.
>
> So I have to find a way to created the following networks:
> - ovirtmgmt: without tagged vlan for managing -> works
> - vlan1: without tagged vlan for the VM's to connect to my default
> network -> currently no solution
> - vlan10: tagged vlan with id 10 -> works
>
> cheers
> gregor
>
> On 2015-09-04 18:05, Chris Liebman wrote:
> > You have to edit the ovirtgmt network and un-check the "VM Network" box:
> >
> > Inline image 1
> >
> > On Fri, Sep 4, 2015 at 8:47 AM, gregor <gregor_fo...@catrix.at
> > <mailto:gregor_fo...@catrix.at>> wrote:
> >
> > Hi,
> >
> > is it possible to use different VLAN with one NIC in ovirt?
> >
> > I can not add a logical network configured as VLAN to my ovirtmgmt
> > interface, I get "Cannot have a non-VLAN VM network and VLAN-tagged
> > networks on one interface."
> >
> > The setup for the Port is a trunk of different tagged VLAN's and the
> > default VLAN1 is untagged. Normally this work under centos where I
> give
> > the nic eth0 an IP and create a nic eth0.10 for the VLAN with ID 10
> and
> > set an IP for it.
> >
> > cheers
> > gregor
> >
> > ___
> > Users mailing list
> > Users@ovirt.org <mailto:Users@ovirt.org>
> > http://lists.ovirt.org/mailman/listinfo/users
> >
> >
>
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] VLAN with one NIC

2015-09-04 Thread Chris Liebman
You have to edit the ovirtgmt network and un-check the "VM Network" box:

[image: Inline image 1]

On Fri, Sep 4, 2015 at 8:47 AM, gregor  wrote:

> Hi,
>
> is it possible to use different VLAN with one NIC in ovirt?
>
> I can not add a logical network configured as VLAN to my ovirtmgmt
> interface, I get "Cannot have a non-VLAN VM network and VLAN-tagged
> networks on one interface."
>
> The setup for the Port is a trunk of different tagged VLAN's and the
> default VLAN1 is untagged. Normally this work under centos where I give
> the nic eth0 an IP and create a nic eth0.10 for the VLAN with ID 10 and
> set an IP for it.
>
> cheers
> gregor
>
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Move DataCenter or Cluster from one engine to another

2015-08-25 Thread Chris Liebman
3.5  (its running 3.5.2)

On Tue, Aug 25, 2015 at 5:14 AM, Liron Aravot lara...@redhat.com wrote:



 - Original Message -
  From: Yaniv Dary yd...@redhat.com
  To: Chris Liebman chri...@taboola.com
  Cc: users users@ovirt.org
  Sent: Tuesday, August 25, 2015 1:47:53 PM
  Subject: Re: [ovirt-users] Move DataCenter or Cluster from one engine
 to  another
 
  Yes, using import storage domain:
  http://www.ovirt.org/Features/ImportStorageDomain
 
  Yaniv Dary
  Technical Product Manager
  Red Hat Israel Ltd.
  34 Jerusalem Road
  Building A, 4th floor
  Ra'anana, Israel 4350109
 
  Tel : +972 (9) 7692306
  8272306
  Email: yd...@redhat.com IRC : ydary
 
  On Mon, Aug 24, 2015 at 8:56 PM, Chris Liebman  chri...@taboola.com 
 wrote:
 
 
 
  Is it possible to export a data center from one engine and import it to
  another? Currently I have an engine running in Europe and a set of nodes
  comprising a datacenter on the west coast of the US and am seeing
  communication issues? There are a number of other data centers that the
  engine in Europe is managing and I'd like to deploy an engine in closer
  proximity to the nodes for one data center. Has anyone one this? Is it
  possible?
  -- Chris

 Hi Chris,
 what is your (source) Data Center compatibility version?


 thanks,
 Liron
 
 
  ___
  Users mailing list
  Users@ovirt.org
  http://lists.ovirt.org/mailman/listinfo/users
 
 
 
  ___
  Users mailing list
  Users@ovirt.org
  http://lists.ovirt.org/mailman/listinfo/users
 

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] Move DataCenter or Cluster from one engine to another

2015-08-24 Thread Chris Liebman
   Is it possible to export a data center from one engine and import it to
another?  Currently I have an engine running in Europe and a set of nodes
comprising a datacenter on the west coast of the US and am seeing
communication issues?  There are a number of other data centers that the
engine in Europe is managing and I'd like to deploy an engine in closer
proximity to the nodes for one data center.  Has anyone one this?  Is it
possible?
-- Chris
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] error messages filling disk

2015-08-18 Thread Chris Liebman
This has happened twice now, these messages start spewing into the both
/var/log/messages and /var/log/vdsm/vdsm.log with slightly different
formats but the same information.  Restarting vdsmd fixes this but I'l like
to find why it gets into this state.  CentOS 6.7 oVirt 3.5.  Any ideas?

Aug 18 16:06:36 ovirt-node250 vdsm vds.MultiProtocolAcceptor ERROR
 Unhandled exception#012Traceback (most recent call last):#012  File
 /usr/share/vdsm/protocoldetector.py, line 86, in serve_forever#012
 self._process_events()#012  File /usr/share/vdsm/protocoldetector.py,
 line 105, in _process_events#012self._handle_connection_read(fd)#012
 File /usr/share/vdsm/protocoldetector.py, line 225, in
 _handle_connection_read#012data =
 client_socket.recv(self._required_size, socket.MSG_PEEK)#012  File
 /usr/lib/python2.6/site-packages/vdsm/sslutils.py, line 58, in read#012
   self._data = self.connection.read(size)#012  File
 /usr/lib64/python2.6/site-packages/M2Crypto/SSL/Connection.py, line 229,
 in read#012return self._read_nbio(size)#012  File
 /usr/lib64/python2.6/site-packages/M2Crypto/SSL/Connection.py, line 218,
 in _read_nbio#012return m2.ssl_read_nbio(self.ssl, size)#012SSLError:
 unexpected eof

 Aug 18 16:06:36 ovirt-node250 vdsm vds.MultiProtocolAcceptor ERROR
 Unhandled exception#012Traceback (most recent call last):#012  File
 /usr/share/vdsm/protocoldetector.py, line 86, in serve_forever#012
 self._process_events()#012  File /usr/share/vdsm/protocoldetector.py,
 line 105, in _process_events#012self._handle_connection_read(fd)#012
 File /usr/share/vdsm/protocoldetector.py, line 225, in
 _handle_connection_read#012data =
 client_socket.recv(self._required_size, socket.MSG_PEEK)#012  File
 /usr/lib/python2.6/site-packages/vdsm/sslutils.py, line 58, in read#012
   self._data = self.connection.read(size)#012  File
 /usr/lib64/python2.6/site-packages/M2Crypto/SSL/Connection.py, line 229,
 in read#012return self._read_nbio(size)#012  File
 /usr/lib64/python2.6/site-packages/M2Crypto/SSL/Connection.py, line 218,
 in _read_nbio#012return m2.ssl_read_nbio(self.ssl, size)#012SSLError:
 unexpected eof

 Aug 18 16:06:36 ovirt-node250 vdsm vds.MultiProtocolAcceptor ERROR
 Unhandled exception#012Traceback (most recent call last):#012  File
 /usr/share/vdsm/protocoldetector.py, line 86, in serve_forever#012
 self._process_events()#012  File /usr/share/vdsm/protocoldetector.py,
 line 105, in _process_events#012self._handle_connection_read(fd)#012
 File /usr/share/vdsm/protocoldetector.py, line 225, in
 _handle_connection_read#012data =
 client_socket.recv(self._required_size, socket.MSG_PEEK)#012  File
 /usr/lib/python2.6/site-packages/vdsm/sslutils.py, line 58, in read#012
   self._data = self.connection.read(size)#012  File
 /usr/lib64/python2.6/site-packages/M2Crypto/SSL/Connection.py, line 229,
 in read#012return self._read_nbio(size)#012  File
 /usr/lib64/python2.6/site-packages/M2Crypto/SSL/Connection.py, line 218,
 in _read_nbio#012return m2.ssl_read_nbio(self.ssl, size)#012SSLError:
 unexpected eof

 Aug 18 16:06:36 ovirt-node250 vdsm vds.MultiProtocolAcceptor ERROR
 Unhandled exception#012Traceback (most recent call last):#012  File
 /usr/share/vdsm/protocoldetector.py, line 86, in serve_forever#012
 self._process_events()#012  File /usr/share/vdsm/protocoldetector.py,
 line 105, in _process_events#012self._handle_connection_read(fd)#012
 File /usr/share/vdsm/protocoldetector.py, line 225, in
 _handle_connection_read#012data =
 client_socket.recv(self._required_size, socket.MSG_PEEK)#012  File
 /usr/lib/python2.6/site-packages/vdsm/sslutils.py, line 58, in read#012
   self._data = self.connection.read(size)#012  File
 /usr/lib64/python2.6/site-packages/M2Crypto/SSL/Connection.py, line 229,
 in read#012return self._read_nbio(size)#012  File
 /usr/lib64/python2.6/site-packages/M2Crypto/SSL/Connection.py, line 218,
 in _read_nbio#012return m2.ssl_read_nbio(self.ssl, size)#012SSLError:
 unexpected eof

 Aug 18 16:06:36 ovirt-node250 vdsm vds.MultiProtocolAcceptor ERROR
 Unhandled exception#012Traceback (most recent call last):#012  File
 /usr/share/vdsm/protocoldetector.py, line 86, in serve_forever#012
 self._process_events()#012  File /usr/share/vdsm/protocoldetector.py,
 line 105, in _process_events#012self._handle_connection_read(fd)#012
 File /usr/share/vdsm/protocoldetector.py, line 225, in
 _handle_connection_read#012data =
 client_socket.recv(self._required_size, socket.MSG_PEEK)#012  File
 /usr/lib/python2.6/site-packages/vdsm/sslutils.py, line 58, in read#012
   self._data = self.connection.read(size)#012  File
 /usr/lib64/python2.6/site-packages/M2Crypto/SSL/Connection.py, line 229,
 in read#012return self._read_nbio(size)#012  File
 /usr/lib64/python2.6/site-packages/M2Crypto/SSL/Connection.py, line 218,
 in _read_nbio#012return m2.ssl_read_nbio(self.ssl, size)#012SSLError:
 unexpected eof

 Aug 18 16:06:36 ovirt-node250 vdsm 

Re: [ovirt-users] stuck hosts - how can I delete them?

2015-08-17 Thread Chris Liebman
Yes - thanks!

On Sunday, August 16, 2015, Sahina Bose sab...@redhat.com wrote:



 On 08/13/2015 11:48 PM, Chris Liebman wrote:

 I've just force deleted a DC.  I did this because gluster was completely
 hosed. Multiple nodes with broken disks -  don't ask...  Anyway - now I see
 that the Cluster still exists with the hosts.  And I cant remove,
 re-install etc the hosts, nor can I delete the cluster.  Help!


 Are you facing the same issue as
 https://bugzilla.redhat.com/show_bug.cgi?id=1244935



 -- Chris



 ___
 Users mailing listus...@ovirt.org 
 javascript:_e(%7B%7D,'cvml','Users@ovirt.org');http://lists.ovirt.org/mailman/listinfo/users



___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] stuck hosts - how can I delete them?

2015-08-13 Thread Chris Liebman
I've just force deleted a DC.  I did this because gluster was completely
hosed. Multiple nodes with broken disks -  don't ask...  Anyway - now I see
that the Cluster still exists with the hosts.  And I cant remove,
re-install etc the hosts, nor can I delete the cluster.  Help!

-- Chris
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] ovirt 3.5.2 issues with nodes becoming Non Operational

2015-08-12 Thread Chris Liebman
Hi,
   I'm new to oVirt and recently built a 10 node ovirt 3.5 DC with shared
storage using gluster configured as distributed-replicated (replication =
2).  Shortly after 7 of the 10 nodes dropped, one at a time over a few
hours, into Non Operational state.  Attempting to activate one of these
nodes gives the error:  Failed to connect Host ovirt-node260 to Storage
Pool LADC-TBX. Attempting to put the node into Maintenance eaves the node
stuck in Preparing For maintenance.

When I rebooted one of the nodes I see this in the nodes event list:

Host ovirt-node269 reports about one of the Active Storage Domains as
Problematic.

I see many of these errors in the vdsm log from the failed nodes:

Thread-1::ERROR::2015-08-12
 10:01:17,748::__init__::506::jsonrpc.JsonRpcServer::(_serveRequest)
 Internal server error

 Traceback (most recent call last):

   File /usr/lib/python2.6/site-packages/yajsonrpc/__init__.py, line 501,
 in _serveRequest

 res = method(**params)

   File /usr/share/vdsm/rpc/Bridge.py, line 267, in _dynamicMethod

 result = fn(*methodArgs)

   File /usr/share/vdsm/API.py, line 1330, in getStats

 stats.update(self._cif.mom.getKsmStats())

   File /usr/share/vdsm/momIF.py, line 60, in getKsmStats

 stats = self._mom.getStatistics()['host']

   File /usr/lib/python2.6/site-packages/mom/MOMFuncs.py, line 75, in
 getStatistics

 host_stats = self.threads['host_monitor'].interrogate().statistics[-1]

 AttributeError: 'NoneType' object has no attribute 'statistics'

Any help here is appreciated.

-- Chris
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] ovirt 3.5.2 issues with nodes becoming Non Operational

2015-08-12 Thread Chris Liebman
I may have figured this out.  The systems that failed are running the
Oracle unbreakable kernel:

3.8.13-98.el6uek.x86_64

The working systems are running the default CentOS 6 2.6 kernel.

and the error from the vdsm.log only show up on the UEK kernel.

-- Chris




On Wed, Aug 12, 2015 at 9:34 AM, Chris Liebman chri...@taboola.com wrote:

 Hi,
I'm new to oVirt and recently built a 10 node ovirt 3.5 DC with shared
 storage using gluster configured as distributed-replicated (replication =
 2).  Shortly after 7 of the 10 nodes dropped, one at a time over a few
 hours, into Non Operational state.  Attempting to activate one of these
 nodes gives the error:  Failed to connect Host ovirt-node260 to Storage
 Pool LADC-TBX. Attempting to put the node into Maintenance eaves the node
 stuck in Preparing For maintenance.

 When I rebooted one of the nodes I see this in the nodes event list:

 Host ovirt-node269 reports about one of the Active Storage Domains as
 Problematic.

 I see many of these errors in the vdsm log from the failed nodes:

 Thread-1::ERROR::2015-08-12
 10:01:17,748::__init__::506::jsonrpc.JsonRpcServer::(_serveRequest)
 Internal server error

 Traceback (most recent call last):

   File /usr/lib/python2.6/site-packages/yajsonrpc/__init__.py, line
 501, in _serveRequest

 res = method(**params)

   File /usr/share/vdsm/rpc/Bridge.py, line 267, in _dynamicMethod

 result = fn(*methodArgs)

   File /usr/share/vdsm/API.py, line 1330, in getStats

 stats.update(self._cif.mom.getKsmStats())

   File /usr/share/vdsm/momIF.py, line 60, in getKsmStats

 stats = self._mom.getStatistics()['host']

   File /usr/lib/python2.6/site-packages/mom/MOMFuncs.py, line 75, in
 getStatistics

 host_stats = self.threads['host_monitor'].interrogate().statistics[-1]

 AttributeError: 'NoneType' object has no attribute 'statistics'

 Any help here is appreciated.

 -- Chris


___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users