[ovirt-users] Unable to reactivate host after reboot due to failed Gluster probe

2015-01-29 Thread Jan Siml

Hello,

we have a strange behavior within an oVirt cluster. Version is 3.5.1, 
engine is running on EL6 machine and hosts are using EL7 as operating 
system. The cluster uses a GlusterFS backed storage domain amongst 
others. Three of four hosts are peers in the Gluster cluster (3 bricks, 
3 replica).


When all hosts are restarted (maybe due to power outage), engine can't 
activate them again, because Gluster probe fails. The message given in 
UI is:


Gluster command [gluster peer node-03] failed on server node-03.

Checking Gluster peer and volume status on each host confirms that 
Gluster peers are known to each other and volume is up.


node-03:~ $ gluster peer status
Number of Peers: 2

Hostname: node-02
Uuid: 3fc36f55-d3a2-4efc-b2f0-31f83ed709d9
State: Peer in Cluster (Connected)

Hostname: node-01
Uuid: 18027b35-971b-4b21-bb3d-df252b4dd525
State: Peer in Cluster (Connected)

node-03:~ $ gluster volume status
Status of volume: glusterfs-1
Gluster process PortOnline  Pid
--
Brick node-01:/export/glusterfs/brick   49152   Y   12409
Brick node-02:/export/glusterfs/brick   49153   Y   9978
Brick node-03:/export/glusterfs/brick   49152   Y   10001
Self-heal Daemon on localhost   N/A Y   10003
Self-heal Daemon on node-01 N/A Y   11590
Self-heal Daemon on node-02 N/A Y   9988

Task Status of Volume glusterfs-1
--
There are no active volume tasks

Storage domain in oVirt UI is fine (active and green) and usable. But 
neither Gluster volume nor any brick is visible in UI.


If I try the command which is shown in UI it returns:

root@node-03:~ $ gluster peer probe node-03
peer probe: success. Probe on localhost not needed

root@node-03:~ $ gluster --mode=script peer probe node-03 --xml
?xml version=1.0 encoding=UTF-8 standalone=yes?
cliOutput
  opRet0/opRet
  opErrno1/opErrno
  opErrstr(null)/opErrstr
  outputProbe on localhost not needed/output
/cliOutput

Is this maybe just an engine side parsing error?

--
Kind regards

Jan Siml
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Unable to reactivate host after reboot due to failed Gluster probe

2015-01-29 Thread Jan Siml

Hello,

finally I got the nodes online. What helps was probing the not needed 
peer node-04 (no brick) from one of the other cluster nodes. When the 
node becames a Gluster peer, I am able to activate any oVirt node which 
serves bricks.


Therefore I assume, the error message which the UI returns comes from 
node-04:


root@node-04:~ $ gluster peer probe node-01
peer probe: failed: Probe returned with unknown errno 107

root@node-03:~ $ gluster peer status
Number of Peers: 2

Hostname: node-01
Uuid: 18027b35-971b-4b21-bb3d-df252b4dd525
State: Peer in Cluster (Connected)

Hostname: node-02
Uuid: 3fc36f55-d3a2-4efc-b2f0-31f83ed709d9
State: Peer in Cluster (Connected)

root@node-03:~ $ gluster peer probe node-04
peer probe: success.

root@node-03:~ $ gluster peer status
Number of Peers: 3

Hostname: node-01
Uuid: 18027b35-971b-4b21-bb3d-df252b4dd525
State: Peer in Cluster (Connected)

Hostname: node-02
Uuid: 3fc36f55-d3a2-4efc-b2f0-31f83ed709d9
State: Peer in Cluster (Connected)

Hostname: node-04
Uuid: 9cdefc68-d710-4346-93b1-76b5307e258b
State: Peer in Cluster (Connected)

This (oVirt's behavior) seems to be reproducible.

On 01/29/2015 11:10 AM, Jan Siml wrote:

Hello,

when looking into engine.log, I can see, that gluster probe returned
errno 107. But I can't figure out why:

2015-01-29 10:40:03,546 ERROR
[org.ovirt.engine.core.bll.InitVdsOnUpCommand]
(DefaultQuartzScheduler_Worker-59) [5977aac5] Could not peer probe the
gluster server node-03. Error: VdcBLLException: org.ovirt.eng
ine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException:
VDSErrorException: Failed to AddGlusterServerVDS, error = Add host failed
error: Probe returned with unknown errno 107

Just for the record: We use the /etc/hosts method because of missing
possibility to choose the network interface for Gluster. The three
Gluster peer hosts have modified /etc/hosts files with addresses binded
to a different interface than the ovirtmgmt addresses.

Example:

root@node-03:~ $ cat /etc/hosts
192.168.200.195  node-01
192.168.200.196  node-02
192.168.200.198  node-03

The /etc/hosts file on engine host isn't modified.


On 01/29/2015 10:39 AM, Jan Siml wrote:

Hello,

we have a strange behavior within an oVirt cluster. Version is 3.5.1,
engine is running on EL6 machine and hosts are using EL7 as operating
system. The cluster uses a GlusterFS backed storage domain amongst
others. Three of four hosts are peers in the Gluster cluster (3 bricks,
3 replica).

When all hosts are restarted (maybe due to power outage), engine can't
activate them again, because Gluster probe fails. The message given in
UI is:

Gluster command [gluster peer node-03] failed on server node-03.

Checking Gluster peer and volume status on each host confirms that
Gluster peers are known to each other and volume is up.

node-03:~ $ gluster peer status
Number of Peers: 2

Hostname: node-02
Uuid: 3fc36f55-d3a2-4efc-b2f0-31f83ed709d9
State: Peer in Cluster (Connected)

Hostname: node-01
Uuid: 18027b35-971b-4b21-bb3d-df252b4dd525
State: Peer in Cluster (Connected)

node-03:~ $ gluster volume status
Status of volume: glusterfs-1
Gluster processPortOnlinePid
--


Brick node-01:/export/glusterfs/brick   49152Y12409
Brick node-02:/export/glusterfs/brick49153Y9978
Brick node-03:/export/glusterfs/brick49152Y10001
Self-heal Daemon on localhostN/AY10003
Self-heal Daemon on node-01N/AY11590
Self-heal Daemon on node-02N/AY9988

Task Status of Volume glusterfs-1
--


There are no active volume tasks

Storage domain in oVirt UI is fine (active and green) and usable. But
neither Gluster volume nor any brick is visible in UI.

If I try the command which is shown in UI it returns:

root@node-03:~ $ gluster peer probe node-03
peer probe: success. Probe on localhost not needed

root@node-03:~ $ gluster --mode=script peer probe node-03 --xml
?xml version=1.0 encoding=UTF-8 standalone=yes?
cliOutput
   opRet0/opRet
   opErrno1/opErrno
   opErrstr(null)/opErrstr
   outputProbe on localhost not needed/output
/cliOutput

Is this maybe just an engine side parsing error?





--
Kind regards

Jan Siml
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Unable to reactivate host after reboot due to failed Gluster probe

2015-01-29 Thread Jan Siml

Hello,

when looking into engine.log, I can see, that gluster probe returned 
errno 107. But I can't figure out why:


2015-01-29 10:40:03,546 ERROR 
[org.ovirt.engine.core.bll.InitVdsOnUpCommand] 
(DefaultQuartzScheduler_Worker-59) [5977aac5] Could not peer probe the 
gluster server node-03. Error: VdcBLLException: org.ovirt.eng
ine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: 
VDSErrorException: Failed to AddGlusterServerVDS, error = Add host failed

error: Probe returned with unknown errno 107

Just for the record: We use the /etc/hosts method because of missing 
possibility to choose the network interface for Gluster. The three 
Gluster peer hosts have modified /etc/hosts files with addresses binded 
to a different interface than the ovirtmgmt addresses.


Example:

root@node-03:~ $ cat /etc/hosts
192.168.200.195  node-01
192.168.200.196  node-02
192.168.200.198  node-03

The /etc/hosts file on engine host isn't modified.


On 01/29/2015 10:39 AM, Jan Siml wrote:

Hello,

we have a strange behavior within an oVirt cluster. Version is 3.5.1,
engine is running on EL6 machine and hosts are using EL7 as operating
system. The cluster uses a GlusterFS backed storage domain amongst
others. Three of four hosts are peers in the Gluster cluster (3 bricks,
3 replica).

When all hosts are restarted (maybe due to power outage), engine can't
activate them again, because Gluster probe fails. The message given in
UI is:

Gluster command [gluster peer node-03] failed on server node-03.

Checking Gluster peer and volume status on each host confirms that
Gluster peers are known to each other and volume is up.

node-03:~ $ gluster peer status
Number of Peers: 2

Hostname: node-02
Uuid: 3fc36f55-d3a2-4efc-b2f0-31f83ed709d9
State: Peer in Cluster (Connected)

Hostname: node-01
Uuid: 18027b35-971b-4b21-bb3d-df252b4dd525
State: Peer in Cluster (Connected)

node-03:~ $ gluster volume status
Status of volume: glusterfs-1
Gluster processPortOnlinePid
--

Brick node-01:/export/glusterfs/brick   49152Y12409
Brick node-02:/export/glusterfs/brick49153Y9978
Brick node-03:/export/glusterfs/brick49152Y10001
Self-heal Daemon on localhostN/AY10003
Self-heal Daemon on node-01N/AY11590
Self-heal Daemon on node-02N/AY9988

Task Status of Volume glusterfs-1
--

There are no active volume tasks

Storage domain in oVirt UI is fine (active and green) and usable. But
neither Gluster volume nor any brick is visible in UI.

If I try the command which is shown in UI it returns:

root@node-03:~ $ gluster peer probe node-03
peer probe: success. Probe on localhost not needed

root@node-03:~ $ gluster --mode=script peer probe node-03 --xml
?xml version=1.0 encoding=UTF-8 standalone=yes?
cliOutput
   opRet0/opRet
   opErrno1/opErrno
   opErrstr(null)/opErrstr
   outputProbe on localhost not needed/output
/cliOutput

Is this maybe just an engine side parsing error?



--
Kind regards

Jan Siml
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Unable to reactivate host after reboot due to failed Gluster probe

2015-01-29 Thread Shubhendu Tripathi

On 01/29/2015 04:26 PM, Jan Siml wrote:

Hello,

finally I got the nodes online. What helps was probing the not needed 
peer node-04 (no brick) from one of the other cluster nodes. When the 
node becames a Gluster peer, I am able to activate any oVirt node 
which serves bricks.


Therefore I assume, the error message which the UI returns comes from 
node-04:


Yes, this could be an issue as all other successful cases, the value for 
opErrno is retruned as 0 and opErrStr is blank.

I feel this scenario is treated as an error engine side.



root@node-04:~ $ gluster peer probe node-01
peer probe: failed: Probe returned with unknown errno 107

root@node-03:~ $ gluster peer status
Number of Peers: 2

Hostname: node-01
Uuid: 18027b35-971b-4b21-bb3d-df252b4dd525
State: Peer in Cluster (Connected)

Hostname: node-02
Uuid: 3fc36f55-d3a2-4efc-b2f0-31f83ed709d9
State: Peer in Cluster (Connected)

root@node-03:~ $ gluster peer probe node-04
peer probe: success.

root@node-03:~ $ gluster peer status
Number of Peers: 3

Hostname: node-01
Uuid: 18027b35-971b-4b21-bb3d-df252b4dd525
State: Peer in Cluster (Connected)

Hostname: node-02
Uuid: 3fc36f55-d3a2-4efc-b2f0-31f83ed709d9
State: Peer in Cluster (Connected)

Hostname: node-04
Uuid: 9cdefc68-d710-4346-93b1-76b5307e258b
State: Peer in Cluster (Connected)

This (oVirt's behavior) seems to be reproducible.

On 01/29/2015 11:10 AM, Jan Siml wrote:

Hello,

when looking into engine.log, I can see, that gluster probe returned
errno 107. But I can't figure out why:

2015-01-29 10:40:03,546 ERROR
[org.ovirt.engine.core.bll.InitVdsOnUpCommand]
(DefaultQuartzScheduler_Worker-59) [5977aac5] Could not peer probe the
gluster server node-03. Error: VdcBLLException: org.ovirt.eng
ine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException:
VDSErrorException: Failed to AddGlusterServerVDS, error = Add host 
failed

error: Probe returned with unknown errno 107

Just for the record: We use the /etc/hosts method because of missing
possibility to choose the network interface for Gluster. The three
Gluster peer hosts have modified /etc/hosts files with addresses binded
to a different interface than the ovirtmgmt addresses.

Example:

root@node-03:~ $ cat /etc/hosts
192.168.200.195  node-01
192.168.200.196  node-02
192.168.200.198  node-03

The /etc/hosts file on engine host isn't modified.


On 01/29/2015 10:39 AM, Jan Siml wrote:

Hello,

we have a strange behavior within an oVirt cluster. Version is 3.5.1,
engine is running on EL6 machine and hosts are using EL7 as operating
system. The cluster uses a GlusterFS backed storage domain amongst
others. Three of four hosts are peers in the Gluster cluster (3 bricks,
3 replica).

When all hosts are restarted (maybe due to power outage), engine can't
activate them again, because Gluster probe fails. The message given in
UI is:

Gluster command [gluster peer node-03] failed on server node-03.

Checking Gluster peer and volume status on each host confirms that
Gluster peers are known to each other and volume is up.

node-03:~ $ gluster peer status
Number of Peers: 2

Hostname: node-02
Uuid: 3fc36f55-d3a2-4efc-b2f0-31f83ed709d9
State: Peer in Cluster (Connected)

Hostname: node-01
Uuid: 18027b35-971b-4b21-bb3d-df252b4dd525
State: Peer in Cluster (Connected)

node-03:~ $ gluster volume status
Status of volume: glusterfs-1
Gluster processPortOnlinePid
-- 




Brick node-01:/export/glusterfs/brick   49152Y 12409
Brick node-02:/export/glusterfs/brick49153Y 9978
Brick node-03:/export/glusterfs/brick49152Y 10001
Self-heal Daemon on localhostN/AY10003
Self-heal Daemon on node-01N/AY11590
Self-heal Daemon on node-02N/AY9988

Task Status of Volume glusterfs-1
-- 




There are no active volume tasks

Storage domain in oVirt UI is fine (active and green) and usable. But
neither Gluster volume nor any brick is visible in UI.

If I try the command which is shown in UI it returns:

root@node-03:~ $ gluster peer probe node-03
peer probe: success. Probe on localhost not needed

root@node-03:~ $ gluster --mode=script peer probe node-03 --xml
?xml version=1.0 encoding=UTF-8 standalone=yes?
cliOutput
   opRet0/opRet
   opErrno1/opErrno
   opErrstr(null)/opErrstr
   outputProbe on localhost not needed/output
/cliOutput

Is this maybe just an engine side parsing error?







___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users