Re: [Pacemaker] One more globally-unique clone question

2015-01-19 Thread Andrew Beekhof

 On 17 Jan 2015, at 1:25 am, Vladislav Bogdanov bub...@hoster-ok.com wrote:
 
 Hi all,
 
 Trying to reproduce problem with early stop of globally-unique clone 
 instances during move to another node I found one more interesting problem.
 
 Due to the different order of resources in the CIB and extensive use of 
 constraints between other resources (odd number of resources cluster-wide) 
 two CLUSTERIP instances are always allocated to the same node in the new 
 testing cluster.

Ah, so this is why broker-vips:1 was moving.

 
 What would be the best/preferred way to make them run on different nodes by 
 default?

By default they will.
I'm assuming its the constraints that are preventing this.

Getting them to auto-rebalance is the harder problem

 
 I see following options:
 * Raise priority of globally-unique clone so its instances are always 
 allocated first of all.
 * Use utilization attributes (with high values for nodes and low values for 
 cluster resources).
 * Anything else?
 
 If I configure virtual IPs one-by-one (without clone), I can add a colocation 
 constraint with negative score between them. I do not see a way to scale that 
 setup well though (5-10 IPs).
 So, what would be the best option to achieve the same with globally-unique 
 cloned resource?
 May be there should be some internal preference/colocation not to place them 
 together (like default stickiness=1 for clones)?
 Or even allow special negative colocation constraint and the same resource in 
 both 'what' and 'with'
 (colocation col1 -1: clone clone)?
 
 Best,
 Vladislav
 
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Unique clone instance is stopped too early on move

2015-01-19 Thread Andrew Beekhof

 On 16 Jan 2015, at 3:59 pm, Vladislav Bogdanov bub...@hoster-ok.com wrote:
 
 16.01.2015 07:44, Andrew Beekhof wrote:
 
 On 15 Jan 2015, at 3:11 pm, Vladislav Bogdanov bub...@hoster-ok.com wrote:
 
 13.01.2015 11:32, Andrei Borzenkov wrote:
 On Tue, Jan 13, 2015 at 10:20 AM, Vladislav Bogdanov
 bub...@hoster-ok.com wrote:
 Hi Andrew, David, all.
 
 I found a little bit strange operation ordering during transition 
 execution.
 
 Could you please look at the following partial configuration (crmsh 
 syntax)?
 
 ===
 ...
 clone cl-broker broker \
 meta interleave=true target-role=Started
 clone cl-broker-vips broker-vips \
 meta clone-node-max=2 globally-unique=true interleave=true 
 resource-stickiness=0 target-role=Started
 clone cl-ctdb ctdb \
 meta interleave=true target-role=Started
 colocation broker-vips-with-broker inf: cl-broker-vips cl-broker
 colocation broker-with-ctdb inf: cl-broker cl-ctdb
 order broker-after-ctdb inf: cl-ctdb cl-broker
 order broker-vips-after-broker 0: cl-broker cl-broker-vips
 ...
 ===
 
 After I put one node to standby and then back to online, I see the 
 following transition (relevant excerpt):
 
 ===
  * Pseudo action:   cl-broker-vips_stop_0
  * Resource action: broker-vips:1   stop on c-pa-0
  * Pseudo action:   cl-broker-vips_stopped_0
  * Pseudo action:   cl-ctdb_start_0
  * Resource action: ctdbstart on c-pa-1
  * Pseudo action:   cl-ctdb_running_0
  * Pseudo action:   cl-broker_start_0
  * Resource action: ctdbmonitor=1 on c-pa-1
  * Resource action: broker  start on c-pa-1
  * Pseudo action:   cl-broker_running_0
  * Pseudo action:   cl-broker-vips_start_0
  * Resource action: broker  monitor=1 on c-pa-1
  * Resource action: broker-vips:1   start on c-pa-1
  * Pseudo action:   cl-broker-vips_running_0
  * Resource action: broker-vips:1   monitor=3 on c-pa-1
 ===
 
 What could be a reason to stop unique clone instance so early for move?
 
 
 Do not take it as definitive answer, but cl-broker-vips cannot run
 unless both other resources are started. So if you compute closure of
 all required transitions it looks rather logical. Having
 cl-broker-vips started while broker is still stopped would violate
 constraint.
 
 Problem is that broker-vips:1 is stopped on one (source) node unnecessarily 
 early.
 
 It looks to be moving from c-pa-0 to c-pa-1
 It might be unnecessarily early, but it is what you asked for... we have to 
 unwind the resource stack before we can build it up.
 
 Yes, I understand that it is valid, but could its stop be delayed until 
 cluster is in the state when all dependencies are satisfied to start it on 
 another node (like migration?)?

No, because we have to unwind the resource stack before we can build it up.
Doing anything else would be one of those things that is trivial for a human to 
identify but rather complex for a computer.

Better to look at why broker-vips:1 needed to be moved.

 
 Like:
 ===
 * Pseudo action:   cl-ctdb_start_0
 * Resource action: ctdbstart on c-pa-1
 * Pseudo action:   cl-ctdb_running_0
 * Pseudo action:   cl-broker_start_0
 * Resource action: ctdbmonitor=1 on c-pa-1
 * Resource action: broker  start on c-pa-1
 * Pseudo action:   cl-broker_running_0
 * Pseudo action:   cl-broker-vips_start_0
 * Resource action: broker  monitor=1 on c-pa-1
 * Pseudo action:   cl-broker-vips_stop_0
 * Resource action: broker-vips:1   stop on c-pa-0
 * Pseudo action:   cl-broker-vips_stopped_0
 * Resource action: broker-vips:1   start on c-pa-1
 * Pseudo action:   cl-broker-vips_running_0
 * Resource action: broker-vips:1   monitor=3 on c-pa-1
 ===
 That would be the great optimization toward five nines...
 
 Best,
 Vladislav
 
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [corosync] CoroSync's UDPu transport for public IP addresses?

2015-01-19 Thread Jan Friesse

Dmitry,



Great, it works! Thank you.

It would be extremely helpful if this information will be included in a
default corosync.conf as comments:
- regarding allowed and even preferred absense of totem.interface in case
of UDPu


Yep


- that quorum section must not be empty, and that the default quorum.provider
could be corosync_votequorum (but not empty).


This is not entirely true. quorum.provider cannot be empty string, or 
generally must be valid provider like corosync_votequorum. But 
unspecified quorum.provider works without any problem (as in example 
configuration file). Truth is, that Pacemaker must then be configured in 
a way that quorum is not required.


Regards,
  Honza



It would help to install and launch corosync instantly by novices.


On Fri, Jan 16, 2015 at 7:31 PM, Jan Friesse jfrie...@redhat.com wrote:


Dmitry Koterov napsal(a):




  such messages (for now). But, anyway, DNS names in ringX_addr seem not

working, and no relevant messages are in default logs. Maybe add some
validations for ringX_addr?

I'm having resolvable DNS names:

root@node1:/etc/corosync# ping -c1 -W100 node1 | grep from
64 bytes from node1 (127.0.1.1): icmp_seq=1 ttl=64 time=0.039 ms



This is problem. Resolving node1 to localhost (127.0.0.1) is simply
wrong. Names you want to use in corosync.conf should resolve to
interface address. I believe other nodes has similar setting (so node2
resolved on node2 is again 127.0.0.1)



Wow! What a shame! How could I miss it... So you're absolutely right,
thanks: that was the cause, an entry in /etc/hosts. On some machines I
removed it manually, but on others - didn't. Now I do it automatically
by sed -i -r /^.*[[:space:]]$host([[:space:]]|\$)/d /etc/hosts in the
initialization script.

I apologize for the mess.

So now I have only one place in corosync.conf where I need to specify a
plain IP address for UDPu: totem.interface.bindnetaddr. If I specify
0.0.0.0 there, I'm having a message Service engine 'corosync_quorum'
failed to load for reason 'configuration error: nodelist or
quorum.expected_votes must be configured!' in the logs (BTW it does not
say that I mistaked in bindnetaddr). Is there a way to completely untie
from IP addresses?



You can just remove whole interface section completely. Corosync will find
correct address from nodelist.

Regards,
   Honza






  Please try to fix this problem first and let's see if this will solve

issue you are hitting.

Regards,
Honza

  root@node1:/etc/corosync# ping -c1 -W100 node2 | grep from

64 bytes from node2 (188.166.54.190): icmp_seq=1 ttl=55 time=88.3 ms

root@node1:/etc/corosync# ping -c1 -W100 node3 | grep from
64 bytes from node3 (128.199.116.218): icmp_seq=1 ttl=51 time=252 ms


With corosync.conf below, nothing works:
...
nodelist {
node {
  ring0_addr: node1
}
node {
  ring0_addr: node2
}
node {
  ring0_addr: node3
}
}
...
Jan 14 10:47:44 node1 corosync[15061]:  [MAIN  ] Corosync Cluster Engine
('2.3.3'): started and ready to provide service.
Jan 14 10:47:44 node1 corosync[15061]:  [MAIN  ] Corosync built-in
features: dbus testagents rdma watchdog augeas pie relro bindnow
Jan 14 10:47:44 node1 corosync[15062]:  [TOTEM ] Initializing transport
(UDP/IP Unicast).
Jan 14 10:47:44 node1 corosync[15062]:  [TOTEM ] Initializing
transmit/receive security (NSS) crypto: aes256 hash: sha1
Jan 14 10:47:44 node1 corosync[15062]:  [TOTEM ] The network interface
[a.b.c.d] is now up.
Jan 14 10:47:44 node1 corosync[15062]:  [SERV  ] Service engine loaded:
corosync configuration map access [0]
Jan 14 10:47:44 node1 corosync[15062]:  [QB] server name: cmap
Jan 14 10:47:44 node1 corosync[15062]:  [SERV  ] Service engine loaded:
corosync configuration service [1]
Jan 14 10:47:44 node1 corosync[15062]:  [QB] server name: cfg
Jan 14 10:47:44 node1 corosync[15062]:  [SERV  ] Service engine loaded:
corosync cluster closed process group service v1.01 [2]
Jan 14 10:47:44 node1 corosync[15062]:  [QB] server name: cpg
Jan 14 10:47:44 node1 corosync[15062]:  [SERV  ] Service engine loaded:
corosync profile loading service [4]
Jan 14 10:47:44 node1 corosync[15062]:  [WD] No Watchdog, try


modprobe


a watchdog
Jan 14 10:47:44 node1 corosync[15062]:  [WD] no resources
configured.
Jan 14 10:47:44 node1 corosync[15062]:  [SERV  ] Service engine loaded:
corosync watchdog service [7]
Jan 14 10:47:44 node1 corosync[15062]:  [QUORUM] Using quorum provider
corosync_votequorum
Jan 14 10:47:44 node1 corosync[15062]:  [QUORUM] Quorum provider:
corosync_votequorum failed to initialize.
Jan 14 10:47:44 node1 corosync[15062]:  [SERV  ] Service engine
'corosync_quorum' failed to load for reason 'configuration error:


nodelist


or quorum.expected_votes must be configured!'
Jan 14 10:47:44 node1 corosync[15062]:  [MAIN  ] Corosync Cluster Engine
exiting with status 20 at service.c:356.


But with IP addresses specified in ringX_addr, everything works:
...
nodelist {
node {
  

Re: [Pacemaker] One more globally-unique clone question

2015-01-19 Thread Vladislav Bogdanov

20.01.2015 02:47, Andrew Beekhof wrote:



On 17 Jan 2015, at 1:25 am, Vladislav Bogdanov
bub...@hoster-ok.com wrote:

Hi all,

Trying to reproduce problem with early stop of globally-unique
clone instances during move to another node I found one more
interesting problem.

Due to the different order of resources in the CIB and extensive
use of constraints between other resources (odd number of resources
cluster-wide) two CLUSTERIP instances are always allocated to the
same node in the new testing cluster.


Ah, so this is why broker-vips:1 was moving.


That are two different 2-node clusters with different order of resources.
In the first one broker-vips go after even number of resources, and one 
instance wants to return to a mother-node after it is brought back 
online, thus broker-vips:1 is moving.


In the second one, broker-vips go after odd number of resources 
(actually three more resources are allocated to one node due to 
constraints) and both boker-vips go to another node.






What would be the best/preferred way to make them run on different
nodes by default?


By default they will. I'm assuming its the constraints that are
preventing this.


I only see that they are allocated similar to any other resources.



Getting them to auto-rebalance is the harder problem


I see. Should it be possible to solve it without priority or utilization 
use?






I see following options:
* Raise priority of globally-unique clone so its instances are

 always allocated first of all.

* Use utilization attributes (with high values for nodes and low values
for cluster resources).

 * Anything else?


If I configure virtual IPs one-by-one (without clone), I can add a
colocation constraint with negative score between them. I do not
see a way to scale that setup well though (5-10 IPs). So, what
would be the best option to achieve the same with globally-unique
cloned resource? May be there should be some internal
preference/colocation not to place them together (like default
stickiness=1 for clones)? Or even allow special negative colocation
constraint and the same resource in both 'what' and 'with'
(colocation col1 -1: clone clone)?

Best, Vladislav


___ Pacemaker mailing
list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs:
http://bugs.clusterlabs.org



___ Pacemaker mailing
list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs:
http://bugs.clusterlabs.org




___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Corosync fails to start when NIC is absent

2015-01-19 Thread Kostiantyn Ponomarenko
One more thing to clarify.
You said rebind can be avoided - what does it mean?

Thank you,
Kostya

On Wed, Jan 14, 2015 at 1:31 PM, Kostiantyn Ponomarenko 
konstantin.ponomare...@gmail.com wrote:

 Thank you. Now I am aware of it.

 Thank you,
 Kostya

 On Wed, Jan 14, 2015 at 12:59 PM, Jan Friesse jfrie...@redhat.com wrote:

 Kostiantyn,

  Honza,
 
  Thank you for helping me.
  So, there is no defined behavior in case one of the interfaces is not in
  the system?

 You are right. There is no defined behavior.

 Regards,
   Honza


 
 
  Thank you,
  Kostya
 
  On Tue, Jan 13, 2015 at 12:01 PM, Jan Friesse jfrie...@redhat.com
 wrote:
 
  Kostiantyn,
 
 
  According to the https://access.redhat.com/solutions/638843 , the
  interface, that is defined in the corosync.conf, must be present in
 the
  system (see at the bottom of the article, section ROOT CAUSE).
  To confirm that I made a couple of tests.
 
  Here is a part of the corosync.conf file (in a free-write form) (also
  attached the origin config file):
  ===
  rrp_mode: passive
  ring0_addr is defined in corosync.conf
  ring1_addr is defined in corosync.conf
  ===
 
  ---
 
  Two-node cluster
 
  ---
 
  Test #1:
  --
  IP for ring0 is not defines in the system:
  --
  Start Corosync simultaneously on both nodes.
  Corosync fails to start.
  From the logs:
  Jan 08 09:43:56 [2992] A6-402-2 corosync error [MAIN ] parse error in
  config: No interfaces defined
  Jan 08 09:43:56 [2992] A6-402-2 corosync error [MAIN ] Corosync
 Cluster
  Engine exiting with status 8 at main.c:1343.
  Result: Corosync and Pacemaker are not running.
 
  Test #2:
  --
  IP for ring1 is not defines in the system:
  --
  Start Corosync simultaneously on both nodes.
  Corosync starts.
  Start Pacemaker simultaneously on both nodes.
  Pacemaker fails to start.
  From the logs, the last writes from the corosync:
  Jan 8 16:31:29 daemon.err27 corosync[3728]: [TOTEM ] Marking ringid
 0
  interface 169.254.1.3 FAULTY
  Jan 8 16:31:30 daemon.notice29 corosync[3728]: [TOTEM ]
 Automatically
  recovered ring 0
  Result: Corosync and Pacemaker are not running.
 
 
  Test #3:
 
  rrp_mode: active leads to the same result, except Corosync and
  Pacemaker
  init scripts return status running.
  But still vim /var/log/cluster/corosync.log shows a lot of errors
 like:
  Jan 08 16:30:47 [4067] A6-402-1 cib: error: pcmk_cpg_dispatch:
 Connection
  to the CPG API failed: Library error (2)
 
  Result: Corosync and Pacemaker show their statuses as running, but
  crm_mon cannot connect to the cluster database. And half of the
  Pacemaker's services are not running (including Cluster Information
 Base
  (CIB)).
 
 
  ---
 
  For a single node mode
 
  ---
 
  IP for ring0 is not defines in the system:
 
  Corosync fails to start.
 
  IP for ring1 is not defines in the system:
 
  Corosync and Pacemaker are started.
 
  It is possible that configuration will be applied successfully (50%),
 
  and it is possible that the cluster is not running any resources,
 
  and it is possible that the node cannot be put in a standby mode
 (shows:
  communication error),
 
  and it is possible that the cluster is running all resources, but
 applied
  configuration is not guaranteed to be fully loaded (some rules can be
  missed).
 
 
  ---
 
  Conclusions:
 
  ---
 
  It is possible that in some rare cases (see comments to the bug) the
  cluster will work, but in that case its working state is unstable and
 the
  cluster can stop working every moment.
 
 
  So, is it correct? Does my assumptions make any sense? I didn't any
 other
  explanation in the network ... .
 
  Corosync needs all interfaces during start and runtime. This doesn't
  mean they must be connected (this would make corosync unusable for
  physical NIC/Switch or cable failure), but they must be up and have
  correct ip.
 
  When this is not the case, corosync rebinds to localhost and weird
  things happens. Removal of this rebinding is long time TODO, but there
  are still more important bugs (especially because rebind can be
 avoided).
 
  Regards,
Honza
 
 
 
 
  Thank you,
  Kostya
 
  On Fri, Jan 9, 2015 at 11:10 AM, Kostiantyn Ponomarenko 
  konstantin.ponomare...@gmail.com wrote:
 
  Hi guys,
 
  Corosync fails to start if there is no such network interface
 configured
  in the system.
  Even with rrp_mode: passive the problem is the same when at least
 one
  network interface is not configured in the system.
 
  Is this the expected behavior?
  I thought that when you use redundant rings, it is enough to have at
  least
  

Re: [Pacemaker] no nodes on both hosts

2015-01-19 Thread Thomas Manninger

Hi,



i recompiled all packages, and now, it works...



But my /var/run/crm is also in the new installation empty, is this ok?



Gesendet:Montag, 19. Januar 2015 um 11:07 Uhr
Von:Thomas Manninger dbgtmas...@gmx.at
An:pacemaker@oss.clusterlabs.org
Betreff:Re: [Pacemaker] no nodes on both hosts




Hi,



now, i see, in /var/run/crm there are no socket- files, the directory is empty.



How can i debug the problem?



Gesendet:Montag, 19. Januar 2015 um 10:18 Uhr
Von:Thomas Manninger dbgtmas...@gmx.at
An:pacemaker@oss.clusterlabs.org
Betreff:Re: [Pacemaker] no nodes on both hosts




Hi,



in reinstalled in the same vm debian, and used the debian pacemaker  corosync packages, everything works fine.

Then, i recomilied the newest pacemaker  corosync packages with the same problem..



Jan 19 10:13:31 [24271] pacemaker2 cib: info: cib_process_request: Completed cib_modify operation for section nodes: OK (rc=0, origin=pacemaker2/crmd/3, version=0.0.0)

This line means, that the node is added in the cib.xml?



But there are no nodes:

root@pacemaker2:/var/lib/pacemaker/cib# cat cib.xml
cib crm_feature_set=3.0.9 validate-with=pacemaker-2.0 epoch=0 num_updates=0 admin_epoch=0 cib-last-written=Mon Jan 19 10:13:30 2015
 configuration
 crm_config/
 nodes/
 resources/
 constraints/
 /configuration
/cib



I also changed the permission of the cib folder to 777...



Someone can help me??
Thanks!



Gesendet:Freitag, 16. Januar 2015 um 16:51 Uhr
Von:Thomas Manninger dbgtmas...@gmx.at
An:pacemaker@oss.clusterlabs.org
Betreff:[Pacemaker] no nodes on both hosts



Hi,



i use debian 7.



At first, i use the standard packages of debian, and pacemaker works perfect.



Now, i compiled my own packages, because i need pacemaker_remote. Since i use my compiled version, pacemaker see no nodes!



corosync2 lists both hosts:

root@pacemaker1:/var/lib/pacemaker/cib# corosync-cmapctl  grep members
runtime.totem.pg.mrp.srp.members.181614346.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.181614346.ip (str) = r(0) ip(10.211.55.10)
runtime.totem.pg.mrp.srp.members.181614346.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.181614346.status (str) = joined
runtime.totem.pg.mrp.srp.members.181614347.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.181614347.ip (str) = r(0) ip(10.211.55.11)
runtime.totem.pg.mrp.srp.members.181614347.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.181614347.status (str) = joined




root@pacemaker1:/var/lib/pacemaker/cib# crm_mon -1
Last updated: Fri Jan 16 16:49:10 2015
Last change: Fri Jan 16 16:05:15 2015
Current DC: NONE
0 Nodes configured
0 Resources configured



uname -n returns pacemaker1 / pacemaker2.



Logfile is attached.



corosync.conf:


totem {
 version: 2

 token: 5000

 # crypto_cipher and crypto_hash: Used for mutual node authentication.
 # If you choose to enable this, then do remember to create a shared
 # secret with corosync-keygen.
 # enabling crypto_cipher, requires also enabling of crypto_hash.
 crypto_cipher: none
 crypto_hash: none

 # interface: define at least one interface to communicate
 # over. If you define more than one interface stanza, you must
 # also set rrp_mode.
 interface {
 # Rings must be consecutively numbered, starting at 0.
  ringnumber: 0
  # This is normally the *network* address of the
  # interface to bind to. This ensures that you can use
  # identical instances of this configuration file
  # across all your cluster nodes, without having to
  # modify this option.
  bindnetaddr: 10.211.55.10
  # However, if you have multiple physical network
  # interfaces configured for the same subnet, then the
  # network address alone is not sufficient to identify
  # the interface Corosync should bind to. In that case,
  # configure the *host* address of the interface
  # instead:
  # bindnetaddr: 192.168.1.1
  # When selecting a multicast address, consider RFC
  # 2365 (which, among other things, specifies that
  # 239.255.x.x addresses are left to the discretion of
  # the network administrator). Do not reuse multicast
  # addresses across multiple Corosync clusters sharing
  # the same network.
  mcastaddr: 239.255.1.1
  # Corosync uses the port you specify here for UDP
  # messaging, and also the immediately preceding
  # port. Thus if you set this to 5405, Corosync sends
  # messages over UDP ports 5405 and 5404.
  mcastport: 5405
  # Time-to-live for cluster communication packets. The
  # number of hops (routers) that this ring will allow
  # itself to pass. Note that multicast routing must be
  # specifically enabled on most network routers.
  ttl: 1
 }
}

logging {
 # Log the source file and line where messages are being
 # generated. When in doubt, leave off. Potentially useful for
 # debugging.
 fileline: off
 # Log to standard error. When in doubt, set to no. Useful when
 # running in the foreground (when invoking corosync -f)
 to_stderr: no
 # Log to a log file. When set to no, the logfile option
 # must not be set.
 

Re: [Pacemaker] no nodes on both hosts

2015-01-19 Thread Thomas Manninger

Hi,



now, i see, in /var/run/crm there are no socket- files, the directory is empty.



How can i debug the problem?



Gesendet:Montag, 19. Januar 2015 um 10:18 Uhr
Von:Thomas Manninger dbgtmas...@gmx.at
An:pacemaker@oss.clusterlabs.org
Betreff:Re: [Pacemaker] no nodes on both hosts




Hi,



in reinstalled in the same vm debian, and used the debian pacemaker  corosync packages, everything works fine.

Then, i recomilied the newest pacemaker  corosync packages with the same problem..



Jan 19 10:13:31 [24271] pacemaker2 cib: info: cib_process_request: Completed cib_modify operation for section nodes: OK (rc=0, origin=pacemaker2/crmd/3, version=0.0.0)

This line means, that the node is added in the cib.xml?



But there are no nodes:

root@pacemaker2:/var/lib/pacemaker/cib# cat cib.xml
cib crm_feature_set=3.0.9 validate-with=pacemaker-2.0 epoch=0 num_updates=0 admin_epoch=0 cib-last-written=Mon Jan 19 10:13:30 2015
 configuration
 crm_config/
 nodes/
 resources/
 constraints/
 /configuration
/cib



I also changed the permission of the cib folder to 777...



Someone can help me??
Thanks!



Gesendet:Freitag, 16. Januar 2015 um 16:51 Uhr
Von:Thomas Manninger dbgtmas...@gmx.at
An:pacemaker@oss.clusterlabs.org
Betreff:[Pacemaker] no nodes on both hosts



Hi,



i use debian 7.



At first, i use the standard packages of debian, and pacemaker works perfect.



Now, i compiled my own packages, because i need pacemaker_remote. Since i use my compiled version, pacemaker see no nodes!



corosync2 lists both hosts:

root@pacemaker1:/var/lib/pacemaker/cib# corosync-cmapctl  grep members
runtime.totem.pg.mrp.srp.members.181614346.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.181614346.ip (str) = r(0) ip(10.211.55.10)
runtime.totem.pg.mrp.srp.members.181614346.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.181614346.status (str) = joined
runtime.totem.pg.mrp.srp.members.181614347.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.181614347.ip (str) = r(0) ip(10.211.55.11)
runtime.totem.pg.mrp.srp.members.181614347.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.181614347.status (str) = joined




root@pacemaker1:/var/lib/pacemaker/cib# crm_mon -1
Last updated: Fri Jan 16 16:49:10 2015
Last change: Fri Jan 16 16:05:15 2015
Current DC: NONE
0 Nodes configured
0 Resources configured



uname -n returns pacemaker1 / pacemaker2.



Logfile is attached.



corosync.conf:


totem {
 version: 2

 token: 5000

 # crypto_cipher and crypto_hash: Used for mutual node authentication.
 # If you choose to enable this, then do remember to create a shared
 # secret with corosync-keygen.
 # enabling crypto_cipher, requires also enabling of crypto_hash.
 crypto_cipher: none
 crypto_hash: none

 # interface: define at least one interface to communicate
 # over. If you define more than one interface stanza, you must
 # also set rrp_mode.
 interface {
 # Rings must be consecutively numbered, starting at 0.
  ringnumber: 0
  # This is normally the *network* address of the
  # interface to bind to. This ensures that you can use
  # identical instances of this configuration file
  # across all your cluster nodes, without having to
  # modify this option.
  bindnetaddr: 10.211.55.10
  # However, if you have multiple physical network
  # interfaces configured for the same subnet, then the
  # network address alone is not sufficient to identify
  # the interface Corosync should bind to. In that case,
  # configure the *host* address of the interface
  # instead:
  # bindnetaddr: 192.168.1.1
  # When selecting a multicast address, consider RFC
  # 2365 (which, among other things, specifies that
  # 239.255.x.x addresses are left to the discretion of
  # the network administrator). Do not reuse multicast
  # addresses across multiple Corosync clusters sharing
  # the same network.
  mcastaddr: 239.255.1.1
  # Corosync uses the port you specify here for UDP
  # messaging, and also the immediately preceding
  # port. Thus if you set this to 5405, Corosync sends
  # messages over UDP ports 5405 and 5404.
  mcastport: 5405
  # Time-to-live for cluster communication packets. The
  # number of hops (routers) that this ring will allow
  # itself to pass. Note that multicast routing must be
  # specifically enabled on most network routers.
  ttl: 1
 }
}

logging {
 # Log the source file and line where messages are being
 # generated. When in doubt, leave off. Potentially useful for
 # debugging.
 fileline: off
 # Log to standard error. When in doubt, set to no. Useful when
 # running in the foreground (when invoking corosync -f)
 to_stderr: no
 # Log to a log file. When set to no, the logfile option
 # must not be set.
 to_logfile: yes
 logfile: /var/log/cluster/corosync.log
 # Log to the system log daemon. When in doubt, set to yes.
 to_syslog: no
 # Log debug messages (very verbose). When in doubt, leave off.
 debug: on
 # Log messages with time stamps. When in doubt, set to on
 # (unless you are only logging to 

Re: [Pacemaker] no nodes on both hosts

2015-01-19 Thread Thomas Manninger

Hi,



in reinstalled in the same vm debian, and used the debian pacemaker  corosync packages, everything works fine.

Then, i recomilied the newest pacemaker  corosync packages with the same problem..



Jan 19 10:13:31 [24271] pacemaker2 cib: info: cib_process_request: Completed cib_modify operation for section nodes: OK (rc=0, origin=pacemaker2/crmd/3, version=0.0.0)

This line means, that the node is added in the cib.xml?



But there are no nodes:

root@pacemaker2:/var/lib/pacemaker/cib# cat cib.xml
cib crm_feature_set=3.0.9 validate-with=pacemaker-2.0 epoch=0 num_updates=0 admin_epoch=0 cib-last-written=Mon Jan 19 10:13:30 2015
 configuration
 crm_config/
 nodes/
 resources/
 constraints/
 /configuration
/cib



I also changed the permission of the cib folder to 777...



Someone can help me??
Thanks!



Gesendet:Freitag, 16. Januar 2015 um 16:51 Uhr
Von:Thomas Manninger dbgtmas...@gmx.at
An:pacemaker@oss.clusterlabs.org
Betreff:[Pacemaker] no nodes on both hosts



Hi,



i use debian 7.



At first, i use the standard packages of debian, and pacemaker works perfect.



Now, i compiled my own packages, because i need pacemaker_remote. Since i use my compiled version, pacemaker see no nodes!



corosync2 lists both hosts:

root@pacemaker1:/var/lib/pacemaker/cib# corosync-cmapctl  grep members
runtime.totem.pg.mrp.srp.members.181614346.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.181614346.ip (str) = r(0) ip(10.211.55.10)
runtime.totem.pg.mrp.srp.members.181614346.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.181614346.status (str) = joined
runtime.totem.pg.mrp.srp.members.181614347.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.181614347.ip (str) = r(0) ip(10.211.55.11)
runtime.totem.pg.mrp.srp.members.181614347.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.181614347.status (str) = joined




root@pacemaker1:/var/lib/pacemaker/cib# crm_mon -1
Last updated: Fri Jan 16 16:49:10 2015
Last change: Fri Jan 16 16:05:15 2015
Current DC: NONE
0 Nodes configured
0 Resources configured



uname -n returns pacemaker1 / pacemaker2.



Logfile is attached.



corosync.conf:


totem {
 version: 2

 token: 5000

 # crypto_cipher and crypto_hash: Used for mutual node authentication.
 # If you choose to enable this, then do remember to create a shared
 # secret with corosync-keygen.
 # enabling crypto_cipher, requires also enabling of crypto_hash.
 crypto_cipher: none
 crypto_hash: none

 # interface: define at least one interface to communicate
 # over. If you define more than one interface stanza, you must
 # also set rrp_mode.
 interface {
 # Rings must be consecutively numbered, starting at 0.
  ringnumber: 0
  # This is normally the *network* address of the
  # interface to bind to. This ensures that you can use
  # identical instances of this configuration file
  # across all your cluster nodes, without having to
  # modify this option.
  bindnetaddr: 10.211.55.10
  # However, if you have multiple physical network
  # interfaces configured for the same subnet, then the
  # network address alone is not sufficient to identify
  # the interface Corosync should bind to. In that case,
  # configure the *host* address of the interface
  # instead:
  # bindnetaddr: 192.168.1.1
  # When selecting a multicast address, consider RFC
  # 2365 (which, among other things, specifies that
  # 239.255.x.x addresses are left to the discretion of
  # the network administrator). Do not reuse multicast
  # addresses across multiple Corosync clusters sharing
  # the same network.
  mcastaddr: 239.255.1.1
  # Corosync uses the port you specify here for UDP
  # messaging, and also the immediately preceding
  # port. Thus if you set this to 5405, Corosync sends
  # messages over UDP ports 5405 and 5404.
  mcastport: 5405
  # Time-to-live for cluster communication packets. The
  # number of hops (routers) that this ring will allow
  # itself to pass. Note that multicast routing must be
  # specifically enabled on most network routers.
  ttl: 1
 }
}

logging {
 # Log the source file and line where messages are being
 # generated. When in doubt, leave off. Potentially useful for
 # debugging.
 fileline: off
 # Log to standard error. When in doubt, set to no. Useful when
 # running in the foreground (when invoking corosync -f)
 to_stderr: no
 # Log to a log file. When set to no, the logfile option
 # must not be set.
 to_logfile: yes
 logfile: /var/log/cluster/corosync.log
 # Log to the system log daemon. When in doubt, set to yes.
 to_syslog: no
 # Log debug messages (very verbose). When in doubt, leave off.
 debug: on
 # Log messages with time stamps. When in doubt, set to on
 # (unless you are only logging to syslog, where double
 # timestamps can be annoying).
 timestamp: on
 logger_subsys {
  subsys: QUORUM
  debug: off
 }
}

quorum {
 # Enable and configure quorum subsystem (default: off)
 # see also corosync.conf.5 and votequorum.5
 #provider: corosync_votequorum
}



Thanks!