Re: [Openais] where is the floating IP in the cluster?

2014-04-06 Thread Fabio M. Di Nitto
On 02/12/2014 07:58 AM, Anthem Cheng wrote:
> Hello guys
> 
> I've added a floating IP in my 2 nodes linux cluster,
> and can check the resource by "crm status"
> 
> the floating IP is live and up, I can ping it or telnet the service port
> on it.
> but ifconfig doesn't show me the IP address on any NIC, where is it?
> How the system or the cluster engineer handle the network layer
> communications for this IP?
> 
> Any doc I can take some reference to understand this?

ip addr

is the command you want. ifconfig doesn't support display of multiple
ipv4 addresses on the same interfaces.

Fabio

> 
> thanks
> Anthem
> 
> 
> ___
> Openais mailing list
> Openais@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/openais
> 

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/openais


Re: [Openais] stickiness='100' doesn't work

2014-04-06 Thread Fabio M. Di Nitto
On 02/11/2014 07:54 AM, Anthem Cheng wrote:
> Hello guys
> 
> I've set up a cluster recently with 2 RHEL 6.3 servers,
> had corosync 1.4.1 and paemaker 1.1.7 installed.

3 things:

1) wrong mailing list :) you want the pacemaker list
2) your packages are very old. we support pacemaker on RHEL starting
from 6.5, so please update first.
3) if the problem persists, then please file a ticket with Red Hat support.

Fabio

> 
> the cluster is all fine but there is an issue with "stickiness='100'"
> when I "standby" the node which has live resource, a floating IP, on it,
> the floating IP can be brought up on the second one,
> but if I "online" the standby node, the floating IP resource come back
> to it again.
> 
> Also tried stop pacemaker and corosync on the node which has live resource,
> I can see the resource be brought up on another one, but the same, if
> start pacemaker and corosync, the resource failover back to original one
> again.
> 
> 
> here is the conf:
> [root@X ]# crm configure show
> node XX..XXX.com  \
> attributes standby="on"
> node X.X.X.com  \
> attributes standby="off"
> primitive ClusterIP ocf:heartbeat:IPaddr2 \
> params ip="10.87.192.10" cidr_netmask="24" \
> op monitor interval="10s"
> location cli-prefer-ClusterIP ClusterIP \
> rule $id="cli-prefer-rule-ClusterIP" inf: #uname eq
> XXX.X..com 
> property $id="cib-bootstrap-options" \
> dc-version="1.1.7-6.el6-148fccfd5985c5590cc601123c6c16e966b85d14" \
> cluster-infrastructure="openais" \
> expected-quorum-votes="2" \
> stonith-enabled="false" \
> no-quorum-policy="ignore"
> rsc_defaults $id="rsc-options" \
> resource-stickiness="100"
> 
> 
> cannot find anything wrong, but it just doesn't work as expected.
> 
> thanks for any suggestion.
> 
> 
> 
> 
> 
> ___
> Openais mailing list
> Openais@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/openais
> 

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/openais


Re: [Openais] Problem and Question about corosync

2013-11-18 Thread Fabio M. Di Nitto
On 11/18/2013 1:41 PM, Moullé Alain wrote:
> Hi,
> 
> with corosync.1.2.3-36 (with Pacemaker) on a 4 nodes HA cluster, we got
> a strange and random problem :
> 
> For some reason that we can't identify in the syslog, one node (let's
> say node1) losts the 3 other members node2, node3, node4 (without any
> visible network problems on both heartbeat networks (configured in rrp
> active mode and with a distinct mcast address, and distinct mcast port)  .

rrp active mode is not working properly and untested. That´s probably
why you see odd things. Try with passive mode instead.

If you are using VMs for testing, be aware to check if the host is not
overcommitted and VMs are pausing.

Another issue is that recent kernels broke multicast on bridge network,
so check that you are using udpu for transport.

Fabio

> This node elects itself as a DC (isolated and whereas node2 is already
> DC) until node2 (DC) ask to node3 to fence node1 (probably because it
> detects another DC).
> Main traces are given below.
> When node1 is rebooted and Pacemaker started again, it is again included
> in the HA cluster and all works fine.
> 
> I've checked the changelog of corosync between 1.2.3-36 and 1.4.1-7, but
> there are around 188 bugzilla fixed between both releases, so  so I
> would like to know if someone in developpment team remembers of a fix
> for such a random problem where a node isolated in the cluster during a
> few seconds elects itself DC and consequently is then fenced by the
> former DC which is in the quorate part of the HA cluster ?
> 
> And also, as workaround or normal but missing tuning, if some tuning
> exists in corosync parameters to avoid a node isolated for a few seconds
> to elect itself as new DC ?
> 
> Thanks a lot for your help.
> Alain Moullé
> 
> I can see in syslog such traces :
> 
> node1 syslog:
> 03:28:55 node1 daemon info crmd [26314]: info: ais_status_callback:
> status: node2 is now lost (was member)
> ...
> 03:28:55 node1 daemon info crmd [26314]: info: ais_status_callback:
> status: node3 is now lost (was member)
> ...
> 03:28:55 node1 daemon info crmd [26314]: info: ais_status_callback:
> status: node4 is now lost (was member)
> ...
> 03:28:55 node1 daemon warning crmd [26314]: WARN: check_dead_member: Our
> DC node (node2) left the cluster
> ...
> 03:28:55 node1 daemon info crmd [26314]: info: update_dc: Unset DC node2
> ...
> 03:28:55 node1 daemon info crmd [26314]: info: do_dc_takeover: Taking
> over DC status for this partition
> ...
> 03:28:56 node1 daemon info crmd [26314]: info: update_dc: Set DC to
> node1 (3.0.5)
> 
> 
> node2 syslog:
> 03:29:05 node2 daemon info corosync   [pcmk  ] info: update_member: Node
> 704645642/node1 is now: lost
> ...
> 03:29:05 node2 daemon info corosync   [pcmk  ] info: update_member: Node
> 704645642/node1 is now: member
> 
> 
> node3:
> 03:30:17 node3 daemon info crmd [26549]: info: tengine_stonith_notify:
> Peer node1 was terminated (reboot) by node3 for node2
> (ref=c62a4c78-21b9-4288-8969-35b361cabacf): OK
> 
> 
> 
> ___
> Openais mailing list
> Openais@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/openais

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/openais


Re: [Openais] problems eliminating the use of multicast

2013-09-19 Thread Fabio M. Di Nitto
On 09/19/2013 11:56 AM, David Lang wrote:
> On Thu, 19 Sep 2013, Fabio M. Di Nitto wrote:
> 
>> On 09/19/2013 11:35 AM, David Lang wrote:
>>> On Thu, 19 Sep 2013, Fabio M. Di Nitto wrote:
>>>
>>>> You don't need all of that...
>>>>
>>>> 
>>>>
>>>> There is no need to specify anything else. memberaddresses et all will
>>>> be determined by the node names.
>>>
>>> Ok, that solves my problem in most cases (8 of the 10 clusters I'm
>>> configuring right now)
>>>
>>> in the other two clusters, I will actually have 4 boxes per cluster, and
>>> I want a resource to run on only one of the four. I don't care which
>>> one, and split brain is not a major problem (no shared storage)
>>>
>>> is it enough to just specify 4 nodes? (leaving "two_node=1" in place),
>>
>> two_node and expected_vote=1 are specific to cluster composed of two
>> node.
>>
>>> do I just remove the two_node attribute? or does this get really ugly?
>>
>> You need to remove also expected_vote.
>>
>>>
>>> An example of one of these two clusters.
>>>
>>> Log alerting engines. All boxes in the cluster will receive copies of
>>> all logs, and process them in parallel. I want to have only one of the
>>> four boxes be the 'active' box that sends out alerts (my alert scripts
>>> can test for the presense of a resource on the local box)
>>
>> that's totally up to the application you are writing and how you
>> configure the IPs.
> 
> In this case, I'm not using pacemaker to manage the IPs, those are
> static on all 4 boxes, all I'm having it do is manage a dummy resource
> that the alerting scripts test for.

Keepalived allows you to run scripts and such to talk to something
"external".


> 
>>>
>>> The four boxes are actually two pairs in each of two datacenters.
>>
>> This will get ugly if connectivity between the datacenters goes kaboom
>> as you will have 2x2 clusters, neither of which can operate.
> 
> I figured I'd need to disable the quarum, or set expected_vote=2 or
> something like that.

That's also an option, but once again, I don't think you need cluster at
all for this use case :)

Fabio
___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/openais


Re: [Openais] problems eliminating the use of multicast

2013-09-19 Thread Fabio M. Di Nitto
On 09/19/2013 11:54 AM, David Lang wrote:
> On Thu, 19 Sep 2013, Fabio M. Di Nitto wrote:
> 
>> On 09/19/2013 11:26 AM, David Lang wrote:
>>> On Wed, 18 Sep 2013, Digimer wrote:
>>>
>>>> On 18/09/13 23:16, Fabio M. Di Nitto wrote:
>>>>> On 09/19/2013 12:47 AM, David Lang wrote:
>>>>>> I'm trying to get a cluster up and running without using multicast,
>>>>>> I've
>>>>>> made my cluster.conf be the following, but the systems are still
>>>>>> sending
>>>>>> to the multicast address and not seeing each other as a result.
>>>>>>
>>>>>> Is there something that I did wrong in creating the cman segment
>>>>>> of the
>>>>>> file? unforutunantly the cluster.conf man page just referrs to the
>>>>>> corosync.conf man page, but the two files use different config
>>>>>> styles.
>>>>>>
>>>>>> If not, what else do I need to do to disable multicast and just use
>>>>>> udpu?
>>>>>
>>>>>>  
>>>>>>   >>>>> token_retransmits_before_loss_const="10" join="60" consensus="4800"
>>>>>> rrp_mode="none" transport="udpu">
>>>>>> >>>>> mcastport="5405"
>>>>>> ttl="1" >
>>>>>>   
>>>>>>   
>>>>>> 
>>>>>>   
>>>>>>  
>>>>>
>>>>> You don't need all of that...
>>>>>
>>>>> 
>>>>>
>>>>> There is no need to specify anything else. memberaddresses et all will
>>>>> be determined by the node names.
>>>>>
>>>>> Fabio
>>>>
>>>> To add to what Fabio said;
>>>>
>>>> You've not setup fencing. This is not support and, when you use
>>>> rgmanager, clvmd and/or gfs2, the first time a fence is called your
>>>> cluster will block.
>>>>
>>>> When a node stops responding, the other node will call fenced to eject
>>>> it from the cluster. One of the first things fenced does is inform dlm,
>>>> which stops giving out locks until fenced tells it that the node is
>>>> gone. If the node can't be fenced, it will obviously never successfully
>>>> be fenced, so dlm will never start offering locks. This leaves
>>>> rgmanager, cman and gfs2 locked up (by design).
>>>
>>> In my case the nodes have no shared storage. I'm using
>>> pacemaker/corosync to move an IP from one box to another (and in one
>>> case, I'm moving a dummy resource that between two alerting boxes, where
>>> both boxes see all logs and calculate alerts, but I want to have only
>>> the active box send out the alert)
>>>
>>> In all these cases, split-brain situations are annoying, but not
>>> critical
>>>
>>> If both alerting boxes send an alert, I get identical alerts.
>>>
>>> If both boxes have the same IP, it's not great, but since either one
>>> will respond, the impact consists of any TCP connections being broken
>>> each time the ARP race winner changes for a source box or gateway (and
>>> since most cases involve UDP traffic, there is no impact at all in those
>>> cases)
>>>
>>> This is about as simple a use case as you can get :-)
>>
>> If you are running only a pool of VIPs, with no fencing, then you want
>> to consider making your life simpler with keepalived instead of
>> pcmk+corosync.
> 
> Thanks, I'll look into it. for all these two machine clusters, what I
> really want to use is heartbeat with v1 style configs, they were really
> trivial to deal with (I've had that on 100+ clusters, some going back to
> heartbeat 0.4 days :-)
> 
> But since that's no longer an option, I figured it was time to bite the
> bullet and move to pacemaker, and since RHEL is pushing
> pacemaker/corosync, that's what we setup.

Well based on what you tell me, there is little need for a "real"
cluster but rather use and deploy something even simpler such as keepalived.

Here corosync/pcmk seems "too much" to deploy for moving a few IP arounds.

But then again, feel free to test and try whatever you like best :)

Fabio

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/openais


Re: [Openais] problems eliminating the use of multicast

2013-09-19 Thread Fabio M. Di Nitto
On 09/19/2013 11:35 AM, David Lang wrote:
> On Thu, 19 Sep 2013, Fabio M. Di Nitto wrote:
> 
>> On 09/19/2013 12:47 AM, David Lang wrote:
>>> I'm trying to get a cluster up and running without using multicast, I've
>>> made my cluster.conf be the following, but the systems are still sending
>>> to the multicast address and not seeing each other as a result.
>>>
>>> Is there something that I did wrong in creating the cman segment of the
>>> file? unforutunantly the cluster.conf man page just referrs to the
>>> corosync.conf man page, but the two files use different config styles.
>>>
>>> If not, what else do I need to do to disable multicast and just use
>>> udpu?
>>
>>>  
>>>   >> token_retransmits_before_loss_const="10" join="60" consensus="4800"
>>> rrp_mode="none" transport="udpu">
>>> >> ttl="1" >
>>>   
>>>   
>>> 
>>>   
>>>  
>>
>> You don't need all of that...
>>
>> 
>>
>> There is no need to specify anything else. memberaddresses et all will
>> be determined by the node names.
> 
> Ok, that solves my problem in most cases (8 of the 10 clusters I'm
> configuring right now)
> 
> in the other two clusters, I will actually have 4 boxes per cluster, and
> I want a resource to run on only one of the four. I don't care which
> one, and split brain is not a major problem (no shared storage)
> 
> is it enough to just specify 4 nodes? (leaving "two_node=1" in place),

two_node and expected_vote=1 are specific to cluster composed of two node.

> do I just remove the two_node attribute? or does this get really ugly?

You need to remove also expected_vote.

> 
> An example of one of these two clusters.
> 
> Log alerting engines. All boxes in the cluster will receive copies of
> all logs, and process them in parallel. I want to have only one of the
> four boxes be the 'active' box that sends out alerts (my alert scripts
> can test for the presense of a resource on the local box)

that's totally up to the application you are writing and how you
configure the IPs.

> 
> The four boxes are actually two pairs in each of two datacenters.

This will get ugly if connectivity between the datacenters goes kaboom
as you will have 2x2 clusters, neither of which can operate.

> There
> is not going to be a quarum because neither half would have enough
> systems to form one. In the common split brain case (two clusters of two
> boxes because of datacenter crosslink outage), alerts will be generated
> from both halves. This is a very tolerable 'worst case' (and may even be
> the right thing, since for the duration of the split, each one may be
> seeing logs that the other doesn't see, the alerting path may follow
> different network connectivity so alerts may get through even if ring
> packets don't)

IMHO you are using the wrong setup here.. I'd use keepalived here as
well since it works in multiple dc and doesn't need quorum or fencing.
It bases IP management on other criterias

Fabio

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/openais


Re: [Openais] problems eliminating the use of multicast

2013-09-19 Thread Fabio M. Di Nitto
On 09/19/2013 11:26 AM, David Lang wrote:
> On Wed, 18 Sep 2013, Digimer wrote:
> 
>> On 18/09/13 23:16, Fabio M. Di Nitto wrote:
>>> On 09/19/2013 12:47 AM, David Lang wrote:
>>>> I'm trying to get a cluster up and running without using multicast,
>>>> I've
>>>> made my cluster.conf be the following, but the systems are still
>>>> sending
>>>> to the multicast address and not seeing each other as a result.
>>>>
>>>> Is there something that I did wrong in creating the cman segment of the
>>>> file? unforutunantly the cluster.conf man page just referrs to the
>>>> corosync.conf man page, but the two files use different config styles.
>>>>
>>>> If not, what else do I need to do to disable multicast and just use
>>>> udpu?
>>>
>>>>  
>>>>   >>> token_retransmits_before_loss_const="10" join="60" consensus="4800"
>>>> rrp_mode="none" transport="udpu">
>>>> >>> ttl="1" >
>>>>   
>>>>   
>>>> 
>>>>   
>>>>  
>>>
>>> You don't need all of that...
>>>
>>> 
>>>
>>> There is no need to specify anything else. memberaddresses et all will
>>> be determined by the node names.
>>>
>>> Fabio
>>
>> To add to what Fabio said;
>>
>> You've not setup fencing. This is not support and, when you use
>> rgmanager, clvmd and/or gfs2, the first time a fence is called your
>> cluster will block.
>>
>> When a node stops responding, the other node will call fenced to eject
>> it from the cluster. One of the first things fenced does is inform dlm,
>> which stops giving out locks until fenced tells it that the node is
>> gone. If the node can't be fenced, it will obviously never successfully
>> be fenced, so dlm will never start offering locks. This leaves
>> rgmanager, cman and gfs2 locked up (by design).
> 
> In my case the nodes have no shared storage. I'm using
> pacemaker/corosync to move an IP from one box to another (and in one
> case, I'm moving a dummy resource that between two alerting boxes, where
> both boxes see all logs and calculate alerts, but I want to have only
> the active box send out the alert)
> 
> In all these cases, split-brain situations are annoying, but not critical
> 
> If both alerting boxes send an alert, I get identical alerts.
> 
> If both boxes have the same IP, it's not great, but since either one
> will respond, the impact consists of any TCP connections being broken
> each time the ARP race winner changes for a source box or gateway (and
> since most cases involve UDP traffic, there is no impact at all in those
> cases)
> 
> This is about as simple a use case as you can get :-)

If you are running only a pool of VIPs, with no fencing, then you want
to consider making your life simpler with keepalived instead of
pcmk+corosync.

Fabio

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/openais


Re: [Openais] problems eliminating the use of multicast

2013-09-18 Thread Fabio M. Di Nitto
On 09/19/2013 12:47 AM, David Lang wrote:
> I'm trying to get a cluster up and running without using multicast, I've
> made my cluster.conf be the following, but the systems are still sending
> to the multicast address and not seeing each other as a result.
> 
> Is there something that I did wrong in creating the cman segment of the
> file? unforutunantly the cluster.conf man page just referrs to the
> corosync.conf man page, but the two files use different config styles.
> 
> If not, what else do I need to do to disable multicast and just use udpu?

>  
>token_retransmits_before_loss_const="10" join="60" consensus="4800"
> rrp_mode="none" transport="udpu">
>  ttl="1" >
>   
>   
> 
>   
>  

You don't need all of that...



There is no need to specify anything else. memberaddresses et all will
be determined by the node names.

Fabio

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/openais


Re: [Openais] corosync + qdevice, VMs and bridge

2013-04-18 Thread Fabio M. Di Nitto
On 04/18/2013 06:18 PM, eXeC001er wrote:
> 
> 
> 2013/4/17 Fabio M. Di Nitto  <mailto:fdini...@redhat.com>>
> 
> On 4/17/2013 3:57 PM, eXeC001er wrote:
> > Hello.
> >
> > I have tried to create the following demo-cluster to check how work
> > MasterWins logic:
> >
> > NODE1 (VM)
> >|== tap0 (host)
> > NODE2 (VM)
> > |=br0(host)
> > NODE3 (VM)
> >|== tap1 (host)
> > NODE3 (VM)
> >
> >
> > To simulate 50/50 split i just remove "tap1" from "br0".
> >
> > before split i have the following on all nodes
> >
> > --
> > Quorate:  Yes
> > Nodeid  VotesQdevice Name
> >  1  1 A,V,MW 172.18.251.41
> >  2  1A,NV,MW 172.18.251.42 (local)
> >  3  1   NA,NV,MW 172.18.251.43
> >  4  1A,NV,MW 172.18.251.44
> >  0  3QDEV
> >
> > --
> >
> > after split
> >
> > on NODE1 and NODE2 i see
> >
> > --
> > Quorate:  Yes
> > Nodeid  VotesQdevice Name
> >  1  1 A,V,MW 172.18.251.41 (local)
> >  2  1A,NV,MW 172.18.251.42
> >  0  3QDEV
> > --
> >
> > on NODE2 and NODE3 i see
> >
> > --
> > Quorate:  No
> > Nodeid  VotesQdevice Name
> >  3  1A,NV,MW 172.18.251.43
> >  4  1A,NV,MW 172.18.251.44 (local)
> >  0  3QDEV
> > --
> >
> > So everything fine and MasterWins works as designed.
> >
> > But after check i tried to restore network connection and added "tap1"
> > to "br0". I see that all nodes can ping to each other. but corosync
> > still show me 50/50 split.
> >
> > tcpdump:
> > .
> > 17:49:36.387217 IP 172.18.251.43.5404 > 172.18.251.44.5405: UDP,
> length 74
> > 17:49:36.387441 IP 172.18.251.44.5404 > 172.18.251.43.5405: UDP,
> length 74
> > 17:49:36.447590 IP 172.18.251.41.5404 > 172.18.251.42.5405: UDP,
> length 74
> > 17:49:36.447811 IP 172.18.251.42.5404 > 172.18.251.41.5405: UDP,
> length 74
> > 17:49:36.568557 IP 172.18.251.43.5404 > 172.18.251.44.5405: UDP,
> length 74
> > 17:49:36.568804 IP 172.18.251.44.5404 > 172.18.251.43.5405: UDP,
> length 74
> > 17:49:36.587829 IP 172.18.251.43.5404 > 239.255.1.1.5405: UDP,
> length 87
> > 17:49:36.628254 IP 172.18.251.41.5404 > 172.18.251.42.5405: UDP,
> length 74
> > 17:49:36.628442 IP 172.18.251.42.5404 > 172.18.251.41.5405: UDP,
> length 74
> > 17:49:36.648323 IP 172.18.251.41.5404 > 239.255.1.1.5405: UDP,
> length 87
> > 
> >
> >
> > Any ideas ?
> >
> 
> Beside the missing logs that might show something, I have tested this
> scenario plenty times but using iptables instead.
> 
> I wonder if you have found a bug in the bridging code.
> 
> I suggest you try the following test instead:
> 
> 4 nodes, without qdisk, try to repeat your bridge remove/add test
> 
> 4 nodes, without qdisk, use iptables instead (make sure block mcast
> traffic too)
> 
> then again with qdisk + iptables.
> 
> 
> have tried with IPTABLES. everything nice.
>  
> But in any case it is very strange, because after the nework connection
> has been restored and i restart corosync on ALL nodes my "cluster" works.
> 
> logs do not contain anything intresting. latest records after 50/50
> split just say that some memeber have left. after restoring the
> connection no new records in the logfile.
> 
> Also it is very strange that to restore whole cluster i need to restart
> corosync on ALL nodes. If restart the corosync only on 3/4 node then
> corosync on each node does not see any other nodes.

This sounds like a bug in the multicast bridge code in the kernel that
does not rebind the groups in the bridge/switch, and I suspected that
because I tested the same scenario with iptables before and corosync
behaves as expected.

I suggest you talk to network kernel guys.

corosync won't attempt to rejoin the group since it's already binded.

Fabio

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/openais


Re: [Openais] corosync + qdevice, VMs and bridge

2013-04-17 Thread Fabio M. Di Nitto
On 4/17/2013 3:57 PM, eXeC001er wrote:
> Hello.
> 
> I have tried to create the following demo-cluster to check how work
> MasterWins logic:
> 
> NODE1 (VM)
>|== tap0 (host)
> NODE2 (VM)
> |=br0(host)
> NODE3 (VM)
>|== tap1 (host)
> NODE3 (VM)
> 
> 
> To simulate 50/50 split i just remove "tap1" from "br0".
> 
> before split i have the following on all nodes
> 
> --
> Quorate:  Yes
> Nodeid  VotesQdevice Name
>  1  1 A,V,MW 172.18.251.41
>  2  1A,NV,MW 172.18.251.42 (local)
>  3  1   NA,NV,MW 172.18.251.43
>  4  1A,NV,MW 172.18.251.44
>  0  3QDEV
> 
> --
> 
> after split 
> 
> on NODE1 and NODE2 i see
> 
> --
> Quorate:  Yes
> Nodeid  VotesQdevice Name
>  1  1 A,V,MW 172.18.251.41 (local)
>  2  1A,NV,MW 172.18.251.42
>  0  3QDEV
> --
> 
> on NODE2 and NODE3 i see
> 
> --
> Quorate:  No
> Nodeid  VotesQdevice Name
>  3  1A,NV,MW 172.18.251.43
>  4  1A,NV,MW 172.18.251.44 (local)
>  0  3QDEV
> --
> 
> So everything fine and MasterWins works as designed.
> 
> But after check i tried to restore network connection and added "tap1"
> to "br0". I see that all nodes can ping to each other. but corosync
> still show me 50/50 split. 
> 
> tcpdump:
> .
> 17:49:36.387217 IP 172.18.251.43.5404 > 172.18.251.44.5405: UDP, length 74
> 17:49:36.387441 IP 172.18.251.44.5404 > 172.18.251.43.5405: UDP, length 74
> 17:49:36.447590 IP 172.18.251.41.5404 > 172.18.251.42.5405: UDP, length 74
> 17:49:36.447811 IP 172.18.251.42.5404 > 172.18.251.41.5405: UDP, length 74
> 17:49:36.568557 IP 172.18.251.43.5404 > 172.18.251.44.5405: UDP, length 74
> 17:49:36.568804 IP 172.18.251.44.5404 > 172.18.251.43.5405: UDP, length 74
> 17:49:36.587829 IP 172.18.251.43.5404 > 239.255.1.1.5405: UDP, length 87
> 17:49:36.628254 IP 172.18.251.41.5404 > 172.18.251.42.5405: UDP, length 74
> 17:49:36.628442 IP 172.18.251.42.5404 > 172.18.251.41.5405: UDP, length 74
> 17:49:36.648323 IP 172.18.251.41.5404 > 239.255.1.1.5405: UDP, length 87
> 
> 
> 
> Any ideas ?
> 

Beside the missing logs that might show something, I have tested this
scenario plenty times but using iptables instead.

I wonder if you have found a bug in the bridging code.

I suggest you try the following test instead:

4 nodes, without qdisk, try to repeat your bridge remove/add test

4 nodes, without qdisk, use iptables instead (make sure block mcast
traffic too)

then again with qdisk + iptables.

But also collect the logs.. otherwise tcpdump doesn´t say enough.

Fabio
___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/openais


Re: [Openais] corosync and qdevice

2013-03-21 Thread Fabio M. Di Nitto
On 3/21/2013 12:18 PM, eXeC001er wrote:
> 
> 
> 2013/3/20 Fabio M. Di Nitto  <mailto:fdini...@redhat.com>>
> 
> On 3/20/2013 6:26 PM, eXeC001er wrote:
> >
>     >
> > 2013/3/20 Fabio M. Di Nitto  <mailto:fdini...@redhat.com>
> > <mailto:fdini...@redhat.com <mailto:fdini...@redhat.com>>>
> >
> > On 3/20/2013 5:27 PM, eXeC001er wrote:
> >
> > > >
> > > > The first Q:
> > > >
> > > > According to the tests that are part of
> corosync-sources i think
> > > that i can:
> > > >
> > > > 1. create a daemon that register a QDEVICE and will notify
> > corosync
> > > > about the device (votequorum_qdevice_poll()).
> > > >
> > > > 2. implement a master/slave logic and if the qdevice
> on a node
> > > wins then
> > > > i call votequorum_qdevice_master_wins() on the node
> and corosync
> > > notify
> > > > another nodes about this, so i can say that the node
> is MASTER.
> > >
> > > There is no votequorum_qdevice_master_wins() call...
> where did you
> > > find it?
> > >
> > >
> > > I am researching corosync-2.3.0 sources.
> >
> > whoops.. i wrote it and forgot about it. getting old is bad :)
> >
> > No it´s a bit more complicated than that.
> >
> > corosync starts and load the config
> > later on qdeviced starts and read the config
> > qdeviced detects that it has to run in master_wins config:
> >  call votequorum_qdevice_master_wins(..., 1);
> > that calls set a flags for the node and makes sure that the
> feature is
> > enabled internally.
> >
> >
> > I thought about a different scenario:
> >
> > master_wins and cast_vote are different flags and they are used in
> > different cases.
> >
> > 1. uses only "cast_vote" and the flag can be used to decide that on a
> > node everything fine and the node is a member of cluster (the cluster
> > does not have master/slave)
> >
> > for example in a cluster (3 nodes) i have several qdevices on each
> node:
> > storage-qdevice and client-network-qdevice
> 
> The API does not support multiple qdevices. This kind of implementation,
> where you need to poll multiple targets, can be multiplexed/proxy´d by
> the votequorum consumer.
> 
> qdevice does not need to know how much to vote. That´s votequorum
> problem. How the qdevice implementation handles internal voting/scoring
> it´s qdevice implementation problem.
> 
> >
> > config:
> > each qdevice has 2 votes
> > each node has 1 vote
> > expected votes = 7 votes (1 own vote + 1 vote from anoter node + 2
> votes
> > from each qdevice)
> 
>  that won´t work. votequorum accepts only one value for qdevice
> votes.
> 
> > 2. "master_wins" and "cast_vote" are used.
> >
> > in this case "cast_vote" will work as in case 1 and "master_wins" will
> > control master/slave
> 
> The need to propagate the value for master_win and it´s status is to
> allow:
> 
> node1 node2 node3 node4
> 
> qdevice is master on node3 for example
> 
> in case of a 50%/50% split you have:
> 
> node1 node2 <- not quorate
> 
> 
> 
> please correct me if i am wrong
> 
> at start point on each node qdevice sets in corosync "master_wins=1",
> each node is a memeber and node3 is master according to decision of qdevice.
> 
> so:
> - node1, node2 and node4 have 4 votes
> - node3 has 5 votes
> 
> at some point we a 50%/50% split:
> 
> node1 + node2  AND node3 + node4
> 
> so:
> - node1, node2 have 2 votes
> - node4 has 2 votes
> - node3 has 3 votes
> 
> the quorum is 3 votes.
> 
> I see the following condition
> 
>  755 if ((qdevice_master_wins) &&
>  756 (!quorate) &&
>  757 (check_qdevice_master() == 1)) {  
>
>
>  758 log_printf(LOGSYS_LEVEL_DEBUG

Re: [Openais] corosync and qdevice

2013-03-20 Thread Fabio M. Di Nitto
On 3/20/2013 6:26 PM, eXeC001er wrote:
> 
> 
> 2013/3/20 Fabio M. Di Nitto  <mailto:fdini...@redhat.com>>
> 
> On 3/20/2013 5:27 PM, eXeC001er wrote:
> 
> > >
> > > The first Q:
> > >
> > > According to the tests that are part of corosync-sources i think
> > that i can:
> > >
> > > 1. create a daemon that register a QDEVICE and will notify
> corosync
> > > about the device (votequorum_qdevice_poll()).
> > >
> > > 2. implement a master/slave logic and if the qdevice on a node
> > wins then
> > > i call votequorum_qdevice_master_wins() on the node and corosync
> > notify
> > > another nodes about this, so i can say that the node is MASTER.
> >
> > There is no votequorum_qdevice_master_wins() call... where did you
> > find it?
> >
> >
> > I am researching corosync-2.3.0 sources.
> 
> whoops.. i wrote it and forgot about it. getting old is bad :)
> 
> No it´s a bit more complicated than that.
> 
> corosync starts and load the config
> later on qdeviced starts and read the config
> qdeviced detects that it has to run in master_wins config:
>  call votequorum_qdevice_master_wins(..., 1);
> that calls set a flags for the node and makes sure that the feature is
> enabled internally.
> 
> 
> I thought about a different scenario:
> 
> master_wins and cast_vote are different flags and they are used in
> different cases.
> 
> 1. uses only "cast_vote" and the flag can be used to decide that on a
> node everything fine and the node is a member of cluster (the cluster
> does not have master/slave)
> 
> for example in a cluster (3 nodes) i have several qdevices on each node:
> storage-qdevice and client-network-qdevice

The API does not support multiple qdevices. This kind of implementation,
where you need to poll multiple targets, can be multiplexed/proxy´d by
the votequorum consumer.

qdevice does not need to know how much to vote. That´s votequorum
problem. How the qdevice implementation handles internal voting/scoring
it´s qdevice implementation problem.

> 
> config:
> each qdevice has 2 votes
> each node has 1 vote
> expected votes = 7 votes (1 own vote + 1 vote from anoter node + 2 votes
> from each qdevice)

 that won´t work. votequorum accepts only one value for qdevice votes.

> 2. "master_wins" and "cast_vote" are used.
> 
> in this case "cast_vote" will work as in case 1 and "master_wins" will
> control master/slave

The need to propagate the value for master_win and it´s status is to allow:

node1 node2 node3 node4

qdevice is master on node3 for example

in case of a 50%/50% split you have:

node1 node2 <- not quorate

with the old cman quorum:

node3 quorate
node4 not quorate (that´s also why we force master_win in 2 node only)

with the master_win flag set in votequorum:

node3 quorate
node4 quorate (because part of a partition with a master qdevice)

note that node4 does not have the votes from qdevice at this point.

Fabio

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/openais


Re: [Openais] corosync and qdevice

2013-03-20 Thread Fabio M. Di Nitto
On 3/20/2013 5:27 PM, eXeC001er wrote:

> >
> > The first Q:
> >
> > According to the tests that are part of corosync-sources i think
> that i can:
> >
> > 1. create a daemon that register a QDEVICE and will notify corosync
> > about the device (votequorum_qdevice_poll()).
> >
> > 2. implement a master/slave logic and if the qdevice on a node
> wins then
> > i call votequorum_qdevice_master_wins() on the node and corosync
> notify
> > another nodes about this, so i can say that the node is MASTER.
> 
> There is no votequorum_qdevice_master_wins() call... where did you
> find it?
> 
> 
> I am researching corosync-2.3.0 sources.

whoops.. i wrote it and forgot about it. getting old is bad :)

No it´s a bit more complicated than that.

corosync starts and load the config
later on qdeviced starts and read the config
qdeviced detects that it has to run in master_wins config:
 call votequorum_qdevice_master_wins(..., 1);
that calls set a flags for the node and makes sure that the feature is
enabled internally.

then, you decide who is master (and tell corosync) by casting or not
casting your vote. master will cast, slaves will not.

As for debugging it is useful as it sets some bits around cmap and it
makes it easier to detect if nodes have been configured differently.

See commit 2f369e7039bc9054033693c56e93db9f4021a73f

Fabio
___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/openais


Re: [Openais] corosync and qdevice

2013-03-20 Thread Fabio M. Di Nitto
On 3/20/2013 4:31 PM, eXeC001er wrote:
> Hello.
> 
> The corosync-2.x has qdevice logic, but i have not found any details
> about using the logic.

that is because the API is not yet supported. The core is done and ready
to be used but there are no real consumers.

> 
> The first Q:
> 
> According to the tests that are part of corosync-sources i think that i can:
> 
> 1. create a daemon that register a QDEVICE and will notify corosync
> about the device (votequorum_qdevice_poll()).
> 
> 2. implement a master/slave logic and if the qdevice on a node wins then
> i call votequorum_qdevice_master_wins() on the node and corosync notify
> another nodes about this, so i can say that the node is MASTER.

There is no votequorum_qdevice_master_wins() call... where did you find it?

> 
> 
> The second Q:
> 
> As i understand 'CAST_VOTE=1' flag says to corosync that need to
> consider the votes of a QDEVICE ?

No, cast_vote is not the number of votes. A qdevice can be registered
and polling, but it´s not casting a vote.

That exactly to implement master_win solution.

A master is casting a vote, a slave is not casting a vote. Both needs to
poll via qdevice_poll.

The old concept that a qdevice needs to know how many votes to cast has
been removed. votequorum knows internally how many vote a device should
have.



___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/openais


Re: [Openais] [PATCH] totemconfig: change minimum RRP threshold

2011-09-08 Thread Fabio M. Di Nitto
Ack

On 9/8/2011 9:44 AM, Jan Friesse wrote:
> RRP threshold can be lower value then 5.
> 
> Signed-off-by: Jan Friesse 
> ---
>  exec/totemconfig.c |2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/exec/totemconfig.c b/exec/totemconfig.c
> index f767f69..a475bb3 100644
> --- a/exec/totemconfig.c
> +++ b/exec/totemconfig.c
> @@ -82,7 +82,7 @@
>  #define MISS_COUNT_CONST 5
>  #define RRP_PROBLEM_COUNT_TIMEOUT2000
>  #define RRP_PROBLEM_COUNT_THRESHOLD_DEFAULT  10
> -#define RRP_PROBLEM_COUNT_THRESHOLD_MIN  5
> +#define RRP_PROBLEM_COUNT_THRESHOLD_MIN  2
>  #define RRP_AUTORECOVERY_CHECK_TIMEOUT   1000
>  
>  static char error_string_response[512];

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] Corosync 2.0 Feature Request: Replace objdb/confdb with something easier to use

2011-08-24 Thread Fabio M. Di Nitto
On 08/25/2011 06:31 AM, Angus Salkeld wrote:
> On Thu, Aug 25, 2011 at 05:16:20AM +0200, Fabio M. Di Nitto wrote:
>> On 08/25/2011 04:56 AM, Angus Salkeld wrote:
>>
>>> Possible Solutions
>>> ==
>>>
>>> 1] API
>>> We really just want to get/set values do we really need a tree?
>>
>> as you already mentioned before, tree make it easy to load a config file
>> and map it in the objdb following the same structure (object, key etc..).
>>
>> I know Steven wants xml loader too (that matches perfectly with objdb 1:1).
>>
>> Whatever API you want to put in place, please keep it simple to retain
>> the same view even if internally is a map or whatever else you decide.
>>
>> Also consider formats for export. For example dumping the objdb into an
>> xml file is dead simple. The new API should allow something similar IMHO.
> 
> Yikes, well converting a map's contents to and from xml is going to be
> interesting.
> 
> If we added an xpath-like api to objdb we could improve the API
> enormously and still keep it a tree.
> 
> ver = objdb_get_int32("/service/@name=cpg/ver");
> 
> objdb_cd("/logging/@subsys=main");
> if (objdb_get_bool("to_syslog")) {
>   //...
> }

Possibly, "real" xpath is complex. A subset of xpath is already
implemented in libccsconfdb (that does a reduced set of xpath<->objdb).

The current "full xpath" implementation in libccsconfdb, involves
dumping objdb into xml and use libxml for real xpath queries.. "easy way
out" without re-implementing the world.

Fabio

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] Corosync 2.0 Feature Request: Replace objdb/confdb with something easier to use

2011-08-24 Thread Fabio M. Di Nitto
On 08/25/2011 04:56 AM, Angus Salkeld wrote:

> Possible Solutions
> ==
> 
> 1] API
> We really just want to get/set values do we really need a tree?

as you already mentioned before, tree make it easy to load a config file
and map it in the objdb following the same structure (object, key etc..).

I know Steven wants xml loader too (that matches perfectly with objdb 1:1).

Whatever API you want to put in place, please keep it simple to retain
the same view even if internally is a map or whatever else you decide.

Also consider formats for export. For example dumping the objdb into an
xml file is dead simple. The new API should allow something similar IMHO.

Fabio
___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] [RFC] quorum implementation in corosync

2011-08-17 Thread Fabio M. Di Nitto
On 8/17/2011 11:21 AM, Christine Caulfield wrote:
> On 17/08/11 09:26, Fabio M. Di Nitto wrote:
>> Hi all,
>>
>> for a long time cman has been the quorum provider within RHCS. cman is
>> going to be obsoleted in the long term and a replacement needs to be
>> implemented.
>>
>> In this proposal I left out API names.. they are not important at this
>> stage and can be defined later on (also because some interfaces like
>> confdb/objdb might change in 2.0).
>>
>> I am also assuming that we want the option to plug different quorum
>> providers into the system (network based, disk based, etc) and different
>> algorithms to calculate quorum (YKD, etc).
>>
>> Attached to this email there is a small pdf with the data flow diagram
>> as one picture can explain better than 1000 words (at least given my
>> level of itaglish ;))
>>
>> Keep always in mind that:
>>
>> 1) At any given time, only one "cluster view provider" feeds information
>> to quorumd. The provider must be the same across all nodes.
>>
>> 2) At any given time, only one "quorum calculation algorithm" can be
>> used and it must be the same across all nodes.
>>
>> 3) disk based provider can either be a separate daemon or run within
>> quorumd. Due to the nature of the provider, the implementation needs
>> either threads or libaio (that´s not very portable) and therefor it
>> cannot run within corosync directly for blocking reasons.
>>
>> 4) a quorum state change, has to trigger a cpg_1 notification (assuming
>> that we will use cpg as notification method, but that would save the
>> issue of synchronizing cpg notifications with quorum ones).
>>
>> 5) dispatch of notification between cpg_0 and cpg_1 has to be
>> synchronized to allow quorumd to act on network based cluster view
>> provider. In theory the only user for cpg_0 is quorumd.
>>
>> 6) quorumd is optional. corosync should work with or without. cpg_1
>> clients will get different quorum info (QUORATED, NOT_QUORATED,
>> QUORUM_INFO_NOT_AVAILABLE, etc) based on the status.
> 
> A minor point but the language is simple QUORATE and not QUORATED

eheh noted :)

> 
> 
>> 7) quorumd will require some configuration or information to be able to
>> act correctly. For example "how many nodes are supposed to be in the
>> cluster?" etc. This is an implementation detail but it is something to
>> take into account. Most of those information are tight to the quorum
>> calculation algorithm and we will need to evaluate on a case by case how
>> to handle it correctly.
> 
> How a quorum provider calculates quorum is entirely up to it and cannot 
> be provided by lower layers really. It can ask for services (such as 
> number of nodes, as you mentioned) but don't try to build anything more 
> specific into the quorum interface as it'll just look silly and be 
> ignored by most providers. Let them provide their own APIs for 
> requesting data. What I did with quorum/votequorum might not be ideal 
> but it shows the level of separation that's needed I think.

Right, that´s why I didn´t go anywhere the specifics of that APIs, but
just as something to keep in mind when we will get there.

The initial discussion was based around quorum/votequorum so that´s
going to be a starting point and not the end of it. It might change but
that´s probably expected.

Fabio
___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


[Openais] [RFC] quorum implementation in corosync

2011-08-17 Thread Fabio M. Di Nitto
Hi all,

for a long time cman has been the quorum provider within RHCS. cman is
going to be obsoleted in the long term and a replacement needs to be
implemented.

In this proposal I left out API names.. they are not important at this
stage and can be defined later on (also because some interfaces like
confdb/objdb might change in 2.0).

I am also assuming that we want the option to plug different quorum
providers into the system (network based, disk based, etc) and different
algorithms to calculate quorum (YKD, etc).

Attached to this email there is a small pdf with the data flow diagram
as one picture can explain better than 1000 words (at least given my
level of itaglish ;))

Keep always in mind that:

1) At any given time, only one "cluster view provider" feeds information
to quorumd. The provider must be the same across all nodes.

2) At any given time, only one "quorum calculation algorithm" can be
used and it must be the same across all nodes.

3) disk based provider can either be a separate daemon or run within
quorumd. Due to the nature of the provider, the implementation needs
either threads or libaio (that´s not very portable) and therefor it
cannot run within corosync directly for blocking reasons.

4) a quorum state change, has to trigger a cpg_1 notification (assuming
that we will use cpg as notification method, but that would save the
issue of synchronizing cpg notifications with quorum ones).

5) dispatch of notification between cpg_0 and cpg_1 has to be
synchronized to allow quorumd to act on network based cluster view
provider. In theory the only user for cpg_0 is quorumd.

6) quorumd is optional. corosync should work with or without. cpg_1
clients will get different quorum info (QUORATED, NOT_QUORATED,
QUORUM_INFO_NOT_AVAILABLE, etc) based on the status.

7) quorumd will require some configuration or information to be able to
act correctly. For example "how many nodes are supposed to be in the
cluster?" etc. This is an implementation detail but it is something to
take into account. Most of those information are tight to the quorum
calculation algorithm and we will need to evaluate on a case by case how
to handle it correctly.

Cheers
Fabio


quorum_proposal.pdf
Description: Adobe PDF document
___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais

Re: [Openais] Extendng call for Corosync RFEs until Aug 30th

2011-08-08 Thread Fabio M. Di Nitto
On 8/7/2011 6:57 PM, Steven Dake wrote:
> Believe many in community are on vacation during our proposal window.
> As a result, I'm extending until Aug 30th.
> 

topic-quorum ? as we discussed recently on IRC, in order to replace cman.

Fabio
___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] [GIT PULL] Minor fixes for RPM builds

2011-07-05 Thread Fabio M. Di Nitto
I checked the changes in that branch and they are good to me.

Fabio

On 7/5/2011 1:58 PM, Florian Haas wrote:
> Steve, Fabio, Jan, Angus,
> 
> Please consider pulling the following changes since commit
> cfb96c64d91b4232538e16497dd5d621b1130a89:
> 
>   Correct mailing list address in corosync_overview manpage (2011-07-04
> 15:15:13 +0200)
> 
> from the git repository at:
>   git://github.com/fghaas/corosync.git master
> 
> They are all small patches fixing minor issues related to RPM builds.
> Thanks.
> 
> Cheers,
> Florian
> 
> 
> Florian Haas (4):
>   build: force LC_ALL=C correctly for dates
>   build: make RDMA support an RPM build conditional
>   build: set RDMA related _LIBS and _CFLAGS only if building with
> RDMA support
>   build: disable RDMA support in RPMs by default
> 
>  Makefile.am  |4 ++--
>  corosync.spec.in |7 +++
>  2 files changed, 9 insertions(+), 2 deletions(-)
> 
> 
> 
> 
> 
> ___
> Openais mailing list
> Openais@lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/openais

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] [PATCH] configure.ac: change edefault to default

2011-06-21 Thread Fabio M. Di Nitto
ACk..

On 6/21/2011 12:55 PM, Jan Friesse wrote:
> Signed-off-by: Jan Friesse 
> ---
>  configure.ac |2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/configure.ac b/configure.ac
> index fee8629..41dfeaf 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -264,7 +264,7 @@ AM_CONDITIONAL(BUILD_MONITORING, test x$enable_monitoring 
> = xyes)
>  
>  AC_ARG_ENABLE([watchdog],
>   [  --enable-watchdog   : Watchdog support ],,
> - [ edefault="no" ])
> + [ default="no" ])
>  AM_CONDITIONAL(BUILD_WATCHDOG, test x$enable_watchdog = xyes)
>  
>  AC_ARG_ENABLE([augeas],

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] Corosync and wireless = problems ?

2011-06-20 Thread Fabio M. Di Nitto
On 06/20/2011 01:40 PM, Hans Lammerts wrote:
> Hi,
> 
>  
> 
> I just installed two servers, both with Ubuntu 10.04 64bit. No problems
> here.
> 
> The idea is to create a cluster that is going to server as a high
> available Zarafa environment.
> 
> One of the steps is to install corosync (which gets installed
> automatically when installing Pacemaker).
> 
> I have two NICs on both machines:
> 
> machine 1: one wired ethernetcard (eth0), ip 192.168.2.20
> 
>  one wirelsess card (wlan0), ip 10.1.1.2
> 
> machine 2: one wired ethernetcard (eth0), ip 192.168.2.30
> 
>  one wired ethernetcard (eth1), ip 10.1.1.3

[SNIP]

>  
> 
> The only real difference between these two machines is that the corosync
> communication runs over the 10.1.1.0 network, but machine one has a
> wireless adapter for this network and machine two has a wired network
> card for this network.
> 
>  
> 
> Could this be the problem ? Or do I have to look for something else ?
> 
> Would setting debug to "on" help me more ?
> 
> Hope someone can shed some light on this problem. I sure can't find
> anything about this using Google.

multicast over wifi does not work the same way as on wired ethernet.

on a wired ethernet, packets are dispatched as soon as possible.

In a wifi environment, multicast packets are queued and sent only during
a specific allocated slot (IIRC during beacon broadcasting, but please
double check with google). This slot is small and might not be able to
contain the whole data set required for corosync to operate properly.

If you can't opt out of wifi, please try using udpu transport instead.

Fabio
___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] [Cluster-devel] new RHCS upstream wiki

2011-05-30 Thread Fabio M. Di Nitto
On 05/09/2011 01:25 PM, Fabio M. Di Nitto wrote:
> Hi all,
> 
> we are in the process of moving the old cluster wiki
> (http://sourceware.org/cluster/wiki/) to:
> 
> https://fedorahosted.org/cluster/wiki/HomePage

The relocation is now complete and the old wiki is redirecting users to
the new one.

I'd like to thanks Digimer for doing the heavy lifting of fixing all pages.

The very last thing left to do is to create a proper default home page
with a summary and maybe a logo... anybody would like to suggest one?

the winner will get a month free support on IRC #linux-cluster ;)

Fabio
___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


[Openais] new RHCS upstream wiki

2011-05-09 Thread Fabio M. Di Nitto
Hi all,

we are in the process of moving the old cluster wiki
(http://sourceware.org/cluster/wiki/) to:

https://fedorahosted.org/cluster/wiki/HomePage

All pages from the old wiki have been imported and we are in the process
to reformat the pages to match the new trac-wiki notation.

If you own any page or content, please make sure to verify that the
content is correct.

In the process I also spotted an insane amount of spam, if you have 5
minutes to spare to help cleaning that up,
https://fedorahosted.org/cluster/wiki/TitleIndex is a good starting point.

The old wiki will be made readonly soon and any change will be discarded.

If necessary I have a backup stored on my harddisk.

Please update all your URLs.

Fabio
___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] [PATCH 1/3] Fix the ttl defaults and range

2011-03-15 Thread Fabio M. Di Nitto
Patch 1 and 2 ACK. I´ll leave 3 to Steven, but it looks good.

Fabio

On 3/15/2011 12:44 AM, Angus Salkeld wrote:
> 1) both IPv4 and IPv6 mcast should default to ttl=1
> 2) the range should be 0..255
>0 is valid meaning localhost only (cluster of one)
> 
> Signed-off-by: Angus Salkeld 
> ---
>  exec/totemconfig.c  |   10 +++---
>  man/corosync.conf.5 |2 +-
>  2 files changed, 4 insertions(+), 8 deletions(-)
> 
> diff --git a/exec/totemconfig.c b/exec/totemconfig.c
> index 6bb4894..7039ba0 100644
> --- a/exec/totemconfig.c
> +++ b/exec/totemconfig.c
> @@ -394,11 +394,7 @@ printf ("couldn't find totem handle\n");
>   /*
>* Get the TTL
>*/
> - if (totem_config->interfaces[ringnumber].mcast_addr.family == 
> AF_INET6) {
> - totem_config->interfaces[ringnumber].ttl = 255;
> - } else {
> - totem_config->interfaces[ringnumber].ttl = 1;
> - }
> + totem_config->interfaces[ringnumber].ttl = 1;
>   if (!objdb_get_string (objdb, object_interface_handle, "ttl", 
> &str)) {
>   totem_config->interfaces[ringnumber].ttl = atoi (str);
>   }
> @@ -477,8 +473,8 @@ int totem_config_validate (
>   goto parse_error;
>   }
>  
> - if (totem_config->interfaces[i].ttl > 255 || 
> totem_config->interfaces[i].ttl < 1) {
> - error_reason = "Invalid TTL (should be 1..255)";
> + if (totem_config->interfaces[i].ttl > 255 || 
> totem_config->interfaces[i].ttl < 0) {
> + error_reason = "Invalid TTL (should be 0..255)";
>   goto parse_error;
>   }
>  
> diff --git a/man/corosync.conf.5 b/man/corosync.conf.5
> index d69cf89..eaf58c4 100644
> --- a/man/corosync.conf.5
> +++ b/man/corosync.conf.5
> @@ -131,7 +131,7 @@ please configure the mcastports with a gap.
>  ttl
>  This specifies the Time To Live (TTL). If you run your cluster on a routed
>  network then the default of "1" will be too small. This option provides
> -a way to increase this up to 255.
> +a way to increase this up to 255. The valid range is 0..255.
>  
>  .TP
>  member

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] [PATCH] Don't assert when ring id file is less then 8 bytes

2011-02-23 Thread Fabio M. Di Nitto
On 2/23/2011 5:55 PM, Steven Dake wrote:
> On 02/23/2011 06:28 AM, Fabio M. Di Nitto wrote:
>> On 2/23/2011 2:17 PM, Russell Bryant wrote:
>>> On Wed, Feb 23, 2011 at 1:53 AM, Fabio M. Di Nitto  
>>> wrote:
>>>>> I hope the change from:
>>>>> read (fd, &memb_ring_id->seq, sizeof (unsigned long long));
>>>>> to
>>>>> read (fd, &memb_ring_id->seq, sizeof (uint64_t));
>>>>>
>>>>> won't cause any problems.
>>>>
>>>> On i686 i noticed the file being 8 bytes (unsigned long long)... I
>>>> wonder if you shutdown corosync, update packages with the fix, then
>>>> restart.. is it going to read garbage from the file?
>>>
>>> but sizeof (uint64_t) is also 8 bytes.
>>
>> ok... i clearly need more coffee :))) ok either x86_64 or i686 had a 4
>> bytes file and the other 8... one of them is going to be affected by
>> switching to a different size.
>>
> 
> Are you serious?  It should be 8 bytes always!  Could you give more
> details of your platform information (was it linux, which os version, etc)

They were 2 VMs RHEL6.0+z one i386 and one x86_64.

it´s entirely possible that the files were truncated somehow.. or that I
do not remember properly.

Probably the same reason why x86_64 had a 0 bytes file.. go figure.
those vms are long gone now and as long as you have tested it, I am OK
with that. Don´t get my doubts in your way.

Fabio
___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] [PATCH] Don't assert when ring id file is less then 8 bytes

2011-02-23 Thread Fabio M. Di Nitto
On 2/23/2011 2:17 PM, Russell Bryant wrote:
> On Wed, Feb 23, 2011 at 1:53 AM, Fabio M. Di Nitto  
> wrote:
>>> I hope the change from:
>>> read (fd, &memb_ring_id->seq, sizeof (unsigned long long));
>>> to
>>> read (fd, &memb_ring_id->seq, sizeof (uint64_t));
>>>
>>> won't cause any problems.
>>
>> On i686 i noticed the file being 8 bytes (unsigned long long)... I
>> wonder if you shutdown corosync, update packages with the fix, then
>> restart.. is it going to read garbage from the file?
> 
> but sizeof (uint64_t) is also 8 bytes.

ok... i clearly need more coffee :))) ok either x86_64 or i686 had a 4
bytes file and the other 8... one of them is going to be affected by
switching to a different size.

Fabio
___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] [PATCH 1/1] Iterate all items in object_reload_notification

2011-02-22 Thread Fabio M. Di Nitto
On 2/22/2011 6:44 PM, Steven Dake wrote:
> Clearly the iterate loop is wrong as is, and your patch is correct, but
> I'm curious, does this fix a specific bug?

https://bugzilla.redhat.com/show_bug.cgi?id=677975

:)

Fabio

> 
> Thanks
> -steve
> 
> On 02/22/2011 06:42 AM, Jan Friesse wrote:
>> Signed-off-by: Jan Friesse 
>> ---
>>  exec/objdb.c |2 +-
>>  1 files changed, 1 insertions(+), 1 deletions(-)
>>
>> diff --git a/exec/objdb.c b/exec/objdb.c
>> index dc36236..884263b 100644
>> --- a/exec/objdb.c
>> +++ b/exec/objdb.c
>> @@ -342,7 +342,7 @@ static void object_reload_notification(int startstop, 
>> int flush)
>>  }
>>  
>>  for (list = tmplist.next, tmp = list->next;
>> -list != tmplist.prev; list = tmp, tmp = list->next) {
>> +list != &tmplist; list = tmp, tmp = list->next) {
>>  
>>  tracker_pt = list_entry (list, struct object_tracker, 
>> object_list);
>>  
> 
> ___
> Openais mailing list
> Openais@lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/openais

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] [PATCH] Don't assert when ring id file is less then 8 bytes

2011-02-22 Thread Fabio M. Di Nitto
On 2/22/2011 11:55 PM, Angus Salkeld wrote:
> On Tue, Feb 22, 2011 at 12:48:32PM -0700, Steven Dake wrote:
>> If the ring id file for the processor is less then 8 bytes, totemsrp would
>> assert.  Our speculation is that this condition happens during a fencing
>> operation or local filesystem corruption.
>>
>> With this patch, Corosync will create fresh ring id file data when the
>> incorrect number of bytes are read from the ring id.
>>
>> Amend to use sizeof the strerror string length and PATH_MAX for the path 
>> length.
>>
>> Signed-off-by: Steven Dake 
> Reviewed-by: Angus Salkeld 
> 
> I hope the change from:
> read (fd, &memb_ring_id->seq, sizeof (unsigned long long));
> to
> read (fd, &memb_ring_id->seq, sizeof (uint64_t));
> 
> won't cause any problems.

On i686 i noticed the file being 8 bytes (unsigned long long)... I
wonder if you shutdown corosync, update packages with the fix, then
restart.. is it going to read garbage from the file?

Fabio
___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


[Openais] Announcing "Cluster in a BOX" project

2011-02-17 Thread Fabio M. Di Nitto
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Hi all,

A lot of people find it hard to setup their first cluster or simply
don´t have time to repeat the same setup over and over, either for
development or do some basic testing or to showcase cluster technologies
to other people.

The  "Cluster in a BOX" project (cbox in short), is one script to setup
a KVM based virtual test cluster in a matter of few minutes.

cbox is still in its early development, has several limitations and some
strict requirements.

Plans are to include as many cluster technologies and configuration
examples as possible and remove as many limitations as we possibly can.

If your cluster project or technology is not there yet, or your
distribution is not supported, it´s simply because I do not have the
resources to do all by myself. Do not take it personally, we will get it
there together and I absolutely welcome comments, patches and feedback
at any time.

Support for pacemaker, DRBD and OCFS2 will come shortly.

cbox documentation is here (in temporary lack of a more neutral location):

http://sources.redhat.com/cluster/wiki/cluster_in_a_box

Please read it carefully before running cbox.

Fabio
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQIcBAEBCAAGBQJNXOg6AAoJEFA6oBJjVJ+Oky4QAIYZptGXaEeCuOIUhyalZrxA
Piuc05L5De6Nsfe7EtVQmut+mylK2uuF8DmErenIsmy/uIlX7xPHx5vzeOxE0nWk
xTvEjVnAL4t36CNf8AfrVXJ7F+4OsRqNrXjukxTJ7lbq72aYqtr8NQDL7sEfCSBp
XQSywRWIdxjqC5JGHNtKN/rSdcD6AlMt9EutvDHJkWZtzLFAZhbxdPkj0sTDyAWp
qje2Bnsz/BfNLcfnxEhGZsZ+ZH1X/A3ps6xT5EIPo3r3l52HTOSCXNLYLBIY4+pK
E4IdzRLwJOQqYujPjYsMeESBww2cgDFSJFHW/AR5YZCE7MkjS/e/BkZxXBkZY5dq
YFR9KTU1GvQi8Hwi32pDYJXcd0bbkKT/Cq73a2/bkwEqwBR2oWisr/lI196SGKaW
PVBNUM2N4DPxv6Rc3tg966ZycAkR/PM8oa1EbojAC9hl7eM5yQADAzs9mof/YISt
1Nay4gdDYEEzH5Mt6pozUZaK6sQcz63C/vrxGsQZAjAXwZz87jmlHOoLBWGhVj6a
Z/FEWo0ofaYBBQKQnboz4V9EKFP3M4yhJ8vDcWUa3qwaDL9X2BVCg/dtuLq3Q51O
szFMibYzw73Xr4K2CS/XlVZQ4Lykfy7gEKxeEIo/7287W2keL/YgXtuVf+17ZUkl
bdhqV0OXF8hq/xMC+OeU
=QTI7
-END PGP SIGNATURE-
___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] [PATCH] Fix merge markers in spec file

2011-02-07 Thread Fabio M. Di Nitto
ACK.

On 02/08/2011 06:41 AM, Angus Salkeld wrote:
> Signed-off-by: Angus Salkeld 
> ---
>  corosync.spec.in |3 ---
>  1 files changed, 0 insertions(+), 3 deletions(-)
> 
> diff --git a/corosync.spec.in b/corosync.spec.in
> index 1eb07f0..ba54123 100644
> --- a/corosync.spec.in
> +++ b/corosync.spec.in
> @@ -2,8 +2,6 @@
>  %global numcomm @numcomm@
>  %global dirty @dirty@
>  
> -<<< HEAD
> -===
>  # Conditionals
>  # Invoke "rpmbuild --without " or "rpmbuild --with "
>  # to disable or enable specific features
> @@ -13,7 +11,6 @@
>  %bcond_with snmp
>  %bcond_with dbus
>  
> ->>> 2a568d6... Add dbus and snmp notifier
>  Name: corosync
>  Summary: The Corosync Cluster Engine and Application Programming Interfaces
>  Version: @version@

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] corosync 1.3.0 not stable under load

2010-12-22 Thread Fabio M. Di Nitto
On 12/22/2010 10:35 AM, Dietmar Maurer wrote:
>> IIRC corosync-pload is a "destructive test". You need to restart corosync 
>> after its run.
> 
> Ah - good to know. Although I don't really understand why that need to be 
> destructive?

Steven needs to answer this one. I know it does some nasty things
internally.

> 
> Anyways, the bug is reproducible with a single run (most times)
> 
> I just increased the count in corosync-pload.c:
> 
>   result = pload_start (
>   handle,
>   0, /* code */
>   150*10, /* count */
>   300); /* size */
> 
> # /etc/init.d/cman stop
> # /etc/init.d/cman start
> # ./test/cpgbench
> 
> No starting pload leads to the crash:
> 
> # ./tools/corosync-pload
> 
> Any idea?

Don´t run pload.. that´s the issue. It´s not meant to be executed on
something that needs to survive.

Fabio

> 
>>
>> Fabio
>>
>> On 12/22/2010 9:52 AM, Dietmar Maurer wrote:
>>> Corosync v1.3.0 (single node)
>>>
>>> Debian Squeeze AMD64 with latest 2.6.32 kernel
>>>
>>>
>>>
>>> When I run "corosync-pload" it prints:
>>>
>>>
>>>
>>> # corosync-pload
>>>
>>> Init result 1
>>>
>>>
>>>
>>> The process never stops (but I can stop it with cntrl-c), but it seems
>>> to work anyways:
>>>
>>>
>>>
>>> Dec 22 09:32:46 maui corosync[2409]:   [PLOAD ] 150 Writes 300 bytes
>>> per write   2.495 seconds runtime, 601307.250 TP/S,   172.035 MB/S.
>>>
>>> Dec 22 09:32:53 maui corosync[2409]:   [PLOAD ] 150 Writes 300 bytes
>>> per write   3.062 seconds runtime, 489821.674 TP/S,   140.139 MB/S.
>>>
>>> Dec 22 09:33:01 maui corosync[2409]:   [PLOAD ] 150 Writes 300 bytes
>>> per write   4.372 seconds runtime, 343112.460 TP/S,98.165 MB/S.
>>>
>>> Dec 22 09:33:09 maui corosync[2409]:   [PLOAD ] 150 Writes 300 bytes
>>> per write   4.369 seconds runtime, 343358.870 TP/S,98.236 MB/S.
>>>
>>> Dec 22 09:33:53 maui corosync[2409]:   [PLOAD ] 150 Writes 300 bytes
>>> per write   3.475 seconds runtime, 431594.847 TP/S,   123.480 MB/S.
>>>
>>>
>>>
>>> If I now start cpgbench I get:
>>>
>>>
>>>
>>> /corosync-1.3.0/test# ./cpgbench
>>>
>>> 463802 messages received  1000 bytes per write  10.000 Seconds runtime
>>> 46380.121 TP/s  46.380 MB/s.
>>>
>>> 470350 messages received  2000 bytes per write  10.000 Seconds runtime
>>> 47034.864 TP/s  94.070 MB/s.
>>>
>>> 460633 messages received  3000 bytes per write  10.000 Seconds runtime
>>> 46063.231 TP/s 138.190 MB/s.
>>>
>>> 443571 messages received  4000 bytes per write  10.000 Seconds runtime
>>> 44357.016 TP/s 177.428 MB/s.
>>>
>>>
>>>
>>> Everything OK, but if I also start corosync-pload I get a corosync crash:
>>>
>>> /corosync-1.3.0/test# ./cpgbench
>>>
>>> ...
>>>
>>> cpg dispatch returned error 2
>>>
>>>
>>>
>>> and the syslog shows:
>>>
>>>
>>>
>>> Dec 22 09:39:45 maui corosync[2409]:   [PLOAD ] 150 Writes 300 bytes
>>> per write   2.184 seconds runtime, 686771.055 TP/S,   196.487 MB/S.
>>>
>>> Dec 22 09:40:03 maui dlm_controld[2479]: cluster is down, exiting
>>>
>>> Dec 22 09:40:03 maui fenced[2464]: cluster is down, exiting
>>>
>>> Dec 22 09:40:05 maui kernel: dlm: closing connection to node 3
>>>
>>>
>>>
>>> Can someone reproduce that? How can I further debug that?
>>>
>>>
>>>
>>> - Dietmar
>>>
>>>
>>>
>>> ___
>>> Openais mailing list
>>> Openais@lists.linux-foundation.org
>>> https://lists.linux-foundation.org/mailman/listinfo/openais
>>
>> ___
>> Openais mailing list
>> Openais@lists.linux-foundation.org
>> https://lists.linux-foundation.org/mailman/listinfo/openais
> 
> 
> ___
> Openais mailing list
> Openais@lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/openais

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] corosync 1.3.0 not stable under load

2010-12-22 Thread Fabio M. Di Nitto
(sorry top post warning)

IIRC corosync-pload is a "destructive test". You need to restart
corosync after its run.

Fabio

On 12/22/2010 9:52 AM, Dietmar Maurer wrote:
> Corosync v1.3.0 (single node)
> 
> Debian Squeeze AMD64 with latest 2.6.32 kernel
> 
>  
> 
> When I run “corosync-pload” it prints:
> 
>  
> 
> # corosync-pload
> 
> Init result 1
> 
>  
> 
> The process never stops (but I can stop it with cntrl-c), but it seems
> to work anyways:
> 
>  
> 
> Dec 22 09:32:46 maui corosync[2409]:   [PLOAD ] 150 Writes 300 bytes
> per write   2.495 seconds runtime, 601307.250 TP/S,   172.035 MB/S.
> 
> Dec 22 09:32:53 maui corosync[2409]:   [PLOAD ] 150 Writes 300 bytes
> per write   3.062 seconds runtime, 489821.674 TP/S,   140.139 MB/S.
> 
> Dec 22 09:33:01 maui corosync[2409]:   [PLOAD ] 150 Writes 300 bytes
> per write   4.372 seconds runtime, 343112.460 TP/S,98.165 MB/S.
> 
> Dec 22 09:33:09 maui corosync[2409]:   [PLOAD ] 150 Writes 300 bytes
> per write   4.369 seconds runtime, 343358.870 TP/S,98.236 MB/S.
> 
> Dec 22 09:33:53 maui corosync[2409]:   [PLOAD ] 150 Writes 300 bytes
> per write   3.475 seconds runtime, 431594.847 TP/S,   123.480 MB/S.
> 
>  
> 
> If I now start cpgbench I get:
> 
>  
> 
> /corosync-1.3.0/test# ./cpgbench
> 
> 463802 messages received  1000 bytes per write  10.000 Seconds runtime
> 46380.121 TP/s  46.380 MB/s.
> 
> 470350 messages received  2000 bytes per write  10.000 Seconds runtime
> 47034.864 TP/s  94.070 MB/s.
> 
> 460633 messages received  3000 bytes per write  10.000 Seconds runtime
> 46063.231 TP/s 138.190 MB/s.
> 
> 443571 messages received  4000 bytes per write  10.000 Seconds runtime
> 44357.016 TP/s 177.428 MB/s.
> 
>  
> 
> Everything OK, but if I also start corosync-pload I get a corosync crash:
> 
> /corosync-1.3.0/test# ./cpgbench
> 
> …
> 
> cpg dispatch returned error 2
> 
>  
> 
> and the syslog shows:
> 
>  
> 
> Dec 22 09:39:45 maui corosync[2409]:   [PLOAD ] 150 Writes 300 bytes
> per write   2.184 seconds runtime, 686771.055 TP/S,   196.487 MB/S.
> 
> Dec 22 09:40:03 maui dlm_controld[2479]: cluster is down, exiting
> 
> Dec 22 09:40:03 maui fenced[2464]: cluster is down, exiting
> 
> Dec 22 09:40:05 maui kernel: dlm: closing connection to node 3
> 
>  
> 
> Can someone reproduce that? How can I further debug that?
> 
>  
> 
> - Dietmar
> 
> 
> 
> ___
> Openais mailing list
> Openais@lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/openais

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] I fail in rpm making from git clone

2010-12-10 Thread Fabio M. Di Nitto
On 12/10/2010 11:02 AM, nozawat wrote:
> Hi Fabio,
> 
>  Thank you for comment.
>  I ran the following commands again.
>  However, the result of make rpm was same as.
> 
> command:./autogen.sh && ./configure --prefix=$PREFIX
> --with-lcrso-dir=$LCRSODIR --with-version="1.3.0"
> --with-tarball-version="1.3.0"
> 

^^ you need to stop passing the version information all together.

It is NOT up to you to tell ./configure what version to generate. It is
calculated automatically based on git tags and commits.

This "failsafe" is there exactly to avoid people to build random
versions that are not based on the real released version.

Fabio

> Regards,
> Tomo
> 
> 2010/12/10 Fabio M. Di Nitto  <mailto:fabbi...@fabbione.net>>
> 
> On 12/10/2010 10:33 AM, nozawat wrote:
> > Hi
> >
> > I want to make rpm in latest corosync(git clone
> > http://corosync.org/git/corosync.git).
> > However, I became the error if I would make rpm and was not able
> to make
> > it.
> > The making of the rpm file is possible from release of corosync-1.3.0.
> > I attach log when I carried it out.
> 
> 
> >
> > [18:24:10][r...@pm1003 /opt/corosync]$ echo "1.3.0" > .version
> > [18:24:19][r...@pm1003 /opt/corosync]$ echo "1.3.0" > .tarball-version
> 
> ^^^this is not up to you to handle manually.
> 
> Let ./configure do the right thing instead.
> 
> Fabio
> 
> 

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] I fail in rpm making from git clone

2010-12-10 Thread Fabio M. Di Nitto
On 12/10/2010 10:33 AM, nozawat wrote:
> Hi
> 
> I want to make rpm in latest corosync(git clone
> http://corosync.org/git/corosync.git).
> However, I became the error if I would make rpm and was not able to make
> it.
> The making of the rpm file is possible from release of corosync-1.3.0.
> I attach log when I carried it out.


> 
> [18:24:10][r...@pm1003 /opt/corosync]$ echo "1.3.0" > .version
> [18:24:19][r...@pm1003 /opt/corosync]$ echo "1.3.0" > .tarball-version

^^^this is not up to you to handle manually.

Let ./configure do the right thing instead.

Fabio
___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


[Openais] [PATCH 1/2] build: fix make srpm from release tarball

2010-12-01 Thread Fabio M. Di Nitto
Signed-off-by: Fabio M. Di Nitto 
---
 Makefile.am |   22 +++---
 1 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/Makefile.am b/Makefile.am
index f9bfca1..f22f03f 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -88,13 +88,21 @@ clean-generic:
 $(SPEC): $(SPEC).in
rm -f $...@-t $@
LC_ALL=C date="$(shell date "+%a %b %d %Y")" && \
-   gitver="$(shell git describe --abbrev=4 --match='v*' HEAD 2>/dev/null)" 
&& \
-   rpmver=`echo $$gitver | sed -e "s/^v//" -e "s/-.*//g"` && \
-   alphatag=`echo $$gitver | sed -e "s/.*-//" -e "s/^g//"` && \
-   vtag=`echo $$gitver | sed -e "s/-.*//g"` && \
-   numcomm=`git rev-list $$vtag..HEAD | wc -l` && \
-   git update-index --refresh > /dev/null 2>&1 || true && \
-   dirty=`git diff-index --name-only HEAD 2>/dev/null` && \
+   if [ -f .tarball-version ]; then \
+   gitver="$(shell cat .tarball-version)" && \
+   rpmver=$$gitver && \
+   alphatag="" && \
+   dirty="" && \
+   numcomm="0"; \
+   else \
+   gitver="$(shell git describe --abbrev=4 --match='v*' HEAD 
2>/dev/null)" && \
+   rpmver=`echo $$gitver | sed -e "s/^v//" -e "s/-.*//g"` && \
+   alphatag=`echo $$gitver | sed -e "s/.*-//" -e "s/^g//"` && \
+   vtag=`echo $$gitver | sed -e "s/-.*//g"` && \
+   numcomm=`git rev-list $$vtag..HEAD | wc -l` && \
+   git update-index --refresh > /dev/null 2>&1 || true && \
+   dirty=`git diff-index --name-only HEAD 2>/dev/null`; \
+   fi && \
if [ -n "$$dirty" ]; then dirty="dirty"; else dirty=""; fi && \
if [ "$$numcomm" = "0" ]; then \
sed \
-- 
1.7.2.3

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


[Openais] [PATCH 2/2] build: fix rpm build to include corosync-blackbox

2010-12-01 Thread Fabio M. Di Nitto
Signed-off-by: Fabio M. Di Nitto 
---
 corosync.spec.in |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/corosync.spec.in b/corosync.spec.in
index 2eff07e..6b0c464 100644
--- a/corosync.spec.in
+++ b/corosync.spec.in
@@ -82,6 +82,7 @@ fi
 %files
 %defattr(-,root,root,-)
 %doc LICENSE SECURITY
+%{_bindir}/corosync-blackbox
 %{_sbindir}/corosync
 %{_sbindir}/corosync-keygen
 %{_sbindir}/corosync-objctl
-- 
1.7.2.3

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] [PATCH] Set the max buffer size for sockets

2010-12-01 Thread Fabio M. Di Nitto
Signed-off-by: Fabio M. Di Nitto 

On 12/1/2010 5:28 PM, Steven Dake wrote:
> Set the recv buffer to a large size and the send buffer to a large size to
> allow the kernel to store more messages before dropping messages.
> 
> Signed-off-by: Steven Dake 
> ---
>  exec/totemudpu.c |   31 +++
>  1 files changed, 31 insertions(+), 0 deletions(-)
> 
> diff --git a/exec/totemudpu.c b/exec/totemudpu.c
> index 62b4d41..3fad618 100644
> --- a/exec/totemudpu.c
> +++ b/exec/totemudpu.c
> @@ -1303,6 +1303,8 @@ static int totemudpu_build_sockets_ip (
>   int addrlen;
>   int res;
>   int flag;
> + unsigned int recvbuf_size;
> + unsigned int optlen = sizeof (recvbuf_size);
>  
>   /*
>* Setup unicast socket
> @@ -1354,6 +1356,20 @@ static int totemudpu_build_sockets_ip (
>   return (-1);
>   }
>  
> + /*
> +  * the token_socket can receive many messages.  Allow a large number
> +  * of receive messages on this socket
> +  */
> + recvbuf_size = MCAST_SOCKET_BUFFER_SIZE;
> + res = setsockopt (instance->token_socket, SOL_SOCKET, SO_RCVBUF,
> + &recvbuf_size, optlen);
> + if (res == -1) {
> + char error_str[100];
> + strerror_r (errno, error_str, 100);
> + log_printf (instance->totemudpu_log_level_notice,
> + "Could not set recvbuf size %s\n", error_str);
> + }
> +
>   return 0;
>  }
>  
> @@ -1663,6 +1679,8 @@ int totemudpu_member_add (
>  
>   struct totemudpu_member *new_member;
>   int res;
> + unsigned int sendbuf_size;
> + unsigned int optlen = sizeof (sendbuf_size);
>   char error_str[100];
>  
>   new_member = malloc (sizeof (struct totemudpu_member));
> @@ -1687,6 +1705,19 @@ int totemudpu_member_add (
>   "Could not set non-blocking operation on token socket: 
> %s\n", error_str);
>   return (-1);
>   }
> +
> + /*
> +  * These sockets are used to send multicast messages, so their buffers
> +  * should be large
> +  */
> + sendbuf_size = MCAST_SOCKET_BUFFER_SIZE;
> + res = setsockopt (new_member->fd, SOL_SOCKET, SO_SNDBUF,
> + &sendbuf_size, optlen);
> + if (res == -1) {
> + strerror_r (errno, error_str, 100);
> + log_printf (instance->totemudpu_log_level_notice,
> + "Could not set sendbuf size %s\n", error_str);
> + }
>   return (0);
>  }
>  

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


[Openais] [PATCH] build: fix makefile to ship corosync.conf.example.udpu

2010-11-19 Thread Fabio M. Di Nitto
Signed-off-by: Fabio M. Di Nitto 
---
 Makefile.am |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/Makefile.am b/Makefile.am
index cbc47d0..5999dd0 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -53,7 +53,8 @@ dist_doc_DATA = LICENSE INSTALL README.devmap \
 
 corosysconfdir = ${COROSYSCONFDIR}
 
-corosysconf_DATA   = conf/corosync.conf.example
+corosysconf_DATA   = conf/corosync.conf.example \
+ conf/corosync.conf.example.udpu
 
 if INSTALL_AUGEAS
 corolensdir= ${datadir}/augeas/lenses
-- 
1.7.2.3

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


[Openais] [PATCH] build: fix spec file and srpm/rpm generation

2010-11-10 Thread Fabio M. Di Nitto
Signed-off-by: Fabio M. Di Nitto 
---
 Makefile.am |   25 ++---
 1 files changed, 18 insertions(+), 7 deletions(-)

diff --git a/Makefile.am b/Makefile.am
index 6d9e009..fc7a4f1 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -114,13 +114,24 @@ $(SPEC): $(SPEC).in
git update-index --refresh > /dev/null 2>&1 || true && \
dirty=`git diff-index --name-only HEAD 2>/dev/null` && \
if [ -n "$$dirty" ]; then dirty="dirty"; else dirty=""; fi && \
-   sed \
-   -e "s...@version@#$$rpmver#g" \
-   -e "s...@alphatag@#$$alphatag#g" \
-   -e "s...@numcomm@#$$numcomm#g" \
-   -e "s...@dirty@#$$dirty#g" \
-   -e "s...@date@#$$date#g" \
-   $< > $...@-t
+   if [ "$$numcomm" = "0" ]; then \
+   sed \
+   -e "s...@version@#$$rpmver#g" \
+   -e "s#%glo.*alpha.*##g" \
+   -e "s#%glo.*numcomm.*##g" \
+   -e "s...@dirty@#$$dirty#g" \
+   -e "s...@date@#$$date#g" \
+   $< > $...@-t; \
+   else \
+   sed \
+   -e "s...@version@#$$rpmver#g" \
+   -e "s...@alphatag@#$$alphatag#g" \
+   -e "s...@numcomm@#$$numcomm#g" \
+   -e "s...@dirty@#$$dirty#g" \
+   -e "s...@date@#$$date#g" \
+   $< > $...@-t; \
+   fi; \
+   if [ -z "$$dirty" ]; then sed -i -e "s#%glo.*dirty.*##g" $...@-t; fi
chmod a-w $...@-t
mv $...@-t $@
 
-- 
1.7.2.3

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


[Openais] [PATCH] add release script and git based versioning

2010-11-10 Thread Fabio M. Di Nitto
Signed-off-by: Fabio M. Di Nitto 
---
 Makefile.am   |   40 -
 build-aux/git-version-gen |  161 ++
 build-aux/gitlog-to-changelog |  191 +
 build-aux/release.mk  |   75 
 configure.ac  |   10 +--
 corosync.spec.in  |   11 ++-
 exec/main.c   |2 +-
 7 files changed, 474 insertions(+), 16 deletions(-)
 create mode 100755 build-aux/git-version-gen
 create mode 100755 build-aux/gitlog-to-changelog
 create mode 100644 build-aux/release.mk

diff --git a/Makefile.am b/Makefile.am
index c944d8e..6d9e009 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -34,7 +34,12 @@ SPEC = $(PACKAGE_NAME).spec
 TARFILE= $(PACKAGE_NAME)-$(VERSION).tar.gz
 
 EXTRA_DIST = autogen.sh conf/corosync.conf.example $(SPEC).in \
- conf/lenses/tests/test_corosync.aug 
conf/lenses/corosync.aug
+ build-aux/git-version-gen \
+ build-aux/gitlog-to-changelog \
+ build-aux/release.mk \
+ conf/lenses/tests/test_corosync.aug \
+ conf/lenses/corosync.aug \
+ .version
 
 AUTOMAKE_OPTIONS   = foreign
 
@@ -101,10 +106,19 @@ clean-generic:
 $(SPEC): $(SPEC).in
rm -f $...@-t $@
LC_ALL=C date="$(shell date "+%a %b %d %Y")" && \
-   alphatag="$(shell svnversion | sed -e "s#.*:##g" -e "s#[MS]##g")" && \
+   gitver="$(shell git describe --abbrev=4 --match='v*' HEAD 2>/dev/null)" 
&& \
+   rpmver=`echo $$gitver | sed -e "s/^v//" -e "s/-.*//g"` && \
+   alphatag=`echo $$gitver | sed -e "s/.*-//" -e "s/^g//"` && \
+   vtag=`echo $$gitver | sed -e "s/-.*//g"` && \
+   numcomm=`git rev-list $$vtag..HEAD | wc -l` && \
+   git update-index --refresh > /dev/null 2>&1 || true && \
+   dirty=`git diff-index --name-only HEAD 2>/dev/null` && \
+   if [ -n "$$dirty" ]; then dirty="dirty"; else dirty=""; fi && \
sed \
-   -e "s...@alphatag@#r$$alphatag#g" \
-   -e "s...@version@#$(VERSION)#g" \
+   -e "s...@version@#$$rpmver#g" \
+   -e "s...@alphatag@#$$alphatag#g" \
+   -e "s...@numcomm@#$$numcomm#g" \
+   -e "s...@dirty@#$$dirty#g" \
-e "s...@date@#$$date#g" \
$< > $...@-t
chmod a-w $...@-t
@@ -126,3 +140,21 @@ srpm: clean
 rpm: clean
$(MAKE) $(SPEC) $(TARFILE)
rpmbuild $(RPMBUILDOPTS) -ba $(SPEC)
+
+# release/versioning
+BUILT_SOURCES  = .version
+.version:
+   echo $(VERSION) > $...@-t && mv $...@-t $@
+
+dist-hook: gen-ChangeLog
+   echo $(VERSION) > $(distdir)/.tarball-version
+
+gen_start_date = 2000-01-01
+.PHONY: gen-ChangeLog
+gen-ChangeLog:
+   if test -d .git; then   \
+   $(top_srcdir)/build-aux/gitlog-to-changelog \
+   --since=$(gen_start_date) > $(distdir)/cl-t;\
+   rm -f $(distdir)/ChangeLog; \
+   mv $(distdir)/cl-t $(distdir)/ChangeLog;\
+   fi
diff --git a/build-aux/git-version-gen b/build-aux/git-version-gen
new file mode 100755
index 000..795a98b
--- /dev/null
+++ b/build-aux/git-version-gen
@@ -0,0 +1,161 @@
+#!/bin/sh
+# Print a version string.
+scriptversion=2010-10-13.20; # UTC
+
+# Copyright (C) 2007-2010 Free Software Foundation, Inc.
+#
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+# This script is derived from GIT-VERSION-GEN from GIT: http://git.or.cz/.
+# It may be run two ways:
+# - from a git repository in which the "git describe" command below
+#   produces useful output (thus requiring at least one signed tag)
+# - from a non-git-repo directory containing a .tarball-version file, which
+#   presumes this script is invoked like "./git-version-gen .tarbal

Re: [Openais] [PATCH] corosync/master: add release script and git based versioning

2010-11-09 Thread Fabio M. Di Nitto
On 11/10/2010 06:28 AM, Fabio M. Di Nitto wrote:
> Hi Steven,
> 
> as we discussed and tested on IRC, this is the framework to do releases
> from git.
> 
> Left to do:
> 
> - tune the build-aux/release.mk publish target for you environment
> (including/optionally gpgsign).
> 
> - add warning in main.c when using test versions.
> 
> the build-aux/git* scripts have been taken from coreutils/gnulib.
> 
> The patch needs to go in every branch you expect to release from.
> 
> Testing shows that cherry pick in flatiron will give 2 conflicts
> (Makefile.am and corosync.spec.in). Please let me know if you want to do
> the rediff yourself or you prefer me to send you another patch.

Forgot to mention, in order to test/commit the patch, apply and then
chmod 755 build-aux/git* since diff/patch do not preserve file
attributes (git does).

Fabio
___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


[Openais] [PATCH] corosync/master: add release script and git based versioning

2010-11-09 Thread Fabio M. Di Nitto
Hi Steven,

as we discussed and tested on IRC, this is the framework to do releases
from git.

Left to do:

- tune the build-aux/release.mk publish target for you environment
(including/optionally gpgsign).

- add warning in main.c when using test versions.

the build-aux/git* scripts have been taken from coreutils/gnulib.

The patch needs to go in every branch you expect to release from.

Testing shows that cherry pick in flatiron will give 2 conflicts
(Makefile.am and corosync.spec.in). Please let me know if you want to do
the rediff yourself or you prefer me to send you another patch.

Fabio
diff --git a/Makefile.am b/Makefile.am
index c944d8e..6d9e009 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -34,7 +34,12 @@ SPEC = $(PACKAGE_NAME).spec
 TARFILE= $(PACKAGE_NAME)-$(VERSION).tar.gz
 
 EXTRA_DIST = autogen.sh conf/corosync.conf.example $(SPEC).in \
- conf/lenses/tests/test_corosync.aug 
conf/lenses/corosync.aug
+ build-aux/git-version-gen \
+ build-aux/gitlog-to-changelog \
+ build-aux/release.mk \
+ conf/lenses/tests/test_corosync.aug \
+ conf/lenses/corosync.aug \
+ .version
 
 AUTOMAKE_OPTIONS   = foreign
 
@@ -101,10 +106,19 @@ clean-generic:
 $(SPEC): $(SPEC).in
rm -f $...@-t $@
LC_ALL=C date="$(shell date "+%a %b %d %Y")" && \
-   alphatag="$(shell svnversion | sed -e "s#.*:##g" -e "s#[MS]##g")" && \
+   gitver="$(shell git describe --abbrev=4 --match='v*' HEAD 2>/dev/null)" 
&& \
+   rpmver=`echo $$gitver | sed -e "s/^v//" -e "s/-.*//g"` && \
+   alphatag=`echo $$gitver | sed -e "s/.*-//" -e "s/^g//"` && \
+   vtag=`echo $$gitver | sed -e "s/-.*//g"` && \
+   numcomm=`git rev-list $$vtag..HEAD | wc -l` && \
+   git update-index --refresh > /dev/null 2>&1 || true && \
+   dirty=`git diff-index --name-only HEAD 2>/dev/null` && \
+   if [ -n "$$dirty" ]; then dirty="dirty"; else dirty=""; fi && \
sed \
-   -e "s...@alphatag@#r$$alphatag#g" \
-   -e "s...@version@#$(VERSION)#g" \
+   -e "s...@version@#$$rpmver#g" \
+   -e "s...@alphatag@#$$alphatag#g" \
+   -e "s...@numcomm@#$$numcomm#g" \
+   -e "s...@dirty@#$$dirty#g" \
-e "s...@date@#$$date#g" \
$< > $...@-t
chmod a-w $...@-t
@@ -126,3 +140,21 @@ srpm: clean
 rpm: clean
$(MAKE) $(SPEC) $(TARFILE)
rpmbuild $(RPMBUILDOPTS) -ba $(SPEC)
+
+# release/versioning
+BUILT_SOURCES  = .version
+.version:
+   echo $(VERSION) > $...@-t && mv $...@-t $@
+
+dist-hook: gen-ChangeLog
+   echo $(VERSION) > $(distdir)/.tarball-version
+
+gen_start_date = 2000-01-01
+.PHONY: gen-ChangeLog
+gen-ChangeLog:
+   if test -d .git; then   \
+   $(top_srcdir)/build-aux/gitlog-to-changelog \
+   --since=$(gen_start_date) > $(distdir)/cl-t;\
+   rm -f $(distdir)/ChangeLog; \
+   mv $(distdir)/cl-t $(distdir)/ChangeLog;\
+   fi
diff --git a/build-aux/git-version-gen b/build-aux/git-version-gen
new file mode 100755
index 000..795a98b
--- /dev/null
+++ b/build-aux/git-version-gen
@@ -0,0 +1,161 @@
+#!/bin/sh
+# Print a version string.
+scriptversion=2010-10-13.20; # UTC
+
+# Copyright (C) 2007-2010 Free Software Foundation, Inc.
+#
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+
+# This script is derived from GIT-VERSION-GEN from GIT: http://git.or.cz/.
+# It may be run two ways:
+# - from a git repository in which the "git describe" command below
+#   produces useful output (thus requiring at least one signed tag)
+# - from a non-git-repo directory containing a .tarball-version file, which
+#   presumes this script is invoked like "./git-version-gen .tarball-version".
+
+# In order to use intra-version strings in your project, you will need two
+# separate generated version string files:
+#
+# .tarball-version - present only in a distribution tarball, and not in
+#   a checked-out repository.  Created with contents that were learned at
+#   the last time autoconf was run, and used by git-version-gen.  Must not
+#   be pr

Re: [Openais] superfluous dependency in corosync spec file

2010-10-13 Thread Fabio M. Di Nitto
On 10/13/2010 1:17 PM, Vadym Chepkov wrote:
> 
> On Oct 13, 2010, at 4:41 AM, Fabio M. Di Nitto wrote:
>>
>> I think the main difference is that some of those libraries are capable
>> of providing services without the daemon running. Something that´s not
>> true for all packages. Let´s put aside for a minute the build/linking
>> case that used to be a problem many years ago, where maintaining build
>> machine was expensive (tho I understand it, let´s be clear).
>>
> 
> You are saying nothing can possibly benefit from using API in provided in 
> -devel package
> and can utilize corosynclibs on it's own without having cluster software 
> installed?
> libraries don't provide services at all, they do provide API 
> 
>>
>> I don´t remember exactly the detail here (I´d have to test it myself),
>> but let´s assume we do those changes in sync, how does "yum install
>> pacemaker" behave? Would pull in both corosync and heartbeat? or it will
>> force the user to select one? Without repeating myself too much, default
>> should install both IMHO and expert users can then drop the core they do
>> not want.
> 
> it will pull nothing and it's a more reasonable approach, admin has to decide 
> which stack to use.
> 
> Lets take a bacula backup software as an example.
> It provides capability to use all kind of databases as backend server - 
> mysql, postgresql, sqlite,
> so naturally it is linked with all those libraries, but they won't force you 
> to install all database servers for that 
> or even command line clients, because all they need is a library, which will 
> provide API, not a "service"

There is a difference here.

You can use mysql-libs to connect to a remote instance of mysql server
(same for pgsql, not sure about sqlite). So in theory you can benefit
from the libs API even without a local installation of mysql server.

With corosync, there is no such thing. The corosynclibs need local corosync.

Fabio
___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] superfluous dependency in corosync spec file

2010-10-13 Thread Fabio M. Di Nitto
On 10/13/2010 11:42 AM, Vladislav Bogdanov wrote:
> 13.10.2010 11:41, Fabio M. Di Nitto wrote:
> ...
>>> It is absolutely legal to have both heartbeat and corosync packages
>>> installed even with that "Provides: cluster-engine" virtual dependencies
>>> as long as they do not have "Conflicts:" or "Obsoletes:" on each other
>>> or some filesystem-level conflicts.
>>> So, each of them could be added later (or removed as long as another
>>> remains installed).
>>
>> I don´t remember exactly the detail here (I´d have to test it myself),
>> but let´s assume we do those changes in sync, how does "yum install
>> pacemaker" behave? Would pull in both corosync and heartbeat? or it will
>> force the user to select one? Without repeating myself too much, default
>> should install both IMHO and expert users can then drop the core they do
>> not want.
> 
> Just tested on dummy packages.
> yum decided to install cluster engine which has a "greater" name
> (although this can be a wrong assumption).
> I can attach rpm specs if you are interested in them (cleng1 and cleng2
> are identical "cluster engines").
> 
> ==
> # yum install crm
> Setting up Install Process
> Resolving Dependencies
> --> Running transaction check
> ---> Package crm.x86_64 0:0.0.1-1.fc13 set to be installed
> --> Processing Dependency: crm-libs = 0.0.1-1.fc13 for package:
> crm-0.0.1-1.fc13.x86_64
> --> Processing Dependency: cluster-engine for package:
> crm-0.0.1-1.fc13.x86_64
> --> Processing Dependency: cleng2-libs for package: crm-0.0.1-1.fc13.x86_64
> --> Processing Dependency: cleng1-libs for package: crm-0.0.1-1.fc13.x86_64
> --> Running transaction check
> ---> Package cleng1-libs.x86_64 0:0.0.1-1.fc13 set to be installed
> ---> Package cleng2.x86_64 0:0.0.1-1.fc13 set to be installed
> ---> Package cleng2-libs.x86_64 0:0.0.1-1.fc13 set to be installed
> ---> Package crm-libs.x86_64 0:0.0.1-1.fc13 set to be installed
> --> Finished Dependency Resolution
> 
> Dependencies Resolved
> ==

Thanks for taking the time to test this.

My only objection to this approach becomes purely "political" as it is
going to create some controversial debates on which engine should have
the "greater" name tho. And truth told, I really don´t think it´s worth
the discussion. My position is "let´s give both to users and let them
decide what they want", rather than having to listen to different
complains on which one should be best.. I am sure you know what I mean.

Let´s wait anyway for Tom to come back to us for an extra external opinion.

Fabio
___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] superfluous dependency in corosync spec file

2010-10-13 Thread Fabio M. Di Nitto
(purging some off topic bits ;))

On 10/13/2010 10:12 AM, Vladislav Bogdanov wrote:
> 
>>> I already wrote that on pacemaker list, "-libs" is generally not a
>>> subpackage, but rather a "superpackage". If libraries are in base
>>> package then dependency of subpackages on base package is correct
>>> (automatic BTW). If corosync package contains libraries, and
>>> corosync-daemons contains daemon, initscript etc., then this is correct
>>> too. But not if libraries splitted to a subpackage (IMHO).
>>
>> Well they are considered subpackages by many (if not all) distros,
>> changing that kind of mindset won´t start from here :)
> 
> Yes, formally they are.
> OK, Let's wait for the answer from guidelines maintainer...
> Majority of fedora "-libs" packages do not require base package (I
> checked that), so that simply could be an incompleteness of guidelines.

I think the main difference is that some of those libraries are capable
of providing services without the daemon running. Something that´s not
true for all packages. Let´s put aside for a minute the build/linking
case that used to be a problem many years ago, where maintaining build
machine was expensive (tho I understand it, let´s be clear).

> 
 So let's assume both heartbeat and corosync drops that Require lines, I 
 am willing to bet in a matter of a week or two, somebody is going to 
 report that installing pacemaker doesn't install required dependencies 
 on corosync or heartbeat.
>>>
>>> This is usually solved by "virtual" dependencies. If both corosync and
>>> heartbeat have "Provides: cluster-engine" and pacemaker has "Requires:
>>> cluster-engine" then everything will go smooth.
>>
>> It won´t solve the problem because pacemaker still links with both -libs
>> and those will pull in both daemons. Unless some more surgery is done by
>> dropping -libs Requires: base-package, both daemons will Provide:
>> cluster-engine and pacemaker will Require: cluster-engine.
> 
> That is exactly what I meant.
> 
>> In order to drop the Requires: base-package, we might have to ask
>> exceptions (at least for corosync since the libraries are not functional
>> without the daemon and it doesn´t make a lot of sense).
>>
>> Assuming we even want to go through that expensive (time wise and
>> coordination at least) route, it will leave the users with a thought
>> choice to make right from "yum install" step to decide what core to use
>> and understand package layout from day 0.
> 
> It is absolutely legal to have both heartbeat and corosync packages
> installed even with that "Provides: cluster-engine" virtual dependencies
> as long as they do not have "Conflicts:" or "Obsoletes:" on each other
> or some filesystem-level conflicts.
> So, each of them could be added later (or removed as long as another
> remains installed).

I don´t remember exactly the detail here (I´d have to test it myself),
but let´s assume we do those changes in sync, how does "yum install
pacemaker" behave? Would pull in both corosync and heartbeat? or it will
force the user to select one? Without repeating myself too much, default
should install both IMHO and expert users can then drop the core they do
not want.

(and yes, let´s avoid the whole Conflicts/Obsoletes thingy as we are
trying to play nicely with each other ;))

Fabio
___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] superfluous dependency in corosync spec file

2010-10-13 Thread Fabio M. Di Nitto
On 10/13/2010 6:52 AM, Vladislav Bogdanov wrote:
> 13.10.2010 07:14, Fabio M. Di NItto wrote:
>> On 10/13/2010 12:14 AM, Vadym Chepkov wrote:
>>
>>>> Also you might want to notice that there is no way any of the corosync
>>>> library can be of any use on a system without corosync main package.
>>>
>>> Of cause there is. What if you just compile pacemaker, for instance?
>>> Why would you need to install corosync daemon to just link pacemaker binary?
>>>
>>
>> So you are telling me that your build machine doesn't have a few MB of 
>> harddisk to install corosync rpm? If that's the problem you are trying 
>> to solve, I think you have some other issues to solve first, because in 
>> that condition, you will likely be unable to apply any security updates 
>> to any of the other packages.
>>
>> and as I explained in another email in this thread, we do have the exact 
>> same issue for users of pacemaker with corosync that finds themselves 
>> installing heartbeat. The major difference is that we do acknowledge the 
>> reason why is done that way and live with it. Life goes on.
>>
>> Pacemaker, is one of the few pieces of software that links against 2 
>> competitive "cores". It is cool enough to add support for both at the 
>> price of a few extra MB of harddisk used on the final system, with the 
>> benefit that everything users might want, works out of the box.
> 
> Offtopic: This reminds me installation of thunderbird on netbook with
> 8Gb flash. It has (had?) dependency on gnome-vfs library which in turn
> pulled whole gnome to disk.

Offtopic^2: Be careful, that I never said that the dependency chain, as
required by fedora is perfect. I come from a Debian background where
each lib is in its own package (you get the idea) :) All I am saying is
that the chain, as expressed now, allow users to get a fully functional
system, and in this specific case, with the option to select the core
they want without caring too much about packages layout.

> 
> I already wrote that on pacemaker list, "-libs" is generally not a
> subpackage, but rather a "superpackage". If libraries are in base
> package then dependency of subpackages on base package is correct
> (automatic BTW). If corosync package contains libraries, and
> corosync-daemons contains daemon, initscript etc., then this is correct
> too. But not if libraries splitted to a subpackage (IMHO).

Well they are considered subpackages by many (if not all) distros,
changing that kind of mindset won´t start from here :)

> 
>>
>> So let's assume both heartbeat and corosync drops that Require lines, I 
>> am willing to bet in a matter of a week or two, somebody is going to 
>> report that installing pacemaker doesn't install required dependencies 
>> on corosync or heartbeat.
> 
> This is usually solved by "virtual" dependencies. If both corosync and
> heartbeat have "Provides: cluster-engine" and pacemaker has "Requires:
> cluster-engine" then everything will go smooth.

It won´t solve the problem because pacemaker still links with both -libs
and those will pull in both daemons. Unless some more surgery is done by
dropping -libs Requires: base-package, both daemons will Provide:
cluster-engine and pacemaker will Require: cluster-engine.

In order to drop the Requires: base-package, we might have to ask
exceptions (at least for corosync since the libraries are not functional
without the daemon and it doesn´t make a lot of sense).

Assuming we even want to go through that expensive (time wise and
coordination at least) route, it will leave the users with a thought
choice to make right from "yum install" step to decide what core to use
and understand package layout from day 0.
IMHO, given that clustering is already not very super user-friendly, I
don´t really feel the urgent need to impose that on users immediately,
specially when the cost to allow the change in future is only a few MB
of harddisk space and one package installed (either that being corosync
or heartbeat, I am looking at it in both directions).

Fabio
___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] superfluous dependency in corosync spec file

2010-10-12 Thread Fabio M. Di NItto
On 10/13/2010 12:14 AM, Vadym Chepkov wrote:

>> Also you might want to notice that there is no way any of the corosync
>> library can be of any use on a system without corosync main package.
>
> Of cause there is. What if you just compile pacemaker, for instance?
> Why would you need to install corosync daemon to just link pacemaker binary?
>

So you are telling me that your build machine doesn't have a few MB of 
harddisk to install corosync rpm? If that's the problem you are trying 
to solve, I think you have some other issues to solve first, because in 
that condition, you will likely be unable to apply any security updates 
to any of the other packages.

and as I explained in another email in this thread, we do have the exact 
same issue for users of pacemaker with corosync that finds themselves 
installing heartbeat. The major difference is that we do acknowledge the 
reason why is done that way and live with it. Life goes on.

Pacemaker, is one of the few pieces of software that links against 2 
competitive "cores". It is cool enough to add support for both at the 
price of a few extra MB of harddisk used on the final system, with the 
benefit that everything users might want, works out of the box.

So let's assume both heartbeat and corosync drops that Require lines, I 
am willing to bet in a matter of a week or two, somebody is going to 
report that installing pacemaker doesn't install required dependencies 
on corosync or heartbeat.

Fabio
___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] superfluous dependency in corosync spec file

2010-10-12 Thread Fabio M. Di NItto
On 10/12/2010 08:02 PM, Vladislav Bogdanov wrote:
> 12.10.2010 20:43, Fabio M. Di NItto wrote:
> ...
>> Also you might want to notice that there is no way any of the corosync
>> library can be of any use on a system without corosync main package.
>
> Then what is the reason to have both corosync and corosynclib packages
> rather then one monolithic corosync package? There is no way to install
> one without another anyways...

Funny you ask that.. in the beginning I did the packaging without the 
library requiring the base package.. with more or less the same 
expectations you have now. I had to change to comply with Fedora Guidelines.

But to be honest, reading the thread again, I am still not sure what 
problem you are trying to solve here.

Looking at the first email you mention installing pacemaker with 
heartbeat, without corosync. The same is true the other way around, but 
I still don't see a problem.. you are installing a a few MB extra of 
dependency, it won't kill anybody given how many hundreds of MB of 
dependencies are pulled in on a system that you probably will never ever 
use anyhow.

Fabio
___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] superfluous dependency in corosync spec file

2010-10-12 Thread Fabio M. Di NItto
On 10/12/2010 07:09 PM, Vadym Chepkov wrote:
> I disagree, users don't usually install libs directly, unless they
intend to, they would install corosync package if they need it.

This is pretty much Debian you are talking about.

> you can check yourself with majority of the packages: xen-libs don't
require xen, pacemaker-libs doesn't require pacemaker, net-snmp-libs
doesn't require net-snmp, just to name a few.

what distribution are you looking at? In Fedora, where the spec file was 
first done as template for others to use and modify as needed, it's 
pretty much mandatory to have the subpackage Require the main package.

http://fedoraproject.org/wiki/Packaging/Guidelines#RequiringBasePackage

http://fedoraproject.org/wiki/Packaging/ReviewGuidelines#Things_To_Check_On_Review

"SHOULD: Usually, subpackages other than devel should require the base 
package using a fully versioned dependency. [21]"

If all the packages you mention above do not Require the main package, 
either they have an exception from the Fedora Board, or they are not 
strictly following the Fedora packaging guidelines.

Anyway I agree with Steven, and that was mentioned several times before, 
that the spec file upstream provides is only a template that _must_ be 
adjust to the distribution you are using. Instead of changing the 
template to fit only one, we welcome patches to either provide alternate 
spec files, or even better add the correct %if / %endif instances to 
special case based on distro.

Also you might want to notice that there is no way any of the corosync 
library can be of any use on a system without corosync main package.

Fabio
___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] openais trunk - change shutdown priority to 80

2010-09-04 Thread Fabio M. Di NItto
On 09/04/2010 07:23 PM, Steven Dake wrote:
> On 09/03/2010 09:33 PM, Fabio M. Di NItto wrote:
>> On 09/03/2010 09:13 PM, Ryan O'Hara wrote:
>>>
>>> Same as Steve's patch to corosync init script.
>>>
>>
>> Can you also consider adding "Provides: corosync" as suggested in the
>> thread?
>>
>
> wouldn't openais need a requires?

Not in the init script, no.

The init script header can declare: "Provides: corosync", that will 
allow any other package to only Requires: corosync and it would make no 
difference for the init order if the user decides to start openais instead.

Fabio
___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] openais trunk - change shutdown priority to 80

2010-09-03 Thread Fabio M. Di NItto
On 09/03/2010 09:13 PM, Ryan O'Hara wrote:
>
> Same as Steve's patch to corosync init script.
>

Can you also consider adding "Provides: corosync" as suggested in the 
thread?

Thanks
Fabio
___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] init script runlevel at 20/20 instead of 20/80 debate

2010-09-03 Thread Fabio M. Di Nitto
On 9/3/2010 10:26 AM, Vladislav Bogdanov wrote:
> 03.09.2010 11:16, Fabio M. Di Nitto wrote:
>> On 9/3/2010 10:00 AM, Keisuke MORI wrote:
>>> 2010/9/3 Fabio M. Di Nitto :
>>>> so the current init script has:
>>>>
>>>>> # chkconfig: - 20 20
>>>>
>>>> and that is definitely wrong. It must have slept through the crack when
>>>> we re-did the init script a while ago. Kudos to Vladislav for noticing it.
>>>>
>>>> (making a bunch of assumptions here) the general rule is:
>>>>
>>>> stop-priority-value = 100 - start-priority-value.
>>>>
>>>> to guarantee the service start/stop symmetry that is pretty much the
>>>> case for corosync.
>>>>
>>>> So a value of 20/80 would be correct.
>>>>
>>>> Now this should address the first concern reported.
>>>
>>>
>>> As for the starting order, I would be grad if you could also consider
>>> the attached patch.
>>>
>>> It will adjust the dependency with syslog correctly so that can
>>> prevent a problem when you use rsyslog as I reported before:
>>> https://lists.linux-foundation.org/pipermail/openais/2010-July/014946.html
>>
>> I think this is a sane patch and should go in.
> 
> BTW shouldn't openais initscript also include LSB stanza
> # Provides: corosync

Yes, ACK.

Fabio
___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] init script runlevel at 20/20 instead of 20/80 debate

2010-09-03 Thread Fabio M. Di Nitto
On 9/3/2010 10:00 AM, Keisuke MORI wrote:
> 2010/9/3 Fabio M. Di Nitto :
>> so the current init script has:
>>
>>> # chkconfig: - 20 20
>>
>> and that is definitely wrong. It must have slept through the crack when
>> we re-did the init script a while ago. Kudos to Vladislav for noticing it.
>>
>> (making a bunch of assumptions here) the general rule is:
>>
>> stop-priority-value = 100 - start-priority-value.
>>
>> to guarantee the service start/stop symmetry that is pretty much the
>> case for corosync.
>>
>> So a value of 20/80 would be correct.
>>
>> Now this should address the first concern reported.
> 
> 
> As for the starting order, I would be grad if you could also consider
> the attached patch.
> 
> It will adjust the dependency with syslog correctly so that can
> prevent a problem when you use rsyslog as I reported before:
> https://lists.linux-foundation.org/pipermail/openais/2010-July/014946.html

I think this is a sane patch and should go in.

The only thing that puzzles me about this requirement is that many
daemons do use syslog before the daemon is available and don´t fail.

All C calls to syslog returns void so there is no way to know if they
succeeded or not (as sign that the daemon is running), and glibc should
make that transparent. So I wonder if the issue is really corosync that
needs to start after syslog or we are masking another bug somewhere else
(most likely glibc).

Fabio
___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] init script runlevel at 20/20 instead of 20/80 debate

2010-09-02 Thread Fabio M. Di Nitto
On 9/2/2010 9:09 PM, Steven Dake wrote:
> On 08/31/2010 08:45 PM, Vladislav Bogdanov wrote:
>> Hi,
>> 31.08.2010 22:22, Steven Dake wrote:
>>> I am pleased to announce Corosync 1.2.8 is available for immediate
>>> download from our website.
>>
>> Initscript doesn't seem to be fixed yet.
>> http://marc.info/?l=openais&m=128271460429681&w=2
>> http://www.mail-archive.com/pacema...@oss.clusterlabs.org/msg05833.html
>>
>> Best,
>> Vladislav
> 
> Fabio,
> 
> Any chance you can provide feedback on this topic?  I'm at a loss in
> this area of distro integration.

so the current init script has:

> # chkconfig: - 20 20

and that is definitely wrong. It must have slept through the crack when
we re-did the init script a while ago. Kudos to Vladislav for noticing it.

(making a bunch of assumptions here) the general rule is:

stop-priority-value = 100 - start-priority-value.

to guarantee the service start/stop symmetry that is pretty much the
case for corosync.

So a value of 20/80 would be correct.

Now this should address the first concern reported.

The other issue with start-early/late, that´s very debatable and
dependent on use cases and distribution init process implementation (and
the way it calculates init dependencies).

My general approach is that the generic/default init script should
satisfy the majority of the cases out there (it´s called generic for a
good reason) and the current values do (modulo the error in the stop
sequence that needs to be addressed).

Every sysadmin has the power to tailor those values with system command
tools, without the need to rebuild the rpm or editing the init script
itself (chkconfig or whatever they use nowadays).

The idea of having an early/late script is something I really don´t
like. It involves a chain of changes to be done properly that adds an
unnecessary complexity specially when there are already tools to move
scripts priority around that are tailored to the OS/Distribution behavior.

Fabio
___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] init script runlevel at 20/20 instead of 20/80 debate

2010-09-02 Thread Fabio M. Di Nitto
On 09/02/2010 09:09 PM, Steven Dake wrote:
> On 08/31/2010 08:45 PM, Vladislav Bogdanov wrote:
>> Hi,
>> 31.08.2010 22:22, Steven Dake wrote:
>>> I am pleased to announce Corosync 1.2.8 is available for immediate
>>> download from our website.
>>
>> Initscript doesn't seem to be fixed yet.
>> http://marc.info/?l=openais&m=128271460429681&w=2
>> http://www.mail-archive.com/pacema...@oss.clusterlabs.org/msg05833.html
>>
>> Best,
>> Vladislav
>
> Fabio,
>
> Any chance you can provide feedback on this topic? I'm at a loss in this
> area of distro integration.

Sure, but it will have to wait tomorrow morning once I get in the office.

Fabio
___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] [PATCH] Respect user LDFLAGS, don't enforce any additional CFLAGS by default

2010-08-25 Thread Fabio M. Di Nitto
On 8/25/2010 11:16 AM, Kacper Kowalik wrote:
> W dniu 25.08.2010 10:37, Fabio M. Di Nitto pisze:
>> On 8/25/2010 10:26 AM, Kacper Kowalik wrote:
>>> Hi,
>>> I would like to ask for inclusion regarding two issues:
>>>  1) respect LDFLAGS env during linking
> 
>> this is good one, did you check if all Makefiles.am do respect LD_FLAGS
>> already? If not, the change should probably be more global.
> 
> Yup, that's all I found. You can verify this by:
>  1. export LDFLAGS=-Wl,--hash-style=gnu
>  2. build
>  3. scanelf -qyRF '%k %p' -k .hash ${BUILDIR} | sed -e "s:\.hash ::"

Ok, I take your word for it :)

> 
>>>  2) don't enforce any optimization/debug flags by default
> 
>> hmm I disagree here.
> 
>> -O3 is necessary to obtain performance out of the sober crypto code. I
>> recall a major impact in performance with anything lower than that.
> Remember that setting no flags is eqv. to -O2, but it doesn't matter here.
> If you have a set of flags that you consider as "best" for your package
> please use them. Make them even default. I only ask for a possibility to
> switch them off, e.g. by --disable-optimization. This way it won't
> change current behaviour of openais at all and will suite my needs.

Yes, I understand the issue, then let´s make your patch "backwards" to
allow --disable-optimizations rather than changing the defaults. I
gather that would work for you and I don´t have a problem with that.

Fabio
___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais

Re: [Openais] [PATCH] Respect user LDFLAGS, don't enforce any additional CFLAGS by default

2010-08-25 Thread Fabio M. Di Nitto
On 8/25/2010 10:26 AM, Kacper Kowalik wrote:
> Hi,
> I would like to ask for inclusion regarding two issues:
>  1) respect LDFLAGS env during linking

this is good one, did you check if all Makefiles.am do respect LD_FLAGS
already? If not, the change should probably be more global.

>  2) don't enforce any optimization/debug flags by default

hmm I disagree here.

-O3 is necessary to obtain performance out of the sober crypto code. I
recall a major impact in performance with anything lower than that.
Steven do you still have the numbers?

the -g is actually default in many projects, since you can easily strip
the debug info with strip(1) and that´s what most package management
build tools do.

> Both are default policies in most OSes.

I don´t want to end in a distro/OS war, but you would be surprised that
many do exactly the opposite by embedding different build/linker flags
either as defaults in the compilers or enforcing some C/LDFLAGS in their
default build environment.

Fabio
___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais

Re: [Openais] Allow corosync run only once

2010-07-29 Thread Fabio M. Di Nitto
Looks ok to me (untested).

Fabio

On 7/28/2010 4:08 PM, Jan Friesse wrote:
> Attached is better implementation of SUBJ patch. It also modifies init
> script file to NOT create pid file.
> 
> Second patch is for better integration with cman. If corosync was
> started by cman, corosync refuses to stop.
> 
> Regards,
>   Honza
> 
> 
> Fabio M. Di Nitto wrote:
>> On 7/22/2010 7:56 PM, Steven Dake wrote:
>>> On 07/22/2010 10:25 AM, Steven Dake wrote:
>>>> On 07/22/2010 08:49 AM, Jan Friesse wrote:
>>>>> Patch uses flock to ensure that only one instance of corosync is running.
>>>>>
>>>>> Regards,
>>>>> Honza
>>>>>
>>>>>
>>>>>
>>>>> ___
>>>>> Openais mailing list
>>>>> Openais@lists.linux-foundation.org
>>>>> https://lists.linux-foundation.org/mailman/listinfo/openais
>>>> g
>>>> reat work! good for merge
>>>>
>>> Honza,
>>>
>>> After talking with Fabio, he mentioned that the proper place for this 
>>> file is /var/run/corosync.pid, and the contents of this file should be 
>>> the active process ID of the corosync child.
>>>
>>> Sorry for not getting more details earlier.
>>
>> My bad, it did pass under my radar.
>>
>> Honza, the implementation you did is ok, i also suggest to look into
>> dm_create_lockfile(const char*) implementation that´s been recently
>> written to be "strong".
>>
>> Fabio
> 

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] Allow corosync run only once

2010-07-22 Thread Fabio M. Di Nitto
On 7/22/2010 7:56 PM, Steven Dake wrote:
> On 07/22/2010 10:25 AM, Steven Dake wrote:
>> On 07/22/2010 08:49 AM, Jan Friesse wrote:
>>> Patch uses flock to ensure that only one instance of corosync is running.
>>>
>>> Regards,
>>> Honza
>>>
>>>
>>>
>>> ___
>>> Openais mailing list
>>> Openais@lists.linux-foundation.org
>>> https://lists.linux-foundation.org/mailman/listinfo/openais
>> g
>> reat work! good for merge
>>
> 
> Honza,
> 
> After talking with Fabio, he mentioned that the proper place for this 
> file is /var/run/corosync.pid, and the contents of this file should be 
> the active process ID of the corosync child.
> 
> Sorry for not getting more details earlier.

My bad, it did pass under my radar.

Honza, the implementation you did is ok, i also suggest to look into
dm_create_lockfile(const char*) implementation that´s been recently
written to be "strong".

Fabio
___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] [PATCH] Fix logging_daemon config parsing and behaviour

2010-07-18 Thread Fabio M. Di Nitto
Committed revision 2997.


On 7/17/2010 2:34 AM, Steven Dake wrote:
> On 07/16/2010 12:15 AM, Fabio M. Di Nitto wrote:
>> Hi Steven,
>>
>> patch in attachment addresses bz 615203
>>
>> Needs to be applied to trunk _and_ flatiron.
>>
>> Thanks
>> Fabio
>>
>>
>>
>> ___
>> Openais mailing list
>> Openais@lists.linux-foundation.org
>> https://lists.linux-foundation.org/mailman/listinfo/openais
> good for merge into trunk
> 
> thanks
> -steve
> ___
> Openais mailing list
> Openais@lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/openais

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


[Openais] [PATCH] Fix logging_daemon config parsing and behaviour

2010-07-16 Thread Fabio M. Di Nitto
Hi Steven,

patch in attachment addresses bz 615203

Needs to be applied to trunk _and_ flatiron.

Thanks
Fabio
Index: exec/mainconfig.c
===
--- exec/mainconfig.c   (revision 2989)
+++ exec/mainconfig.c   (working copy)
@@ -528,22 +528,26 @@
object_logger_subsys_handle,
"name", &value)) {
 
-   if ((strcmp(value, "corosync") == 0) &&
-  (!objdb_get_string (objdb,
-   object_logger_subsys_handle,
-   "subsys", &value))) {
-
-   if (corosync_main_config_set (objdb,
-   
object_logger_subsys_handle,
-   value,
-   &error_reason) < 0) {
-   goto parse_error;
+   if (strcmp(value, "corosync") == 0) {
+   if (!objdb_get_string (objdb,
+   object_logger_subsys_handle,
+   "subsys", &value)) {
+   if (corosync_main_config_set 
(objdb,
+   
object_logger_subsys_handle,
+   value,
+   &error_reason) 
< 0) {
+   goto parse_error;
+   }
}
+   else {
+   if (corosync_main_config_set 
(objdb,
+   
object_logger_subsys_handle,
+   NULL,
+   &error_reason) 
< 0) {
+   goto parse_error;
+   }
+   }
}
-   else {
-   error_reason = "subsys required for 
logging_daemon directive";
-   goto parse_error;
-   }
}
else {
error_reason = "name required for 
logging_daemon directive";
___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais

Re: [Openais] corosync trunk - have corosync own /var/log/cluster

2010-06-29 Thread Fabio M. Di Nitto
Steven,

you need to get /var from %{localstatedir} in the build system.

The spec file needs to be generated based on that (we already do for 
other dirs IIRC, so it should be very simple to pass localstatedir too.

Any other solution will break custom install path.

Fabio

On 06/29/2010 01:22 AM, Steven Dake wrote:
> have corosync own the directory /var/log/cluster
>
> have example file use /var/log/cluster/cluster.log
>
> Regards
> -steve
>
>
>
> ___
> Openais mailing list
> Openais@lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/openais

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] [PATCH] V3 Fix debug function in logsys.c (Was Re: Remove unused functions from logsys.c)

2010-06-14 Thread Fabio M. Di Nitto
On 06/12/2010 09:29 PM, Steven Dake wrote:
> Andreas,
>
> Thanks for all the work on this, but your intuition of removing the code
> completely (your first patch) is correct. This is why the patch was
> applied without argument. I don't want ifdef's in the code around debug
> output. Libraries in general, such as logsys, should never print any
> type of output (unless that is their purpose in the case of stderr mode
> of operation). If a field deployment can't turn optional code with a
> runtime (vs build time) configuration option, then it doesn't belong in
> the codebase.
>
> In some cases we make exceptions but they are rare and this type of
> debug output doesn't warrant one of those exceptions.

Then make sure to keep a copy of the patch around for your own sake. As 
I already explained to you, the patch debugs a very specific problem in 
logsys init code that is dependent on gcc behaviour (compiler and 
linker). You really don't want to waste heaps of time re-writing it again.

Fabio
___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] corosync.spec

2010-06-09 Thread Fabio M. Di Nitto
On 6/10/2010 6:18 AM, Vadym Chepkov wrote:
> 
> On Jun 9, 2010, at 11:56 PM, Fabio M. Di Nitto wrote:
> 
>> On 6/10/2010 1:50 AM, Vadym Chepkov wrote:
>>> Hi,
>>>
>>> There are several issues with corosync spec file.
>>>
>>> - configure script should be called in %build, not in %prep section.
>>> - the macro used for init.d is wrong
>>> - chckonfig --add should be called only when rpm is installed, not during 
>>> upgrade, because it will overwrite the custom set priorities
>>>
>>> I attached the patch:
>>
>> Not acknowledge.
>>
>> The %prep vs %build is not a requirement in any rpm based distribution.
>> configuring the tree is preparation of the tree and not build (but we
>> can argue about this forever as it goes down to how you see at the whole
>> build process).
> 
> If one would want to see what the source code is getting compiled he would 
> execute:
> 
> rpmbuild -bp --nodeps corosync.spec
> 
> If you put configure in the prep section it would go through commotion of 
> looking for build tools and fails when none of the build tools are installed 
> and all that for no reason.
> 
> http://www.rpm.org/max-rpm/s1-rpm-inside-scripts.html
> suggests to run configuration script in the build section all Redhat rpms I 
> saw behave this way
> I still remember what R in rpm stands for.

I guess we will need to agree to disagree.

The %prep Script
[snip of the obvious irrelevant bits]

# Perform any other actions required to get the sources in a
ready-to-build state.

[snip]

The %build Script

The %build script picks up where the %prep script left off. Once the
%prep script has gotten everything ready for the build, the %build
script is usually somewhat anti-climactic — normally invoking make,
maybe a configuration script, and little else.
^^
[snip]

Either way, let´s not waste time or energy here. As long as it works, I
care little.

>>
>> As we discussed via email, caging the call to chkconfig only solves part
>> of the problem you reported and not all of it. The correct solution is
>> to ship a specific init script for rhel5.
> 
> It still preserves custom priorities set by administrator.
> If he wants to revert to default he would run
> 
> chkconfig corosync resetpriorities

it doesn´t protect you from:

chkconfig corosync off
chkconfig corosync on

i am not saying your patch is completely wrong. I am saying i´d like to
see a proper and complete fix including correct rhel5 values as you
reported them not to be correct on top of the protection.

Fabio
___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] corosync.spec

2010-06-09 Thread Fabio M. Di Nitto
On 6/10/2010 6:04 AM, Vadym Chepkov wrote:
> 
> On Jun 9, 2010, at 11:58 PM, Fabio M. Di Nitto wrote:
> 
>> On 6/10/2010 3:40 AM, Vadym Chepkov wrote:
>>> Didn't look right, more like this:
>>>
>>> %{!?_initddir: %{expand: %%define _initddir %{_sysconfdir}/rc.d/init.d}}
>>
>> But why making it so complicated when the current macro is portable to
>> all rpm based distribution?
>>
>> Fabio
> 
> Run it on RHEL5
> 
> # rpm --showrc|grep init
> -14: _initrddir   %{_sysconfdir}/rc.d/init.d
> 
> Vadym

It appears that the fix for
https://bugzilla.redhat.com/show_bug.cgi?id=575165
was never merged into flatiron (1.2.z)

That´s what was confusing me. Sorry I got it backwards.

Fabio
___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] corosync.spec

2010-06-09 Thread Fabio M. Di Nitto
On 6/10/2010 3:40 AM, Vadym Chepkov wrote:
> Didn't look right, more like this:
> 
> %{!?_initddir: %{expand: %%define _initddir %{_sysconfdir}/rc.d/init.d}}

But why making it so complicated when the current macro is portable to
all rpm based distribution?

Fabio
___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] corosync.spec

2010-06-09 Thread Fabio M. Di Nitto
On 6/10/2010 1:50 AM, Vadym Chepkov wrote:
> Hi,
> 
> There are several issues with corosync spec file.
> 
> - configure script should be called in %build, not in %prep section.
> - the macro used for init.d is wrong
> - chckonfig --add should be called only when rpm is installed, not during 
> upgrade, because it will overwrite the custom set priorities
> 
> I attached the patch:

Not acknowledge.

The %prep vs %build is not a requirement in any rpm based distribution.
configuring the tree is preparation of the tree and not build (but we
can argue about this forever as it goes down to how you see at the whole
build process).

The init macro you suggest is not portable to rhel5.

As we discussed via email, caging the call to chkconfig only solves part
of the problem you reported and not all of it. The correct solution is
to ship a specific init script for rhel5.

Fabio

> 
> 
> --- corosync.spec.in  (revision 2942)
> +++ corosync.spec.in  (working copy)
> @@ -32,6 +32,7 @@
>  %prep
>  %setup -q -n %{name}-%{version}
>  
> +%build
>  %if %{buildtrunk}
>  ./autogen.sh
>  %endif
> @@ -43,9 +44,8 @@
>  %{configure} \
>   --enable-nss \
>   --enable-rdma \
> - --with-initddir=%{_initddir}
> + --with-initddir=%{_initrddir}
>  
> -%build
>  make %{_smp_mflags}
>  
>  %install
> @@ -67,7 +67,9 @@
>  APIs and libraries, default configuration files, and an init script.
>  
>  %post
> -/sbin/chkconfig --add corosync || :
> +if [ $1 -eq 1 ]; then
> + /sbin/chkconfig --add corosync || :
> +fi
>  
>  %preun
>  if [ $1 -eq 0 ]; then
> @@ -90,7 +92,7 @@
>  %dir %{_sysconfdir}/corosync/service.d
>  %dir %{_sysconfdir}/corosync/uidgid.d
>  %config(noreplace) %{_sysconfdir}/corosync/corosync.conf.example
> -%{_initddir}/corosync
> +%{_initrddir}/corosync
>  %dir %{_libexecdir}/lcrso
>  %{_libexecdir}/lcrso/coroparse.lcrso
>  %{_libexecdir}/lcrso/objdb.lcrso
> 
> 
> 
> ___
> Openais mailing list
> Openais@lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/openais

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] corosync starts too early

2010-06-07 Thread Fabio M. Di Nitto
On 6/7/2010 4:24 PM, Vadym Chepkov wrote:
> On Mon, Jun 7, 2010 at 9:03 AM, Fabio M. Di Nitto  wrote:
>> On 6/7/2010 2:56 PM, Vadym Chepkov wrote:
>>
>>> It would be nice to prevent this from happening though:
>>>
>>> ls /etc/rc3.d/*corosync*
>>> /etc/rc3.d/S99corosync
>>>
>>> yum -y update
>>> Updating:
>>>  corosync   x86_641.2.2-1.1.el5 
>>> clusterlabs148 k
>>>  corosynclibx86_641.2.2-1.1.el5 
>>> clusterlabs170 k
>>>
>>> # ls /etc/rc3.d/*corosync*
>>> /etc/rc3.d/S20corosync
>>>
>>> Is it corosync's rpm fault? Why priorities getting reset?
>>
>> The init script gets replaced with the new one (with original values) on
>> updates and chkconfig is executed. That recalculates the whole position
>> of the init script with the new values.
>>
>> I think that愀 just the way rpm works in general as the init script is
>> not considered a user modifiable config file.
>>
>> Fabio
>>
> 
> I could be wrong, but wouldn't this cure it ?
> 
> %post
> if [ "$1" = 1 ]
> then
>   /sbin/chkconfig --add corosync
> fi

IIRC it would isolate the call to chkconfig, but the custom values you
set in the init script, would still be overwritten by the update.
So it would act as temporary workaround, but won´t solve your specific
problem completely.

Fabio
___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais

Re: [Openais] corosync starts too early

2010-06-07 Thread Fabio M. Di Nitto
On 6/7/2010 2:56 PM, Vadym Chepkov wrote:

> It would be nice to prevent this from happening though:
> 
> ls /etc/rc3.d/*corosync*
> /etc/rc3.d/S99corosync
> 
> yum -y update
> Updating:
>  corosync   x86_641.2.2-1.1.el5   
>   clusterlabs148 k
>  corosynclibx86_641.2.2-1.1.el5   
>   clusterlabs170 k
> 
> # ls /etc/rc3.d/*corosync*
> /etc/rc3.d/S20corosync
> 
> Is it corosync's rpm fault? Why priorities getting reset?

The init script gets replaced with the new one (with original values) on
updates and chkconfig is executed. That recalculates the whole position
of the init script with the new values.

I think that´s just the way rpm works in general as the init script is
not considered a user modifiable config file.

Fabio
___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] corosync starts too early

2010-06-06 Thread Fabio M. Di Nitto
On 6/7/2010 6:47 AM, Vadym Chepkov wrote:
> 
> On Jun 7, 2010, at 12:34 AM, Fabio M. Di Nitto wrote:
>>
>> There is also a major catch-22 in changing them.
>>
>> In certain environment it is necessary to start corosync right after the
>> network because everything else in the boot process could require for
>> example a cluster filesystem available (and corosync running as backend).
> 
> I realize that, that's why I brought it up.
> But it would be certainly strange to see somebody starting resources relying 
> on cluster's filesystem 
> outside of cluster configuration.

Not really no. It´s actually common to mount gfs2 via fstab and start
web servers outside a resource-manager controlled environment.

> And there is no sensible cluster configuration out there 
> that will start file system right away,
> some synchronization/DC detection has to occur and
> by that time rc script has already finished.

hmm sorry, I really don´t understand this statement. Cman for example
starts right after corosync (or starts corosync to be more precise) and
at the end of the cman init process, cluster is fully available by the
end of the init script.

Fabio
___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] corosync starts too early

2010-06-06 Thread Fabio M. Di Nitto
Hi,

On 6/7/2010 6:27 AM, Vadym Chepkov wrote:
> Hi,
> 
> I think corosync starts too early during system initialization.
> Current priorities in init.d script seems to be wrong:
> 
> corosync-1.2.2-1.1.el5:
> # chkconfig: - 20 20
> 
> I observe very strange behavior, if it starts as configured I get this error
> 
> # corosync-cfgtool -s
> Printing ring status.
> Local node ID 319425034
> Could not get the ring status, the error is: 6
> 
> But if I restart it, all is well:
> 
> # corosync-cfgtool -s
> Printing ring status.
> Local node ID 352979466
> RING ID 0
>   id  = 10.10.10.21
>   status  = ring 0 active with no faults
> RING ID 1
>   id  = 10.10.3.21
>   status  = ring 1 active with no faults
> 
> I think the start should be pushed way down, after ntp starts (58) , for sure
> heartbeat's priority is more sensible:
> # chkconfig: - 75 05
> 
> but I would push it even further, after sendmail (80), possibly
> I personally do sed -i -e 's/.*chkconfig:.*/# chkconfig: 345 99 00/' 
> /etc/rc.d/init.d/corosync in my kickstart :)

to keep this a short discussion, the current values were never tested on
RHEL-5.

There is also a major catch-22 in changing them.

In certain environment it is necessary to start corosync right after the
network because everything else in the boot process could require for
example a cluster filesystem available (and corosync running as backend).

It´s possible that we might have to ship a specific rhel5 init script
with different values as solution to this problem. Changing the current
generic init is not an option as it would affect too many distributions
vs a specific one.

Fabio
___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] [PATCH] V2 Fix debug function in logsys.c (Was: Re: Remove unused functions from logsys.c)

2010-06-05 Thread Fabio M. Di Nitto
On 6/5/2010 8:31 AM, Andreas Florath wrote:
> Hello!
> 
> Ok - looks that it's needed sometimes during debugging.

Indeed.

> 
> Attached a patch against svn version 2920 which (hopefully)
> does The-Right-Thing and has no buffer overflow.

It is enough to fix the buffer overflow in the old code. This debugging
piece of infrastructure is not performance critical. There is little to
no point to implement all those buffers/strlen optimizations.

> (IMHO also the debug-code should be bug-free ;-) ).

Oh indeed, I absolutely agree. Given the nature of the code itself
(being effectively executed only in a controlled environment and only
once during the debugging process) i didn´t really pay too much
attention to possible security issues. But it´s good somebody spotted
them, regardless of the above conditions.

Cheers
Fabio

> 
> Kind regards
> 
> Andreas Florath
> 
> Signed-off-by: Andreas Florath 
> ---
> 
> Index: exec/logsys.c
> ===
> --- exec/logsys.c (revision 2920)
> +++ exec/logsys.c (working copy)
> @@ -217,6 +217,130 @@
>  /* forward declarations */
>  static void logsys_close_logfile(int subsysid);
> 
> +#ifdef LOGSYS_DEBUG
> +struct st_ls2str
> +{
> + unsigned long  flag;
> + char const *   name;
> +}
> + ls2str[] =
> + {
> + { LOGSYS_MODE_OUTPUT_FILE, "FILE" },
> + { LOGSYS_MODE_OUTPUT_STDERR,   "STDERR" },
> + { LOGSYS_MODE_OUTPUT_SYSLOG,   "SYSLOG" },
> + { LOGSYS_MODE_FORK,"FORK" },
> + { LOGSYS_MODE_THREADED,"THREADED" }
> + };
> +
> +/* This is a copy of the strncpy(3) function but it returns the
> + * index to the end of the dest string.
> + * This eliminates the need for additional strlen() calls. */
> +static size_t mstrncpy(char *dest, const char *src, size_t n)
> +{
> + size_t i;
> +
> + for (i=0; i + dest[i] = src[i];
> + dest[i] = '\0';
> +
> + return i;
> +}
> +
> +/* Appends exactly one string to the buffer.
> + * It takes care about the maximal size of buf and the comma
> + * handling between the words.
> + * The function recognizes the fact, that a comma must be inserted, on
> + * the condition whether the used_buf is 0 (->no comma prepended) or
> + * 1 (->comma prepended). */
> +static void decode_mode_append(
> + unsigned int const mode, char * const buf, size_t const buflen,
> + struct st_ls2str const * const l2si, size_t * const used_buf)
> +{
> + if (mode & l2si->flag)
> + {
> + if (*used_buf) buf[(*used_buf)++] = ',';
> + *used_buf += mstrncpy(buf+*used_buf, l2si->name,
> +   buflen-*used_buf-1);
> + }
> +}
> +
> +/* Appends all modes with an index less than the given 'maxidx' to the
> + * buffer. */
> +static char * decode_mode_maxidx(
> + unsigned int mode, char *buf, size_t buflen, int maxidx)
> +{
> + int idx;
> + size_t used_buf = 0;
> + *buf = '\0';
> +
> + /* buflen-1: the \0 must fit in the buffer */
> + for (idx=0; idx + decode_mode_append(
> + mode, buf, buflen, &ls2str[idx], &used_buf);
> + return buf;
> +}
> +
> +static char * decode_mode(int subsysid, char *buf, size_t buflen)
> +{
> + return decode_mode_maxidx(
> + logsys_loggers[subsysid].mode, buf, buflen,
> + subsysid == LOGSYS_MAX_SUBSYS_COUNT ? 5 : 3);
> +}
> +
> +static const char *decode_debug(int subsysid)
> +{
> + if (logsys_loggers[subsysid].debug)
> + return "on";
> +
> + return "off";
> +}
> +
> +static const char *decode_status(int subsysid)
> +{
> + if (!logsys_loggers[subsysid].init_status)
> + return "INIT_DONE";
> +
> + return "NEEDS_INIT";
> +}
> +
> +static void dump_subsys_config(int subsysid)
> +{
> + char modebuf[1024];
> +
> + fprintf(stderr,
> + "ID: %d\n"
> + "subsys: %s\n"
> + "logfile: %s\n"
> + "logfile_fp: %p\n"
> + "mode: %s\n"
> + "debug: %s\n"
> + "syslog_fac: %s\n"
> + "syslog_pri: %s\n"
> + "logfile_pri: %s\n"
> + "init_status: %s\n",
> + subsysid,
> + logsys_loggers[subsysid].subsys,
> + logsys_loggers[subsysid].logfile,
> + logsys_loggers[subsysid].logfile_fp,
> + decode_mode(subsysid, modebuf, sizeof(modebuf)),
> + decode_debug(subsysid),
> + 
> logsys_facility_name_get(logsys_loggers[subsysid].syslog_facility),
> + 
> logsys_priority_name_get(logsys_loggers[subsysid].syslog_priority),
> + 
> logsys_priority_name_get(logsys_loggers[subsysid].logfile_priority),
> + decode_status(subsysid));
> +}
> +
> +static void dump_full_config(void)
> +{
> + int i;
> +
> + for (i = 0; i <= LOGSYS_MAX_SUBSYS_COUNT; i++) {
> + if 

Re: [Openais] [PATCH] Remove unused functions from logsys.c

2010-06-01 Thread Fabio M. Di Nitto
Steven,

we left this code in specifically to debug logsys and #ifdef out because
unrequired for normal runtime operations. the configure system does NOT
know how to enable that flag exactly because users shouldn´t be mangling
it at random.

the buffer overflow can´t be exploited since it´s not compiled in
anywhere in any package (or it could be fixed instead.. either way), but
the code is very useful to determine the status of the library
configuration (systems vs subsystems) when figuring out init issues and
why stuff mis-behave. We had to use it heavily when there was a gcc
issue compiling init macros in the right order (remember the
__attribute_ priority init problem?). Something otherwise you can´t spot.

Fabio

On 5/28/2010 10:36 PM, Steven Dake wrote:
> merged
> 
> thanks
> -steve
> 
> On 05/28/2010 12:56 PM, Andreas Florath wrote:
>> Hello!
>>
>> Just stumble over the function 'decode_mode()' which IMHO has at least
>> one problem with a buffer overflow.
>>
>> The static function 'decode_mode()' is used by the static function
>> 'dump_subsys_config()' which is is turn used by the static function
>> 'dump_full_config()' which is never used.
>>
>> Are these functions used by someone using some magic? I did not find
>> any reference and even the flag LOGSYS_DEBUG, which prevents them from
>> compiling, does not exist at some other point.
>>
>> If these functions are really not used, please remove them (because at
>> least one of them has a buffer overflow). Patch against 1.2.3
>> is attached.
>>
>> If there is a need for these functions, I'll send a patch to fix
>> the 'decode_mode()' function.
>>
>> Kind regards
>>
>> Andreas Florath
>>
>> Signed-off-by: Andreas Florath
>> ---
>> diff -ru corosync-1.2.3/exec/logsys.c corosync-1.2.3-patched/exec/logsys.c
>> --- corosync-1.2.3/exec/logsys.c 2010-05-19 15:59:17.0 +0200
>> +++ corosync-1.2.3-patched/exec/logsys.c 2010-05-28 21:13:02.0 
>> +0200
>> @@ -217,87 +217,6 @@
>>   /* forward declarations */
>>   static void logsys_close_logfile(int subsysid);
>>
>> -#ifdef LOGSYS_DEBUG
>> -static char *decode_mode(int subsysid, char *buf, size_t buflen)
>> -{
>> -memset(buf, 0, buflen);
>> -
>> -if (logsys_loggers[subsysid].mode&  LOGSYS_MODE_OUTPUT_FILE)
>> -snprintf(buf+strlen(buf), buflen, "FILE,");
>> -
>> -if (logsys_loggers[subsysid].mode&  LOGSYS_MODE_OUTPUT_STDERR)
>> -snprintf(buf+strlen(buf), buflen, "STDERR,");
>> -
>> -if (logsys_loggers[subsysid].mode&  LOGSYS_MODE_OUTPUT_SYSLOG)
>> -snprintf(buf+strlen(buf), buflen, "SYSLOG,");
>> -
>> -if (subsysid == LOGSYS_MAX_SUBSYS_COUNT) {
>> -if (logsys_loggers[subsysid].mode&  LOGSYS_MODE_FORK)
>> -snprintf(buf+strlen(buf), buflen, "FORK,");
>> -
>> -if (logsys_loggers[subsysid].mode&  LOGSYS_MODE_THREADED)
>> -snprintf(buf+strlen(buf), buflen, "THREADED,");
>> -}
>> -
>> -memset(buf+strlen(buf)-1,0,1);
>> -
>> -return buf;
>> -}
>> -
>> -static const char *decode_debug(int subsysid)
>> -{
>> -if (logsys_loggers[subsysid].debug)
>> -return "on";
>> -
>> -return "off";
>> -}
>> -
>> -static const char *decode_status(int subsysid)
>> -{
>> -if (!logsys_loggers[subsysid].init_status)
>> -return "INIT_DONE";
>> -
>> -return "NEEDS_INIT";
>> -}
>> -
>> -static void dump_subsys_config(int subsysid)
>> -{
>> -char modebuf[1024];
>> -
>> -fprintf(stderr,
>> -"ID: %d\n"
>> -"subsys: %s\n"
>> -"logfile: %s\n"
>> -"logfile_fp: %p\n"
>> -"mode: %s\n"
>> -"debug: %s\n"
>> -"syslog_fac: %s\n"
>> -"syslog_pri: %s\n"
>> -"logfile_pri: %s\n"
>> -"init_status: %s\n",
>> -subsysid,
>> -logsys_loggers[subsysid].subsys,
>> -logsys_loggers[subsysid].logfile,
>> -logsys_loggers[subsysid].logfile_fp,
>> -decode_mode(subsysid, modebuf, sizeof(modebuf)),
>> -decode_debug(subsysid),
>> -
>> logsys_facility_name_get(logsys_loggers[subsysid].syslog_facility),
>> -
>> logsys_priority_name_get(logsys_loggers[subsysid].syslog_priority),
>> -
>> logsys_priority_name_get(logsys_loggers[subsysid].logfile_priority),
>> -decode_status(subsysid));
>> -}
>> -
>> -static void dump_full_config(void)
>> -{
>> -int i;
>> -
>> -for (i = 0; i<= LOGSYS_MAX_SUBSYS_COUNT; i++) {
>> -if (strlen(logsys_loggers[i].subsys)>  0)
>> -dump_subsys_config(i);
>> -}
>> -}
>> -#endif
>> -
>>   static uint32_t circular_memory_map (void **buf, size_t bytes)
>>   {
>>  void *addr_orig;
>> ___
>> Openais mailing list
>> Openais@lists.linux-foundation.org
>> https://lists.linux-foundation.org/mailman/listinfo/openais
> 
> __

Re: [Openais] [PATCH] diags: add a mechanism to trigger the writing the flight data

2010-04-27 Thread Fabio M. Di Nitto
On 4/28/2010 6:57 AM, Angus Salkeld wrote:
> trigger the dumping of flight data using:
>  corosync-objctl -w runtime.logsys.dump_flight_data=yes
> 
> then read the flight data as usual:
>  corosync-fplay

Nice idea.. can we hook up also a signal handler (maybe USR1 that seems
unused) to achieve the same?

Cheers
Fabio
___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] Corosync Patch: Fix the default for COROSYNC_RUN_DIR

2010-04-12 Thread Fabio M. Di Nitto
On 4/12/2010 5:17 PM, Steven Dake wrote:
> On Mon, 2010-04-12 at 09:15 +0200, Andrew Beekhof wrote:
>> On Mon, Apr 12, 2010 at 12:46 AM, Steven Dake  wrote:
>>> On Sun, 2010-04-11 at 10:30 +0200, Andrew Beekhof wrote:
>>>> On Sun, Apr 11, 2010 at 1:59 AM, Steven Dake  wrote:
>>>>> On Sat, 2010-04-10 at 13:35 +0200, Andrew Beekhof wrote:
>>>>>> On Sat, Apr 10, 2010 at 6:18 AM, Fabio M. Di Nitto  
>>>>>> wrote:
>>>>>>> On 4/9/2010 8:17 PM, Steven Dake wrote:
>>>>>>>> On Fri, 2010-04-09 at 15:05 +0200, Andrew Beekhof wrote:
>>>>>>>>> This looks like a copy/paste error to me...
>>>>>>>>>
>>>>>>>>> The "RUN" in COROSYNC_RUN_DIR would seem to imply /var/run
>>>>>>>>> Also /var/lib is persistent and doesn't need to be created at startup.
>>>>>>>>> On the other-hand, LSB states that the contents of /var/run is blow
>>>>>>>>> away at boot time.
>>>>>>>>>
>>>>>>>>> So I'm reasonably sure the following patch is correct.
>>>>>>>>> Please ACK.
>>>>>>>>
>>>>>>>> In general "rundir" should probably be renamed to "libdir" since the
>>>>>>>> idea is that data stored there is persistent.
>>>>>>>>
>>>>>>>> Totem requires persistence between node boots of data stored with the
>>>>>>>> rundir path.
>>>>>>>
>>>>>>> /var/lib/corosync should be created at "make install" time and it愀
>>>>>>> guaranteed to be there by packaging and after each reboot.
>>>>>>>
>>>>>>> /var/run/corosync is more complicated. As Andrew already mentioned LSB,
>>>>>>> we need to make sure that it愀 created at startup time. Most daemons can
>>>>>>> do that in the init script and be done with it. Corosync doesn愒 have
>>>>>>> that luxury because it can be invoked in several different ways (cman
>>>>>>> for example), therefor it needs to do the dir creation/check within the
>>>>>>> code as the init script is not always used.
>>>>>>>
>>>>>>> This is the problem we need to address basically.
>>>>>>
>>>>>> And what the patch does :-)
>>>>>>
>>>>>> There is no need, at runtime, to create /var/lib/corosync.
>>>>>> Particularly if its required to be persistent.
>>>>>> /var/run/corosync is a different story as Fabbio reiterated above.
>>>>>>
>>>>>> So given all that, the original patch makes the most sense.
>>>>>
>>>>> Oh missed the patch sorry.
>>>>>
>>>>> I did review it just now.  Hate to be a stickler to details, but the
>>>>> rundir environment + variable names should be something like lib instead
>>>>> (what is this called?).
>>>>
>>>> Oh I see what you mean.
>>>> rundir is used elsewhere in totemsrp.c
>>>>
>>>
>>> The issue is COROSYNC_RUN_DIR is used in ipc
>>
>> Is it though?
>> I trawled the code last night and all I could find was:
>>   /var/run/some_ipc_file
>> not
>>   /var/run/corosync/some_ipc_file
>>
>> So now I'm confused, do we actually need a /var/run/corosync directory
>> to ever be created?
>>
> 
> No we shouldn't need /var/run/corosync at all.
> 

> ipc uses LOCALSTATEDIR /run for shared memory files
> ipc uses SOCKETDIR (/var/run)  for socket files on systems which don't
> support abstract sockets

Maybe those are best in /var/run/corosync tho. /var/run tends to have
enough clutter on its own.

Fabio
___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais

Re: [Openais] Corosync Patch: Fix the default for COROSYNC_RUN_DIR

2010-04-09 Thread Fabio M. Di Nitto
On 4/9/2010 8:17 PM, Steven Dake wrote:
> On Fri, 2010-04-09 at 15:05 +0200, Andrew Beekhof wrote:
>> This looks like a copy/paste error to me...
>>
>> The "RUN" in COROSYNC_RUN_DIR would seem to imply /var/run
>> Also /var/lib is persistent and doesn't need to be created at startup.
>> On the other-hand, LSB states that the contents of /var/run is blow
>> away at boot time.
>>
>> So I'm reasonably sure the following patch is correct.
>> Please ACK.
> 
> In general "rundir" should probably be renamed to "libdir" since the
> idea is that data stored there is persistent.
> 
> Totem requires persistence between node boots of data stored with the
> rundir path.

/var/lib/corosync should be created at "make install" time and it´s
guaranteed to be there by packaging and after each reboot.

/var/run/corosync is more complicated. As Andrew already mentioned LSB,
we need to make sure that it´s created at startup time. Most daemons can
do that in the init script and be done with it. Corosync doesn´t have
that luxury because it can be invoked in several different ways (cman
for example), therefor it needs to do the dir creation/check within the
code as the init script is not always used.

This is the problem we need to address basically.

Fabio
___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] [corosync] - hotfix for confdb linking

2010-04-08 Thread Fabio M. Di Nitto
On 4/8/2010 9:29 AM, Jan Friesse wrote:
> Fabio,
> inconsistency it is taken from original version. I agree that some
> consistency there will be nice. Sadly I'm not sure, what are the correct
> parameters?
> 
> Will $(OS_DYFLAGS) $(OS_LDL) works on Darwin and Solaris?

I don´t have access to either at the moment, so I can´t confirm, but
when they were introduced they used to work.

> 
> Regards,
>   Honza
> 
> Fabio M. Di Nitto wrote:
>> On 4/7/2010 5:57 PM, Jan Friesse wrote:
>>> +   -Wl,-whole-archive $^ -Wl,-no-whole-archive $(LDFLAGS) 
>>> $(OS_DYFLAGS) $(OS_LDL) $(AM_LDFLAGS)
>>
>> some of the snippets you are readding to the Makefile includes
>> explicitly -ldl while the linux build OS_ bits.
>>
>> Can you please be consistent with it?
>>
>> Cheers
>> Fabio
>>
>> PS otherwise I think the patch looks fine.
>>
>> ___
>> Openais mailing list
>> Openais@lists.linux-foundation.org
>> https://lists.linux-foundation.org/mailman/listinfo/openais


___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] [corosync] - hotfix for confdb linking

2010-04-07 Thread Fabio M. Di Nitto
On 4/7/2010 5:57 PM, Jan Friesse wrote:
> + -Wl,-whole-archive $^ -Wl,-no-whole-archive $(LDFLAGS) 
> $(OS_DYFLAGS) $(OS_LDL) $(AM_LDFLAGS)

some of the snippets you are readding to the Makefile includes
explicitly -ldl while the linux build OS_ bits.

Can you please be consistent with it?

Cheers
Fabio

PS otherwise I think the patch looks fine.

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] openais trunk patch: putting version number in openais service engines

2010-04-05 Thread Fabio M. Di Nitto
On 4/1/2010 2:32 AM, Steven Dake wrote:
> Ryan,
> 
> As we talked about on irc a few days ago, there is no current way to
> detect the version of the service engine installed on the system for
> openais services.  This attached patch puts the version field in the
> service engine description that is loaded.  Would be interesting to take
> the current svn revision and put it in this field, but I don't quite
> know how to do that.

No time atm to hack it myself, but what Honzaf did for corosync (see
autogen.sh to start) could be easily ported to openais and add that info
to the services.

Fabio
___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] [PATCH corosync_trunk] Add augeas lense for corosync.conf

2010-02-17 Thread Fabio M. Di Nitto
On 2/17/2010 9:35 PM, Steven Dake wrote:
> On Wed, 2010-02-17 at 22:21 +1100, Angus Salkeld wrote:
>> On Wed, 2010-02-17 at 07:47 +0100, Fabio M. Di Nitto wrote:
>>> I only have a few comments, some of them we discussed on IRC.
>>>
>>> When adding new files, you need to update the corosync.spec.in too.
>>> In general my policy has always been that (s)rpm from development should
>>> enable all features and ships all files.
>>>
>>> Since those files are useful only when augtool are available, I suggest
>>> you create a separate binary rpm to ship them (call it corosync-augeas
>>> for example) that will Requires: augeas as dependency.
>>>
>>> This way, it´s clear what those files do, they don´t interfere with
>>> normal packaging, and they become optionally installable even in testing.
>>>
>>> Fabio
>>>
>>
>> Here is a patch that should do the trick.
>>
> 
> According to augeas upstream, separate rpms are not recommended unless
> there is a dependency on augtool or library.

The dependency is there.

> The issue with this spec file patch is that it creates unowned dirs
> (+%{_datadir}/augeas/lenses).  Some distros (fedora) own these dirs in
> the "filesystem" package.  It is not clear to me that other distros will
> do that, and instead choose to own them in the augeas packages (in which
> case a dependency on augeas makes since, so that it can own the proper
> dirs).

A problem that´s very easily solvable. Simply own the dirs you create.
It´s allowed to have multiple owners for one dir.

> [r...@fedora12-node1 cluster]# rpm -q -f /usr/share/augeas/lenses 
> filesystem-2.4.30-2.fc12.i686
> augeas-libs-0.7.0-1.fc12.i686

filesystem doesn´t own it alone (augeas pulls in -libs as suggested by
upstream)

> It is not possible to predict what adoptors of augeas will do with
> regard to the lens dir owning issue.

Remember that this spec file is for developers only and not for general
distributions. Each distribution will always add little changes here and
there to fit the distro specific policy.

If the rpm can´t fit all distro, then distro people can happily send us
patches to fit their specific policies. So far we work and develop on
Fedora.. we fit those policies.

IMHO you are creating a problem that really isn´t necessary.

Also note.. if you want to go into distro specific policies, then you
probably want to review the whole spec file idea. Probably even the
./configure call is not right on all distros (some enforces usage of
/etc/init.d and others /etc/rc.d/init.d even if the two are hardlinked
together, only one is the correct for packaging policies).

Fabio
___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais

Re: [Openais] [PATCH corosync_trunk] Add augeas lense for corosync.conf

2010-02-17 Thread Fabio M. Di Nitto
On 2/17/2010 12:21 PM, Angus Salkeld wrote:
> On Wed, 2010-02-17 at 07:47 +0100, Fabio M. Di Nitto wrote:
>> I only have a few comments, some of them we discussed on IRC.
>>
>> When adding new files, you need to update the corosync.spec.in too.
>> In general my policy has always been that (s)rpm from development should
>> enable all features and ships all files.
>>
>> Since those files are useful only when augtool are available, I suggest
>> you create a separate binary rpm to ship them (call it corosync-augeas
>> for example) that will Requires: augeas as dependency.
>>
>> This way, it´s clear what those files do, they don´t interfere with
>> normal packaging, and they become optionally installable even in testing.
>>
>> Fabio
>>
> 
> Here is a patch that should do the trick.

The Patch is fine, but please split it in 2 commits. Makefile.am -> fix
make dist-check (also try to keep lines < 80 cols ;)) and release
tarball and spec file for augeas.

Thanks
Fabio


> 
> -Angus
> 
> Index: corosync.spec.in
> ===
> --- corosync.spec.in  (revision 2658)
> +++ corosync.spec.in  (working copy)
> @@ -43,6 +43,7 @@
>  %{configure} \
>   --enable-nss \
>   --enable-rdma \
> + --enable-augeas \
>   --with-initddir=%{_initddir}
>  
>  %build
> @@ -210,6 +211,22 @@
>  %{_mandir}/man8/coroipc_overview.8*
>  %{_mandir}/man8/sam_overview.8*
>  
> +%package -n corosync-augeas
> +BuildArch: noarch
> +Summary: The Augeas len for the Corosync Cluster Engine configuration file
> +Group: System Environment/Libraries
> +Requires: %{name} = %{version}-%{release}
> +Requires: augeas
> +
> +%description -n corosync-augeas
> +This package contains the augeas lens for corosync.conf.
> +
> +%files -n corosync-augeas
> +%defattr(-,root,root,-)
> +%doc LICENSE
> +%{_datadir}/augeas/lenses/corosync.aug
> +%{_datadir}/augeas/lenses/tests/test_corosync.aug
> +
>  %changelog
>  * @date@ Autotools generated version  - 
> @vers...@-1.@alphatag@
>  - Autotools generated version
> Index: Makefile.am
> ===
> --- Makefile.am   (revision 2658)
> +++ Makefile.am   (working copy)
> @@ -33,7 +33,8 @@
>  
>  TARFILE  = $(PACKAGE_NAME)-$(VERSION).tar.gz
>  
> -EXTRA_DIST   = autogen.sh conf/corosync.conf.example $(SPEC).in
> +EXTRA_DIST   = autogen.sh conf/corosync.conf.example $(SPEC).in \
> +   conf/lenses/corosync.aug 
> conf/lenses/tests/test_corosync.aug
>  
>  AUTOMAKE_OPTIONS = foreign
>  
> 
> 

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais

Re: [Openais] [PATCH corosync_trunk] Add augeas lense for corosync.conf

2010-02-16 Thread Fabio M. Di Nitto
I only have a few comments, some of them we discussed on IRC.

When adding new files, you need to update the corosync.spec.in too.
In general my policy has always been that (s)rpm from development should
enable all features and ships all files.

Since those files are useful only when augtool are available, I suggest
you create a separate binary rpm to ship them (call it corosync-augeas
for example) that will Requires: augeas as dependency.

This way, it´s clear what those files do, they don´t interfere with
normal packaging, and they become optionally installable even in testing.

Fabio

On 2/16/2010 10:54 PM, Angus Salkeld wrote:
> On Tue, 2010-02-16 at 07:09 -0700, Steven Dake wrote:
>> The patch looks great but I prefer a feature addition as well:
>>
>> configure --enable-augeas option which has the effect of enabling the
>> installation of the augeas files.
>>
> 
> Hi
> 
> Attached is a patch with the --enable-augeas config option.
> 
> -Angus
> 
> 
>> On systems where there is no augeas, I'd wonder what would happen with
>> this patch.
>>
>> Regards
>> -steve
>>
>> On Tue, 2010-02-16 at 13:09 +1100, Angus Salkeld wrote:
>>> Hi
>>>
>>> This adds an augeas lens to corosync (so you can configure corosync.conf
>>> via augtool and python-augeas).
>>>
>>> Here is my post to the augeas ML.
>>> https://www.redhat.com/archives/augeas-devel/2010-February/msg00041.html
>>>
>>> I have included rules to install the lens as the augeas maintainer suggests
>>> (Steve I am not sure if that is what you want).
>>> Note on my system I have the following
>>> ls /usr/share/augeas/lenses/
>>> corosync.aug  dist  libvirtd.aug  libvirtd_qemu.aug  tests
>>>
>>>
>>> The test suite I will post soon needs this to configure corosync.
>>>
>>> -Angus
>>>
>>>
>>> ___
>>> Openais mailing list
>>> Openais@lists.linux-foundation.org
>>> https://lists.linux-foundation.org/mailman/listinfo/openais

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais

Re: [Openais] [PATCH corosync_trunk] add a note about rotating logfile created with to_logfile

2010-02-16 Thread Fabio M. Di Nitto
ACK.. good to go

On 2/17/2010 1:11 AM, Angus Salkeld wrote:
> This just adds a note in corosync.conf about the best way
> of configuring logrotate with log file generated with to_logfile.
> 
> -Angus
> 
> diff --git a/man/corosync.conf.5 b/man/corosync.conf.5
> index de6682f..231911c 100644
> --- a/man/corosync.conf.5
> +++ b/man/corosync.conf.5
> @@ -488,6 +488,9 @@ and
>  
>  The default is syslog and stderr.
>  
> +Please note, if you are using to_logfile and want to rotate the file, use 
> logrotate
> +with the option "copytruncate".
> +
>  .TP
>  logfile
>  If the
> 
> 
> ___
> Openais mailing list
> Openais@lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/openais

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] [Corosync] Multiple corosync processes are started

2010-02-08 Thread Fabio M. Di Nitto
On 2/8/2010 10:49 PM, Steven Dake wrote:
> On Mon, 2010-02-08 at 11:13 -0700, hj lee wrote:
>> I noticed that this happens when corosync starts before syslog in init
>> start order. I understand that corosync requires syslog, but at least
>> it should start OK and should be operational OK even without syslog.
>>
> 
> thanks
> 
> Would you file a defect indicating that corosync without a running
> syslog doesn't operate properly?
> 
> Regards
> -steve

This is an interesting problem tho. All syslog calls are "void". There
is no way to know if syslogd is running or not (unless we do some evil
hacks).

Are we entirely sure this isn´t a version specific bug of glibc/syslog?

Fabio
___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] corosync-objctl **binary**

2010-01-13 Thread Fabio M. Di Nitto
On 1/13/2010 6:08 PM, David Teigland wrote:
> On Wed, Jan 13, 2010 at 02:49:53PM +1100, Angus Salkeld wrote:
>> On Wed, Jan 13, 2010 at 6:06 AM, David Teigland  wrote:
>>> corosync-objctl used to print a lot of useful information which now
>>> appears only as **binary**. ?Is there a way to get that back?
>>> Perhaps two output modes, one where it prints binary values in hex and
>>> another where it makes a best effort to interpret and print the values
>>> in a useful form?
>>>
>>> Dave
>>
>> Hi David
>>
>> The keys are now typed, the default as used by the old API
>> defaults to ANY (or void*). So if we have uses of the old API
>> then these objects are printed out as **BINARY**. If they are in actual
>> fact strings then we need to update the call to key_create()
>> to use the new API, which alows us to pass in the type (in this case
>> STRING).
> 
> I wonder if there's anything preventing us from using the new API in the
> cluster.git code?

At this point in time nothing blocks us to change to the new API, but
when it landed in corosync we didn´t have the time to suck the changes
in before release.

Fabio

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] libccs problem -> Re: Sources for dlm and cman

2010-01-12 Thread Fabio M. Di Nitto
On 1/13/2010 1:28 AM, Hunny Bunny wrote:
> Hello Fabio,
> 
> If I understood you correctly, in my case of non RH cluster I should
> disregard ocfs2-tools complains regarding absence of libdlm/libcman and
> just compile and install it?

IIRC libdlm/cman are only used to build ocfs2_controld.cman that you do
not need.

Fabio
___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] libccs problem -> Re: Sources for dlm and cman

2010-01-12 Thread Fabio M. Di Nitto
On 1/12/2010 11:57 PM, Hunny Bunny wrote:
> Hello Fabio,
> 
> I admit, I might don't understand something obvious to you and other
> experienced cluster
> folkz. It is my first attempt to build an HA cluster, so please correct
> me where I'm wrong.
> 
> I've already built almost complete Corosync/OpenAIS, Pacemaker, DRBD,
> OCFS2 cluster environment in the following progression:
> 
> 1.. e2fsprogs
> 2. Corosync
> 3. OpenAIS
> 4. Cluster Glue
> 5. Reusable Cluster Components
> 6. Pacemaker
> 7. DRBD
> 
> Up to this point everything was compiling and installing pretty smoothly.
> 
> I hit the brick when I tried to compile last needed package ocfs2-tools.
> It required libdlm and optionally libcman.

ocfs2-tools needs libdlm/libcman only if you want to build the
integration layer between OCFS2 and RH Cluster.

Fabio
___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] libccs problem -> Re: Sources for dlm and cman

2010-01-12 Thread Fabio M. Di Nitto
On 1/12/2010 10:10 PM, Hunny Bunny wrote:
> Hello Fabio,
> Here is the information regarding this libccs problem:

> --without_gfs \
> --without_gfs2 \
> --without_group \
> --without_fence \
> --without_fence_agents \
> --without_rgmanager \
> --without_resource_agents \
> --without_bindings \
> --without_config \

 this is the problem

> --without_kernel_modules

you really need to know what you are building before disabling
everything. It´s a lot simpler to build the whole tree and then just
grab the bits you want.

Fabio
___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] libccs problem -> Re: Sources for dlm and cman

2010-01-12 Thread Fabio M. Di Nitto
On 1/12/2010 8:59 PM, Hunny Bunny wrote:
> Hello Fabio,
> I tried diligently to build newest cluster-3.0.7 tree as you suggested.
> However, when it compiles ld gives me:
> 
> cannot find -lccs error
> collect2: ld returned 1 exit status
> 
> My assumption was that libccs is included with Red Hat cluster.
> 
> Do you have any idea how to remedy this problem?

That tells me nothing.. can you please post the full error of the build
failure?

Fabio
___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] Sources for dlm and cman

2010-01-12 Thread Fabio M. Di Nitto
On 1/12/2010 9:12 AM, Andrew Beekhof wrote:
> On Tue, Jan 12, 2010 at 8:50 AM, Fabio M. Di Nitto  
> wrote:
>> On 1/12/2010 6:07 AM, Hunny Bunny wrote:
>>> Thanks for reply Fabio,
>>> I went through this site also. However, I cannot figure out how to
>>> compile only dlm  (libdlm) and cman (libcman) libraries without whole
>>> Red Hat cluster environment.
>>> I'm going to use OCFS2 with DRBD, Corosync/OpenAIS and Pacemaker, not
>>> the whole GFS2 cluster in Red Hat's implementation.
>>> So, I'm looking for clean stand alone dlm (libdlm) and cman (libcman)
>>> source packages not bound with any third party applications.
>>>
>>> Please let me know where to get such sources.
>>
>> If you are going to use pacemaker, then you don´t need libcman.
>>
>> For dlm, you will still need dlm_controld.pcmk from RH Cluster. It´s a
>> lot simpler to just build the full tree and extract the bits you need.
>>
>> There are no separate source trees for just the libraries as it doesn´t
>> make any sense without the other bits.
> 
> Did you forget about http://git.fedorahosted.org/git/dlm.git ?

It still requires you to build the whole thing and dlm.git is not as
updated as stable3 full releases (and it´s not tested at all at this
point in time).

Fabio
___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] Sources for dlm and cman

2010-01-11 Thread Fabio M. Di Nitto
On 1/12/2010 6:07 AM, Hunny Bunny wrote:
> Thanks for reply Fabio,
> I went through this site also. However, I cannot figure out how to
> compile only dlm  (libdlm) and cman (libcman) libraries without whole
> Red Hat cluster environment.
> I'm going to use OCFS2 with DRBD, Corosync/OpenAIS and Pacemaker, not
> the whole GFS2 cluster in Red Hat's implementation.
> So, I'm looking for clean stand alone dlm (libdlm) and cman (libcman)
> source packages not bound with any third party applications.
> 
> Please let me know where to get such sources.

If you are going to use pacemaker, then you don´t need libcman.

For dlm, you will still need dlm_controld.pcmk from RH Cluster. It´s a
lot simpler to just build the full tree and extract the bits you need.

There are no separate source trees for just the libraries as it doesn´t
make any sense without the other bits.

Fabio
___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] Sources for dlm and cman

2010-01-11 Thread Fabio M. Di Nitto
On 1/12/2010 12:32 AM, Hunny Bunny wrote:
> Hello folkz,
> Could somebody please point me out where I can get tar balls of dlm
> (libdlm) and cman (libcman) sources.
> My search in google yielded nothing definitive.
> Many thanks in advance,

http://sources.redhat.com/cluster/wiki/

Fabio

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] Corosync: Coroipc freeze and segfaults on corosync exit

2010-01-09 Thread Fabio M. Di Nitto
On 1/9/2010 11:15 AM, Andrew Beekhof wrote:
> On Fri, Jan 8, 2010 at 12:33 PM, Fabio M. Di Nitto  
> wrote:
>> ACK, the patch fixes the problems spotted with IPC shutdown.
>>
>> There is only one segfault left to address and it is triggered only when
>> corosync receives a kill -TERM.
> 
> Any time kill -TERM is sent? Or are there other factors too?

See RH BZ #547511

I don´t remember seeing it all the time, only when there is an active
IPC client consumer (by active i don´t mean simply connected, but also
performing heavy operations).

Fabio
___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] Corosync: Coroipc freeze and segfaults on corosync exit

2010-01-08 Thread Fabio M. Di Nitto
ACK, the patch fixes the problems spotted with IPC shutdown.

There is only one segfault left to address and it is triggered only when
corosync receives a kill -TERM.

Fabio

On 1/8/2010 12:22 PM, Jan Friesse wrote:
> Related to https://bugzilla.redhat.com/show_bug.cgi?id=547511
> 
> This patch solves problem in little different (I hope better) way.
> 
> It fixes problem with sem_destroy + sem_wait and also solves hard freeze
> because malloc(*) + other functions are called in sighandler. This is
> reason, why special thread is created and only purpose in life of these
> thread is to wait for semaphore and begin shutdown sequence.
> 
> According to Fabbio, there are still some segfaults left on Fedora 12.
> 
> Regards,
>Honza
> 
> (*) according to glibc documentation, malloc and free can be called in
> signal handler, but in such case, I really don't understand this:
> (gdb) bt
> #0  0x00de1424 in __kernel_vsyscall ()
> #1  0x002c7e43 in __lll_lock_wait_private () from /lib/libc.so.6
> #2  0x00250b94 in _L_lock_9571 () from /lib/libc.so.6
> #3  0x0024ebf4 in malloc () from /lib/libc.so.6
> #4  0x08054d26 in hdb_handle_create (handle_database=0x805d748,
> instance_size=12, handle_id_out=0x805fa48) at ../include/corosync/hdb.h:178
> #5  0x08055422 in schedwrk_create (handle=0x805fa48,
> schedwrk_fn=0x8050bee , context=0x805d5a0)
> at schedwrk.c:104
> #6  0x08050d33 in corosync_service_unlink_all (api=0x805d5a0,
> unlink_all_complete=0x804b3fb ) at service.c:583
> #7  0x0804b491 in corosync_shutdown_request () at main.c:171
> #8  0x0804b508 in sigintr_handler (num=2) at main.c:195
> #9  
> #10 0x0024ca2b in _int_malloc () from /lib/libc.so.6
> #11 0x0024ebfe in malloc () from /lib/libc.so.6
> #12 0x0023a7df in __fopen_internal () from /lib/libc.so.6
> #13 0x0023a8ac in fopen@@GLIBC_2.1 () from /lib/libc.so.6
> #14 0x0053352a in pid_to_name (pid=18300, out_name=0xbfda2016 ,
> name_len=32) at coroipcs.c:1515
> #15 0x00533654 in coroipcs_init_conn_stats (conn=0x82d1bb8) at
> coroipcs.c:1557
> #16 0x00533a29 in coroipcs_handler_dispatch (fd=10, revent=1,
> context=0x82d1bb8) at coroipcs.c:1670
> #17 0x0804d498 in corosync_poll_handler_dispatch
> (handle=1197105576937521152, fd=10, revent=1, context=0x82d1bb8 at
> main.c:911
> #18 0x0057b01b in poll_run (handle=1197105576937521152) at coropoll.c:394
> #19 0x0804ec8e in main (argc=1, argv=0xbfda3434) at main.c:1498
> 
> 
> 
> ___
> Openais mailing list
> Openais@lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/openais

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] [RPM] Corosync 1.2.0 spec file

2010-01-07 Thread Fabio M. Di Nitto
On 1/8/2010 5:12 AM, Thomas Guthmann wrote:
> Heya,
> 
> I'm sure the clusterlabs.org team will update soon their repository but
> in the mean time if you need to package corosync 1.2.0, the following
> spec file might help you (design for EPEL/Centos5/RHEL5).
> 
> It's based on Fabio M. Di Nitto work (current 1.1.2 RPM on clusterlabs).
> See patch for minor changes.
> 
> Thanks guys for your work.

Are you aware that there is a spec file that´s auto generated in svn?
Also Fedora has the same spec files and should just work out of the box :)

Fabio
___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] Corosync: coroipc (rhbz#547511)

2009-12-17 Thread Fabio M. Di Nitto
Steven Dake wrote:
> Honza,
> 
> Looks pretty good; few comments:

The patch is definitely a step forward, but as noted in the BZ, doesn´t
unfortunately fix the problem.

> Good work on the patch though.  See you after break.
> 

Indeed. Have a great vacation.

Fabio
___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] vtun tunnelling and totem binding

2009-12-14 Thread Fabio M. Di Nitto
Robert Borkowski wrote:
> What would you suggest when running on Amazon EC2? No multicast, no GRE...
> 
> There's no guarantee that the cluster members will be anywhere near each
> other
> network-wise.
> 
> --
> Robert Borkowski
> 

I personally never used cloud services so I am not sure I am the right
person to ask.

Fabio
___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] vtun tunnelling and totem binding

2009-12-14 Thread Fabio M. Di Nitto
openais does support broadcast too, but not point to point.

All I am saying, is that while using tunnel devices is a valid use
cases, it might not operate properly as expected and it has never been
tested before.

With the vtun case, I am very familiar with that piece of software and I
know that has probably more glitches than other tunnelling
implementations :)

Fabio

Robert Borkowski wrote:
> Unless openais has some way to run without multicast, that's my only
> alternative.
> 
> Well the other-other alternative is to run the app without clustering
> and devise some 
> sort of duct tape and hot glue HA system :-)
> 
> --
> Robert Borkowski
> 
> On Mon, Dec 14, 2009 at 2:53 AM, Fabio M. Di Nitto  <mailto:fdini...@redhat.com>> wrote:
> 
> Binding over tun devices might be useful, but be aware of several
> different gotchas:
> 
> - MTU is not ethernet size (and it愀 not constant. vtun uses 50 bytes
> for its own header - irrelevant to corosync - others might use different
> size. this could affect certain opeartions)
> - tun implementation. vtun, for example, adds latency that could be
> relevant for cluster operations (the amount depends on the plugins
> loaded - crypto, compression and so on).
> - queues handling. vtun for example, in certain conditions, will block
> the application when writing to the network socket. I don愒 believe this
> is desirable vs dropping packets (expected behaviour?).
> 
> so is it really worth the troubles to be able to bind to tunnels?
> 
> Just 2c...
> 
> Fabio
> 
> Steven Dake wrote:
> > The binding code may not support binding to tuns without modification.
> >
> > I'll have a look this week.
> >
> > Regards
> > -steve
> >
> > On Sun, 2009-12-13 at 12:08 -0500, Robert Borkowski wrote:
> >> Hello,
> >>
> >>
> >> Is there any way to get openais/corosync working on Amazon EC2?
> >> Multicast is not permitted there...
> >> What I'd like to set up is a two node cluster.
> >>
> >>
> >> My current attempt to get this working is to set up vtun tunnels
> >> between the two nodes. vtun is supposed to be able to tunnel
> >> multicast.
> >> The two nodes have 192.168.1.1 and 192.168.1.2 on their tun0
> >> interfaces respectively, and I'm able to pass traffic through the
> >> tunnel.
> >>
> >>
> >> This is failing right now because totem won't bind to the tun0
> >> address.
> >> On the first node I tried setting bindnetaddr to 192.168.1.0 and
> >> 192.168.1.1. In both cases debugging indicates 'network interface is
> >> down' and totem binding to 127.0.0.1.
> >> Strangely enough when I configure it to bind on 192.168.1.2 it does
> >> bind, but obviously that's wrong and doesn't work.
> >>
> >>
> >> The OS is Ubuntu hardy heron. I tried the openais out of the heron
> >> repo (0.82-3ubuntu2), and built corosync from the karmic source repo
> >> (1.0.0-5ubuntu1).
> >> Both behave the same way.
> >>
> >>
> >> Any pointers?
> >>
> >>
> >>
> >>
> >> # ifconfig tun0
> >> tun0  Link encap:UNSPEC  HWaddr
> >> 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
> >>   inet addr:192.168.1.1  P-t-P:192.168.1.2
> >>  Mask:255.255.255.255
> >>   UP POINTOPOINT RUNNING NOARP MULTICAST  MTU:1450  Metric:1
> >>   RX packets:11 errors:0 dropped:0 overruns:0 frame:0
> >>   TX packets:11 errors:0 dropped:0 overruns:0 carrier:0
> >>   collisions:0 txqueuelen:500
> >>   RX bytes:924 (924.0 B)  TX bytes:924 (924.0 B)
> >>
> >>
> >> # egrep -v '#|^$' /etc/corosync/corosync.conf
> >> totem {
> >> version: 2
> >> token: 3000
> >> token_retransmits_before_loss_const: 10
> >> join: 60
> >> consensus: 1500
> >> vsftype: none
> >> max_messages: 20
> >> clear_node_high_bit: yes
> >> secauth: off
> >> threads: 0
> >> rrp_mode: none
> >> interface {
> >> ri

  1   2   3   4   5   6   >