Re: [Pacemaker] [ha-wg-technical] [ha-wg] [Linux-HA] [RFC] Organizing HA Summit 2015
> On 26 Nov 2014, at 4:51 pm, Fabio M. Di Nitto wrote: > > > > On 11/25/2014 10:54 AM, Lars Marowsky-Bree wrote: >> On 2014-11-24T16:16:05, "Fabio M. Di Nitto" wrote: >> Yeah, well, devconf.cz is not such an interesting event for those who do not wear the fedora ;-) >>> That would be the perfect opportunity for you to convert users to Suse ;) >> > I´d prefer, at least for this round, to keep dates/location and explore > the option to allow people to join remotely. Afterall there are tons of > tools between google hangouts and others that would allow that. That is, in my experience, the absolute worst. It creates second class participants and is a PITA for everyone. >>> I agree, it is still a way for people to join in tho. >> >> I personally disagree. In my experience, one either does a face-to-face >> meeting, or a virtual one that puts everyone on the same footing. >> Mixing both works really badly unless the team already knows each >> other. >> I know that an in-person meeting is useful, but we have a large team in Beijing, the US, Tasmania (OK, one crazy guy), various countries in Europe etc. >>> Yes same here. No difference.. we have one crazy guy in Australia.. >> >> Yeah, but you're already bringing him for your personal conference. >> That's a bit different. ;-) >> >> OK, let's switch tracks a bit. What *topics* do we actually have? Can we >> fill two days? Where would we want to collect them? > > I´d say either a google doc or any random etherpad/wiki instance will do > just fine. -ENOGOOGLE > > As for the topics: > - corosync qdevice and plugins (network, disk, integration with sdb?, > others?) > - corosync RRP / libknet integration/replacement > - fence autodetection/autoconfiguration > > For the user facing topics (that is if there are enough participants and > I only got 1 user confirmation so far): > > - demos, cluster 101, tutorials > - get feedback > - get feedback > - get more feedback > > Fabio > ___ > ha-wg-technical mailing list > ha-wg-techni...@lists.linux-foundation.org > https://lists.linuxfoundation.org/mailman/listinfo/ha-wg-technical ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] [ha-wg-technical] [ha-wg] [Linux-HA] [RFC] Organizing HA Summit 2015
On 26/11/14 12:51 AM, Fabio M. Di Nitto wrote: On 11/25/2014 10:54 AM, Lars Marowsky-Bree wrote: On 2014-11-24T16:16:05, "Fabio M. Di Nitto" wrote: Yeah, well, devconf.cz is not such an interesting event for those who do not wear the fedora ;-) That would be the perfect opportunity for you to convert users to Suse ;) I´d prefer, at least for this round, to keep dates/location and explore the option to allow people to join remotely. Afterall there are tons of tools between google hangouts and others that would allow that. That is, in my experience, the absolute worst. It creates second class participants and is a PITA for everyone. I agree, it is still a way for people to join in tho. I personally disagree. In my experience, one either does a face-to-face meeting, or a virtual one that puts everyone on the same footing. Mixing both works really badly unless the team already knows each other. I know that an in-person meeting is useful, but we have a large team in Beijing, the US, Tasmania (OK, one crazy guy), various countries in Europe etc. Yes same here. No difference.. we have one crazy guy in Australia.. Yeah, but you're already bringing him for your personal conference. That's a bit different. ;-) OK, let's switch tracks a bit. What *topics* do we actually have? Can we fill two days? Where would we want to collect them? I´d say either a google doc or any random etherpad/wiki instance will do just fine. As for the topics: - corosync qdevice and plugins (network, disk, integration with sdb?, others?) - corosync RRP / libknet integration/replacement - fence autodetection/autoconfiguration For the user facing topics (that is if there are enough participants and I only got 1 user confirmation so far): - demos, cluster 101, tutorials - get feedback - get feedback - get more feedback Fabio Ok, I do have a topic I want to add; Merging the dozen different mailing lists, IRC channels and other support forums. This thread is a good example of the thinness that the community is spread over. A 'dev', 'user', 'announce' list should be enough for all HA. Likewise, one IRC channel should be enough, too. The trick will be discussing this without bikeshedding. :) digimer -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] [Cluster-devel] [ha-wg] [Linux-HA] [RFC] Organizing HA Summit 2015
On 26/11/14 12:51 AM, Fabio M. Di Nitto wrote: On 11/25/2014 10:54 AM, Lars Marowsky-Bree wrote: On 2014-11-24T16:16:05, "Fabio M. Di Nitto" wrote: Yeah, well, devconf.cz is not such an interesting event for those who do not wear the fedora ;-) That would be the perfect opportunity for you to convert users to Suse ;) I´d prefer, at least for this round, to keep dates/location and explore the option to allow people to join remotely. Afterall there are tons of tools between google hangouts and others that would allow that. That is, in my experience, the absolute worst. It creates second class participants and is a PITA for everyone. I agree, it is still a way for people to join in tho. I personally disagree. In my experience, one either does a face-to-face meeting, or a virtual one that puts everyone on the same footing. Mixing both works really badly unless the team already knows each other. I know that an in-person meeting is useful, but we have a large team in Beijing, the US, Tasmania (OK, one crazy guy), various countries in Europe etc. Yes same here. No difference.. we have one crazy guy in Australia.. Yeah, but you're already bringing him for your personal conference. That's a bit different. ;-) OK, let's switch tracks a bit. What *topics* do we actually have? Can we fill two days? Where would we want to collect them? I´d say either a google doc or any random etherpad/wiki instance will do just fine. As for the topics: - corosync qdevice and plugins (network, disk, integration with sdb?, others?) - corosync RRP / libknet integration/replacement - fence autodetection/autoconfiguration For the user facing topics (that is if there are enough participants and I only got 1 user confirmation so far): - demos, cluster 101, tutorials - get feedback - get feedback - get more feedback Fabio I'd be happy to do a cluster 101 or similar, if there is interest. Not sure if that would be particularly appealing to anyone coming to our meeting, as I think anyone interested is probably well past 101. :) Anyway, you guys know my background, let me know if there is a topic you'd like me to cover for the user side of things. -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] [ha-wg] [Linux-HA] [RFC] Organizing HA Summit 2015
On 11/25/2014 10:54 AM, Lars Marowsky-Bree wrote: > On 2014-11-24T16:16:05, "Fabio M. Di Nitto" wrote: > >>> Yeah, well, devconf.cz is not such an interesting event for those who do >>> not wear the fedora ;-) >> That would be the perfect opportunity for you to convert users to Suse ;) > I´d prefer, at least for this round, to keep dates/location and explore the option to allow people to join remotely. Afterall there are tons of tools between google hangouts and others that would allow that. >>> That is, in my experience, the absolute worst. It creates second class >>> participants and is a PITA for everyone. >> I agree, it is still a way for people to join in tho. > > I personally disagree. In my experience, one either does a face-to-face > meeting, or a virtual one that puts everyone on the same footing. > Mixing both works really badly unless the team already knows each > other. > >>> I know that an in-person meeting is useful, but we have a large team in >>> Beijing, the US, Tasmania (OK, one crazy guy), various countries in >>> Europe etc. >> Yes same here. No difference.. we have one crazy guy in Australia.. > > Yeah, but you're already bringing him for your personal conference. > That's a bit different. ;-) > > OK, let's switch tracks a bit. What *topics* do we actually have? Can we > fill two days? Where would we want to collect them? I´d say either a google doc or any random etherpad/wiki instance will do just fine. As for the topics: - corosync qdevice and plugins (network, disk, integration with sdb?, others?) - corosync RRP / libknet integration/replacement - fence autodetection/autoconfiguration For the user facing topics (that is if there are enough participants and I only got 1 user confirmation so far): - demos, cluster 101, tutorials - get feedback - get feedback - get more feedback Fabio ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Avoid monitoring of resources on nodes
25.11.2014 23:36, David Vossel wrote: > > > - Original Message - >> Daniel Dehennin writes: >> >>> Hello, >> >> Hello, >> >>> I have a 4 nodes cluster and some resources are only installed on 2 of >>> them. >>> >>> I set cluster asymmetry and infinity location: >>> >>> primitive Mysqld upstart:mysql \ >>> op monitor interval="60" >>> primitive OpenNebula-Sunstone-Sysv lsb:opennebula-sunstone \ >>> op monitor interval="60" >>> primitive OpenNebula-Sysv lsb:opennebula \ >>> op monitor interval="60" >>> group OpenNebula Mysqld OpenNebula-Sysv OpenNebula-Sunstone-Sysv \ >>> meta target-role="Started" >>> location OpenNebula-runs-on-Frontend OpenNebula inf: one-frontend >>> property $id="cib-bootstrap-options" \ >>> dc-version="1.1.10-42f2063" \ >>> cluster-infrastructure="corosync" \ >>> symmetric-cluster="false" \ >>> stonith-enabled="true" \ >>> stonith-timeout="30" \ >>> last-lrm-refresh="1416817941" \ >>> no-quorum-policy="stop" \ >>> stop-all-resources="off" >>> >>> But I have a lot of failing monitoring on other nodes of these resources >>> because they are not installed on them. >>> >>> Is there a way to completely exclude the resources from nodes, even the >>> monitoring? > > actually, this is possible now. I am unaware of any configuration tools (pcs > or > crmsh) that support this feature yet though. You might have to edit the cib > xml > manually. > > There's a new 'resource-discovery' option you can set on a location constraint > that help prevent resources from ever being started or monitored on a node. crmsh git master supports that. One note is that pacemaker validation schema should be set to 'pacemaker-next'. > > Example: never start or monitor the resource FAKE1 on 18node2. > > resource-discovery="never" rsc="FAKE1" score="-INFINITY"/> > > There are more examples in this regression test. > https://github.com/ClusterLabs/pacemaker/blob/master/pengine/test10/resource-discovery.xml#L99 > > -- Vossel > > >> This cause troubles on my setup, as resources fails, my nodes are all >> fenced. >> >> Any hints? >> >> Regards. >> -- >> Daniel Dehennin >> Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF >> Fingerprint: 3E69 014E 5C23 50E8 9ED6 2AAD CC1E 9E5B 7A6F E2DF >> >> ___ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org >> > > ___ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Fencing of bare-metal remote nodes
On 25/11/14 03:15 AM, Vladislav Bogdanov wrote: Hi! is subj implemented? Trying echo c > /proc/sysrq-trigger on remote nodes and no fencing occurs. Best, Vladislav Please share your configuration(s) and application names/versions. OS info wouldn't hurt, too. Relevant log entries on the surviving node after panicing the other node would also be helpful. -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Fencing of bare-metal remote nodes
25.11.2014 23:41, David Vossel wrote: > > > - Original Message - >> Hi! >> >> is subj implemented? >> >> Trying echo c > /proc/sysrq-trigger on remote nodes and no fencing occurs. > > Yes, fencing remote-nodes works. Are you certain your fencing devices can > handle > fencing the remote-node? Fencing a remote-node requires a cluster node to > invoke the agent that actually performs the fencing action on the remote-node. Yes, if I invoke fencing action manually ('crm node fence ' in crmsh syntax), node is fenced. So the issue seems to be related to the detection of a "need fencing". Comments in related git commits are a little bit terse in this area. So could you please explain what exactly needs to happen on a remote node to initiate fencing? I tried so far: * kill pacemaker_remoted when no resources are running. systemd restated it and crmd reconnected after some time. * crash kernel when no resources are running * crash kernel during massive start of resources No fencing happened. In the last case that start actions 'hung' and were failed by timeout (it is rather long), node was not even listed as failed. My customer asked me to stop crashing nodes because one of them does not boot anymore (I "like" that modern UEFI hardware very much.), so it is hard for me to play more with that. Best, Vladislav > > -- Vossel > >> >> Best, >> Vladislav >> >> ___ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org >> > > ___ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] [ha-wg-technical] [Cluster-devel] [Linux-HA] [ha-wg] [RFC] Organizing HA Summit 2015
> On 26 Nov 2014, at 10:06 am, Digimer wrote: > > On 25/11/14 04:31 PM, Andrew Beekhof wrote: >>> Yeah, but you're already bringing him for your personal conference. >>> That's a bit different. ;-) >>> >>> OK, let's switch tracks a bit. What *topics* do we actually have? Can we >>> fill two days? Where would we want to collect them? >> >> Personally I'm interested in talking about scaling - with pacemaker-remoted >> and/or a new messaging/membership layer. >> >> Other design-y topics: >> - SBD >> - degraded mode >> - improved notifications > > This my be something my company can bring to the table. We just hired a dev > whose principle goal is to develop and alert system for HA. We're modelling > it heavily on the fence/resource agent model with a "scan core" and "scan > agents". It's sort of like existing tools, but designed specifically for HA > clusters and heavily focused on not interfering with the host more than at > all necessary. By Feb., it should be mostly done. > > We're doing this for our own needs, but it might be a framework worth talking > about, if nothing else to see if others consider it a fit. Of course, it will > be entirely open source. *If* there is interest, I could put together a(n > informal) talk on it with a demo. Definitely interesting > >> - containerisation of services (cgroups, docker, virt) >> - resource-agents (upstream releases, handling of pull requests, testing) >> >> User-facing topics could include recent features (ie. pacemaker-remoted, >> crm_resource --restart) and common deployment scenarios (eg. NFS) that >> people get wrong. > > > > -- > Digimer > Papers and Projects: https://alteeve.ca/w/ > What if the cure for cancer is trapped in the mind of a person without access > to education? ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] [ha-wg-technical] [Cluster-devel] [Linux-HA] [ha-wg] [RFC] Organizing HA Summit 2015
On 25/11/14 04:31 PM, Andrew Beekhof wrote: Yeah, but you're already bringing him for your personal conference. That's a bit different. ;-) OK, let's switch tracks a bit. What *topics* do we actually have? Can we fill two days? Where would we want to collect them? Personally I'm interested in talking about scaling - with pacemaker-remoted and/or a new messaging/membership layer. Other design-y topics: - SBD - degraded mode - improved notifications This my be something my company can bring to the table. We just hired a dev whose principle goal is to develop and alert system for HA. We're modelling it heavily on the fence/resource agent model with a "scan core" and "scan agents". It's sort of like existing tools, but designed specifically for HA clusters and heavily focused on not interfering with the host more than at all necessary. By Feb., it should be mostly done. We're doing this for our own needs, but it might be a framework worth talking about, if nothing else to see if others consider it a fit. Of course, it will be entirely open source. *If* there is interest, I could put together a(n informal) talk on it with a demo. - containerisation of services (cgroups, docker, virt) - resource-agents (upstream releases, handling of pull requests, testing) User-facing topics could include recent features (ie. pacemaker-remoted, crm_resource --restart) and common deployment scenarios (eg. NFS) that people get wrong. -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Avoid monitoring of resources on nodes
Daniel Dehennin writes: >> There's a new 'resource-discovery' option you can set on a location >> constraint >> that help prevent resources from ever being started or monitored on a node. >> >> Example: never start or monitor the resource FAKE1 on 18node2. >> >> > resource-discovery="never" rsc="FAKE1" score="-INFINITY"/> >> >> There are more examples in this regression test. >> https://github.com/ClusterLabs/pacemaker/blob/master/pengine/test10/resource-discovery.xml#L99 > > Thanks a lot. > > I'll try find how to make the change directly in XML. Ok, looking at git history this feature seems only available on master branch and not yet released. Thanks. -- Daniel Dehennin Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF Fingerprint: 3E69 014E 5C23 50E8 9ED6 2AAD CC1E 9E5B 7A6F E2DF signature.asc Description: PGP signature ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Avoid monitoring of resources on nodes
David Vossel writes: > actually, this is possible now. I am unaware of any configuration tools (pcs > or > crmsh) that support this feature yet though. You might have to edit the cib > xml > manually. > > There's a new 'resource-discovery' option you can set on a location constraint > that help prevent resources from ever being started or monitored on a node. > > Example: never start or monitor the resource FAKE1 on 18node2. > > resource-discovery="never" rsc="FAKE1" score="-INFINITY"/> > > There are more examples in this regression test. > https://github.com/ClusterLabs/pacemaker/blob/master/pengine/test10/resource-discovery.xml#L99 Thanks a lot. I'll try find how to make the change directly in XML. Regards. -- Daniel Dehennin Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF Fingerprint: 3E69 014E 5C23 50E8 9ED6 2AAD CC1E 9E5B 7A6F E2DF signature.asc Description: PGP signature ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] [Cluster-devel] [Linux-HA] [ha-wg] [RFC] Organizing HA Summit 2015
- Original Message - > > > On 25 Nov 2014, at 8:54 pm, Lars Marowsky-Bree wrote: > > > > On 2014-11-24T16:16:05, "Fabio M. Di Nitto" wrote: > > > >>> Yeah, well, devconf.cz is not such an interesting event for those who do > >>> not wear the fedora ;-) > >> That would be the perfect opportunity for you to convert users to Suse ;) > > > I´d prefer, at least for this round, to keep dates/location and explore > the option to allow people to join remotely. Afterall there are tons of > tools between google hangouts and others that would allow that. > >>> That is, in my experience, the absolute worst. It creates second class > >>> participants and is a PITA for everyone. > >> I agree, it is still a way for people to join in tho. > > > > I personally disagree. In my experience, one either does a face-to-face > > meeting, or a virtual one that puts everyone on the same footing. > > Mixing both works really badly unless the team already knows each > > other. > > > >>> I know that an in-person meeting is useful, but we have a large team in > >>> Beijing, the US, Tasmania (OK, one crazy guy), various countries in > >>> Europe etc. > >> Yes same here. No difference.. we have one crazy guy in Australia.. > > > > Yeah, but you're already bringing him for your personal conference. > > That's a bit different. ;-) > > > > OK, let's switch tracks a bit. What *topics* do we actually have? Can we > > fill two days? Where would we want to collect them? > > Personally I'm interested in talking about scaling - with pacemaker-remoted > and/or a new messaging/membership layer. If we're going to talk about scaling, we should throw in our new docker support in the same discussion. Docker lends itself well to the "pet vs cattle" analogy. I see management of docker with pacemaker making quite a bit of sense now that we have the ability to scale into the "cattle" territory. > Other design-y topics: > - SBD > - degraded mode > - improved notifications > - containerisation of services (cgroups, docker, virt) > - resource-agents (upstream releases, handling of pull requests, testing) Yep, We definitely need to talk about the resource-agents. > > User-facing topics could include recent features (ie. pacemaker-remoted, > crm_resource --restart) and common deployment scenarios (eg. NFS) that > people get wrong. Adding to the list, it would be a good idea to talk about Deployment integration testing, what's going on with the phd project and why it's important regardless if you're interested in what the project functionally does. -- Vossel > ___ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] [Cluster-devel] [Linux-HA] [ha-wg] [RFC] Organizing HA Summit 2015
> On 25 Nov 2014, at 8:54 pm, Lars Marowsky-Bree wrote: > > On 2014-11-24T16:16:05, "Fabio M. Di Nitto" wrote: > >>> Yeah, well, devconf.cz is not such an interesting event for those who do >>> not wear the fedora ;-) >> That would be the perfect opportunity for you to convert users to Suse ;) > I´d prefer, at least for this round, to keep dates/location and explore the option to allow people to join remotely. Afterall there are tons of tools between google hangouts and others that would allow that. >>> That is, in my experience, the absolute worst. It creates second class >>> participants and is a PITA for everyone. >> I agree, it is still a way for people to join in tho. > > I personally disagree. In my experience, one either does a face-to-face > meeting, or a virtual one that puts everyone on the same footing. > Mixing both works really badly unless the team already knows each > other. > >>> I know that an in-person meeting is useful, but we have a large team in >>> Beijing, the US, Tasmania (OK, one crazy guy), various countries in >>> Europe etc. >> Yes same here. No difference.. we have one crazy guy in Australia.. > > Yeah, but you're already bringing him for your personal conference. > That's a bit different. ;-) > > OK, let's switch tracks a bit. What *topics* do we actually have? Can we > fill two days? Where would we want to collect them? Personally I'm interested in talking about scaling - with pacemaker-remoted and/or a new messaging/membership layer. Other design-y topics: - SBD - degraded mode - improved notifications - containerisation of services (cgroups, docker, virt) - resource-agents (upstream releases, handling of pull requests, testing) User-facing topics could include recent features (ie. pacemaker-remoted, crm_resource --restart) and common deployment scenarios (eg. NFS) that people get wrong. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] [Linux-HA] [ha-wg] [RFC] Organizing HA Summit 2015
> On 25 Nov 2014, at 9:16 pm, Michael Schwartzkopff wrote: > > Am Dienstag, 25. November 2014, 10:54:01 schrieb Lars Marowsky-Bree: >> On 2014-11-24T16:16:05, "Fabio M. Di Nitto" wrote: Yeah, well, devconf.cz is not such an interesting event for those who do not wear the fedora ;-) >>> >>> That would be the perfect opportunity for you to convert users to Suse ;) >>> > I´d prefer, at least for this round, to keep dates/location and explore > the option to allow people to join remotely. Afterall there are tons of > tools between google hangouts and others that would allow that. That is, in my experience, the absolute worst. It creates second class participants and is a PITA for everyone. >>> >>> I agree, it is still a way for people to join in tho. >> >> I personally disagree. In my experience, one either does a face-to-face >> meeting, or a virtual one that puts everyone on the same footing. >> Mixing both works really badly unless the team already knows each >> other. >> I know that an in-person meeting is useful, but we have a large team in Beijing, the US, Tasmania (OK, one crazy guy), various countries in Europe etc. >>> >>> Yes same here. No difference.. we have one crazy guy in Australia.. >> >> Yeah, but you're already bringing him for your personal conference. >> That's a bit different. ;-) >> >> OK, let's switch tracks a bit. What *topics* do we actually have? Can we >> fill two days? Where would we want to collect them? > > - Roadmap: What to expect next. > - Unification: Locking and fencing the RH style (cman) and the rest of the > world. Unification is pretty much sorted AFAICS, RHEL ships corosync2 + pacemaker and SUSE either does already or is talking about doing it soon. Pacemaker + CMAN was only ever a transitioning state. > - features in pacemaker > - Cluster File Systems: Which one is usable for what application. > > All points from a users point of view. Not realated to any company. > > I could present: "Monitoring of clusters". > > Mit freundlichen Grüßen, > > Michael Schwartzkopff > > -- > [*] sys4 AG > > http://sys4.de, +49 (89) 30 90 46 64, +49 (162) 165 0044 > Franziskanerstraße 15, 81669 München > > Sitz der Gesellschaft: München, Amtsgericht München: HRB 199263 > Vorstand: Patrick Ben Koetter, Marc Schiffbauer > Aufsichtsratsvorsitzender: Florian > Kirstein___ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Fencing of bare-metal remote nodes
- Original Message - > Hi! > > is subj implemented? > > Trying echo c > /proc/sysrq-trigger on remote nodes and no fencing occurs. Yes, fencing remote-nodes works. Are you certain your fencing devices can handle fencing the remote-node? Fencing a remote-node requires a cluster node to invoke the agent that actually performs the fencing action on the remote-node. -- Vossel > > Best, > Vladislav > > ___ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Avoid monitoring of resources on nodes
- Original Message - > Daniel Dehennin writes: > > > Hello, > > Hello, > > > I have a 4 nodes cluster and some resources are only installed on 2 of > > them. > > > > I set cluster asymmetry and infinity location: > > > > primitive Mysqld upstart:mysql \ > > op monitor interval="60" > > primitive OpenNebula-Sunstone-Sysv lsb:opennebula-sunstone \ > > op monitor interval="60" > > primitive OpenNebula-Sysv lsb:opennebula \ > > op monitor interval="60" > > group OpenNebula Mysqld OpenNebula-Sysv OpenNebula-Sunstone-Sysv \ > > meta target-role="Started" > > location OpenNebula-runs-on-Frontend OpenNebula inf: one-frontend > > property $id="cib-bootstrap-options" \ > > dc-version="1.1.10-42f2063" \ > > cluster-infrastructure="corosync" \ > > symmetric-cluster="false" \ > > stonith-enabled="true" \ > > stonith-timeout="30" \ > > last-lrm-refresh="1416817941" \ > > no-quorum-policy="stop" \ > > stop-all-resources="off" > > > > But I have a lot of failing monitoring on other nodes of these resources > > because they are not installed on them. > > > > Is there a way to completely exclude the resources from nodes, even the > > monitoring? actually, this is possible now. I am unaware of any configuration tools (pcs or crmsh) that support this feature yet though. You might have to edit the cib xml manually. There's a new 'resource-discovery' option you can set on a location constraint that help prevent resources from ever being started or monitored on a node. Example: never start or monitor the resource FAKE1 on 18node2. There are more examples in this regression test. https://github.com/ClusterLabs/pacemaker/blob/master/pengine/test10/resource-discovery.xml#L99 -- Vossel > This cause troubles on my setup, as resources fails, my nodes are all > fenced. > > Any hints? > > Regards. > -- > Daniel Dehennin > Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF > Fingerprint: 3E69 014E 5C23 50E8 9ED6 2AAD CC1E 9E5B 7A6F E2DF > > ___ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Pacemaker fencing and DLM/cLVM
Christine Caulfield writes: > It seems to me that fencing is failing for some reason, though I can't > tell from the logs exactly why, so you might have to investgate your > setup for IPMI to see just what is happening (I'm no IPMI expert, > sorry). Thanks for looking, but actually IPMI stonith is working, for all nodes I tested: stonith_adm --reboot And it works. > The logs files tell me this though: > > Nov 25 10:56:32 nebula3 dlm_controld[6465]: 1035 fence request > 1084811079 pid 7358 nodedown time 1416909392 fence_all dlm_stonith > Nov 25 10:56:32 nebula3 dlm_controld[6465]: 1035 fence result > 1084811079 pid 7358 result 1 exit status > Nov 25 10:56:32 nebula3 dlm_controld[6465]: 1035 fence status > 1084811079 receive 1 from 1084811080 walltime 1416909392 local 1035 > Nov 25 10:56:32 nebula3 dlm_controld[6465]: 1035 fence request > 1084811079 no actor > > > Showing a status code '1' from dlm_stonith - the result should be 0 if > fencing completed succesfully. But 1084811080 is nebula3 and in its logs I see: Nov 25 10:56:33 nebula3 stonith-ng[6232]: notice: can_fence_host_with_device: Stonith-nebula2-IPMILAN can fence nebula2: static-list [...] Nov 25 10:56:34 nebula3 stonith-ng[6232]: notice: log_operation: Operation 'reboot' [7359] (call 4 from crmd.5038) for host 'nebula2' with device 'Stonith-nebula2-IPMILAN' returned: 0 (OK) Nov 25 10:56:34 nebula3 stonith-ng[6232]:error: crm_abort: crm_glib_handler: Forked child 7376 to record non-fatal assert at logging.c:63 : Source ID 20 was not found when attempting to remove it Nov 25 10:56:34 nebula3 stonith-ng[6232]:error: crm_abort: crm_glib_handler: Forked child 7377 to record non-fatal assert at logging.c:63 : Source ID 21 was not found when attempting to remove it Nov 25 10:56:34 nebula3 stonith-ng[6232]: notice: remote_op_done: Operation reboot of nebula2 by nebula1 for crmd.5038@nebula1.34bed18c: OK Nov 25 10:56:34 nebula3 crmd[6236]: notice: tengine_stonith_notify: Peer nebula2 was terminated (reboot) by nebula1 for nebula1: OK (ref=34bed18c-c395-4de2-b323-e00208cac6c7) by client crmd.5038 Nov 25 10:56:34 nebula3 crmd[6236]: notice: crm_update_peer_state: tengine_stonith_notify: Node nebula2[0] - state is now lost (was (null)) Which means to me that stonith-ng manage to fence the node and notify its success. How the “returned: 0 (OK)” could became “receive 1”? A logic issue somewhere between stonith-ng and dlm_controld? Thanks. -- Daniel Dehennin Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF Fingerprint: 3E69 014E 5C23 50E8 9ED6 2AAD CC1E 9E5B 7A6F E2DF signature.asc Description: PGP signature ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Pacemaker fencing and DLM/cLVM
On 25/11/14 10:45, Daniel Dehennin wrote: Daniel Dehennin writes: I'm using Ubuntu 14.04: - corosync 2.3.3-1ubuntu1 - pacemaker 1.1.10+git20130802-1ubuntu2.1 I thought everything was integrated in such configuration. Here are some more informations: - the pacemaker configuration - the log of the DC nebula1 with marks for each step - the log of the nebula2 with marks for each step - the log of the nebula3 with marks for each step - the output of “dlm_tool ls” and dlm_tool status” before/during/after nebula2 fencing The steps are: 1. All nodes up, cluster down 2. Start corosync on all nodes 3. Start pacemaker on all nodes 4. Start resource ONE-Storage-Clone (dlm, cLVM, VG, GFS2) 5. Crash nebula2 6. Start corosync on nebula2 after reboot 7. Start pacemaker on nebula2 after reboot Does someone understand why DLM did not get the ACK of the fencing automatically from stonith? Why ONE-Storage-Clone does not manage to start on nebula2 after fencing. It seems to me that fencing is failing for some reason, though I can't tell from the logs exactly why, so you might have to investgate your setup for IPMI to see just what is happening (I'm no IPMI expert, sorry). The logs files tell me this though: Nov 25 10:56:32 nebula3 dlm_controld[6465]: 1035 fence request 1084811079 pid 7358 nodedown time 1416909392 fence_all dlm_stonith Nov 25 10:56:32 nebula3 dlm_controld[6465]: 1035 fence result 1084811079 pid 7358 result 1 exit status Nov 25 10:56:32 nebula3 dlm_controld[6465]: 1035 fence status 1084811079 receive 1 from 1084811080 walltime 1416909392 local 1035 Nov 25 10:56:32 nebula3 dlm_controld[6465]: 1035 fence request 1084811079 no actor Showing a status code '1' from dlm_stonith - the result should be 0 if fencing completed succesfully. Chrissie ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Avoid monitoring of resources on nodes
Daniel Dehennin writes: > Hello, Hello, > I have a 4 nodes cluster and some resources are only installed on 2 of > them. > > I set cluster asymmetry and infinity location: > > primitive Mysqld upstart:mysql \ > op monitor interval="60" > primitive OpenNebula-Sunstone-Sysv lsb:opennebula-sunstone \ > op monitor interval="60" > primitive OpenNebula-Sysv lsb:opennebula \ > op monitor interval="60" > group OpenNebula Mysqld OpenNebula-Sysv OpenNebula-Sunstone-Sysv \ > meta target-role="Started" > location OpenNebula-runs-on-Frontend OpenNebula inf: one-frontend > property $id="cib-bootstrap-options" \ > dc-version="1.1.10-42f2063" \ > cluster-infrastructure="corosync" \ > symmetric-cluster="false" \ > stonith-enabled="true" \ > stonith-timeout="30" \ > last-lrm-refresh="1416817941" \ > no-quorum-policy="stop" \ > stop-all-resources="off" > > But I have a lot of failing monitoring on other nodes of these resources > because they are not installed on them. > > Is there a way to completely exclude the resources from nodes, even the > monitoring? This cause troubles on my setup, as resources fails, my nodes are all fenced. Any hints? Regards. -- Daniel Dehennin Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF Fingerprint: 3E69 014E 5C23 50E8 9ED6 2AAD CC1E 9E5B 7A6F E2DF signature.asc Description: PGP signature ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[Pacemaker] sbd fencing race
Hi list, The last night, i had a cluster in fencing race using sbd as stonith device, i would like to know what is the effect to use start-delay in my stonith resource in this way: primitive stonith-sbd stonith:external/sbd \ params sbd_device="/dev/mapper/SBD \ op start interval="0" start-delay="5" Thanks ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[Pacemaker] Suicide fencing and watchdog questions
Hi, Is there any information how watchdog integration is intended to work? What are currently-evaluated use-cases for that? It seems to be forcibly disabled id SBD is not detected... Also, is there any way to make node (in one-node cluster ;) ) to suicide if it detects fencing is required? Technically, that can be done with IPMI 'power cycle' or 'power reset' commands - but node (and thus the "whole" cluster) will not know about fencing is succeeded, because if it received the answer, then fencing failed. But node will be hard reboot and thus cleaned up otherwise. Best, Vladislav ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] [Linux-HA] [ha-wg] [RFC] Organizing HA Summit 2015
Am Dienstag, 25. November 2014, 10:54:01 schrieb Lars Marowsky-Bree: > On 2014-11-24T16:16:05, "Fabio M. Di Nitto" wrote: > > > Yeah, well, devconf.cz is not such an interesting event for those who do > > > not wear the fedora ;-) > > > > That would be the perfect opportunity for you to convert users to Suse ;) > > > > >> I´d prefer, at least for this round, to keep dates/location and explore > > >> the option to allow people to join remotely. Afterall there are tons of > > >> tools between google hangouts and others that would allow that. > > > > > > That is, in my experience, the absolute worst. It creates second class > > > participants and is a PITA for everyone. > > > > I agree, it is still a way for people to join in tho. > > I personally disagree. In my experience, one either does a face-to-face > meeting, or a virtual one that puts everyone on the same footing. > Mixing both works really badly unless the team already knows each > other. > > > > I know that an in-person meeting is useful, but we have a large team in > > > Beijing, the US, Tasmania (OK, one crazy guy), various countries in > > > Europe etc. > > > > Yes same here. No difference.. we have one crazy guy in Australia.. > > Yeah, but you're already bringing him for your personal conference. > That's a bit different. ;-) > > OK, let's switch tracks a bit. What *topics* do we actually have? Can we > fill two days? Where would we want to collect them? - Roadmap: What to expect next. - Unification: Locking and fencing the RH style (cman) and the rest of the world. - features in pacemaker - Cluster File Systems: Which one is usable for what application. All points from a users point of view. Not realated to any company. I could present: "Monitoring of clusters". Mit freundlichen Grüßen, Michael Schwartzkopff -- [*] sys4 AG http://sys4.de, +49 (89) 30 90 46 64, +49 (162) 165 0044 Franziskanerstraße 15, 81669 München Sitz der Gesellschaft: München, Amtsgericht München: HRB 199263 Vorstand: Patrick Ben Koetter, Marc Schiffbauer Aufsichtsratsvorsitzender: Florian Kirstein signature.asc Description: This is a digitally signed message part. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] [Linux-HA] [ha-wg] [RFC] Organizing HA Summit 2015
On 2014-11-24T16:16:05, "Fabio M. Di Nitto" wrote: > > Yeah, well, devconf.cz is not such an interesting event for those who do > > not wear the fedora ;-) > That would be the perfect opportunity for you to convert users to Suse ;) > >> I´d prefer, at least for this round, to keep dates/location and explore > >> the option to allow people to join remotely. Afterall there are tons of > >> tools between google hangouts and others that would allow that. > > That is, in my experience, the absolute worst. It creates second class > > participants and is a PITA for everyone. > I agree, it is still a way for people to join in tho. I personally disagree. In my experience, one either does a face-to-face meeting, or a virtual one that puts everyone on the same footing. Mixing both works really badly unless the team already knows each other. > > I know that an in-person meeting is useful, but we have a large team in > > Beijing, the US, Tasmania (OK, one crazy guy), various countries in > > Europe etc. > Yes same here. No difference.. we have one crazy guy in Australia.. Yeah, but you're already bringing him for your personal conference. That's a bit different. ;-) OK, let's switch tracks a bit. What *topics* do we actually have? Can we fill two days? Where would we want to collect them? Regards, Lars -- Architect Storage/HA SUSE LINUX GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg) "Experience is the name everyone gives to their mistakes." -- Oscar Wilde ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[Pacemaker] Avoid monitoring of resources on nodes
Hello, I have a 4 nodes cluster and some resources are only installed on 2 of them. I set cluster asymmetry and infinity location: primitive Mysqld upstart:mysql \ op monitor interval="60" primitive OpenNebula-Sunstone-Sysv lsb:opennebula-sunstone \ op monitor interval="60" primitive OpenNebula-Sysv lsb:opennebula \ op monitor interval="60" group OpenNebula Mysqld OpenNebula-Sysv OpenNebula-Sunstone-Sysv \ meta target-role="Started" location OpenNebula-runs-on-Frontend OpenNebula inf: one-frontend property $id="cib-bootstrap-options" \ dc-version="1.1.10-42f2063" \ cluster-infrastructure="corosync" \ symmetric-cluster="false" \ stonith-enabled="true" \ stonith-timeout="30" \ last-lrm-refresh="1416817941" \ no-quorum-policy="stop" \ stop-all-resources="off" But I have a lot of failing monitoring on other nodes of these resources because they are not installed on them. Is there a way to completely exclude the resources from nodes, even the monitoring? Regards. Ubuntu Trusty Tahr (amd64): - corosync 2.3.3-1ubuntu1 - pacemaker 1.1.10+git20130802-1ubuntu2.1 -- Daniel Dehennin Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF Fingerprint: 3E69 014E 5C23 50E8 9ED6 2AAD CC1E 9E5B 7A6F E2DF signature.asc Description: PGP signature ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[Pacemaker] Fencing of bare-metal remote nodes
Hi! is subj implemented? Trying echo c > /proc/sysrq-trigger on remote nodes and no fencing occurs. Best, Vladislav ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org