[ClusterLabs] network fencing - azure arm
Hello, I have a question about stonith configuration within azure and I hope I'm using the correct mailing list. I've installed two virtual machines with pacemaker 1.1.18, pcs 0.9.162 and fence-agents-azure-arm 4.0.11.86. Now I'm unable to create a stonith configuration via pcs by using network-fencing. Without network-fencing everything works fine. The following command is working as expected (without network-fencing): # pcs stonith create stonith.node1 fence_azure_arm login=$az_login passwd=$az_passwd resourceGroup=$az_rg subscriptionId=$az_sid tenantId=$az_tenant retry_on=0 pcmk_host_list=node1 If I add "network-fencing" to the list, pcs throws an error: Error: missing value of 'network-fencing' option I don't know what's wrong because network-fencing doesn't require any value. # fence_azure_arm -h | grep -A2 network-fencing --network-fencing Use network fencing. See NOTE-section of metadata for required Subnet/Network Security Group configuration. Also network-fencing="on", "true" or "0" wasn't working. If I use fence_azure_arm with network-fencing option manually everything is working as expected. Unfortunately this fence agent is very rare documented and I didn't found any example for network-fencing. Thanks for your help! Best Regards, Thomas B. ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Antw: Announcing Anvil! m2 v2.0.7
On 2018-11-20 2:53 a.m., Ulrich Windl wrote: > Hi! > > You forgot the most important piece of information: "What is it?" I guess it's > so obvious for you that you forgot to mention. ;-) > > Regards, > Ulrich Heh, fair enough :) We have a specialized, "canned" HA cluster that adds a lot of autonomous operation, particularly useful for deployments where there aren't cluster specialists available (factories, cargo ships, etc). We wrote a rough overview of it here, if you're curious; https://www.alteeve.com/w/What_is_an_Anvil!_and_why_do_I_care%3F Basically, if you want to host VMs but won't be there to take care of them often, you might find the Anvil! platform quite appealing. cheers! Digimer schrieb am 20.11.2018 um 08:25 in Nachricht > <3ff31468-4052-dda7-7841-4c04985ad...@alteeve.ca>: >> * https://github.com/ClusterLabs/striker/releases/tag/v2.0.7 >> >> This is the first release since March, 2018. No critical issues are know >> or where fixed. Users are advised to upgrade. >> >> Main bugs fixed; >> >> * Fixed install issues for Windows 10 and 2016 clients. >> * Improved duplicate record detection and cleanup in scan-clustat and >> scan-storcli. >> * Disabled the detection and recovery of 'paused' state servers (it >> caused more trouble than it solved). >> >> Notable new features; >> * Improved the server boot logic to choose the node with the most >> running servers, all else being equal. >> * Updated UPS power transfer reason alerts from "warning" to "notice" >> level alerts. >> * Added support for EL 6.10. >> >> Users can upgrade using 'striker-update' from their Striker dashboards. >> >> /sbin/striker/striker-update --local >> /sbin/striker/striker-update --anvil all >> >> Please feel free to report any issues in the Striker github repository. >> >> -- >> Digimer >> Papers and Projects: https://alteeve.com/w/ >> "I am, somehow, less interested in the weight and convolutions of >> Einstein’s brain than in the near certainty that people of equal talent >> have lived and died in cotton fields and sweatshops." - Stephen Jay Gould >> ___ >> Users mailing list: Users@clusterlabs.org >> https://lists.clusterlabs.org/mailman/listinfo/users >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org > > > > ___ > Users mailing list: Users@clusterlabs.org > https://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > -- Digimer Papers and Projects: https://alteeve.com/w/ "I am, somehow, less interested in the weight and convolutions of Einstein’s brain than in the near certainty that people of equal talent have lived and died in cotton fields and sweatshops." - Stephen Jay Gould ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] FYI: dlm.service possibly silenty overwritten with DisplayLink driver installer
Accidentally, when searching for something systemd related, dlm.service caught my eye, and surprisingly, it was rather in a HW support in Linux SW enablement context. Briefly looking into the Ubuntu driver that allegedly contained that file (or recipe to create it, actually), I've realized the respective installer would simply overwrite the regular cluster DLM related service file without any hesitations (unless I miss something). So take this as a heads-up to watch out for that circumstance, three letter acronyms are apparently not very namespace-collision-proof. I logged this with their "Feature Suggestion" asking for more carefulness: https://support.displaylink.com/forums/287786-displaylink-feature-suggestions/suggestions/36068896 -- Jan (Poki) pgppW1YFZ_3ut.pgp Description: PGP signature ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Pacemaker failed to restart subprocess of host if container also uses pacemaker cluster!
On Fri, 2018-11-16 at 16:33 +0800, ma.jinf...@zte.com.cn wrote: > There is a problem in my program about pacemake that pacemaker > failed to restart subprocess of host if container also uses > pacemaker cluster! That might not be supportable with the current code. It's possible to have a nested cluster with VMs, but containers probably share too much of the host environment. There was an issue not that long ago with libqb that led to a new libqb option to use filesystem sockets instead of Linux native sockets, that might help, but I wouldn't be surprised if there are more issues. One problem with nested clusters is fencing; it's difficult to get fencing working reliably in both clusters. If the reason for the separation is policy, then VMs may be the only way. Otherwise, if you just want to control resources inside the containers, then the new bundles feature or the Pacemaker Remote feature would be the best way to handle it. > The environment is as follows: > 1. corosync version 2.4.0 pacemaker version 1.1.16 > 2. three node clusters, and container also has a pacemaker cluster > This issue caused the cluster can`t work normally when the node is > restart or the pacemakerd process is restart . > I did a test for it: stop corosync ( leading to pacemaker restart) > ,the logs are as follows: > ///stop corosync// > [ubuntu@paas-controller-208-1-0-40:~]$ sudo su > [root@paas-controller-208-1-0-40:/home/ubuntu]$ service corosync > stop > [root@paas-controller-208-1-0-40:/home/ubuntu]$ ps -elf | grep > pacemaker > 4 S root 16613 14434 0 80 0 - 26569 poll_s 19:09 pts/2 00:00:00 > /usr/sbin/pacemakerd > 4 S haclust+ 16619 16613 0 80 0 - 27481 poll_s 19:09 ? 00:00:00 > /usr/libexec/pacemaker/cib > 4 S root 16620 16613 0 80 0 - 27454 poll_s 19:09 ? 00:00:00 > /usr/libexec/pacemaker/stonithd > 4 S root 16622 16613 0 80 0 - 19155 poll_s 19:09 ? 00:00:00 > /usr/libexec/pacemaker/lrmd > 4 S haclust+ 16623 16613 0 80 0 - 25141 poll_s 19:09 ? 00:00:00 > /usr/libexec/pacemaker/attrd > 4 S haclust+ 16624 16613 0 80 0 - 20618 poll_s 19:09 ? 00:00:00 > /usr/libexec/pacemaker/pengine > 4 S haclust+ 16625 16613 0 80 0 - 29743 poll_s 19:09 ? 00:00:00 > /usr/libexec/pacemaker/crmd > 4 S root 16628 14465 0 80 0 - 26569 poll_s 19:09 pts/3 00:00:00 > /usr/sbin/pacemakerd > 4 S haclust+ 16631 16628 0 80 0 - 27357 poll_s 19:09 ? 00:00:00 > /usr/libexec/pacemaker/cib > 4 S root 16632 16628 0 80 0 - 27455 poll_s 19:09 ? 00:00:00 > /usr/libexec/pacemaker/stonithd > 4 S root 16633 16628 0 80 0 - 19155 poll_s 19:09 ? 00:00:00 > /usr/libexec/pacemaker/lrmd > 4 S haclust+ 16634 16628 0 80 0 - 25142 poll_s 19:09 ? 00:00:00 > /usr/libexec/pacemaker/attrd > 4 S haclust+ 16635 16628 0 80 0 - 20618 poll_s 19:09 ? 00:00:00 > /usr/libexec/pacemaker/pengine > 4 S haclust+ 16636 16628 0 80 0 - 29743 poll_s 19:09 ? 00:00:00 > /usr/libexec/pacemaker/crmd > 4 S root 23559 1 0 80 0 - 20416 hrtime 19:10 ? 00:00:00 > /usr/sbin/pacemakerd -f > 4 S root 25105 11245 0 80 0 - 28203 pipe_w 19:10 pts/5 00:00:00 grep > --color=auto pacemaker > 4 S root 31529 1 0 80 0 - 19012 poll_s 14:41 ? 00:00:40 > /usr/libexec/pacemaker/lrmd > 4 S haclust+ 31531 1 0 80 0 - 24467 poll_s 14:41 ? 00:00:29 > /usr/libexec/pacemaker/pengine > > some pacemaker process(crmd,attrd,cib,stonithd) seems to be lost, > even if I restart pacemaker(service pacemaker start) . > Does anyone know how to deal it? Thank you very much! > > > > > > 马金峰 > 通信协议软件开发工程师 > 虚拟化二部/无线研究院/无线产品经营部 NIV Dept. II/Wireless Product R&D > Institute/Wireless Product Operation Division > > 中兴通讯股份有限公司 > 上海市浦东新区碧波路889号中兴通讯D2070 > T: +86 021 M: +86 17601320963 > E: ma.jinf...@zte.com.cn > www.zte.com.cn > ___ > Users mailing list: Users@clusterlabs.org > https://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch. > pdf > Bugs: http://bugs.clusterlabs.org -- Ken Gaillot ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] pcs constraint order set syntax
On Mon, 2018-11-19 at 14:32 -0800, Chris Miller wrote: > > Hello, > I am attempting to add a resource to an existing ordering > constraint set. The system in question came pre-configured > with PCS (FreePBX HA), and need to add a resource group > (queuemetrics) to the ordering constraint set. Before modification, > the existing set is as follows (output from pcs config --full) > set mysql httpd asterisk sequential=true (id:mysql-httpd-asterisk) > setoptions kind=Optional (id:freepbx-start-order) > I'm having issues with the "resource order set" command syntax, > specifically with setting options and IDs. Per the man page and help > info, the syntax appears as it should be this : > > pcs constraint order set mysql httpd asterisk queuemetrics > sequential=true id=mysql-httpd-asterisk setoptions kind=Optional > id=freepbx-start-order > > However when running this command I receive the following error : > Call cib_replace failed (-203): Update does not conform to the > configured schema > I have also tried variations of this syntax, and the ID option > specifically is ignored and a dynamically generated name is used > instead. > > I'm not having any luck finding guidance with this specific issue > online. Thanks in advance for your guidance. > > Chris I'm guessing the issue is that the set already exists; "order set" creates a new one. I don't think there is a single command to modify an existing set. You could delete the existing one then add the new one (preferably using -f with an external file to make the changes atomic). Or, you could use "pcs cluster edit" and modify the XML interactively. -- Ken Gaillot ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Corosync 3.0 - Alpha5 is available at corosync.org!
I am pleased to announce the fifth testing (Alpha 5) release of Corosync 3.0 (codename Camelback) available immediately from our website at http://build.clusterlabs.org/corosync/releases/ as corosync-2.99.4. You can also download RPMs for various distributions from CI https://kronosnet.org/builds/. This release turns out to be quite bigger than I've expected, so that's the reason why it's not Beta/RC yet. It also contains quite a lot of removal of unused/unmaintained code. List of biggest changes and backwards compatibility breakages with reasoning and replacement (if needed): - Removal of CTS - unmaintained for long time and unused by developers, currently without replacement. - libtotem is no longer shared library and it's directly compiled into corosync binary - main idea of having libtotem.so was to allow other projects usage of libtotem and build custom "corosync" on top of it. This idea actually never got expected usage and because it was super big, it was just making corosync development much harder. This doesn't affect any corosync user. In future totemsrp.c should be made into real well testable library without network protocol handling, ... but right now, there is no replacement for libtotem.so. - Remove usage of all environment variables and tied up corosync arguments: - -p, -P, -R and -r options - replaced by system.sched_rr, system.priority and system.move_to_root_cgroup options in the config file. - env COROSYNC_MAIN_CONFIG_FILE - Replaced by "-c" option. This also affects uidgid.d location. - env COROSYNC_TOTEM_AUTHKEY_FILE - Replaced by (already existing) totem.keyfile config option, which is now documented. - env COROSYNC_RUN_DIR - Replaced by system.run_dir and documented. - Usage of libcgroup removal - deprecated in most of new distributions, replaced by "short" code with even functionality. - NSS dependency removal - not needed anymore because crypto is now handled by knet, so no replacement needed. This change affects only cpgverify where packet format is changed. - Corosync config file parser updated so it's now more strict and (finally) displays line with error. Affects only broken config files. - With new enough LibQB, it's possible to send command to corosync and it tries reopen its log files. Used by logrotate in favor of old copytruncate method. Copytruncate still exists and is compiled/installed by default with old LibQB. - Timestamps are now enabled by default. With new enough LibQB hires (including milliseconds) timestamps are used by default. Complete changelog for Alpha 5 (compared to Alpha 4): Chris Walker (3): Add option for quiet operation to corosync-cmapctl Add token_warning configuration option Add option to force cluster into GATHER state Christine Caulfield (2): config: Fix crash in reload if new interfaces are added config: Allow generated nodeis for UDP & UDPU Ferenc Wágner (3): man: fix cmap key name runtime.config.totem.token man: Fix typo connnections -> connections man: Fix typo conains -> contains Jan Friesse (40): spec: Add explicit gcc build requirement totemknet: Free instance on failure exit util: Fix strncpy in setcs_name_t function cmap: Fix strncpy warning in cmap_iter_next ipc_glue: Fix strncpy in pid_to_name function totemconfig: Enlarge error_string_response totemsrp: Add assert into memb_lowest_in_config corosync-notifyd: Rename global local_nodeid Remove libcgroup build: Support for git archive stored tags git-version-gen: Fail on UNKNOWN version notifyd: Propagate error to exit code coroparse: Return error if config line is too long coroparse: Check icmap_set results coroparse: Fix remove_whitespace end condition coroparse: Be more strict in what is parsed coroparse: Add file name and line to error message coroparse: Use key_name for error message coroparse: Fix newly introduced warning man: Fix crypto_hash and crypto_cipher defaults cts: Remove CTS build: Remove NSS dependencies build: Do not compile totempg as a shared library build: Remove totempg shared library leftovers man: Fix default knet_pmtud_interval to match code totemconfig: Replace strcpy by strncpy log: Implement support for reopening log files config example: Migrate to newer syntax totemconfig: Fix logging of freed string logsys: Support hires timestamp logsys: Make hires timestamp default configure: move to AC_COMPILE_IFELSE main: Move sched paramaters to config file main: Replace COROSYNC_MAIN_CONFIG_FILE main: Remove COROSYNC_TOTEM_AUTHKEY_FILE man: Describe nodelist.node.name properly main: Remove COROSYNC_RUN_DIR init: Fix init script to work with containers stats: Fix delete of track notifyd: Delete registered tracking keys Jan
Re: [ClusterLabs] Antw: VirtualDomain & parallel shutdown
Hi Ulrich, The stop timeout needs te quite big for an obvius reason called "System is updating, do not turn it off". The main question here is why this slow shutdown would prevent the other VMs from being shut down? Regards, On 20.11.2018 11:54:58 Ulrich Windl wrote: > >>> Klechomir schrieb am 20.11.2018 um 11:40 in Nachricht > > <12860117.ByXx81i3mo@bobo>: > > Hi list, > > Bumped onto the following issue lately: > > > > When ultiple VMs are given shutdown right one‑after‑onther and the > > shutdown > > of > > > the first VM takes long, the others aren't being shut down at all before > > the > > > > > > first doesn't stop. > > I don't quite understand: WHen the stop timeout for a VM expired, the > cluster takes measures, or in the Xen PV case the VM is terminated the hard > way. > > "batch‑limit" doesn't seem to affect this. > > Any suggestions why this could happen? > > I know of a market-leader software that needs more then five minutes to shut > down when it's doing nothing before and during the shutdown (no I/O, no CPU > usage)... ;-) > Meaning: Software bugs? > > Regards, > Ulrich > > > Best regards, > > Klecho > > ___ > > Users mailing list: Users@clusterlabs.org > > https://lists.clusterlabs.org/mailman/listinfo/users > > > > Project Home: http://www.clusterlabs.org > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > Bugs: http://bugs.clusterlabs.org > > ___ > Users mailing list: Users@clusterlabs.org > https://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Antw: VirtualDomain & parallel shutdown
>>> Klechomir schrieb am 20.11.2018 um 11:40 in Nachricht <12860117.ByXx81i3mo@bobo>: > Hi list, > Bumped onto the following issue lately: > > When ultiple VMs are given shutdown right one‑after‑onther and the shutdown of > > the first VM takes long, the others aren't being shut down at all before the > > first doesn't stop. I don't quite understand: WHen the stop timeout for a VM expired, the cluster takes measures, or in the Xen PV case the VM is terminated the hard way. > > "batch‑limit" doesn't seem to affect this. > Any suggestions why this could happen? I know of a market-leader software that needs more then five minutes to shut down when it's doing nothing before and during the shutdown (no I/O, no CPU usage)... ;-) Meaning: Software bugs? Regards, Ulrich > > Best regards, > Klecho > ___ > Users mailing list: Users@clusterlabs.org > https://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] VirtualDomain & parallel shutdown
Hi list, Bumped onto the following issue lately: When ultiple VMs are given shutdown right one-after-onther and the shutdown of the first VM takes long, the others aren't being shut down at all before the first doesn't stop. "batch-limit" doesn't seem to affect this. Any suggestions why this could happen? Best regards, Klecho ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Announcing Anvil! m2 v2.0.7
On Tue, 2018-11-20 at 02:25 -0500, Digimer wrote: > * https://github.com/ClusterLabs/striker/releases/tag/v2.0.7 > > This is the first release since March, 2018. No critical issues are > know > or where fixed. Users are advised to upgrade. > Congratulations! Cheers, Kristoffer > Main bugs fixed; > > * Fixed install issues for Windows 10 and 2016 clients. > * Improved duplicate record detection and cleanup in scan-clustat and > scan-storcli. > * Disabled the detection and recovery of 'paused' state servers (it > caused more trouble than it solved). > > Notable new features; > * Improved the server boot logic to choose the node with the most > running servers, all else being equal. > * Updated UPS power transfer reason alerts from "warning" to "notice" > level alerts. > * Added support for EL 6.10. > > Users can upgrade using 'striker-update' from their Striker > dashboards. > > /sbin/striker/striker-update --local > /sbin/striker/striker-update --anvil all > > Please feel free to report any issues in the Striker github > repository. > ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] pcs constraint order set syntax
Hello Chris, Dne 19. 11. 18 v 23:32 Chris Miller napsal(a): Hello, I am attempting to add a resource to an existing ordering constraint set. The system in question came pre-configured with PCS (FreePBX HA), and need to add a resource group (queuemetrics) to the ordering constraint set. Before modification, the existing set is as follows (output from pcs config --full) set mysql httpd asterisk sequential=true (id:mysql-httpd-asterisk) setoptions kind=Optional (id:freepbx-start-order) I'm having issues with the "resource order set" command syntax, specifically with setting options and IDs. Per the man page and help info, the syntax appears as it should be this : pcs constraint order set mysql httpd asterisk queuemetrics sequential=true id=mysql-httpd-asterisk setoptions kind=Optional id=freepbx-start-order according pcs man: * sequential=true id=mysql-httpd-asterisk are options * kind=Optional id=freepbx-start-order are constraint_options Allowed options are: action, require-all, role, sequential. So id=mysql-httpd-asterisk is not valid. For constraint_options it is possible to use id. However, `pcs constraint order set` is only for creating constraint set. Unfortunately it is not possible to update a constraint. As a workaround you can delete the constraint and create the new one in one step. Something like this: $ pcs cluster cib temp-cib.xml $ pcs constraint delete freepbx-start-order -f temp-cib.xml $ pcs constraint order set mysql httpd asterisk queuemetrics sequential=true setoptions kind=Optional id=freepbx-start-order -f temp-cib.xml $ pcs cluster cib-push temp-cib.xml However when running this command I receive the following error : Call cib_replace failed (-203): Update does not conform to the configured schema I have also tried variations of this syntax, and the ID option specifically is ignored and a dynamically generated name is used instead. I'm not having any luck finding guidance with this specific issue online. Thanks in advance for your guidance. Chris Ivan ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Antw: Re: Antw: Placing resource based on least load on a node
Am 2018-11-20 09:08, schrieb Michael Schwartzkopff: Am 20.11.18 um 08:57 schrieb Ulrich Windl: Michael Schwartzkopff schrieb am 20.11.2018 um 08:41 in Nachricht : Am 20.11.18 um 08:35 schrieb Bernd: Am 2018-11-20 08:06, schrieb Ulrich Windl: Bernd schrieb am 20.11.2018 um 07:21 in Nachricht : Hi, I'd like to run a certain bunch of cronjobs from time to time on the cluster node (four node cluster) that has the lowest load of all four nodes. The parameters wanted for this system yet to build are * automatic placement on one of the four nodes (i.e., that with the lowest load) * in case a node fails, automatically removed from the cluster * it must only exist a single entity of the cronjob entity running so this really screams for pacemakter being used as foundation. However, I'm not sure how to implement the "put onto node with least load" part. I was thinking to use Node Attributes for that, but I didn't find any solution "out of the box" for this. Furthermore, as load is a highly volatile value, how can one make sure that all cronjobs are run to the end without being moved to a node that possibly meanwhile got a lower load than the one executing the jobs? Hi! Actually I think the last one is the easiest (assuming the cron jobs do not need any resources that are moved): Once a cron job is started, it will run until it ends, whether it's crontab has been moved or not. Despite of that I think cluster software is not ideal when you actually need load-balancing software. Regards, Ulrich The only resource(s) existing would be the cron "runner". The point about load balancing is true, yes... so, any idea what to use instead? Is there already a tool or framework for solving a problem like this available or do I have to start from scratch? Not that I'd be too lazy, but what's the use of reinventing the wheel repeatedly...? ;) Regards, Bernd ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org hi, I solved this problem years ago. I used the utilization attribute. But you can use any attribute. You have to write an agent that measures the CPU load every X minutes and updates the attribute. Now you just have to add a location constraint, that starts the resource on the node with the "best" attribute value. The "best" could be lowest CPU usage or most free RAM or whatever you want. The disadvantage of this solution is that the cluster (i.e. pacemaker) has to recalculate the scores every time you update your attribute. That causes additional load. If you have many resources the interdepend that additional load may be not negligible. Hi! Question on this: Is the cluster clever to check only updates of attributes that some rule actually uses, or does it re-evaluate everything when any attribute changed? Everytime. That is what causes the load. My thought was to update a variable stored per node which contains the value of the system load avg over the last 15 minutes, which is extremely easy to gather. Based on that, crmd could to its job. Every ten minutes would be more than sufficient, as it's not a real cluster needed here. (Well, this seems to be an extreme rare use case, though.) Bernd ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Antw: Re: Antw: Placing resource based on least load on a node
Am 20.11.18 um 08:57 schrieb Ulrich Windl: Michael Schwartzkopff schrieb am 20.11.2018 um 08:41 in Nachricht > : >> Am 20.11.18 um 08:35 schrieb Bernd: >>> Am 2018-11-20 08:06, schrieb Ulrich Windl: >>> Bernd schrieb am 20.11.2018 um 07:21 in >>> Nachricht : > Hi, > > I'd like to run a certain bunch of cronjobs from time to time on the > cluster node (four node cluster) that has the lowest load of all four > nodes. > > The parameters wanted for this system yet to build are > > * automatic placement on one of the four nodes (i.e., that with the > lowest load) > > * in case a node fails, automatically removed from the cluster > > * it must only exist a single entity of the cronjob entity running > > so this really screams for pacemakter being used as foundation. > > However, I'm not sure how to implement the "put onto node with least > load" part. I was thinking to use Node Attributes for that, but I > didn't > find any solution "out of the box" for this. Furthermore, as load is a > highly volatile value, how can one make sure that all cronjobs are run > to the end without being moved to a node that possibly meanwhile got a > lower load than the one executing the jobs? Hi! Actually I think the last one is the easiest (assuming the cron jobs do not need any resources that are moved): Once a cron job is started, it will run until it ends, whether it's crontab has been moved or not. Despite of that I think cluster software is not ideal when you actually need load-balancing software. Regards, Ulrich >>> The only resource(s) existing would be the cron "runner". >>> >>> The point about load balancing is true, yes... so, any idea what to >>> use instead? Is there already a tool or framework for solving a >>> problem like this available or do I have to start from scratch? Not >>> that I'd be too lazy, but what's the use of reinventing the wheel >>> repeatedly...? ;) >>> >>> Regards, >>> >>> Bernd >>> ___ >>> Users mailing list: Users@clusterlabs.org >>> https://lists.clusterlabs.org/mailman/listinfo/users >>> >>> Project Home: http://www.clusterlabs.org >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org >> >> hi, >> >> >> I solved this problem years ago. I used the utilization attribute. But >> you can use any attribute. You have to write an agent that measures the >> CPU load every X minutes and updates the attribute. Now you just have >> to add a location constraint, that starts the resource on the node with >> the "best" attribute value. The "best" could be lowest CPU usage or most >> free RAM or whatever you want. >> >> >> The disadvantage of this solution is that the cluster (i.e. pacemaker) >> has to recalculate the scores every time you update your attribute. That >> causes additional load. If you have many resources the interdepend that >> additional load may be not negligible. > Hi! > > Question on this: Is the cluster clever to check only updates of attributes > that some rule actually uses, or does it re-evaluate everything when any > attribute changed? > Everytime. That is what causes the load. signature.asc Description: OpenPGP digital signature ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org