Re: [elasticluster] resizing SLURM clusters (was: ssh mycluster problem)

2017-01-23 Thread Riccardo Murri
you need to keep the indentation! (For the full story, read: https://github.com/gc3-uzh-ch/elasticluster/issues/304 ) Ciao, R -- Riccardo Murri, Schwerzenbacherstrasse 2, CH-8606 Nänikon, Switzerland -- You received this message because you are subscribed to the Google Groups "elasticlus

Re: [elasticluster] resizing SLURM clusters (was: ssh mycluster problem)

2017-01-23 Thread Riccardo Murri
Dear Ana: the `image_userdata` parameter is only used when *starting new VMs*. Did you destroy (stop) the cluster and make a new one afresh? Ciao, R -- You received this message because you are subscribed to the Google Groups "elasticluster" group. To unsubscribe from this group and stop

[elasticluster] using a Ubuntu 16.04 "xenial" VM (was: resizing SLURM clusters)

2017-01-24 Thread Riccardo Murri
d-Upgrade "0"; __EOF__ * save the running VM as a snapshot * use that snapshot in the ElastiCluster config, in stead of the official image Hope this helps, R -- Riccardo Murri, Schwerzenbacherstrasse 2, CH-8606 Nänikon, Switzerland -- You received this message because you ar

Re: [elasticluster] resizing SLURM clusters (was: ssh mycluster problem)

2017-01-24 Thread Riccardo Murri
Dear Ana: > Yes, I did stop the cluster and started new one. Hmmm... then going back to the error message you posted last, I see two distinct issues: 1. SSH keys mismatch (not a fatal error):: fatal: [compute001]: FAILED! => {... "module_stderr": " ... WARNING: REMOTE HOST

[elasticluster] resizing SLURM clusters (was: ssh mycluster problem)

2017-01-19 Thread Riccardo Murri
size -r` removes the nodes immediately starting with the highest-numbered ones, regardless of whether they are running any jobs.) Ciao, R -- Riccardo Murri, Schwerzenbacherstrasse 2, CH-8606 Nänikon, Switzerland -- You received this message because you are subscribed to the Google Groups &quo

Re: [elasticluster] ssh mycluster problem

2017-01-16 Thread Riccardo Murri
lation should have the EC2 compatibility layer installed; you should see the `ec2-api-metadata` service running and listening to port 8788 or 8789 if it does.) Are you able to (1) start a VM with the same keypair and VM image used by ElastiCluster and (2) ssh into it from the command-line? Ci

[elasticluster] Re: Ansible 2.2.1 now required for security reasons

2017-01-16 Thread Riccardo Murri
VM images, users with access to the same UNIX account used by Ansible, etc) then you can download the release candidate Ansible code from: http://releases.ansible.com/ansible/ Ciao, R -- Riccardo Murri, Schwerzenbacherstrasse 2, CH-8606 Nänikon, Switzerland -- You received this message beca

[elasticluster] Re: Ansible 2.2.1 now required for security reasons

2017-01-16 Thread Riccardo Murri
Ok, *now* the Ansible package with fixed code has landed on PyPI, so ElastiCluster follows suit and requires Ansible>=2.2.1.0 Sorry for the confusion! Ciao, R -- You received this message because you are subscribed to the Google Groups "elasticluster" group. To unsubscribe from this group and

[elasticluster] Re: Drop support for Python 2.6?

2016-09-16 Thread Riccardo Murri
Dear ElastiCluster users, On 5 September 2016 at 14:21, Riccardo Murri <riccardo.mu...@gmail.com> wrote: > > I'm still aiming at supporting Py 2.6 until (and including) the 1.4 > release of ElastiCluster (scheduled for Nov. 2016), but I would like to > drop it beginning of 20

Re: [elasticluster] Update: Microsoft Azure support in Elasticluster

2016-09-20 Thread Riccardo Murri
iCluster version 1.4, planned for end of November. Thanks, Riccardo -- Riccardo Murri, Schwerzenbacherstrasse 2, CH-8606 Nänikon, Switzerland -- You received this message because you are subscribed to the Google Groups "elasticluster" group. To unsubscribe from this group and stop receivi

[elasticluster] "hpc-common" role - looking for suggestions

2017-08-03 Thread Riccardo Murri
Hello, since a couple of days, all compute clusters installed with ElastiCluster run a "hpc-common" Ansible role, the purpose of which is to install software and configuration that people would normally expect on a HPC-oriented cluster. At the moment, the role is quite bare-bones and only

Re: [elasticluster] error "the field 'port' has an invalid value"

2017-08-16 Thread Riccardo Murri
Hi Pablo, > thanks for your help. The error appeared in the "Gathering Facts" task in > ansible. What version of Ansible are you using? Ciao, R -- Riccardo Murri / Email: riccardo.mu...@gmail.com / Tel.: +41 77 458 98 32 -- You received this message because you are subscrib

Re: [elasticluster] error "the field 'port' has an invalid value"

2017-08-16 Thread Riccardo Murri
nor Ansible are to blame -- ElastiCluster's code is :-( Could you please report this issue on GitHub? Ciao, R -- Riccardo Murri / Email: riccardo.mu...@gmail.com / Tel.: +41 77 458 98 32 -- You received this message because you are subscribed to the Google Groups "elasticluster" group. T

Re: [elasticluster] "hpc-common" role - looking for suggestions

2017-08-15 Thread Riccardo Murri
ter.readthedocs.io/en/latest/playbooks.html Good idea, I'll try to split that page into "fundamental" playbooks and "add-on" ones. Ciao, R -- Riccardo Murri / Email: riccardo.mu...@gmail.com / Tel.: +41 77 458 98 32 -- You received this message because you are subscribed t

[elasticluster] how to configure OpenStack authentication?

2017-08-15 Thread Riccardo Murri
ing anyone? Thanks for any comments! [1]: https://specs.openstack.org/openstack/openstack-specs/specs/clouds-yaml-support.html Ciao, R -- Riccardo Murri / Email: riccardo.mu...@gmail.com / Tel.: +41 77 458 98 32 -- You received this message because you are subscribed to the Google Groups &quo

Re: [elasticluster] Run startup script

2017-07-14 Thread Riccardo Murri
you create your own VM image/snapshot, which includes the startup script and calls it from `/etc/rc.local` or a similar boot-level script. Ciao, R -- Riccardo Murri / Email: riccardo.mu...@gmail.com / Tel.: +41 77 458 98 32 -- You received this message because you are subscribed to the Google

Re: [elasticluster] Run startup script

2017-07-20 Thread Riccardo Murri
'module' object has no attribute 'get_configurator' > > I'm guessing `get_configurator()` is from an older version > elasticluster? Yes, the configuration parser was refactored and that function removed in commit 94859c4 on 2016-10-12. I'll check what the `cluster_monitor.sh` code is d

[elasticluster] survey: ElastiCluster uses?

2017-04-24 Thread Riccardo Murri
email me or the list! (e.g.: "using ElastiCluster for creating short-lived Hadoop clusters (size 8 to 20 nodes) for running one-off analytics tasks", or "using ElastiCluster to create clusters for teaching", etc.) Thanks, R [*]: https://indico.cern.ch/event/595396/contributions/25

Re: [elasticluster] No image found with ID

2017-08-09 Thread Riccardo Murri
For all those interested: this issue is being discussed on GitHub: https://github.com/gc3-uzh-ch/elasticluster/issues/473 Please follow-up there. Ciao, R -- You received this message because you are subscribed to the Google Groups "elasticluster" group. To unsubscribe from this group and stop

Re: [elasticluster] No image found with ID

2017-08-04 Thread Riccardo Murri
Hi Pablo, It is probably an issue with the code: I renamed an attribute 'nova_api_version' into 'compute_api_version' but possibly missed an occurrence. Or Maybe it's the saved state of the cluster that should converted. Anyway, going back in the sources to a couple days ago should fix it. I'll

Re: [elasticluster] "hpc-common" role - looking for suggestions

2017-08-18 Thread Riccardo Murri
Hi all, many thanks for the feedback! So it looks like the consensus is for making more features opt-in. Good to know :-) I'll work towards that (as usual, it may take a while). If you already have any patches for this, please feel free to submit a PR on GitHub! Ciao, R -- Riccardo Murri

[elasticluster] disable SELinux?

2017-08-18 Thread Riccardo Murri
m/gc3-uzh-ch/elasticluster/issues/480 Ciao, R -- Riccardo Murri / Email: riccardo.mu...@gmail.com / Tel.: +41 77 458 98 32 -- You received this message because you are subscribed to the Google Groups "elasticluster" group. To unsubscribe from this group and stop receiving emails

[elasticluster] ElastiCluster can now deploy Kubernetes

2017-05-17 Thread Riccardo Murri
-- Riccardo Murri http://www.s3it.uzh.ch/about/team/#Riccardo.Murri S3IT: Services and Support for Science IT University of Zurich Winterthurerstrasse 190, CH-8057 Zürich (Switzerland) Tel: +41 44 635 4208 Fax: +41 44 635 6888 -- You received this message because you are subscribed to the Google

Re: [elasticluster] Submitting Jobs in Google Cloud Grid Engine Cluster

2017-06-22 Thread Riccardo Murri
Engine users mailing list is a good place to ask questions on Grid Engine: https://gridengine.org/mailman/listinfo/users Hope this helps, Riccardo -- Riccardo Murri, Schwerzenbacherstrasse 2, CH-8606 Nänikon, Switzerland -- You received this message because you are subscribed to the Google Grou

Re: [elasticluster] elasticluster fails to create ubuntu/grid engine cluster.

2017-05-25 Thread Riccardo Murri
he GE variant provided by the base OS. Perhaps something has changed upstream (e.g., location of `act_qmaster`) which broke the playbooks. I'm on vacation now, but I will back to my office on Monday and can have a look if you have not solved the issue by then. Ciao, R -- Riccardo Murri, Schwerzenba

Re: [elasticluster] libvirt provider errors

2017-06-19 Thread Riccardo Murri
Hello John-Paul, is this still an issue? I've copied it to ElastiCluster's GitHub issue tracking page: https://github.com/gc3-uzh-ch/elasticluster/issues/452 Sorry for the late reply! Ciao, R -- You received this message because you are subscribed to the Google Groups "elasticluster" group.

Re: [elasticluster] automatic install of slurm

2017-09-21 Thread Riccardo Murri
Hello Alexander, welcome to ElastiCluster :-) Yes, ElastiCluster does automatically provision and configure a cluster with SLURM (or other systems, see list at [1]): after you've done the initial configuration, you should be able to get a fully-functional SLURM cluster on EC2/GCE/OpenStack/etc.

Re: [elasticluster] automatic install of slurm

2017-09-21 Thread Riccardo Murri
2017-09-21 22:36 GMT+02:00 Alex : > so as long as i follow the ElastiCluster > setup instructions i won't have to do anything else? (ie i won't have to > actually install slurm separately and configure it) Yes, exactly. Cheers, Riccardo -- You received this

Re: [elasticluster] global name 'region' is not defined

2017-10-16 Thread Riccardo Murri
Hello, Traceback (most recent call last): File ".../elasticluster/cluster.py", line 499, in _start_node node.start() File ".../elasticluster/cluster.py", line 1141, in start **self.extra) File ".../elasticluster/providers/ec2_boto.py", line 197, in start_instance connection =

[elasticluster] anyone still needing Python 2.6?

2017-11-17 Thread Riccardo Murri
Hello, supporting Python 2.6 is becoming next to impossible; even Paramiko and Pip have dropped support for it. Is there anyone using ElastiCluster that still needs Python 2.6 compatibility? (and cannot use other options, like a non-system Python 2.7 or Docker?) Ciao, R -- Riccardo Murri

Re: [elasticluster] AWS error of " host not reachable within 5 seconds"

2018-05-07 Thread Riccardo Murri
Hello Samy, any success in using the "official" OS images with ElastiCluster? Ciao, R -- You received this message because you are subscribed to the Google Groups "elasticluster" group. To unsubscribe from this group and stop receiving emails from it, send an email to

Re: [elasticluster] AWS error of " host not reachable within 5 seconds"

2018-04-27 Thread Riccardo Murri
Hello, > I tried again and I'm getting different errors. Please see the attached log. > > Do you recommend another OS version? I'm almost giving up. The AMI you're using seems to lack the package `lua-devel`, which however *is* part of a standard CentOS7 (see e.g.

Re: [elasticluster] AWS error of " host not reachable within 5 seconds"

2018-04-28 Thread Riccardo Murri
Hello, > So yes, I would suggest you switch to a "standard" AMI; Important: the AMI ID is recorded in the cluster upon creation -- so changing the AMI in the configuration file is not enough. You need to destroy a cluster (`elasticluster stop -y myaws`) and then re-create it. Ciao, R -- You

Re: [elasticluster] elasticluster setup mycluster

2018-05-08 Thread Riccardo Murri
Hi Champak, You likely forgot to source the `openrc` file before running Elasticluster. Ciao, R Il mar 8 mag 2018, 19:35 Champak Reddy ha scritto: > Hi Riccardo, > > I was trying to update my cluster with the "setup" command and I get the > errors below. > *$

Re: [elasticluster] Shorten configuration time

2018-05-22 Thread Riccardo Murri
Hi Orxan, all, > Elasticluster spent nearly two hours for configuration of a cluster with 37 > nodes. Yes, this is definitely a pain point with ElastiCluster/Ansible ATM. I'll try to summarize the issue and give some suggestions here. My rule of thumb for time it takes to set up a basic SLURM

Re: [elasticluster] AWS error of " host not reachable within 5 seconds"

2018-04-25 Thread Riccardo Murri
Hello, it looks like the system gets into trouble when installing package `moreutils`:: failed: [compute004] (item=moreutils) => {"changed": true, "failed": true, "item": "moreutils", "msg": "Error: Package: moreutils-0.49-2.el7.x86_64 (epel)\n Requires: perl(IPC::Run)\n", "rc": 1,

Re: [elasticluster] Mounting storage volume - wrong fs type, bad option, bad superblock on ..

2018-01-08 Thread Riccardo Murri
As far as I can see, OpenStack provisions "raw" devices as volumes. The Ansible "filesystem" module may help you with this: http://docs.ansible.com/ansible/latest/filesystem_module.html Ciao, R -- Riccardo Murri / Email: riccardo.mu...@gmail.com / Tel.: +41 77 458 98 32 --

Re: [elasticluster] Connection timeout on mounting /home mount.nfs

2018-01-08 Thread Riccardo Murri
ou please run the following commands on the compute nodes to check:: rpcinfo -p localhost rpcinfo -p frontend001 showmount -e frontend001 Thanks, R -- Riccardo Murri / Email: riccardo.mu...@gmail.com / Tel.: +41 77 458 98 32 -- You received this message because you are subscribed

[elasticluster] anyone willing to test ElastiCluster in Docker?

2018-01-13 Thread Riccardo Murri
Dear fellow EasltiClusters, Would you have time to test the new [elasticluster.sh][1] installation-free script? Just download it anywhere on your path, and use it like the regular ElastiCluster:: wget -O elasticluster.sh

Re: [elasticluster] SLURM is not installed after cluster setup

2018-02-04 Thread Riccardo Murri
Ah, no, wait! You should *not* run `elasticluster.sh` through `sudo`! Otherwise you'll be running as root in your home directory, which screws all permissions up... Can you please run the following commands? # fix permissions chown -R $USER $HOME/.ansible $HOME/.ssh $HOME/.elasticluster

Re: [elasticluster] SLURM is not installed after cluster setup

2018-02-04 Thread Riccardo Murri
Dear Orxan, there seems to be an error with the Docker image; according to this log line, the Ansible configuration system did not run at all: ansible.errors.AnsibleError: Unable to create local directories(/home/.ansible/tmp): [Errno 13] Permission denied: '/home/.ansible' This is

Re: [elasticluster] SLURM is not installed after cluster setup

2018-02-04 Thread Riccardo Murri
2018-02-04 12:15 GMT+01:00 Orxan Shibliyev : > The second command gave: > > orhan@orhan-MS-7850:~$ ./elasticluster.sh -vvv start slurm-on-gce > docker: Got permission denied while trying to connect to the Docker daemon > socket at unix:///var/run/docker.sock: Post >

Re: [elasticluster] Another fail

2018-08-07 Thread Riccardo Murri
Hello Orhan, > Sorry for taking your time with these fail messages but I got yet another > one. Output is attached. The issue is that two nodes were interrupted during `dpkg`'s configure step (due to Ubuntu's obnoxious "upgrade on boot" setting), so they refused to install any new software.

[elasticluster] Updated Ansible version: Ansible 2.5+ now required

2018-08-21 Thread Riccardo Murri
Dear all, I have updated ElastiCluster's Ansible playbooks to use the new syntax introduced in Ansible 2.4 and 2.5. This gets rid of the many deprecation warnings that were littering ElastiCluster's output, but also introduces some incompatibility with Ansible 2.3. So: * Ansible 2.5 or 2.6 is

[elasticluster] R Studio Server support in ElastiCluster

2018-08-30 Thread Riccardo Murri
Hello! I've just committed additions to the ElastiCluster playbooks to install "R Studio Server" -- it can be combined with any cluster software, or used stand-alone. See an example config to deploy it on Google Cloud Engine:

[elasticluster] Turning off automatic upgrades

2018-03-08 Thread Riccardo Murri
# ... everything as usual here, plus: global_var_upgrade_packages=no Thanks! Ciao, R -- Riccardo Murri / Email: riccardo.mu...@gmail.com / Tel.: +41 77 458 98 32 -- You received this message because you are subscribed to the Google Groups "elasticluster" group. To unsubscribe from this grou

Re: [elasticluster] Not able to Install development code from GitHub

2018-03-13 Thread Riccardo Murri
nstead of using the Docker image? (see: http://elasticluster.readthedocs.io/en/latest/install.html#quickstart ) Ciao, R -- Riccardo Murri / Email: riccardo.mu...@gmail.com / Tel.: +41 77 458 98 32 -- You received this message because you are subscribed to the Google Groups "elasticlu

[elasticluster] Deploying cluster on "local" Ubuntu machines?

2018-04-09 Thread Riccardo Murri
Hello Shane, I have seen the issue you filed on GitHub; I'm replying here but either forum is equally good. What is not clear to me is: did you install these 20 machines with ElastiCluster? (I'd say no, since you seem not to have an ".elasticluster/config" file...) Or rather did you install all

Re: [elasticluster] Load balancing

2018-04-15 Thread Riccardo Murri
Hello! > First of all thank you very much for such a wonderful software. Glad you like it! And thanks for getting back with questions -- this is how a community is built and grows :-) > 1. Is there any way already implemented for load balancing in Elasticluster? I'm not sure what you mean by

Re: [elasticluster] ERR_CONNECTION_REFUSED

2018-04-16 Thread Riccardo Murri
Hello Orxan, > it. I think this is related to google but after some research I am still > clueless. Basically, I get following. > > Your browser has been opened to visit: > > > > If your browser is on a different machine then exit and re-run this > application with the command-line parameter

Re: [elasticluster] SLURM sbatch error

2018-04-19 Thread Riccardo Murri
Hello Orxan, I cannot reproduce this error; with a freshly-started Ubuntu 16.04 cluster, I get:: ubuntu@frontend001:~$ cat test.sh #! /bin/sh echo hello ubuntu@frontend001:~$ sbatch test.sh Submitted batch job 2 ubuntu@frontend001:~$ cat slurm-2.out hello One caveat: right after building the

Re: [elasticluster] SLURM multi-node/architecture cluster config question

2018-04-20 Thread Riccardo Murri
Hello Champak, > I have successfully setup a multiuser cluster using Elasticluster with > multiple nodes with differing architectures but I haven't been able to run > more than a single job per node. p...] > champost@frontend001:~/sbatch$ for i in `seq 20`; do sbatch -J test_$i -e > test_$i.out

Re: [elasticluster] AWS error of " host not reachable within 5 seconds"

2018-04-23 Thread Riccardo Murri
Hello Samy, I haven't tried to build a cluster with the config you posted but my first guess is that you cannot connect because of firewall rules in EC2: i.e., your default security group does not allow SSH connections from the host you're running ElastiCluster on. (See, e.g., this comment for

Re: [elasticluster] Load balancing

2018-04-16 Thread Riccardo Murri
Hello! 2018-04-15 17:21 GMT+02:00 'Ravi Arya' via elasticluster : > 1. Load balancing: You submit the number of jobs to the cluster and > depending upon the number of jobs, cluster adds or removes the nodes. This > is automatic and is accomplished by running load

Re: [elasticluster] SLURM sbatch error

2018-04-19 Thread Riccardo Murri
Hi Orxan, I tried starting a cluster with your config, but I still cannot reproduce the error: * the only differences are: - I only started 1 compute node - I used dthe public ebian-9-stretch-v20180401 as a base OS since I cannot use the image-23 snapshot from your project - I use my own

Re: [elasticluster] Error: Ensure the APT package cache is updated

2018-04-02 Thread Riccardo Murri
Probably a network glitch. Can you please try again and open an issue on GitHub if the problem persists? Ciao, Riccardo -- You received this message because you are subscribed to the Google Groups "elasticluster" group. To unsubscribe from this group and stop receiving emails from it, send an

Re: [elasticluster] Re: Status of Azure support in Elasticluster

2018-03-20 Thread Riccardo Murri
Dear all, I have just merged the new Azure provider into the "master" branch; this means the Docker image will get working Azure support in a few minutes. The configuration information has changed; please see

Re: [elasticluster] Not able to Install development code from GitHub

2018-03-22 Thread Riccardo Murri
Hi Champak, > It worked like a charm. I am able to activate/deactivate the virtualenv and > import elasticluster! Excellent! Ciao, R -- You received this message because you are subscribed to the Google Groups "elasticluster" group. To unsubscribe from this group and stop receiving emails

[elasticluster] FAQs or how-to's?

2018-03-22 Thread Riccardo Murri
Hello all, I was thiinking of adding a section to the ElastiCluster documentation, to cover topics that are possibly interesting to users and/ot needed in day-to-day usage of ElastiCluster but do not fit well into the current sections installation/configuration/usage/playbooks. Is there anything

Re: [elasticluster] Different flavors for compute nodes

2018-03-22 Thread Riccardo Murri
Hi Champak, > Is it possible to configure a cluster consisting of several compute nodes > but with different flavours (e.g. made up of 8cpu-64ram-hpc and > 4cpu-16ram-hpc and 1cpu-4ram-hpc flavors). Yes, totally possible! I've just added an example config here:

Re: [elasticluster] AWS error of " host not reachable within 5 seconds"

2018-04-25 Thread Riccardo Murri
Hello, > TASK [common : Upgrade all installed packages to latest version] > fatal: [frontend001]: FAILED! => {"changed": true, "failed": true, "msg": > "https://yum.repos.intel.com/2018/intel-daal-runtime-64bit-2018.2-199.x86_64.rpm: > [Errno 14] curl#18 - \"transfer closed with 138280031 bytes

Re: [elasticluster] SLURM sbatch error

2018-04-25 Thread Riccardo Murri
Hi all, for the record, I'm summarizing here the outcome of the debugging session which went on through private emails. The issue stems from the user name on GCP vs user name on the VM: Google uses the local part of the email address (e.g., in my case `riccardo.murri`) but then the user name on

Re: [elasticluster] AWS error of " host not reachable within 5 seconds"

2018-04-25 Thread Riccardo Murri
> But the cluster didn't get setup. I ran `elasticluster setup mycluster' > various times and it fails. I'm using the same config file. Then there is some other error before which is the root issue. Can you please post the *entire* output of `elasticluster -v setup mycluster` ? Ciao, R -- You

Re: [elasticluster] AWS error of " host not reachable within 5 seconds"

2018-04-25 Thread Riccardo Murri
> Ans this is another error message when the script finishes: Yes, this is just a consequence of the error that you hit before. ElastiCluster is just summarizing the outcome of the setup process; since there were fatal errors before, it warns you that the cluster has not been fully configured.

Re: [elasticluster] AWS error of " host not reachable within 5 seconds"

2018-04-26 Thread Riccardo Murri
Hmmm... you're right, installation of `moreutils` on CentOS 7 was handled separately. Should be fixed now -- can you pls update and check again? Ciao, R -- You received this message because you are subscribed to the Google Groups "elasticluster" group. To unsubscribe from this group and stop

Re: [elasticluster] Cluster creation on Azure fails

2019-01-02 Thread Riccardo Murri
Dear all, 2019 starts with a long-awaited bug fix for Azure users: ElastiCluster's "master" branch now has fixes for all the issues Manuele reported! Please test and let me know if it works (it does for me...) Ciao & happy new year, Riccardo -- You received this message because you are

Re: [elasticluster] Cluster creation on Azure fails

2019-01-14 Thread Riccardo Murri
Hello Manuele, all, > Thanks for the update. After a (long) round of tests I can confirm that > ElastiCluster works with Azure very well now. I'm able to set up and manage > my clusters! Excellent! Thank you very much for taking the time to test! Ciao, R -- You received this message

Re: [elasticluster] Cluster creation on Azure fails

2018-12-12 Thread Riccardo Murri
Hello Manuele, I think the root cause of the problem is the exception thrown by the Azure code -- the malformed YAML is just a consequence of that. I'll take a look in the coming days; right now, my Azure account is non-functional. Thanks for reporting this issue! Riccardo -- You received

Re: [elasticluster] Re: SLURM: Unable to contact slurm controller

2018-12-21 Thread Riccardo Murri
> Please disregard my previous post. I didn't even construct cluster but > just one instance. Sorry for taking time. > All is well that ends well ;-) Ciao, R -- You received this message because you are subscribed to the Google Groups "elasticluster" group. To unsubscribe from this group and

Re: [elasticluster] Elasticluster copies files before job submission

2018-12-21 Thread Riccardo Murri
Hello Orhan, > When I run a job by setting number of nodes to 1 and number of tasks to 1 as > well, naturally, only compute001 runs the job. Then I run 8-node job and I > see that output of first job which ran on compute001 are also available on > other nodes. Does elasticluster copy files

Re: [elasticluster] Elasticluster copies files before job submission

2018-12-21 Thread Riccardo Murri
Hello Orhan, > So when a node produce a file, the file will be copied to all other nodes, > right? Almost. All files in the home directories actually reside on the front-end node; whenever you try to read or write to a file in the home directory, this is transparently routed to the front-end

[elasticluster] ElastiCluster supports Apache BigTop 1.3.0

2018-11-22 Thread Riccardo Murri
Dear ElastiCluster users, I'm pleased to announce that ElastiCluster now can install Hadoop 2.8.4 and Spark 2.2.1, through the Apache BigTop 1.3.0 (released yesterday). If you want to install the newer Hadoop/Spark, be sure to pull the latest ElastiCluster from the "master" branch. Ciao, R --

[elasticluster] PBSPro support

2019-01-13 Thread Riccardo Murri
Hello! ElastiCluster now has support for installing PBSPro [1], albeit only on CentOS7 (since no other distribution is packaged upstream); I'll be glad for any testing or feedback. [1]: http://www.pbspro.org/ Ciao, R -- You received this message because you are subscribed to the Google Groups

Re: [elasticluster] How to use --limit in retry?

2019-04-02 Thread Riccardo Murri
> So to sum up: it is misleading that the output from the ansible task suggests > you to rerun with the --limit option, as it can not be used with > elasticluster? Yes, exactly. (But there is no way to prevent Ansible from printing that message, AFAIK.) > If a task fails and the cluster is

Re: [elasticluster] How to use --limit in retry?

2019-04-02 Thread Riccardo Murri
Hello Maike, > How is this exactly supposed to be applied? It isn't. Ansible's `--limit` option will limit also the information gathering to the listed hosts, thus breaking ElastiCluster's playbooks which rely on a "global view" of the cluster (i.e., much of the info is taken dynamically from

Re: [elasticluster] Resize - fail on task lmod : Download sources - recommendations for resizing

2019-04-02 Thread Riccardo Murri
Hello Maiken, > fatal: [frontend001]: FAILED! => {"changed": false, "failed": true, "msg": > "Failed to validate the SSL certificate for github.com:443. Make sure your > managed systems have a valid CA certificate installed. You can use > validate_certs=False if you do not need to confirm the

[elasticluster] Python3 support now in "master"

2019-04-04 Thread Riccardo Murri
Dear all, it's a pleasure to announce that ElastiCluster can now officially run on Python3! Many thanks to Yaroslav Halchenko for reviewing the code, suggesting and implementing improvements. The Docker-based installation stays based on Python 2.7 until Python 3 has received significant testing

[elasticluster] Python3 code for ElastiCluster ready for testing

2019-03-29 Thread Riccardo Murri
Hello all! I'm pleased to announce that Py3-compatible code of ElastiCluster is now available in pull request https://github.com/gc3-uzh-ch/elasticluster/pull/623 The code passes all tests on Travis CI for Python 3.5, 3.6, and 3.7; and I could successfully start clusters on OpenStack, GCP and

Re: [elasticluster] Error in cluster resize

2019-02-01 Thread Riccardo Murri
Dear Narcis, can you please update your ElastiCluster and try with the latest "master" version? The package signing key for Ubuntu had indeed changed, this is corrected in the latest commit I pushed. Ciao, R -- You received this message because you are subscribed to the Google Groups

Re: [elasticluster] elasticluster Setup with CENTOS

2019-04-09 Thread Riccardo Murri
Dear Alex, > TASK [lmod : Install Lmod from the OS repository (Debian-compatible)] > ** > task path:

Re: [elasticluster] Error when setting up clusterfs

2019-04-09 Thread Riccardo Murri
Dear Alex, > TASK [glusterfs-server : include_tasks] > > task path: >

Re: [elasticluster] version number inconsistent with state=latest: lmod=7.0*

2019-04-08 Thread Riccardo Murri
Hello Alex, what version of ElastiCluster are you running? Installed from the Python sources or using the elasticluster.sh script / Docker container? Ciao, R -- You received this message because you are subscribed to the Google Groups "elasticluster" group. To unsubscribe from this group and

[elasticluster] SLURM 18.08 now default on RHEL/CentOS 7 clusters

2019-04-15 Thread Riccardo Murri
Dear all, Thanks to @verdurin and his work on providing COPR repositories with SLURM packages, SLURM 18.08 is now the default version of SLURM installed by ElastiCluster on RHEL/CentOS 7. Older versions are selectable with the `slurm_version` global variable. Clusters based on Debian or Ubuntu

Re: [elasticluster] elasticluster resize -r - become_user is not a valid attribute for TaskInclude

2019-05-27 Thread Riccardo Murri
Hello Maiken, all, > ERROR! 'become_user' is not a valid attribute for a TaskInclude > > The error appears to be in > '/home/centos/elasticluster_20190520/src/elasticluster/share/playbooks/roles/ceph/tasks/mgr.yml': > line 31, column 3, but may > be elsewhere in the file depending on the exact

Re: [elasticluster] Use of OpenStack Availability zone

2019-05-29 Thread Riccardo Murri
Hello Alex, > How can I specify in the config file, that Elasticluster should use a > specific availability zone for the rollout of the cluster? Which cloud provider/backend? Kind regards, Riccardo -- You received this message because you are subscribed to the Google Groups "elasticluster"

Re: [elasticluster] Use of OpenStack Availability zone

2019-05-29 Thread Riccardo Murri
Hi Alex, ElastiCluster currently has no code to place nodes in a specific availability zone on OpenStack clouds. It would not be difficult to add it, if there is request. I'm curious, however, as to what the use case for availability zones would be, when using ElastiCluster. In my

Re: [elasticluster] Use of OpenStack Availability zone

2019-05-29 Thread Riccardo Murri
Hello Alex, many thanks for the detailed explanation! Can you please submit a bug/feature request at https://github.com/gc3-uzh-ch/elasticluster/issues/new ? FWIW, we use dedicated HPC flavors (as opposed to AZs) to accomplish the same goals on our internal OpenStack cloud. Ciao, R -- You

Re: [elasticluster] Resize - or setup - elasticluster.sh - /usr/bin/eatmydata: No such file or directory\n",

2019-06-14 Thread Riccardo Murri
Hello Maiken, sorry for the late reply - I am on vacation until end of June. Can you please open a bug at https://github.com/gc3-uzh-ch/elasticluster/issues -- I will take a look at it when I'm back. Meanwhile, the workaround is to add: safe_but_slower=yes to your cluster's

Re: [elasticluster] Accessing my organization's AWS AMI's

2019-06-14 Thread Riccardo Murri
Ah perfect I see that Manuele has solved the issue in another email thread :-) -- You received this message because you are subscribed to the Google Groups "elasticluster" group. To unsubscribe from this group and stop receiving emails from it, send an email to

Re: [elasticluster] Accessing my organization's AWS AMI's

2019-06-14 Thread Riccardo Murri
Hello Gabe, looks like a bug -- can you please open an issue report at https://github.com/gc3-uzh-ch/elasticluster/issues ? I am on vacation until end of June but I can take a look at it when I'm back. Anyway: please make sure that the user or IAM profile that you are using with ElastiCluster

Re: [elasticluster] Resize - or setup - elasticluster.sh - /usr/bin/eatmydata: No such file or directory\n",

2019-06-24 Thread Riccardo Murri
Hello Maiken, all, creation of CentOS 7.x and 6.x clusters with the default settings (`slow_but_safer=no`, i.e., use "eatmydata" to speed up operations) should now be fixed in the current "master" branch -- can you please give it a try? Ciao, R -- You received this message because you are

[elasticluster] Future of ElastiCluster: a usage survey

2019-05-16 Thread Riccardo Murri
Dear all, Nearing release 1.3.0, I would like to start discuss what directions ElastiCluster should take in the future, what to prioritize, and in general what the community thinks. A survey on current usage seems to me like a good starting point to have some data to ground discussions on.

Re: [elasticluster] elasticluster + openstack + ipv6 only

2019-05-21 Thread Riccardo Murri
Hello Maiken, > I need to recreate my cluster using ipv6 only for the public adress. Not possible at the moment -- currently, ElastiCluster playbooks assume there is always an IPv4 address available. Ciao, R -- You received this message because you are subscribed to the Google Groups

Re: [elasticluster] elasticluster resize -r - become_user is not a valid attribute for TaskInclude

2019-05-20 Thread Riccardo Murri
Hello Maiken, looks like a syntax error in the playbooks, but I won't be able to check until end of the week. Can you please try with Ansible 2.7 and see if the error goes away? Ciao, R -- You received this message because you are subscribed to the Google Groups "elasticluster" group. To

Re: [elasticluster] elasticluster resize - with new image_id

2019-05-20 Thread Riccardo Murri
Hello Maiken, > I hacked my way through it changing the > .elasticluster/storage/$clustername.yml file with the correct image_id for > the nodes, but I am sure there is a better way! Unfortunately no :-( The way to do it without hacking the `.elasticluster/storage/*.yml` files would be to:

[elasticluster] Publications using ElastiCluster?

2019-05-16 Thread Riccardo Murri
Hello, I am co-authoring a paper on different models for running "massively parallel" experiments on the cloud; we would like to provide additional references and examples of the use of ElastiCluster in actual research usage (in academia or outside). So I'm asking for help from the ElastiCluster

Re: [elasticluster] Re: Resize -

2019-05-07 Thread Riccardo Murri
Hello Maiken, this error seems to come from deep down the Paramiko dependency stack -- can you please try with the latest ElastiCluster Docker image? It was built yesterday night so it should have all the newest packages. Instructions here:

Re: [elasticluster] Re: Resize -

2019-05-07 Thread Riccardo Murri
Hello Maiken, > 1.3.dev13 does not work with paramiko 2.1.1 (or 2.1.2) To me, the error seems to come from PyOpenSSL rather than Paramiko itself? Paramiko ends up using PyOpenSSL through the `cryptography` Python package, so probably the version of all three packages should be compared. I have

Re: [elasticluster] Re: SLURM 18.08 now default on RHEL/CentOS 7 clusters

2019-05-07 Thread Riccardo Murri
Hello Maiken, > This is because I am getting the paramiko problems with the newer version of > elasticluster. But I should give it another try :) I doubt that error is due to the newer ElastiCluster; sounds like a library conflict due to (possibly) old system libraries that cannot be updated.

Re: [elasticluster] Re: SLURM 18.08 now default on RHEL/CentOS 7 clusters

2019-05-07 Thread Riccardo Murri
Hello Maiken, > However, I have updated my cluster to slurm 18.08. > Can I copy over the changes wrt slurm from 1.3.dev13 to my 1.3.dev0 version? In principle, you would only need to apply the patch in this commit: https://github.com/gc3-uzh-ch/elasticluster/commit/20eca47 However, it's

  1   2   3   4   >