[elasticluster] Shorten configuration time

2018-05-19 Thread Orxan Shibliyev
Hi Elasticluster spent nearly two hours for configuration of a cluster with 37 nodes. Considering that I am going to use a 1000-node cluster this means a lot time hence money for just configuration. Is there a way to speed up the configuration time? Or is it possible to skip some installations to

Re: [elasticluster] Shorten configuration time

2018-05-22 Thread Orxan Shibliyev
> > Start your large cluster from node snapshots > I already use a custom image but I don't differentiate between frontend and compute nodes hence, they both use the same custom image (or snapshot, assuming they are basically the same thing). Use larger nodes Unfortunately, multi-core nodes

Re: [elasticluster] SLURM is not installed after cluster setup

2018-02-04 Thread Orxan Shibliyev
Initially permissions were like this: drwxrwxr-x 2 orhan orhan 4096 Şub 3 21:24 /home/orhan/.ansible drwxrwxr-x 3 orhan orhan 4096 Şub 4 16:15 /home/orhan/.elasticluster drwx-- 2 orhan orhan 4096 Oca 29 19:57 /home/orhan/.ssh After the commands it became: drwxrwxrwx 2 orhan orhan 4096 Şub

Re: [elasticluster] SLURM is not installed after cluster setup

2018-02-04 Thread Orxan Shibliyev
The `sudo` issue is solved but [Errno 13] is still there. Output is attached. Orhan On Sun, Feb 4, 2018 at 2:31 PM, Riccardo Murri <riccardo.mu...@gmail.com> wrote: > 2018-02-04 12:15 GMT+01:00 Orxan Shibliyev <orxan.shi...@gmail.com>: > > The second command gave: > &

[elasticluster] sinfo gives wrong wrong number of nodes after resize

2018-02-04 Thread Orxan Shibliyev
Hi Initially, I made one front end and two compute nodes. In front end, `sinfo` reported number of nodes as two. Then I added five more compute nodes with `./elasticluster.sh resize -a 5:compute slurm-on-gce`. As expected, I got the compute nodes however, in front end, `sinfo` gives the same

Re: [elasticluster] SLURM sbatch error

2018-04-19 Thread Orxan Shibliyev
Your test does not work for me. Restarting SLURM does not help. Base OS is Debian GNU/Linux 9.4 (stretch). I get errors related to lmod TASK [lmod : Is installation directory writable?]

[elasticluster] SLURM sbatch error

2018-04-19 Thread Orxan Shibliyev
Hi My the very same `sbatch` script gave error after `sbatch submit.sh`: *Error message: * sbatch: error: Batch job submission failed: Invalid account or account/partition combination specified *submit.sh:* #!/bin/bash# SBATCH --nodes=3-3 #SBATCH --ntasks=3 #SBATCH -t 10:00:0 #SBATCH

[elasticluster] ERR_CONNECTION_REFUSED

2018-04-16 Thread Orxan Shibliyev
I have been using elasticluster with great pleasure. After dist upgrade I wanted to install elasticluster again. I have an issue which I somehow solved in the first installation. But this time I don't remember how I did it. I think this is related to google but after some research I am still

[elasticluster] Error: Ensure the APT package cache is updated

2018-04-02 Thread Orxan Shibliyev
Until now elasticluster was working perfectly. I have changed nothing but today I got the following error for "./elasticluster.sh start slurm-on-gce". What is the problem? ... TASK [common : Ensure the APT package cache is updated]

[elasticluster] Different flavors for frontend and compute nodes

2018-12-18 Thread Orxan Shibliyev
Is it possible to take different flavors for frontend and compute nodes? I want high memory machine for frontend and lower memory for computes. I use GCE machines. -- You received this message because you are subscribed to the Google Groups "elasticluster" group. To unsubscribe from this group

Re: [elasticluster] Different flavors for frontend and compute nodes

2018-12-18 Thread Orxan Shibliyev
ster/gridengine]* > ... > *frontend_nodes=1* > *compute_nodes=3* > > # Compute node section > *[cluster/gridengine/compute]* > *flavor=n1-highcpu-2* > ... > > # Frontend node section > *[cluster/gridengine/frontend]* > *flavor=n1-standard-64* > ... > > On Tue,

[elasticluster] Re: SLURM: Unable to contact slurm controller

2018-12-20 Thread Orxan Shibliyev
Please disregard my previous post. I didn't even construct cluster but just one instance. Sorry for taking time. On Thu, Dec 20, 2018 at 1:44 PM Orxan Shibliyev wrote: > For some reason I get "Unable to contact slurm controller (connect > failure)" for any SLURM command. I con

[elasticluster] SLURM: Unable to contact slurm controller

2018-12-20 Thread Orxan Shibliyev
For some reason I get "Unable to contact slurm controller (connect failure)" for any SLURM command. I constructed cluster as usual but this time it gives the mentioned error. What could be the reason? -- You received this message because you are subscribed to the Google Groups "elasticluster"

[elasticluster] Elasticluster copies files before job submission

2018-12-21 Thread Orxan Shibliyev
When I run a job by setting number of nodes to 1 and number of tasks to 1 as well, naturally, only compute001 runs the job. Then I run 8-node job and I see that output of first job which ran on compute001 are also available on other nodes. Does elasticluster copy files among compute nodes before

Re: [elasticluster] Elasticluster copies files before job submission

2018-12-21 Thread Orxan Shibliyev
So when a node produce a file, the file will be copied to all other nodes, right? What if nodes produce a file with same name but different content? Which file will be read by a node? On Fri, Dec 21, 2018 at 3:57 PM Riccardo Murri wrote: > Hello Orhan, > > > When I run a job by setting number