Hi
Elasticluster spent nearly two hours for configuration of a cluster with 37
nodes. Considering that I am going to use a 1000-node cluster this means a
lot time hence money for just configuration. Is there a way to speed up the
configuration time? Or is it possible to skip some installations to
>
> Start your large cluster from node snapshots
>
I already use a custom image but I don't differentiate between frontend and
compute nodes hence, they both use the same custom image (or snapshot,
assuming they are basically the same thing).
Use larger nodes
Unfortunately, multi-core nodes
Initially permissions were like this:
drwxrwxr-x 2 orhan orhan 4096 Şub 3 21:24 /home/orhan/.ansible
drwxrwxr-x 3 orhan orhan 4096 Şub 4 16:15 /home/orhan/.elasticluster
drwx-- 2 orhan orhan 4096 Oca 29 19:57 /home/orhan/.ssh
After the commands it became:
drwxrwxrwx 2 orhan orhan 4096 Şub
The `sudo` issue is solved but [Errno 13] is still there. Output is
attached.
Orhan
On Sun, Feb 4, 2018 at 2:31 PM, Riccardo Murri <riccardo.mu...@gmail.com>
wrote:
> 2018-02-04 12:15 GMT+01:00 Orxan Shibliyev <orxan.shi...@gmail.com>:
> > The second command gave:
> &
Hi
Initially, I made one front end and two compute nodes. In front end,
`sinfo` reported number of nodes as two. Then I added five more compute
nodes with `./elasticluster.sh resize -a 5:compute slurm-on-gce`. As
expected, I got the compute nodes however, in front end, `sinfo` gives the
same
Your test does not work for me. Restarting SLURM does not help. Base OS is
Debian GNU/Linux 9.4 (stretch). I get errors related to lmod
TASK [lmod : Is installation directory writable?]
Hi
My the very same `sbatch` script gave error after `sbatch submit.sh`:
*Error message: *
sbatch: error: Batch job submission failed: Invalid account or
account/partition combination specified
*submit.sh:*
#!/bin/bash#
SBATCH --nodes=3-3
#SBATCH --ntasks=3
#SBATCH -t 10:00:0
#SBATCH
I have been using elasticluster with great pleasure. After dist upgrade I
wanted to install elasticluster again. I have an issue which I somehow
solved in the first installation. But this time I don't remember how I did
it. I think this is related to google but after some research I am still
Until now elasticluster was working perfectly. I have changed nothing but
today I got the following error for "./elasticluster.sh start
slurm-on-gce". What is the problem?
...
TASK [common : Ensure the APT package cache is updated]
Is it possible to take different flavors for frontend and compute nodes? I
want high memory machine for frontend and lower memory for computes. I use
GCE machines.
--
You received this message because you are subscribed to the Google Groups
"elasticluster" group.
To unsubscribe from this group
ster/gridengine]*
> ...
> *frontend_nodes=1*
> *compute_nodes=3*
>
> # Compute node section
> *[cluster/gridengine/compute]*
> *flavor=n1-highcpu-2*
> ...
>
> # Frontend node section
> *[cluster/gridengine/frontend]*
> *flavor=n1-standard-64*
> ...
>
> On Tue,
Please disregard my previous post. I didn't even construct cluster but just
one instance. Sorry for taking time.
On Thu, Dec 20, 2018 at 1:44 PM Orxan Shibliyev
wrote:
> For some reason I get "Unable to contact slurm controller (connect
> failure)" for any SLURM command. I con
For some reason I get "Unable to contact slurm controller (connect failure)"
for any SLURM command. I constructed cluster as usual but this time it
gives the mentioned error. What could be the reason?
--
You received this message because you are subscribed to the Google Groups
"elasticluster"
When I run a job by setting number of nodes to 1 and number of tasks to 1
as well, naturally, only compute001 runs the job. Then I run 8-node job and
I see that output of first job which ran on compute001 are also available
on other nodes. Does elasticluster copy files among compute nodes before
So when a node produce a file, the file will be copied to all other nodes,
right? What if nodes produce a file with same name but different content?
Which file will be read by a node?
On Fri, Dec 21, 2018 at 3:57 PM Riccardo Murri
wrote:
> Hello Orhan,
>
> > When I run a job by setting number
15 matches
Mail list logo