[slurm-dev] Mslurm

2013-11-05 Thread Pancorbo, Juan
Hello all, As we told in the SLURM User Group Meeting http://slurm.schedmd.com/SUG13/Mslurm.pdf , here, at Leibniz-Rechenzentrum (LRZ) we are working with several middle to small size clusters managed by slurm. In our case we have all the slurm control daemons and the slurm database daemons

[slurm-dev] Problem with reservations

2013-11-05 Thread Marcin Stolarek
Hi Guys, I'm currently experiencing a problem with reservation. The job have been submitted with appropriate --reservation parameter, the reservation is active and all nodes in reservation are in idle state. Despite of this conditions job remains in pending state. You can find output from

[slurm-dev] RE: Fwd: Failed to contact primary controller : No route to host

2013-11-05 Thread Ludovic Prevost
Hi, Could you try to populate your /etc/hosts like : 130.1.2.205 qdr1 130.1.2.206 qdr2 And try again : $ nc -v qdr1 6818 Best Regards, PREVOST Ludovic NEC HPC Europe De : Arjun J Rao [mailto:rectangle.k...@gmail.com] Envoyé : mardi 5 novembre 2013 08:54 À : slurm-dev

[slurm-dev] RE: Fwd: Failed to contact primary controller : No route to host

2013-11-05 Thread Arjun J Rao
My /etc/hosts alread has those entries. And like I mentioned, I can ping from qdr2 to qdr1. But nc -v qdr1 6818 shows that there is no route. On Tue, Nov 5, 2013 at 9:58 AM, Ludovic Prevost ludovic.prev...@emea.nec.com wrote: Hi, Could you try to populate your /etc/hosts like :

[slurm-dev] Re: Problem with reservations

2013-11-05 Thread Moe Jette
See the function schedule() in src/slurmctld/job_scheduler.c (main scheduling logic) and _attempt_backfill() in src/plugins/sched/backfill/backfill.c (backfill scheduler). Look in both places for calls to the function job_test_resv(). Quoting Marcin Stolarek stolarek.mar...@gmail.com:

[slurm-dev] Oversubscription of GPU resources

2013-11-05 Thread Ulf Markwardt
Dear list, how can I oversubscribe a few of our GPU cards (general resource) so that a certain number of users might share the node AND the card for development purposes. Thanks, Ulf -- ___ Dr. Ulf Markwardt Dresden

[slurm-dev] Re: Slurm not working with gcc HARDEND -- was Re: Starring slurmd on Gentoo Linux

2013-11-05 Thread Daniel M. Weeks
On 10/31/2013 06:33 PM, Olaf Leidinger wrote: Hi Dan, problems you're seeing are hardened-specific. Yes, that's what I suspected. At your third link, one can read: I found your Gentoo bug [2]. Could you please attach the logs from the failed build (with hardened gcc) to the ticket so

[slurm-dev] Re: Oversubscription of GPU resources

2013-11-05 Thread Moe Jette
You would need to configure the GPU(s) multiple times in slurm.conf and gres.conf, but duplicate the name in the gres.conf File option like this: # Configure GPU zero to be allocated twice Name=gpu File=/dev/nvidia0 Name=gpu File=/dev/nvidia0 Quoting Ulf Markwardt

[slurm-dev] Re: Failed to contact primary controller : No route to host

2013-11-05 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 05/11/13 18:52, Arjun J Rao wrote: Now I have tried netstat -lnt on qdr1(130.1.2.205) and it shows this : Proto Recv-QSend-Q LocalAddress ForeignAddress State tcp0 0 0.0.0.0:6817 http://0.0.0.0:6817

[slurm-dev] Re: Failed to contact primary controller : No route to host

2013-11-05 Thread Arjun J Rao
Yes, it was the firewall rules on my Scientific Linux installation. Flushed the iptables using iptables -F and now the slurm daemons talk with the slurm controller just fine. On Tue, Nov 5, 2013 at 11:21 PM, Christopher Samuel sam...@unimelb.edu.auwrote: -BEGIN PGP SIGNED MESSAGE-