Hi folks,
We're seeing what looks like incorrect step accounting in the MySQL
database when we're using preemption and PreemptMode=REQUEUE. When a job
is requeued, a new job_table row is created with a new job_db_inx. Next,
a step_table record is created and uses the new job_db_inx, instead of
On 08/18/2013 11:51 PM, Christopher Samuel wrote:
On 19/08/13 12:02, Christopher Samuel wrote:
Chasing through the code it looks like getgrnam_r() fails in
get_group_members() in src/slurmctld/groups.c on our Slurm 2.6 boxes
(on RHEL 6.4)
Scratch that, restarting slurmctld doesn't provoke the
On 08/05/2013 06:52 PM, Kevin Abbey wrote:
I started using cgroups for control memory usage last week. One user
reported his application takes 4 times longer to complete. I read
elsewhere that cgroup mem. control can reduce performance. Is this
amount realistic?
Is there a more efficient met
p) and 6-16] (which is
>
> actually no node name at all but a wrong parsing after the failure)
>
>
>
> Thanks
>
> Eva
>
>
>
> On Wed, 10 Jul 2013, John Thiltges wrote:
>
>
>
>> On 07/10/2013 06:16 PM, Eva Hocks wrote:
>>> The ent
On 07/10/2013 06:16 PM, Eva Hocks wrote:
> The entry in partiton.conf:
> PartitionName=CLUSTER Default=yes State=UP
> nodes=gpu-[1]-[4-17],gpu-[2]-[4,6-16],gpu-[3]-[9]
>
>
> causes slurmctl to crash:
>
> 2013-07-10T16:03:22.923] error: find_node_record: lookup failure for
> gpu-[2]-[4]
> [2013-0
On 06/19/2013 10:36 AM, Paul Edmon wrote:
> I have a group here that wants to submit a ton of jobs to the queue, but
> want to restrict how many they have running at any given time so that
> they don't torch their fileserver.
The licenses feature might work OK for this. Create a license for the
On 06/18/2013 05:53 PM, Eva Hocks wrote:
> any sacct command returns:
>
> slurmctld.log:[2013-06-18T14:45:57-07:00] error: Association database
> appears down, reading from state file.
In slurm.conf, are you using
"AccountingStorageType=accounting_storage/slurmdbd" ?
If so, check that slurmdbd
I think we've seen this problem as well. When using ThreadsPerCore along
with DefMemPerCpu or --mem-per-cpu, the worker node doesn't set the
memory limit correctly. Setting --mem instead should work OK.
I have a lightly tested patch that appears to address the issue, but
I've only tried it on
On 02/11/2013 12:20 AM, Christopher Samuel wrote:
> One of the nice things in Torque that a number of our users use is
> interactive jobs with X11 forwarding so they can run various codes with
> graphical interfaces on the compute nodes (such as VMD, or MATLAB, etc).
Hi Chris,
To solve this when