[slurm-dev] Re: Removing Job from Slurm Database

2014-04-22 Thread Paul Edmon

Well more like the naive ones namely:

sacctmgr delete job JobID

How do you set the endtime?  Do you do that via scontrol?

-Paul Edmon-


On 04/21/2014 10:14 PM, Danny Auble wrote:

What are the obvious ones?

I would expect setting the end time to the start time and state to 4 
(I think that is a completed state) should do it.





On April 21, 2014 6:54:22 PM PDT, Paul Edmon ped...@cfa.harvard.edu 
wrote:


Sure I can hunt that info down.  So what would be the command to remove
the job from the DB?  I tried the obvious ones I could think of but with
not effect.

-Paul Edmon-

On 4/21/2014 4:31 PM, Danny Auble wrote:

Paul, you should be able to remove the job with no issue. The
real question is why is it still running in the database
instead of completed. If you happen to have any logs on the
job and the information from the database it would be nice to
look at since what you are describing shouldn't be possible. I
know others have seen this before but no one has found a
reproducer yet or any evidence on how the state was achieved.
Let me know if you have anything like this. Thanks, Danny On
04/21/14 13:05, Paul Edmon wrote:

Is there a way to delete a JobID and it's relevant data
from the slurm database? I have a user that I want to
remove but there is a job which slurm thinks is not
complete that is preventing me. I want slurm to just
remove that job data as it shouldn't impact anything.
-Paul Edmon- 





[slurm-dev] Re: Removing Job from Slurm Database

2014-04-22 Thread Danny Auble
Paul I think this was covered in this thread 
https://groups.google.com/forum/#!searchin/slurm-devel/time_start/slurm-devel/nf7JxV91F40/KUsS1AmyWRYJ


The just of it is you have to go into the database and manually update 
the record.


If you know the jobid or the db_inx you can do something like this

update $CLUSTER_job_table set state=3 time_end=time_start where 
time_end=0 and id_job=$JOBID;


That should make it go away from the check.

Knowing why the job didn't finish in the database would be very good to 
know though as this shouldn't happen.


Danny

On 04/22/14 06:48, Paul Edmon wrote:

Well more like the naive ones namely:

sacctmgr delete job JobID

How do you set the endtime?  Do you do that via scontrol?

-Paul Edmon-


On 04/21/2014 10:14 PM, Danny Auble wrote:

What are the obvious ones?

I would expect setting the end time to the start time and state to 4 
(I think that is a completed state) should do it.





On April 21, 2014 6:54:22 PM PDT, Paul Edmon ped...@cfa.harvard.edu 
wrote:


Sure I can hunt that info down.  So what would be the command to remove
the job from the DB?  I tried the obvious ones I could think of but with
not effect.

-Paul Edmon-

On 4/21/2014 4:31 PM, Danny Auble wrote:

Paul, you should be able to remove the job with no issue. The
real question is why is it still running in the database
instead of completed. If you happen to have any logs on the
job and the information from the database it would be nice to
look at since what you are describing shouldn't be possible.
I know others have seen this before but no one has found a
reproducer yet or any evidence on how the state was achieved.
Let me know if you have anything like this. Thanks, Danny On
04/21/14 13:05, Paul Edmon wrote:

Is there a way to delete a JobID and it's relevant data
from the slurm database? I have a user that I want to
remove but there is a job which slurm thinks is not
complete that is preventing me. I want slurm to just
remove that job data as it shouldn't impact anything.
-Paul Edmon- 







[slurm-dev] Re: Removing Job from Slurm Database

2014-04-22 Thread Paul Edmon

Thanks.  Sorry forgot about that thread.

I'm wagering that the jobs got orphaned due to timing out. Essentially 
they actually launched but the didn't successfully update the database 
because it was busy.


-Paul Edmon-

On 04/22/2014 12:15 PM, Danny Auble wrote:
Paul I think this was covered in this thread 
https://groups.google.com/forum/#!searchin/slurm-devel/time_start/slurm-devel/nf7JxV91F40/KUsS1AmyWRYJ


The just of it is you have to go into the database and manually update 
the record.


If you know the jobid or the db_inx you can do something like this

update $CLUSTER_job_table set state=3 time_end=time_start where 
time_end=0 and id_job=$JOBID;


That should make it go away from the check.

Knowing why the job didn't finish in the database would be very good 
to know though as this shouldn't happen.


Danny

On 04/22/14 06:48, Paul Edmon wrote:

Well more like the naive ones namely:

sacctmgr delete job JobID

How do you set the endtime?  Do you do that via scontrol?

-Paul Edmon-


On 04/21/2014 10:14 PM, Danny Auble wrote:

What are the obvious ones?

I would expect setting the end time to the start time and state to 4 
(I think that is a completed state) should do it.





On April 21, 2014 6:54:22 PM PDT, Paul Edmon 
ped...@cfa.harvard.edu wrote:


Sure I can hunt that info down.  So what would be the command to remove
the job from the DB?  I tried the obvious ones I could think of but with
not effect.

-Paul Edmon-

On 4/21/2014 4:31 PM, Danny Auble wrote:

Paul, you should be able to remove the job with no issue.
The real question is why is it still running in the database
instead of completed. If you happen to have any logs on the
job and the information from the database it would be nice
to look at since what you are describing shouldn't be
possible. I know others have seen this before but no one has
found a reproducer yet or any evidence on how the state was
achieved. Let me know if you have anything like this.
Thanks, Danny On 04/21/14 13:05, Paul Edmon wrote:

Is there a way to delete a JobID and it's relevant data
from the slurm database? I have a user that I want to
remove but there is a job which slurm thinks is not
complete that is preventing me. I want slurm to just
remove that job data as it shouldn't impact anything.
-Paul Edmon- 









[slurm-dev] Re: Removing Job from Slurm Database

2014-04-22 Thread Danny Auble


On 04/22/14 09:59, Paul Edmon wrote:

Thanks.  Sorry forgot about that thread.

No problem.


I'm wagering that the jobs got orphaned due to timing out. Essentially 
they actually launched but the didn't successfully update the database 
because it was busy.
The slurmctld should be keeping record of all jobs ending unless the 
list got too full and the slurmctld starting throwing messages  to the 
DBD away, this would be the only way I would expect orphan jobs like 
this to still be around.  You should see lots of messages about this in 
the slurmctld log file if this is the case.  Otherwise a busy 
DBD/database should be handled.


Danny


-Paul Edmon-

On 04/22/2014 12:15 PM, Danny Auble wrote:
Paul I think this was covered in this thread 
https://groups.google.com/forum/#!searchin/slurm-devel/time_start/slurm-devel/nf7JxV91F40/KUsS1AmyWRYJ


The just of it is you have to go into the database and manually 
update the record.


If you know the jobid or the db_inx you can do something like this

update $CLUSTER_job_table set state=3 time_end=time_start where 
time_end=0 and id_job=$JOBID;


That should make it go away from the check.

Knowing why the job didn't finish in the database would be very good 
to know though as this shouldn't happen.


Danny

On 04/22/14 06:48, Paul Edmon wrote:

Well more like the naive ones namely:

sacctmgr delete job JobID

How do you set the endtime?  Do you do that via scontrol?

-Paul Edmon-


On 04/21/2014 10:14 PM, Danny Auble wrote:

What are the obvious ones?

I would expect setting the end time to the start time and state to 
4 (I think that is a completed state) should do it.





On April 21, 2014 6:54:22 PM PDT, Paul Edmon 
ped...@cfa.harvard.edu wrote:


Sure I can hunt that info down.  So what would be the command to remove
the job from the DB?  I tried the obvious ones I could think of but with
not effect.

-Paul Edmon-

On 4/21/2014 4:31 PM, Danny Auble wrote:

Paul, you should be able to remove the job with no issue.
The real question is why is it still running in the
database instead of completed. If you happen to have any
logs on the job and the information from the database it
would be nice to look at since what you are describing
shouldn't be possible. I know others have seen this before
but no one has found a reproducer yet or any evidence on
how the state was achieved. Let me know if you have
anything like this. Thanks, Danny On 04/21/14 13:05, Paul
Edmon wrote:

Is there a way to delete a JobID and it's relevant data
from the slurm database? I have a user that I want to
remove but there is a job which slurm thinks is not
complete that is preventing me. I want slurm to just
remove that job data as it shouldn't impact anything.
-Paul Edmon- 











[slurm-dev] Re: Removing Job from Slurm Database

2014-04-21 Thread Danny Auble


Paul, you should be able to remove the job with no issue.  The real 
question is why is it still running in the database instead of 
completed.  If you happen to have any logs on the job and the 
information from the database it would be nice to look at since what you 
are describing shouldn't be possible.  I know others have seen this 
before but no one has found a reproducer yet or any evidence on how the 
state was achieved.


Let me know if you have anything like this.

Thanks,
Danny

On 04/21/14 13:05, Paul Edmon wrote:


Is there a way to delete a JobID and it's relevant data from the slurm 
database?  I have a user that I want to remove but there is a job 
which slurm thinks is not complete that is preventing me.  I want 
slurm to just remove that job data as it shouldn't impact anything.


-Paul Edmon-


[slurm-dev] Re: Removing Job from Slurm Database

2014-04-21 Thread Paul Edmon


Sure I can hunt that info down.  So what would be the command to remove 
the job from the DB?  I tried the obvious ones I could think of but with 
not effect.


-Paul Edmon-

On 4/21/2014 4:31 PM, Danny Auble wrote:


Paul, you should be able to remove the job with no issue.  The real 
question is why is it still running in the database instead of 
completed.  If you happen to have any logs on the job and the 
information from the database it would be nice to look at since what 
you are describing shouldn't be possible.  I know others have seen 
this before but no one has found a reproducer yet or any evidence on 
how the state was achieved.


Let me know if you have anything like this.

Thanks,
Danny

On 04/21/14 13:05, Paul Edmon wrote:


Is there a way to delete a JobID and it's relevant data from the 
slurm database?  I have a user that I want to remove but there is a 
job which slurm thinks is not complete that is preventing me.  I want 
slurm to just remove that job data as it shouldn't impact anything.


-Paul Edmon-


[slurm-dev] Re: Removing Job from Slurm Database

2014-04-21 Thread Danny Auble
What are the obvious ones? 

I would expect setting the end time to the start time and state to 4 (I think 
that is a completed state) should do it. 



On April 21, 2014 6:54:22 PM PDT, Paul Edmon ped...@cfa.harvard.edu wrote:

Sure I can hunt that info down.  So what would be the command to remove

the job from the DB?  I tried the obvious ones I could think of but
with 
not effect.

-Paul Edmon-

On 4/21/2014 4:31 PM, Danny Auble wrote:

 Paul, you should be able to remove the job with no issue.  The real 
 question is why is it still running in the database instead of 
 completed.  If you happen to have any logs on the job and the 
 information from the database it would be nice to look at since what 
 you are describing shouldn't be possible.  I know others have seen 
 this before but no one has found a reproducer yet or any evidence on 
 how the state was achieved.

 Let me know if you have anything like this.

 Thanks,
 Danny

 On 04/21/14 13:05, Paul Edmon wrote:

 Is there a way to delete a JobID and it's relevant data from the 
 slurm database?  I have a user that I want to remove but there is a 
 job which slurm thinks is not complete that is preventing me.  I
want 
 slurm to just remove that job data as it shouldn't impact anything.

 -Paul Edmon-