Re: [galaxy-dev] possible to resume failed workflow?

2012-05-25 Thread Kelkar, Hemant
Jason,

Is there a possibility that the jobs failing to submit because there is no 
valid input detected (because of a write cache delay perhaps)?

Sorry I do not know PBS. We use LSF and we have ended up increasing the 
“retry_job_output_collection” entry to 30 to catch these falsely “failed” jobs.

--Hemant

From: J. Greenbaum [mailto:jgb...@liai.org]
Sent: Friday, May 25, 2012 11:42 AM
To: Kelkar, Hemant
Cc: galaxy-dev@lists.bx.psu.edu
Subject: Re: [galaxy-dev] possible to resume failed workflow?

Hi Hermant,

Thanks for the suggestion.  I've tried setting that parameter to 5, but it has 
not helped.  I've noticed in the galaxy server output that I'm getting the 
following error:

galaxy.jobs.runners.pbs DEBUG 2012-05-25 08:13:17,957 (96) pbs_submit failed, 
PBS error 15031: Protocol (ASN.1) error

I believe this is why certain jobs are failing.  I've googled a bit for this 
error, and found many threads where similar problems were reported.  Here are a 
couple:

http://lists.bx.psu.edu/pipermail/galaxy-dev/2011-February/004336.html
http://osdir.com/ml/galaxy-development-source-control/2011-02/msg00148.html

Is anyone aware of a solution to this issue?

Thanks,

J

--
Jason Greenbaum, Ph.D.
Manager, Bioinformatics Core | jgb...@liai.org<mailto:jgb...@liai.org>
La Jolla Institute for Allergy and Immunology




From: "Hemant Kelkar" mailto:hkel...@unc.edu>>
To: "J. Greenbaum" mailto:jgb...@liai.org>>, 
galaxy-dev@lists.bx.psu.edu<mailto:galaxy-dev@lists.bx.psu.edu>
Sent: Friday, May 25, 2012 4:34:12 AM
Subject: RE: [galaxy-dev] possible to resume failed workflow?
Jason,

Are the affected workflow steps actually failing or are they falsely being 
reported as “failed” (have you checked if correct output exists for the 
affected step)? Once a step/job is marked “failed’ you can’t use the output 
(even if it exists) for any subsequent step.

If you are using a cluster for your local galaxy install and NFS disk mounts 
then this may happen because of write cache delays. If that is the case, 
increasing the value for the “retry_job_output_collection” parameter  to a 
higher number in the universe_wsgi.ini should help you get around the problem. 
It fixed the problem in our local galaxy  where some jobs were being reported 
as “failed” though the correct output was there.

--Hemant

From: 
galaxy-dev-boun...@lists.bx.psu.edu<mailto:galaxy-dev-boun...@lists.bx.psu.edu> 
[mailto:galaxy-dev-boun...@lists.bx.psu.edu]<mailto:[mailto:galaxy-dev-boun...@lists.bx.psu.edu]>
 On Behalf Of J. Greenbaum
Sent: Thursday, May 24, 2012 9:18 PM
To: galaxy-dev@lists.bx.psu.edu<mailto:galaxy-dev@lists.bx.psu.edu>
Subject: [galaxy-dev] possible to resume failed workflow?

Hi,

I've created a few workflows and have been having issues with some steps 
randomly failing.  This would not be an issue if I could simply resume the 
workflow from the failed step, but it seems that this is not possible.  
Instead, I'm forced to restart the workflow from the beginning.  Is this true 
or am I missing something?

Thanks,

Jason

--
Jason Greenbaum, Ph.D.
Manager, Bioinformatics Core | jgb...@liai.org<mailto:jgb...@liai.org>
La Jolla Institute for Allergy and Immunology




___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] possible to resume failed workflow?

2012-05-25 Thread J. Greenbaum
Hi Hermant,

Thanks for the suggestion. I've tried setting that parameter to 5, but it has 
not helped. I've noticed in the galaxy server output that I'm getting the 
following error:

galaxy.jobs.runners.pbs DEBUG 2012-05-25 08:13:17,957 (96) pbs_submit failed, 
PBS error 15031: Protocol (ASN.1) error

I believe this is why certain jobs are failing. I've googled a bit for this 
error, and found many threads where similar problems were reported. Here are a 
couple:

http://lists.bx.psu.edu/pipermail/galaxy-dev/2011-February/004336.html
http://osdir.com/ml/galaxy-development-source-control/2011-02/msg00148.html

Is anyone aware of a solution to this issue?

Thanks,

J

-- Jason Greenbaum, Ph.D.
Manager, Bioinformatics Core | jgb...@liai.org
La Jolla Institute for Allergy and Immunology

- Original Message -

> From: "Hemant Kelkar" 
> To: "J. Greenbaum" , galaxy-dev@lists.bx.psu.edu
> Sent: Friday, May 25, 2012 4:34:12 AM
> Subject: RE: [galaxy-dev] possible to resume failed workflow?

> Jason,

> Are the affected workflow steps actually failing or are they falsely
> being reported as “failed” (have you checked if correct output
> exists for the affected step)? Once a step/job is marked “failed’
> you can’t use the output (even if it exists) for any subsequent
> step.

> If you are using a cluster for your local galaxy install and NFS disk
> mounts then this may happen because of write cache delays. If that
> is the case, increasing the value for the
> “retry_job_output_collection” parameter to a higher number in the
> universe_wsgi.ini should help you get around the problem. It fixed
> the problem in our local galaxy where some jobs were being reported
> as “failed” though the correct output was there.

> --Hemant

> From: galaxy-dev-boun...@lists.bx.psu.edu
> [mailto:galaxy-dev-boun...@lists.bx.psu.edu] On Behalf Of J.
> Greenbaum
> Sent: Thursday, May 24, 2012 9:18 PM
> To: galaxy-dev@lists.bx.psu.edu
> Subject: [galaxy-dev] possible to resume failed workflow?

> Hi,

> I've created a few workflows and have been having issues with some
> steps randomly failing. This would not be an issue if I could simply
> resume the workflow from the failed step, but it seems that this is
> not possible. Instead, I'm forced to restart the workflow from the
> beginning. Is this true or am I missing something?

> Thanks,

> Jason

> --
> Jason Greenbaum, Ph.D.
> Manager, Bioinformatics Core | jgb...@liai.org
> La Jolla Institute for Allergy and Immunology
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] possible to resume failed workflow?

2012-05-25 Thread Kelkar, Hemant
Jason,

Are the affected workflow steps actually failing or are they falsely being 
reported as “failed” (have you checked if correct output exists for the 
affected step)? Once a step/job is marked “failed’ you can’t use the output 
(even if it exists) for any subsequent step.

If you are using a cluster for your local galaxy install and NFS disk mounts 
then this may happen because of write cache delays. If that is the case, 
increasing the value for the “retry_job_output_collection” parameter  to a 
higher number in the universe_wsgi.ini should help you get around the problem. 
It fixed the problem in our local galaxy  where some jobs were being reported 
as “failed” though the correct output was there.

--Hemant

From: galaxy-dev-boun...@lists.bx.psu.edu 
[mailto:galaxy-dev-boun...@lists.bx.psu.edu] On Behalf Of J. Greenbaum
Sent: Thursday, May 24, 2012 9:18 PM
To: galaxy-dev@lists.bx.psu.edu
Subject: [galaxy-dev] possible to resume failed workflow?

Hi,

I've created a few workflows and have been having issues with some steps 
randomly failing.  This would not be an issue if I could simply resume the 
workflow from the failed step, but it seems that this is not possible.  
Instead, I'm forced to restart the workflow from the beginning.  Is this true 
or am I missing something?

Thanks,

Jason

--
Jason Greenbaum, Ph.D.
Manager, Bioinformatics Core | jgb...@liai.org<mailto:jgb...@liai.org>
La Jolla Institute for Allergy and Immunology



___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

[galaxy-dev] possible to resume failed workflow?

2012-05-24 Thread J. Greenbaum

Hi, 

I've created a few workflows and have been having issues with some steps 
randomly failing. This would not be an issue if I could simply resume the 
workflow from the failed step, but it seems that this is not possible. Instead, 
I'm forced to restart the workflow from the beginning. Is this true or am I 
missing something? 

Thanks, 


Jason 






-- Jason Greenbaum, Ph.D. 
Manager, Bioinformatics Core | jgb...@liai.org 
La Jolla Institute for Allergy and Immunology 




___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/