Hi Aaron,

As of now, the mentioned task has not been added to a work plan and is
still open for work. However, given your immediate requirement, you can
resolve it by having an active allocation for your experiment submission,
allowing it to proceed without needing the feature.

Let me know if you need further assistance or clarification.

Thanks,
Lahiru

On Thu, Oct 17, 2024 at 8:16 AM Aaron Householder <[email protected]>
wrote:

> Hi Lahiru,
>
>
>
> Thank you for getting back to me and writing up the improvement. I’m
> trying to submit an academic paper before December. Given your initial
> review, do you think this feature will be available in time for me to meet
> my deadline?
>
>
>
> Regards,
>
> Aaron
>
>
>
> *From: *Lahiru Jayathilake <[email protected]>
> *Date: *Thursday, October 17, 2024 at 12:41 AM
> *To: *[email protected] <[email protected]>
> *Cc: *[email protected] <[email protected]>
> *Subject: *Re: Failed but metascheduler did not resubmit job
>
> Hi Aaron,
>
> Thank you for your patience, and I apologize for the delay in getting back
> to you regarding the issue.
>
> After further investigation, I noticed that the current version of the
> Airavata Metascheduler does not support automatically resubmitting jobs to
> different clusters when a job fails after successful submission (e.g., due
> to resource allocation issues). I've now created a task [1] to add this
> feature, which will enable the expected functionality. This enhancement
> will take some time to implement, but we’ll keep you updated on the
> progress.
>
> Please feel free to reach out if you have any further questions or need
> additional information.
>
> [1] - https://issues.apache.org/jira/browse/AIRAVATA-3893
>
>
>
> Thanks,
> Lahiru
>
>
>
> On Wed, Oct 16, 2024 at 7:36 PM Aaron Householder <[email protected]>
> wrote:
>
> Hi,
>
>
>
> Is there any update?
>
>
>
> Regards,
>
> Aaron
>
>
>
> *From: *Aaron Householder <[email protected]>
> *Date: *Saturday, October 12, 2024 at 8:12 PM
> *To: *[email protected] <[email protected]>
> *Subject: *Re: Failed but metascheduler did not resubmit job
>
> Hi Lahiru,
>
>
>
> Any update on this issue? This is an impediment to getting this rolled out
> to UltraScan users. My understanding is that if the job fails while
> verifying and making checks that the metascheduler should try another
> resource.
>
>
>
> Is there anything I can do to help?
>
>
>
> Regards,
>
> Aaron
>
>
>
> *From: *Lahiru Jayathilake <[email protected]>
> *Date: *Thursday, September 12, 2024 at 1:42 PM
> *To: *[email protected] <[email protected]>
> *Subject: *Re: Failed but metascheduler did not resubmit job
>
> Hi Aaron,
>
> Thanks for contacting us. We will look into this issue and get back to you.
>
> Best,
> Lahiru
>
> On 2024/09/11 18:45:33 Aaron Householder wrote:
> > Hi Airavata,
> >
> > I’m working on connecting Ultrascan3 to Airavata. As the message below
> shows, if the job fails the metascheduler might not retry the job. Is there
> a resource available to take a look at this issue?
> >
> > Regards,
> > Aaron
> >
> > From: Aaron Householder <[email protected]>
> > Date: Tuesday, September 3, 2024 at 4:42 PM
> > To: Airavata Users <[email protected]>
> > Subject: Failed but metascheduler did not resubmit job
> > Hi Airavata Users,
> >
> > I had an UltraScan job that seemed to fail without the metascheduler
> resubmitting the job for completion by another cluster. I received the
> following in an email:
> >
> >    Your UltraScan job is complete:
> >
> >    Submission Time : 2024-08-26 00:50:05
> >    Job End Time    :
> >    Mail Time       : 2024-08-25 19:54:41
> >    LIMS Host       :
> >    Analysis ID     : US3-AIRA_ea2b4a32-27a8-4df4-827c-5fd9367c5e1c
> >    Request ID      : 182  ( uslims3_Demo )
> >    RunID           : demo1_veloc1
> >    EditID          : 21030600161
> >    Data Type       : RA
> >    Cell/Channel/Wl : 2 / A / 259
> >    Status          : failed
> >    Cluster         : metascheduler
> >    Job Type        : 2DSA-MC
> >    GFAC Status     : FAILED
> >    GFAC Message    :
> org.apache.airavata.helix.impl.task.TaskOnFailException: Error Code :
> 23857cb5-5431-43e7-a927-fedd7a929e34, Task
> TASK_5b1ea99b-750c-49f6-a05b-df7175f141ed failed due to Couldn't find job
> id in both submitted and verified steps.
> expId:US3-AIRA_ea2b4a32-27a8-4df4-827c-5fd9367c5e1c Couldn't find remote
> jobId for JobName:A1394806797, both submit and verify steps doesn't return
> a valid JobId. Hence changing experiment state to Failed
> >         at
> org.apache.airavata.helix.impl.task.AiravataTask.onFail(AiravataTask.java:146)
> >         at
> org.apache.airavata.helix.impl.task.submission.DefaultJobSubmissionTask.onRun(DefaultJobSubmissionTask.java:192)
> >         at
> org.apache.airavata.helix.impl.task.AiravataTask.onRun(AiravataTask.java:437)
> >         at
> org.apache.airavata.helix.core.AbstractTask.run(AbstractTask.java:102)
> >         at org.apache.helix.task.TaskRunner.run(TaskRunner.java:71)
> >         at
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
> >         at
> java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
> >         at
> java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
> >         at
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> >         at
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> >         at java.base/java.lang.Thread.run(Thread.java:829)
> >
> >
> > No reservation for this job
> > --> Verifying valid submit host (login2)...OK
> > --> Verifying valid jobname...OK
> > --> Verifying valid ssh keys...OK
> > --> Verifying access to desired queue (normal)...OK
> > --> Checking available allocation FAILED
> >    Airavata stderr : ERROR: You have no project in the projectuser.map
> file (in accounting_check_prod.pl).
> >
>
>

Reply via email to