Hi Airavata,

I’m working on connecting Ultrascan3 to Airavata. As the message below shows, 
if the job fails the metascheduler might not retry the job. Is there a resource 
available to take a look at this issue?

Regards,
Aaron

From: Aaron Householder <[email protected]>
Date: Tuesday, September 3, 2024 at 4:42 PM
To: Airavata Users <[email protected]>
Subject: Failed but metascheduler did not resubmit job
Hi Airavata Users,

I had an UltraScan job that seemed to fail without the metascheduler 
resubmitting the job for completion by another cluster. I received the 
following in an email:

   Your UltraScan job is complete:

   Submission Time : 2024-08-26 00:50:05
   Job End Time    :
   Mail Time       : 2024-08-25 19:54:41
   LIMS Host       :
   Analysis ID     : US3-AIRA_ea2b4a32-27a8-4df4-827c-5fd9367c5e1c
   Request ID      : 182  ( uslims3_Demo )
   RunID           : demo1_veloc1
   EditID          : 21030600161
   Data Type       : RA
   Cell/Channel/Wl : 2 / A / 259
   Status          : failed
   Cluster         : metascheduler
   Job Type        : 2DSA-MC
   GFAC Status     : FAILED
   GFAC Message    : org.apache.airavata.helix.impl.task.TaskOnFailException: 
Error Code : 23857cb5-5431-43e7-a927-fedd7a929e34, Task 
TASK_5b1ea99b-750c-49f6-a05b-df7175f141ed failed due to Couldn't find job id in 
both submitted and verified steps. 
expId:US3-AIRA_ea2b4a32-27a8-4df4-827c-5fd9367c5e1c Couldn't find remote jobId 
for JobName:A1394806797, both submit and verify steps doesn't return a valid 
JobId. Hence changing experiment state to Failed
        at 
org.apache.airavata.helix.impl.task.AiravataTask.onFail(AiravataTask.java:146)
        at 
org.apache.airavata.helix.impl.task.submission.DefaultJobSubmissionTask.onRun(DefaultJobSubmissionTask.java:192)
        at 
org.apache.airavata.helix.impl.task.AiravataTask.onRun(AiravataTask.java:437)
        at 
org.apache.airavata.helix.core.AbstractTask.run(AbstractTask.java:102)
        at org.apache.helix.task.TaskRunner.run(TaskRunner.java:71)
        at 
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at 
java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)


No reservation for this job
--> Verifying valid submit host (login2)...OK
--> Verifying valid jobname...OK
--> Verifying valid ssh keys...OK
--> Verifying access to desired queue (normal)...OK
--> Checking available allocation FAILED
   Airavata stderr : ERROR: You have no project in the projectuser.map file (in 
accounting_check_prod.pl).

Reply via email to