Hi Airavata, I’m working on connecting Ultrascan3 to Airavata. As the message below shows, if the job fails the metascheduler might not retry the job. Is there a resource available to take a look at this issue?
Regards, Aaron From: Aaron Householder <[email protected]> Date: Tuesday, September 3, 2024 at 4:42 PM To: Airavata Users <[email protected]> Subject: Failed but metascheduler did not resubmit job Hi Airavata Users, I had an UltraScan job that seemed to fail without the metascheduler resubmitting the job for completion by another cluster. I received the following in an email: Your UltraScan job is complete: Submission Time : 2024-08-26 00:50:05 Job End Time : Mail Time : 2024-08-25 19:54:41 LIMS Host : Analysis ID : US3-AIRA_ea2b4a32-27a8-4df4-827c-5fd9367c5e1c Request ID : 182 ( uslims3_Demo ) RunID : demo1_veloc1 EditID : 21030600161 Data Type : RA Cell/Channel/Wl : 2 / A / 259 Status : failed Cluster : metascheduler Job Type : 2DSA-MC GFAC Status : FAILED GFAC Message : org.apache.airavata.helix.impl.task.TaskOnFailException: Error Code : 23857cb5-5431-43e7-a927-fedd7a929e34, Task TASK_5b1ea99b-750c-49f6-a05b-df7175f141ed failed due to Couldn't find job id in both submitted and verified steps. expId:US3-AIRA_ea2b4a32-27a8-4df4-827c-5fd9367c5e1c Couldn't find remote jobId for JobName:A1394806797, both submit and verify steps doesn't return a valid JobId. Hence changing experiment state to Failed at org.apache.airavata.helix.impl.task.AiravataTask.onFail(AiravataTask.java:146) at org.apache.airavata.helix.impl.task.submission.DefaultJobSubmissionTask.onRun(DefaultJobSubmissionTask.java:192) at org.apache.airavata.helix.impl.task.AiravataTask.onRun(AiravataTask.java:437) at org.apache.airavata.helix.core.AbstractTask.run(AbstractTask.java:102) at org.apache.helix.task.TaskRunner.run(TaskRunner.java:71) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829) No reservation for this job --> Verifying valid submit host (login2)...OK --> Verifying valid jobname...OK --> Verifying valid ssh keys...OK --> Verifying access to desired queue (normal)...OK --> Checking available allocation FAILED Airavata stderr : ERROR: You have no project in the projectuser.map file (in accounting_check_prod.pl).
