As I said, now I have one job in LAUNCHED and another in EXECUTING. Slurm job 
IDs are getting created but there appears to be no “callback” when they’re 
finished.

One odd thing that’s happening can be seen in the log.

2017-06-26 12:50:20,901 [Thread-21] ERROR 
org.apache.airavata.gfac.monitor.email.EmailBasedMonitor  - [EJM]: Error 
parsing email message =====================================>
org.apache.airavata.common.exception.AiravataException: [EJM]: Couldn't 
identify Resource job manager type from address Google 
<[email protected]>
        at 
org.apache.airavata.gfac.monitor.email.EmailBasedMonitor.getJobMonitorType(EmailBasedMonitor.java:181)
        at 
org.apache.airavata.gfac.monitor.email.EmailBasedMonitor.parse(EmailBasedMonitor.java:165)
        at 
org.apache.airavata.gfac.monitor.email.EmailBasedMonitor.processMessages(EmailBasedMonitor.java:265)
        at 
org.apache.airavata.gfac.monitor.email.EmailBasedMonitor.run(EmailBasedMonitor.java:234)
        at java.lang.Thread.run(Thread.java:745)
2017-06-26 12:50:20,903 [Thread-21] ERROR 
org.apache.airavata.gfac.monitor.email.EmailBasedMonitor  - FROM: Google 
<[email protected]>
2017-06-26 12:50:20,903 [Thread-21] ERROR 
org.apache.airavata.gfac.monitor.email.EmailBasedMonitor  - TO: 
[email protected]
2017-06-26 12:50:20,903 [Thread-21] ERROR 
org.apache.airavata.gfac.monitor.email.EmailBasedMonitor  - SUBJECT: Someone 
has your password
2017-06-26 12:50:36,459 [Thread-21] INFO  
org.apache.airavata.gfac.monitor.email.EmailBasedMonitor  - [EJM]: 1 job/s in 
job monitor map
2017-06-26 12:50:36,756 [Thread-21] INFO  
org.apache.airavata.gfac.monitor.email.EmailBasedMonitor  - [EJM]: Retrieving 
unseen emails
2017-06-26 12:50:37,383 [Thread-21] INFO  
org.apache.airavata.gfac.monitor.email.EmailBasedMonitor  - [EJM]: 104 new 
email/s received
2017-06-26 12:51:06,350 [Thread-21] ERROR 
org.apache.airavata.gfac.monitor.email.EmailBasedMonitor  - [EJM]: Error 
parsing email message =====================================>
org.apache.airavata.common.exception.AiravataException: [EJM]: Couldn't 
identify Resource job manager type from address Google 
<[email protected]>
        at 
org.apache.airavata.gfac.monitor.email.EmailBasedMonitor.getJobMonitorType(EmailBasedMonitor.java:181)
        at 
org.apache.airavata.gfac.monitor.email.EmailBasedMonitor.parse(EmailBasedMonitor.java:165)
        at 
org.apache.airavata.gfac.monitor.email.EmailBasedMonitor.processMessages(EmailBasedMonitor.java:265)
        at 
org.apache.airavata.gfac.monitor.email.EmailBasedMonitor.run(EmailBasedMonitor.java:234)
        at java.lang.Thread.run(Thread.java:745)
2017-06-26 12:51:06,350 [Thread-21] ERROR 
org.apache.airavata.gfac.monitor.email.EmailBasedMonitor  - FROM: Google 
<[email protected]>
2017-06-26 12:51:06,350 [Thread-21] ERROR 
org.apache.airavata.gfac.monitor.email.EmailBasedMonitor  - TO: 
[email protected]
2017-06-26 12:51:06,350 [Thread-21] ERROR 
org.apache.airavata.gfac.monitor.email.EmailBasedMonitor  - SUBJECT: Someone 
has your password
2017-06-26 12:51:22,007 [Thread-21] INFO  
org.apache.airavata.gfac.monitor.email.EmailBasedMonitor  - [EJM]: 1 job/s in 
job monitor map
2017-06-26 12:51:22,445 [Thread-21] INFO  
org.apache.airavata.gfac.monitor.email.EmailBasedMonitor  - [EJM]: Retrieving 
unseen emails
2017-06-26 12:51:23,071 [Thread-21] INFO  
org.apache.airavata.gfac.monitor.email.EmailBasedMonitor  - [EJM]: 104 new 
email/s received
2017-06-26 12:51:51,038 [Thread-21] ERROR 
org.apache.airavata.gfac.monitor.email.EmailBasedMonitor  - [EJM]: Error 
parsing email message =====================================>
org.apache.airavata.common.exception.AiravataException: [EJM]: Couldn't 
identify Resource job manager type from address Google 
<[email protected]>
        at 
org.apache.airavata.gfac.monitor.email.EmailBasedMonitor.getJobMonitorType(EmailBasedMonitor.java:181)
        at 
org.apache.airavata.gfac.monitor.email.EmailBasedMonitor.parse(EmailBasedMonitor.java:165)
        at 
org.apache.airavata.gfac.monitor.email.EmailBasedMonitor.processMessages(EmailBasedMonitor.java:265)
        at 
org.apache.airavata.gfac.monitor.email.EmailBasedMonitor.run(EmailBasedMonitor.java:234)
        at java.lang.Thread.run(Thread.java:745)
2017-06-26 12:51:51,039 [Thread-21] ERROR 
org.apache.airavata.gfac.monitor.email.EmailBasedMonitor  - FROM: Google 
<[email protected]>
2017-06-26 12:51:51,039 [Thread-21] ERROR 
org.apache.airavata.gfac.monitor.email.EmailBasedMonitor  - TO: 
[email protected]
2017-06-26 12:51:51,039 [Thread-21] ERROR 
org.apache.airavata.gfac.monitor.email.EmailBasedMonitor  - SUBJECT: Someone 
has your password

This email from Google to the effect of “someone has your password” is not 
found anywhere in the account, so I don’t know why it keeps “reading" it over 
and over again. It does seem to be able to access the email account without any 
difficulty.

J

> On Jun 26, 2017, at 9:35 AM, Eroma Abeysinghe <[email protected]> 
> wrote:
> 
> Hi Jarett,
> 
> Did you do a recent upgrade of airavata and pga? If not please do so with the 
> latest production. By the information you have provided, it could be an issue 
> with gfac server reading from the rabbitmq queue. But you said although the 
> experiment is in LAUNCHED job is in submitted. So does your email contain 
> unread emails for this job? When was the last time the experiment completed 
> and any changes done to server machines, etc.. from then to now? 
> 
> Hi Jeff,
> Yours is slightly different since its in EXECUTING. With the information you 
> have provided, I think your issue could be with email monitoring. Do you have 
> unread emails for the jobs in EXECUTING in your email box? If you do, then 
> you need to check you gfac-config.yaml in airavata bin folder and make sure 
> it processes emails from the comet.
> 
> hope this info helps for further investigations. 
> 
> Thanks,
> Eroma
> 
> On Fri, Jun 23, 2017 at 4:56 PM, Sale, Jeff <[email protected] 
> <mailto:[email protected]>> wrote:
> I have a similar issue. I have been working with the Airavata support folks, 
> Eroma, Supun, and Marcus for the past few weeks trying to get Gaussian jobs 
> to run on Comet. They have been super helpful, and it appears I am now able 
> to run jobs to completion according to the Gaussian.log file in the scratch 
> directory on Comet, but when I browse to the Experiment on the PGA the stdout 
> and stderr files never appear as a link in Outputs and the job status is 
> perpetually in  "EXECUTING".
> 
> I seem to recall Supun saying this was something they were aware of and are 
> working to resolve, but I could be wrong about this.
> 
> Jeff
> 
> ________________________________________
> From: Jarett DeAngelis [[email protected] <mailto:[email protected]>]
> Sent: Friday, June 23, 2017 1:28 PM
> To: [email protected] <mailto:[email protected]>
> Subject: Job stuck in "launched," "submitted" status
> 
> Hi gang,
> 
> Working on our Airavata deployment (still build 16) again and have 
> encountered an issue where after submitting a job to Slurm, it gets stuck in 
> the “LAUNCHED” state, appearing to have sent the job to Slurm because it says 
> “SUBMITTED” underneath, but it just stays that way forever. If you look at 
> RabbitMQ there is a message sitting in the queue. Our first thought was that 
> it was the email account we’re using for job tracking, but that is 
> functioning fine. Where should I be looking for answers?
> 
> Thanks,
> Jarett
> 
> 
> 
> -- 
> Thank You,
> Best Regards,
> Eroma

Reply via email to