[jira] [Commented] (AIRAVATA-2621) SSH port provided in compute resource registration is not considered for cluster SSH communication
[ https://issues.apache.org/jira/browse/AIRAVATA-2621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16332515#comment-16332515 ] ASF subversion and git services commented on AIRAVATA-2621: --- Commit f13c17fe41bfdd1a4c8e35a88ab16f534190b523 in airavata's branch refs/heads/AIRAVATA-2620 from dimuthu.upeks...@gmail.com [ https://gitbox.apache.org/repos/asf?p=airavata.git;h=f13c17f ] Fixing AIRAVATA-2621 > SSH port provided in compute resource registration is not considered for > cluster SSH communication > -- > > Key: AIRAVATA-2621 > URL: https://issues.apache.org/jira/browse/AIRAVATA-2621 > Project: Airavata > Issue Type: Bug > Components: GFac >Affects Versions: 0.18 > Environment: https://hpcgateway.gsu.edu/ > https://scigap.org/ >Reporter: Eroma >Assignee: Dimuthu Upeksha >Priority: Major > Fix For: 0.18 > > > 1. Added a specific port for job submissions (15022) > 2. But when submitting jobs, for environment creation, the gfac is using the > default 22 port, not the specified one in scigap.org for hpclogin.gsu.edu. > 3. log messages in airavata log > 2017-12-19 11:13:18,996 [pool-7-thread-2] INFO > o.a.airavata.gfac.impl.Factory > process_id=PROCESS_3b471b3b-5b4e-4b6d-a66e-554652a390d2, > token_id=35da840b-63d5-4cbf-b9ce-3005cd94d961, > experiment_id=NWChem2_a38ac303-666f-4dea-9b4c-7bffe0f97dd7, > gateway_id=georgiastate - Initialize a new SSH session for > :airavata_hpclogin.gsu.edu_22_35da840b-63d5-4cbf-b9ce-3005cd94d961 > 2017-12-19 11:15:26,272 [pool-7-thread-2] ERROR o.a.a.gfac.core.GFacException > process_id=PROCESS_3b471b3b-5b4e-4b6d-a66e-554652a390d2, > token_id=35da840b-63d5-4cbf-b9ce-3005cd94d961, > experiment_id=NWChem2_a38ac303-666f-4dea-9b4c-7bffe0f97dd7, > gateway_id=georgiastate - JSch initialization error > com.jcraft.jsch.JSchException: java.net.ConnectException: Connection timed > out (Connection timed out) > at com.jcraft.jsch.Util.createSocket(Util.java:349) > at com.jcraft.jsch.Session.connect(Session.java:215) > at com.jcraft.jsch.Session.connect(Session.java:183) > at > org.apache.airavata.gfac.impl.Factory.getSSHSession(Factory.java:542) > at > org.apache.airavata.gfac.impl.HPCRemoteCluster.getSshSession(HPCRemoteCluster.java:138) > at > org.apache.airavata.gfac.impl.HPCRemoteCluster.getSession(HPCRemoteCluster.java:315) > at > org.apache.airavata.gfac.impl.HPCRemoteCluster.makeDirectory(HPCRemoteCluster.java:242) > at > org.apache.airavata.gfac.impl.task.EnvironmentSetupTask.execute(EnvironmentSetupTask.java:51) > at > org.apache.airavata.gfac.impl.GFacEngineImpl.executeTask(GFacEngineImpl.java:814) > at > org.apache.airavata.gfac.impl.GFacEngineImpl.configureWorkspace(GFacEngineImpl.java:553) > at > org.apache.airavata.gfac.impl.GFacEngineImpl.executeTaskListFrom(GFacEngineImpl.java:324) > at > org.apache.airavata.gfac.impl.GFacEngineImpl.executeProcess(GFacEngineImpl.java:286) > at > org.apache.airavata.gfac.impl.GFacWorker.executeProcess(GFacWorker.java:227) > at org.apache.airavata.gfac.impl.GFacWorker.run(GFacWorker.java:86) > at > org.apache.airavata.common.logging.MDCUtil.lambda$wrapWithMDC$0(MDCUtil.java:40) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.net.ConnectException: Connection timed out (Connection timed > out) > at java.net.PlainSocketImpl.socketConnect(Native Method) > at > java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) > at > java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) > at > java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) > at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) > at java.net.Socket.connect(Socket.java:589) > at java.net.Socket.connect(Socket.java:538) > at java.net.Socket.(Socket.java:434) > at java.net.Socket.(Socket.java:211) > at com.jcraft.jsch.Util.createSocket(Util.java:343) > ... 17 common frames omitted > ?NWChem2_a38ac303-666f-4dea-9b4c-7bffe0f97dd7 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRAVATA-2620) Force post processing functionality
[ https://issues.apache.org/jira/browse/AIRAVATA-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16332517#comment-16332517 ] ASF subversion and git services commented on AIRAVATA-2620: --- Commit e3de5a05731c27b6eb640a7da4b97b177aad48fc in airavata's branch refs/heads/AIRAVATA-2620 from [~smarru] [ https://gitbox.apache.org/repos/asf?p=airavata.git;h=e3de5a0 ] Merge branch 'master' into AIRAVATA-2620 > Force post processing functionality > > > Key: AIRAVATA-2620 > URL: https://issues.apache.org/jira/browse/AIRAVATA-2620 > Project: Airavata > Issue Type: Improvement >Affects Versions: 0.16 >Reporter: Suresh Marru >Assignee: Dimuthu Upeksha >Priority: Major > Fix For: 0.17 > > > Due to current limitations of only relying on email for job monitoring, the > post-processing sometimes has inherent delays. Ultrascan science gateway > would like to have a capability in airavata to request forcing of post > processing. This will be used when clients have out of band knowledge about > job completion (for example through code instrumented UDP messages) and would > like Airavata to force staging of output files. > This improvement has to be carefully added so existing life cycle of an > experiment is not hampred. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRAVATA-2624) Stampede2 cluster SSH connectivity issue
[ https://issues.apache.org/jira/browse/AIRAVATA-2624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16332516#comment-16332516 ] ASF subversion and git services commented on AIRAVATA-2624: --- Commit 02379098159a81612ed8cf4f73a5cef1e3eae9f9 in airavata's branch refs/heads/AIRAVATA-2620 from dimuthu.upeks...@gmail.com [ https://gitbox.apache.org/repos/asf?p=airavata.git;h=0237909 ] Fixing AIRAVATA-2624 Sampede2 cluster SSH connectivity issue > Stampede2 cluster SSH connectivity issue > > > Key: AIRAVATA-2624 > URL: https://issues.apache.org/jira/browse/AIRAVATA-2624 > Project: Airavata > Issue Type: Bug > Components: Airavata System, GFac >Affects Versions: 0.18 > Environment: https://seagrid.org >Reporter: Eroma >Assignee: Marcus Christie >Priority: Major > Fix For: 0.18 > > > Job submission fails at env creation due to JSch initialization error. > Error messages > 2018-01-09 09:46:10,786 [pool-7-thread-15] ERROR > o.a.a.gfac.core.GFacException > process_id=PROCESS_650014f6-fcb6-4680-90ea-898bee373f37, > token_id=3d65bf6d-2c9f-4166-a51b-e76e0022bd3b, > experiment_id=Clone_of_st2molcastest_e2942a34-c9c7-4f04-8ccb-af6fe27e0990, > gateway_id=seagrid - JSch initialization error > com.jcraft.jsch.JSchException: Auth fail > at com.jcraft.jsch.Session.connect(Session.java:512) > at com.jcraft.jsch.Session.connect(Session.java:183) > at > org.apache.airavata.gfac.impl.Factory.getSSHSession(Factory.java:542) > at > org.apache.airavata.gfac.impl.HPCRemoteCluster.getSshSession(HPCRemoteCluster.java:138) > at > org.apache.airavata.gfac.impl.HPCRemoteCluster.getSession(HPCRemoteCluster.java:315) > at > org.apache.airavata.gfac.impl.HPCRemoteCluster.makeDirectory(HPCRemoteCluster.java:242) > at > org.apache.airavata.gfac.impl.task.EnvironmentSetupTask.execute(EnvironmentSetupTask.java:51) > at > org.apache.airavata.gfac.impl.GFacEngineImpl.executeTask(GFacEngineImpl.java:814) > at > org.apache.airavata.gfac.impl.GFacEngineImpl.configureWorkspace(GFacEngineImpl.java:553) > at > org.apache.airavata.gfac.impl.GFacEngineImpl.executeTaskListFrom(GFacEngineImpl.java:324) > at > org.apache.airavata.gfac.impl.GFacEngineImpl.executeProcess(GFacEngineImpl.java:286) > at > org.apache.airavata.gfac.impl.GFacWorker.executeProcess(GFacWorker.java:227) > at org.apache.airavata.gfac.impl.GFacWorker.run(GFacWorker.java:86) > at > org.apache.airavata.common.logging.MDCUtil.lambda$wrapWithMDC$0(MDCUtil.java:40) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:748) > 2018-01-09 09:46:10,786 [pool-7-thread-15] ERROR > o.a.a.g.i.t.EnvironmentSetupTask > process_id=PROCESS_650014f6-fcb6-4680-90ea-898bee373f37, > token_id=3d65bf6d-2c9f-4166-a51b-e76e0022bd3b, > experiment_id=Clone_of_st2molcastest_e2942a34-c9c7-4f04-8ccb-af6fe27e0990, > gateway_id=seagrid - Error while environment setup > org.apache.airavata.gfac.core.GFacException: JSch initialization error > at > org.apache.airavata.gfac.impl.Factory.getSSHSession(Factory.java:545) > at > org.apache.airavata.gfac.impl.HPCRemoteCluster.getSshSession(HPCRemoteCluster.java:138) > at > org.apache.airavata.gfac.impl.HPCRemoteCluster.getSession(HPCRemoteCluster.java:315) > at > org.apache.airavata.gfac.impl.HPCRemoteCluster.makeDirectory(HPCRemoteCluster.java:242) > at > org.apache.airavata.gfac.impl.task.EnvironmentSetupTask.execute(EnvironmentSetupTask.java:51) > at > org.apache.airavata.gfac.impl.GFacEngineImpl.executeTask(GFacEngineImpl.java:814) > at > org.apache.airavata.gfac.impl.GFacEngineImpl.configureWorkspace(GFacEngineImpl.java:553) > at > org.apache.airavata.gfac.impl.GFacEngineImpl.executeTaskListFrom(GFacEngineImpl.java:324) > at > org.apache.airavata.gfac.impl.GFacEngineImpl.executeProcess(GFacEngineImpl.java:286) > at > org.apache.airavata.gfac.impl.GFacWorker.executeProcess(GFacWorker.java:227) > at org.apache.airavata.gfac.impl.GFacWorker.run(GFacWorker.java:86) > at > org.apache.airavata.common.logging.MDCUtil.lambda$wrapWithMDC$0(MDCUtil.java:40) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:748) > Caused by: com.jcraft.jsch.JSchException: Auth fail > at com.jcraft.jsch.Session.connect(Session.java:512) >
[jira] [Created] (AIRAVATA-2646) Upgrade to Thrift 0.11.0
Suresh Marru created AIRAVATA-2646: -- Summary: Upgrade to Thrift 0.11.0 Key: AIRAVATA-2646 URL: https://issues.apache.org/jira/browse/AIRAVATA-2646 Project: Airavata Issue Type: Improvement Reporter: Suresh Marru Thrift upgrades to 0.11.0 has some changes related to C++ stubs - [https://github.com/apache/thrift/blob/0.11.0/CHANGES] To mitigate an unrelated issue, I upgraded to 0.11.0 and its working ok. I think we should move the develop branch to the latest thrift release. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRAVATA-2622) Search airavata.log and locate job IDs returned at subsequent job ID search after submitting the job
[ https://issues.apache.org/jira/browse/AIRAVATA-2622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16332853#comment-16332853 ] Eroma commented on AIRAVATA-2622: - This was tested using the SEAGrid js-169-144.jetstream-cloud.org jetstream slurm cluster. The subsequent steps are functioning. In order to test this # We stopped job ID return at job submission # Then the job ID was returned in the 1st retry or the second retry of squeue command. > Search airavata.log and locate job IDs returned at subsequent job ID search > after submitting the job > > > Key: AIRAVATA-2622 > URL: https://issues.apache.org/jira/browse/AIRAVATA-2622 > Project: Airavata > Issue Type: Bug > Components: Airavata System, GFac >Affects Versions: 0.18 >Reporter: Eroma >Assignee: Eroma >Priority: Major > Fix For: 0.18 > > > Currently at job submission the job ID is returned to airavata GFAC. At some > job submissions when the job ID is not returned gfac will try to retrieve it > two more times. We need to confirm that these subsequent steps are working. > One way is to find and locate instances where the job ID is returned in a > subsequent step. > This is to confirm that these steps are functioning as they are expected. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (AIRAVATA-2622) Search airavata.log and locate job IDs returned at subsequent job ID search after submitting the job
[ https://issues.apache.org/jira/browse/AIRAVATA-2622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eroma closed AIRAVATA-2622. --- Resolution: Fixed > Search airavata.log and locate job IDs returned at subsequent job ID search > after submitting the job > > > Key: AIRAVATA-2622 > URL: https://issues.apache.org/jira/browse/AIRAVATA-2622 > Project: Airavata > Issue Type: Bug > Components: Airavata System, GFac >Affects Versions: 0.18 >Reporter: Eroma >Assignee: Eroma >Priority: Major > Fix For: 0.18 > > > Currently at job submission the job ID is returned to airavata GFAC. At some > job submissions when the job ID is not returned gfac will try to retrieve it > two more times. We need to confirm that these subsequent steps are working. > One way is to find and locate instances where the job ID is returned in a > subsequent step. > This is to confirm that these steps are functioning as they are expected. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (AIRAVATA-2590) Update UGE_groovy.template to apply different parallel environment (-pe) values
[ https://issues.apache.org/jira/browse/AIRAVATA-2590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16331178#comment-16331178 ] Eroma edited comment on AIRAVATA-2590 at 1/19/18 8:02 PM: -- Testing the fix done for Printing the processes per node and job submission commands. The -pe related change was not applied to production as the requirement is no longer valid. Tested using [https://seagrid.org,|https://seagrid.org,/] [https://sciencegateway.siu.edu|https://sciencegateway.siu.edu/] and [https://sciencegateway.usd.edu|https://sciencegateway.usd.edu/] Test Cases. # Submit a test job to little dog cluster. - Cluster is currently non responsive # Submit a test job to big dog cluster. Test job completed successfully. - PASS # Cancel a test job in little dog - luster is currently non responsive # Cancel a test job in big dog. - PASS # Submit a test job to USD HPC. Test job completed successfully. - PASS # Cancel a job in USD HPC. - PASS # Submit test job to SLURM cluster in SEAGrid. Tested with Comet and Bridges - PASS # Submit test job to PBS cluster in SEAGrid. Tested with bigred2 - PASS # Cancel SLURM job. Tested with Stampede2 - PASS # Cancel a PBS jobs. Tested with bigred2 - PASS was (Author: eroma_a): Testing the fix done for Printing the processes per node and job submission commands. The -pe related change was not applied to production as the requirement is no longer valid. Tested using [https://seagrid.org,|https://seagrid.org,/] [https://sciencegateway.siu.edu|https://sciencegateway.siu.edu/] and [https://sciencegateway.usd.edu|https://sciencegateway.usd.edu/] Test Cases. # Submit a test job to little dog cluster. # Submit a test job to big dog cluster. Test job completed successfully. - PASS # Cancel a test job in little dog # Cancel a test job in big dog. - PASS # Submit a test job to USD HPC. Test job completed successfully. - PASS # Cancel a job in USD HPC. # Submit test job to SLURM cluster in SEAGrid. Tested with Comet and Bridges - PASS # Submit test job to PBS cluster in SEAGrid. Tested with bigred2 - PASS # Cancel SLURM job. # Cancel a PBS jobs > Update UGE_groovy.template to apply different parallel environment (-pe) > values > --- > > Key: AIRAVATA-2590 > URL: https://issues.apache.org/jira/browse/AIRAVATA-2590 > Project: Airavata > Issue Type: Bug >Reporter: Marcus Christie >Assignee: Marcus Christie >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRAVATA-2590) Update UGE_groovy.template to apply different parallel environment (-pe) values
[ https://issues.apache.org/jira/browse/AIRAVATA-2590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eroma resolved AIRAVATA-2590. - Resolution: Fixed > Update UGE_groovy.template to apply different parallel environment (-pe) > values > --- > > Key: AIRAVATA-2590 > URL: https://issues.apache.org/jira/browse/AIRAVATA-2590 > Project: Airavata > Issue Type: Bug >Reporter: Marcus Christie >Assignee: Marcus Christie >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)