[jira] [Updated] (AMBARI-21530) Service Checks During Upgrades Should Use Desired Stack

2017-07-20 Thread Jonathan Hurley (JIRA)

 [ 
https://issues.apache.org/jira/browse/AMBARI-21530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hurley updated AMBARI-21530:
-
Status: Patch Available  (was: Open)

> Service Checks During Upgrades Should Use Desired Stack
> ---
>
> Key: AMBARI-21530
> URL: https://issues.apache.org/jira/browse/AMBARI-21530
> Project: Ambari
>  Issue Type: Bug
>  Components: ambari-server
>Affects Versions: 2.5.2
>Reporter: Jonathan Hurley
>Assignee: Jonathan Hurley
>Priority: Blocker
> Fix For: 2.5.2
>
> Attachments: AMBARI-21530.patch
>
>
> During an upgrade from BI 4.2 to HDP 2.6, some service checks were failing. 
> This is because the service checks were having their hooks/service folders 
> overwritten by some of the scheduler framework. At the time of orchestration, 
> the cluster desired ID was still BI but the effective ID used for the upgrade 
> was HDP (which was being clobbered)
> Exception on running YARN service check:
> {code}
> Traceback (most recent call last):
>   File 
> "/var/lib/ambari-agent/cache/stacks/BigInsights/4.2/services/YARN/package/scripts/service_check.py",
>  line 91, in 
> ServiceCheck().execute()
>   File 
> "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py",
>  line 329, in execute
> method(env)
>   File 
> "/var/lib/ambari-agent/cache/stacks/BigInsights/4.2/services/YARN/package/scripts/service_check.py",
>  line 54, in service_check
> user=params.smokeuser,
>   File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", 
> line 72, in inner
> result = function(command, **kwargs)
>   File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", 
> line 102, in checked_call
> tries=tries, try_sleep=try_sleep, 
> timeout_kill_strategy=timeout_kill_strategy)
>   File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", 
> line 150, in _call_wrapper
> result = _call(command, **kwargs_copy)
>   File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", 
> line 303, in _call
> raise ExecutionFailed(err_msg, code, out, err)
> resource_management.core.exceptions.ExecutionFailed: Execution of 'yarn 
> org.apache.hadoop.yarn.applications.distributedshell.Client -shell_command ls 
> -num_containers 1 -jar 
> /usr/lib/hadoop-yarn/hadoop-yarn-applications-distributedshell*.jar' returned 
> 1. 17/07/19 19:34:40 INFO distributedshell.Client: Initializing Client
> 17/07/19 19:34:40 INFO distributedshell.Client: Running Client
> 17/07/19 19:34:40 INFO client.RMProxy: Connecting to ResourceManager at 
> sid-bigi-2.c.pramod-thangali.internal/10.240.0.47:8050
> 17/07/19 19:34:40 INFO client.AHSProxy: Connecting to Application History 
> server at sid-bigi-2.c.pramod-thangali.internal/10.240.0.47:10200
> 17/07/19 19:34:40 INFO distributedshell.Client: Got Cluster metric info from 
> ASM, numNodeManagers=1
> 17/07/19 19:34:40 INFO distributedshell.Client: Got Cluster node info from ASM
> 17/07/19 19:34:40 INFO distributedshell.Client: Got node report from ASM for, 
> nodeId=sid-bigi-3.c.pramod-thangali.internal:45454, 
> nodeAddresssid-bigi-3.c.pramod-thangali.internal:8042, 
> nodeRackName/default-rack, nodeNumContainers0
> 17/07/19 19:34:40 INFO distributedshell.Client: Queue info, 
> queueName=default, queueCurrentCapacity=0.0, queueMaxCapacity=1.0, 
> queueApplicationCount=0, queueChildQueueCount=0
> 17/07/19 19:34:40 INFO distributedshell.Client: User ACL Info for Queue, 
> queueName=root, userAcl=SUBMIT_APPLICATIONS
> 17/07/19 19:34:40 INFO distributedshell.Client: User ACL Info for Queue, 
> queueName=root, userAcl=ADMINISTER_QUEUE
> 17/07/19 19:34:40 INFO distributedshell.Client: User ACL Info for Queue, 
> queueName=default, userAcl=SUBMIT_APPLICATIONS
> 17/07/19 19:34:40 INFO distributedshell.Client: User ACL Info for Queue, 
> queueName=default, userAcl=ADMINISTER_QUEUE
> 17/07/19 19:34:40 INFO distributedshell.Client: Max mem capability of 
> resources in this cluster 10240
> 17/07/19 19:34:40 INFO distributedshell.Client: Max virtual cores capabililty 
> of resources in this cluster 3
> 17/07/19 19:34:40 INFO distributedshell.Client: Copy App Master jar from 
> local filesystem and add to local environment
> 17/07/19 19:34:41 FATAL distributedshell.Client: Error running Client
> java.io.FileNotFoundException: File 
> /usr/lib/hadoop-yarn/hadoop-yarn-applications-distributedshell*.jar does not 
> exist
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:624)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:850)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:614)
>   at 
> 

[jira] [Updated] (AMBARI-21530) Service Checks During Upgrades Should Use Desired Stack

2017-07-20 Thread Jonathan Hurley (JIRA)

 [ 
https://issues.apache.org/jira/browse/AMBARI-21530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hurley updated AMBARI-21530:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Service Checks During Upgrades Should Use Desired Stack
> ---
>
> Key: AMBARI-21530
> URL: https://issues.apache.org/jira/browse/AMBARI-21530
> Project: Ambari
>  Issue Type: Bug
>  Components: ambari-server
>Affects Versions: 2.5.2
>Reporter: Jonathan Hurley
>Assignee: Jonathan Hurley
>Priority: Blocker
> Fix For: 2.5.2
>
> Attachments: AMBARI-21530.patch
>
>
> During an upgrade from BI 4.2 to HDP 2.6, some service checks were failing. 
> This is because the service checks were having their hooks/service folders 
> overwritten by some of the scheduler framework. At the time of orchestration, 
> the cluster desired ID was still BI but the effective ID used for the upgrade 
> was HDP (which was being clobbered)
> Exception on running YARN service check:
> {code}
> Traceback (most recent call last):
>   File 
> "/var/lib/ambari-agent/cache/stacks/BigInsights/4.2/services/YARN/package/scripts/service_check.py",
>  line 91, in 
> ServiceCheck().execute()
>   File 
> "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py",
>  line 329, in execute
> method(env)
>   File 
> "/var/lib/ambari-agent/cache/stacks/BigInsights/4.2/services/YARN/package/scripts/service_check.py",
>  line 54, in service_check
> user=params.smokeuser,
>   File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", 
> line 72, in inner
> result = function(command, **kwargs)
>   File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", 
> line 102, in checked_call
> tries=tries, try_sleep=try_sleep, 
> timeout_kill_strategy=timeout_kill_strategy)
>   File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", 
> line 150, in _call_wrapper
> result = _call(command, **kwargs_copy)
>   File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", 
> line 303, in _call
> raise ExecutionFailed(err_msg, code, out, err)
> resource_management.core.exceptions.ExecutionFailed: Execution of 'yarn 
> org.apache.hadoop.yarn.applications.distributedshell.Client -shell_command ls 
> -num_containers 1 -jar 
> /usr/lib/hadoop-yarn/hadoop-yarn-applications-distributedshell*.jar' returned 
> 1. 17/07/19 19:34:40 INFO distributedshell.Client: Initializing Client
> 17/07/19 19:34:40 INFO distributedshell.Client: Running Client
> 17/07/19 19:34:40 INFO client.RMProxy: Connecting to ResourceManager at 
> sid-bigi-2.c.pramod-thangali.internal/10.240.0.47:8050
> 17/07/19 19:34:40 INFO client.AHSProxy: Connecting to Application History 
> server at sid-bigi-2.c.pramod-thangali.internal/10.240.0.47:10200
> 17/07/19 19:34:40 INFO distributedshell.Client: Got Cluster metric info from 
> ASM, numNodeManagers=1
> 17/07/19 19:34:40 INFO distributedshell.Client: Got Cluster node info from ASM
> 17/07/19 19:34:40 INFO distributedshell.Client: Got node report from ASM for, 
> nodeId=sid-bigi-3.c.pramod-thangali.internal:45454, 
> nodeAddresssid-bigi-3.c.pramod-thangali.internal:8042, 
> nodeRackName/default-rack, nodeNumContainers0
> 17/07/19 19:34:40 INFO distributedshell.Client: Queue info, 
> queueName=default, queueCurrentCapacity=0.0, queueMaxCapacity=1.0, 
> queueApplicationCount=0, queueChildQueueCount=0
> 17/07/19 19:34:40 INFO distributedshell.Client: User ACL Info for Queue, 
> queueName=root, userAcl=SUBMIT_APPLICATIONS
> 17/07/19 19:34:40 INFO distributedshell.Client: User ACL Info for Queue, 
> queueName=root, userAcl=ADMINISTER_QUEUE
> 17/07/19 19:34:40 INFO distributedshell.Client: User ACL Info for Queue, 
> queueName=default, userAcl=SUBMIT_APPLICATIONS
> 17/07/19 19:34:40 INFO distributedshell.Client: User ACL Info for Queue, 
> queueName=default, userAcl=ADMINISTER_QUEUE
> 17/07/19 19:34:40 INFO distributedshell.Client: Max mem capability of 
> resources in this cluster 10240
> 17/07/19 19:34:40 INFO distributedshell.Client: Max virtual cores capabililty 
> of resources in this cluster 3
> 17/07/19 19:34:40 INFO distributedshell.Client: Copy App Master jar from 
> local filesystem and add to local environment
> 17/07/19 19:34:41 FATAL distributedshell.Client: Error running Client
> java.io.FileNotFoundException: File 
> /usr/lib/hadoop-yarn/hadoop-yarn-applications-distributedshell*.jar does not 
> exist
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:624)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:850)
>   at 
> 

[jira] [Updated] (AMBARI-21530) Service Checks During Upgrades Should Use Desired Stack

2017-07-20 Thread Jonathan Hurley (JIRA)

 [ 
https://issues.apache.org/jira/browse/AMBARI-21530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hurley updated AMBARI-21530:
-
Attachment: AMBARI-21530.patch

> Service Checks During Upgrades Should Use Desired Stack
> ---
>
> Key: AMBARI-21530
> URL: https://issues.apache.org/jira/browse/AMBARI-21530
> Project: Ambari
>  Issue Type: Bug
>  Components: ambari-server
>Affects Versions: 2.5.2
>Reporter: Jonathan Hurley
>Assignee: Jonathan Hurley
>Priority: Blocker
> Fix For: 2.5.2
>
> Attachments: AMBARI-21530.patch
>
>
> During an upgrade from BI 4.2 to HDP 2.6, some service checks were failing. 
> This is because the service checks were having their hooks/service folders 
> overwritten by some of the scheduler framework. At the time of orchestration, 
> the cluster desired ID was still BI but the effective ID used for the upgrade 
> was HDP (which was being clobbered)
> Exception on running YARN service check:
> {code}
> Traceback (most recent call last):
>   File 
> "/var/lib/ambari-agent/cache/stacks/BigInsights/4.2/services/YARN/package/scripts/service_check.py",
>  line 91, in 
> ServiceCheck().execute()
>   File 
> "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py",
>  line 329, in execute
> method(env)
>   File 
> "/var/lib/ambari-agent/cache/stacks/BigInsights/4.2/services/YARN/package/scripts/service_check.py",
>  line 54, in service_check
> user=params.smokeuser,
>   File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", 
> line 72, in inner
> result = function(command, **kwargs)
>   File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", 
> line 102, in checked_call
> tries=tries, try_sleep=try_sleep, 
> timeout_kill_strategy=timeout_kill_strategy)
>   File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", 
> line 150, in _call_wrapper
> result = _call(command, **kwargs_copy)
>   File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", 
> line 303, in _call
> raise ExecutionFailed(err_msg, code, out, err)
> resource_management.core.exceptions.ExecutionFailed: Execution of 'yarn 
> org.apache.hadoop.yarn.applications.distributedshell.Client -shell_command ls 
> -num_containers 1 -jar 
> /usr/lib/hadoop-yarn/hadoop-yarn-applications-distributedshell*.jar' returned 
> 1. 17/07/19 19:34:40 INFO distributedshell.Client: Initializing Client
> 17/07/19 19:34:40 INFO distributedshell.Client: Running Client
> 17/07/19 19:34:40 INFO client.RMProxy: Connecting to ResourceManager at 
> sid-bigi-2.c.pramod-thangali.internal/10.240.0.47:8050
> 17/07/19 19:34:40 INFO client.AHSProxy: Connecting to Application History 
> server at sid-bigi-2.c.pramod-thangali.internal/10.240.0.47:10200
> 17/07/19 19:34:40 INFO distributedshell.Client: Got Cluster metric info from 
> ASM, numNodeManagers=1
> 17/07/19 19:34:40 INFO distributedshell.Client: Got Cluster node info from ASM
> 17/07/19 19:34:40 INFO distributedshell.Client: Got node report from ASM for, 
> nodeId=sid-bigi-3.c.pramod-thangali.internal:45454, 
> nodeAddresssid-bigi-3.c.pramod-thangali.internal:8042, 
> nodeRackName/default-rack, nodeNumContainers0
> 17/07/19 19:34:40 INFO distributedshell.Client: Queue info, 
> queueName=default, queueCurrentCapacity=0.0, queueMaxCapacity=1.0, 
> queueApplicationCount=0, queueChildQueueCount=0
> 17/07/19 19:34:40 INFO distributedshell.Client: User ACL Info for Queue, 
> queueName=root, userAcl=SUBMIT_APPLICATIONS
> 17/07/19 19:34:40 INFO distributedshell.Client: User ACL Info for Queue, 
> queueName=root, userAcl=ADMINISTER_QUEUE
> 17/07/19 19:34:40 INFO distributedshell.Client: User ACL Info for Queue, 
> queueName=default, userAcl=SUBMIT_APPLICATIONS
> 17/07/19 19:34:40 INFO distributedshell.Client: User ACL Info for Queue, 
> queueName=default, userAcl=ADMINISTER_QUEUE
> 17/07/19 19:34:40 INFO distributedshell.Client: Max mem capability of 
> resources in this cluster 10240
> 17/07/19 19:34:40 INFO distributedshell.Client: Max virtual cores capabililty 
> of resources in this cluster 3
> 17/07/19 19:34:40 INFO distributedshell.Client: Copy App Master jar from 
> local filesystem and add to local environment
> 17/07/19 19:34:41 FATAL distributedshell.Client: Error running Client
> java.io.FileNotFoundException: File 
> /usr/lib/hadoop-yarn/hadoop-yarn-applications-distributedshell*.jar does not 
> exist
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:624)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:850)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:614)
>   at 
>