[jira] [Created] (YARN-4244) BlockPlacementPolicy related logs should contain the details about the filename and blockid
J.Andreina created YARN-4244: Summary: BlockPlacementPolicy related logs should contain the details about the filename and blockid Key: YARN-4244 URL: https://issues.apache.org/jira/browse/YARN-4244 Project: Hadoop YARN Issue Type: Improvement Reporter: J.Andreina Assignee: J.Andreina Currently the user will not get the details about which file/block , the BlockPlacementPolicy is not able to find a replica node , if there is a huge client write operation is going on. For example consider below failure message , which does'nt have details about file/block , which will be difficult to track later. {noformat} final String message = "Failed to place enough replicas, still in need of " + (totalReplicasExpected - results.size()) + " to reach " + totalReplicasExpected + " (unavailableStorages=" + unavailableStorages + ", storagePolicy=" + storagePolicy + ", newBlock=" + newBlock + ")"; String msg = "All required storage types are unavailable: " + " unavailableStorages=" + unavailableStorages + ", storagePolicy=" + storagePolicy.getName(); {noformat} It is better to provide the file/block information in the logs for better debugability . -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3617) Fix WindowsResourceCalculatorPlugin.getCpuFrequency() returning always -1
[ https://issues.apache.org/jira/browse/YARN-3617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589512#comment-14589512 ] J.Andreina commented on YARN-3617: -- Thanks [~devaraj.k] for the commit Fix WindowsResourceCalculatorPlugin.getCpuFrequency() returning always -1 - Key: YARN-3617 URL: https://issues.apache.org/jira/browse/YARN-3617 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Environment: Windows 7 x64 SP1 Reporter: Georg Berendt Assignee: J.Andreina Priority: Minor Fix For: 2.8.0 Attachments: YARN-3617.1.patch Original Estimate: 1h Remaining Estimate: 1h In the class 'WindowsResourceCalculatorPlugin.java' of the YARN project, there is an unused variable for CPU frequency. /** {@inheritDoc} */ @Override public long getCpuFrequency() { refreshIfNeeded(); return -1; } Please change '-1' to use 'cpuFrequencyKhz'. org/apache/hadoop/yarn/util/WindowsResourceCalculatorPlugin.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3617) Fix unused variable to get CPU frequency on Windows systems
[ https://issues.apache.org/jira/browse/YARN-3617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] J.Andreina updated YARN-3617: - Attachment: YARN-3617.1.patch Attached an initial patch. Please review. Fix unused variable to get CPU frequency on Windows systems --- Key: YARN-3617 URL: https://issues.apache.org/jira/browse/YARN-3617 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 2.7.0 Environment: Windows 7 x64 SP1 Reporter: Georg Berendt Assignee: J.Andreina Priority: Minor Attachments: YARN-3617.1.patch Original Estimate: 1h Remaining Estimate: 1h In the class 'WindowsResourceCalculatorPlugin.java' of the YARN project, there is an unused variable for CPU frequency. /** {@inheritDoc} */ @Override public long getCpuFrequency() { refreshIfNeeded(); return -1; } Please change '-1' to use 'cpuFrequencyKhz'. org/apache/hadoop/yarn/util/WindowsResourceCalculatorPlugin.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3617) Fix unused variable to get CPU frequency on Windows systems
[ https://issues.apache.org/jira/browse/YARN-3617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14539018#comment-14539018 ] J.Andreina commented on YARN-3617: -- Thanks [~xafero] for reporting this issue. If you have already started working on this, please reassign to you. Fix unused variable to get CPU frequency on Windows systems --- Key: YARN-3617 URL: https://issues.apache.org/jira/browse/YARN-3617 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 2.7.0 Environment: Windows 7 x64 SP1 Reporter: Georg Berendt Assignee: J.Andreina Priority: Minor Original Estimate: 1h Remaining Estimate: 1h In the class 'WindowsResourceCalculatorPlugin.java' of the YARN project, there is an unused variable for CPU frequency. /** {@inheritDoc} */ @Override public long getCpuFrequency() { refreshIfNeeded(); return -1; } Please change '-1' to use 'cpuFrequencyKhz'. org/apache/hadoop/yarn/util/WindowsResourceCalculatorPlugin.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3617) Fix unused variable to get CPU frequency on Windows systems
[ https://issues.apache.org/jira/browse/YARN-3617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] J.Andreina reassigned YARN-3617: Assignee: J.Andreina Fix unused variable to get CPU frequency on Windows systems --- Key: YARN-3617 URL: https://issues.apache.org/jira/browse/YARN-3617 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 2.7.0 Environment: Windows 7 x64 SP1 Reporter: Georg Berendt Assignee: J.Andreina Priority: Minor Original Estimate: 1h Remaining Estimate: 1h In the class 'WindowsResourceCalculatorPlugin.java' of the YARN project, there is an unused variable for CPU frequency. /** {@inheritDoc} */ @Override public long getCpuFrequency() { refreshIfNeeded(); return -1; } Please change '-1' to use 'cpuFrequencyKhz'. org/apache/hadoop/yarn/util/WindowsResourceCalculatorPlugin.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3559) Mark org.apache.hadoop.security.token.Token as @InterfaceAudience.Public
[ https://issues.apache.org/jira/browse/YARN-3559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519137#comment-14519137 ] J.Andreina commented on YARN-3559: -- [~ste...@apache.org] ,I would like to work on this issue. If you have not already started working on this , shall i take this issue? Mark org.apache.hadoop.security.token.Token as @InterfaceAudience.Public Key: YARN-3559 URL: https://issues.apache.org/jira/browse/YARN-3559 Project: Hadoop YARN Issue Type: Improvement Components: security Affects Versions: 2.6.0 Reporter: Steve Loughran {{org.apache.hadoop.security.token.Token}} is tagged {{@InterfaceAudience.LimitedPrivate}} for HDFS and MapReduce. However, it is used throughout YARN apps, where both the clients and the AM need to work with tokens. This class and related all need to be declared public. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3507) Illegal Option is displayed while executing clrSpaceQuota with -storageType option
J.Andreina created YARN-3507: Summary: Illegal Option is displayed while executing clrSpaceQuota with -storageType option Key: YARN-3507 URL: https://issues.apache.org/jira/browse/YARN-3507 Project: Hadoop YARN Issue Type: Bug Reporter: J.Andreina Assignee: J.Andreina Providing -storageType during execution of clrSpaceQuota as per the usage , displays Illegal option {noformat} X:~/opensrc/hadoop-3.0.0-SNAPSHOT/bin ./hdfs dfsadmin -setSpaceQuota 1024m -storageType DISK /Folder1 X:~/opensrc/hadoop-3.0.0-SNAPSHOT/bin ./hdfs dfsadmin -clrSpaceQuota -storageType DISK /Folder1 clrSpaceQuota: Illegal option -storageType Usage: hdfs dfsadmin [-clrSpaceQuota [-storageType storagetype] dirname...dirname] {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3397) Failover should be filtered out from HAAdmin to be in sync with doc.
[ https://issues.apache.org/jira/browse/YARN-3397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] J.Andreina updated YARN-3397: - Attachment: YARN-3397.1.patch Attached an initial patch . Please review. Failover should be filtered out from HAAdmin to be in sync with doc. Key: YARN-3397 URL: https://issues.apache.org/jira/browse/YARN-3397 Project: Hadoop YARN Issue Type: Bug Reporter: J.Andreina Attachments: YARN-3397.1.patch Failover should be filtered out from HAAdmin to be in sync with doc. Since -failover is not supported operation in doc it is not been mentioned, cli usage is misguiding (can be in sync with doc) . -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3397) Failover should be filtered out from HAAdmin to be in sync with doc.
J.Andreina created YARN-3397: Summary: Failover should be filtered out from HAAdmin to be in sync with doc. Key: YARN-3397 URL: https://issues.apache.org/jira/browse/YARN-3397 Project: Hadoop YARN Issue Type: Bug Reporter: J.Andreina Failover should be filtered out from HAAdmin to be in sync with doc. Since -failover is not supported operation in doc it is not been mentioned, cli usage is misguiding (can be in sync with doc) . -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3397) Failover should be filtered out from HAAdmin to be in sync with doc.
[ https://issues.apache.org/jira/browse/YARN-3397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14379741#comment-14379741 ] J.Andreina commented on YARN-3397: -- Observation: {noformat} CLI: Rex@XXX:~/Hadoop_March18/hadoop-3.0.0-SNAPSHOT/bin ./yarn rmadmin Usage: yarn rmadmin -refreshQueues -refreshNodes -refreshSuperUserGroupsConfiguration -refreshUserToGroupsMappings -refreshAdminAcls -refreshServiceAcl -getGroups [username] -addToClusterNodeLabels [label1,label2,label3] (label splitted by ,) -removeFromClusterNodeLabels [label1,label2,label3] (label splitted by ,) -replaceLabelsOnNode [node1[:port]=label1,label2 node2[:port]=label1,label2] -directlyAccessNodeLabelStore -transitionToActive [--forceactive] serviceId -transitionToStandby serviceId -failover [--forcefence] [--forceactive] serviceId serviceId -getServiceState serviceId -checkHealth serviceId -help [cmd] {noformat} {noformat} Doc: Runs ResourceManager admin client Usage: yarn rmadmin [-refreshQueues] [-refreshNodes] [-refreshUserToGroupsMapping] [-refreshSuperUserGroupsConfiguration] [-refreshAdminAcls] [-refreshServiceAcl] [-getGroups [username]] [-help [cmd]] [-transitionToActive serviceId] [-transitionToStandby serviceId] [-getServiceState serviceId] [-checkHealth serviceId] {noformat} I would like to work on this issue . If this issue holds good can you please assign to me Failover should be filtered out from HAAdmin to be in sync with doc. Key: YARN-3397 URL: https://issues.apache.org/jira/browse/YARN-3397 Project: Hadoop YARN Issue Type: Bug Reporter: J.Andreina Failover should be filtered out from HAAdmin to be in sync with doc. Since -failover is not supported operation in doc it is not been mentioned, cli usage is misguiding (can be in sync with doc) . -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2908) While checking acl access for acl_administer_queue , should avoid using contains for user/group name.
J.Andreina created YARN-2908: Summary: While checking acl access for acl_administer_queue , should avoid using contains for user/group name. Key: YARN-2908 URL: https://issues.apache.org/jira/browse/YARN-2908 Project: Hadoop YARN Issue Type: Bug Reporter: J.Andreina Priority: Minor If user1 is given acl_administer_queue permissions to QueueA. Then able to perform all administer operations on QueueA from user2. While checking username , contains check should be removed . -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2908) While checking acl access for acl_administer_queue , should avoid using contains for user/group name.
[ https://issues.apache.org/jira/browse/YARN-2908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] J.Andreina updated YARN-2908: - Description: If user12 is given acl_administer_queue permissions to QueueA. Then able to perform all administer operations on QueueA from user1. While checking username , contains check should be removed . was: If user1 is given acl_administer_queue permissions to QueueA. Then able to perform all administer operations on QueueA from user2. While checking username , contains check should be removed . While checking acl access for acl_administer_queue , should avoid using contains for user/group name. - Key: YARN-2908 URL: https://issues.apache.org/jira/browse/YARN-2908 Project: Hadoop YARN Issue Type: Bug Reporter: J.Andreina Assignee: Rohith Priority: Minor If user12 is given acl_administer_queue permissions to QueueA. Then able to perform all administer operations on QueueA from user1. While checking username , contains check should be removed . -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2908) While checking acl access for acl_administer_queue , should avoid using contains for user/group name.
[ https://issues.apache.org/jira/browse/YARN-2908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226100#comment-14226100 ] J.Andreina commented on YARN-2908: -- I have misunderstood that the contains check is on a string . But it's on a list. Please close the issue. While checking acl access for acl_administer_queue , should avoid using contains for user/group name. - Key: YARN-2908 URL: https://issues.apache.org/jira/browse/YARN-2908 Project: Hadoop YARN Issue Type: Bug Reporter: J.Andreina Assignee: Rohith Priority: Minor If user12 is given acl_administer_queue permissions to QueueA. Then able to perform all administer operations on QueueA from user1. While checking username , contains check should be removed . -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2305) When a container is in reserved state then total cluster memory is displayed wrongly.
J.Andreina created YARN-2305: Summary: When a container is in reserved state then total cluster memory is displayed wrongly. Key: YARN-2305 URL: https://issues.apache.org/jira/browse/YARN-2305 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.1 Reporter: J.Andreina ENV Details: = 3 queues : a(50%),b(25%),c(25%) --- All max utilization is set to 100 2 Node cluster with total memory as 16GB TestSteps: = Execute following 3 jobs with different memory configurations for Map , reducer and AM task ./yarn jar wordcount-sleep.jar -Dmapreduce.job.queuename=a -Dwordcount.map.sleep.time=2000 -Dmapreduce.map.memory.mb=2048 -Dyarn.app.mapreduce.am.resource.mb=1024 -Dmapreduce.reduce.memory.mb=2048 /dir8 /preempt_85 (application_1405414066690_0023) ./yarn jar wordcount-sleep.jar -Dmapreduce.job.queuename=b -Dwordcount.map.sleep.time=2000 -Dmapreduce.map.memory.mb=2048 -Dyarn.app.mapreduce.am.resource.mb=2048 -Dmapreduce.reduce.memory.mb=2048 /dir2 /preempt_86 (application_1405414066690_0025) ./yarn jar wordcount-sleep.jar -Dmapreduce.job.queuename=c -Dwordcount.map.sleep.time=2000 -Dmapreduce.map.memory.mb=1024 -Dyarn.app.mapreduce.am.resource.mb=1024 -Dmapreduce.reduce.memory.mb=1024 /dir2 /preempt_62 Issue = when 2GB memory is in reserved state totoal memory is shown as 15GB and used as 15GB ( while total memory is 16GB) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2305) When a container is in reserved state then total cluster memory is displayed wrongly.
[ https://issues.apache.org/jira/browse/YARN-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] J.Andreina updated YARN-2305: - Attachment: Capture.jpg When a container is in reserved state then total cluster memory is displayed wrongly. - Key: YARN-2305 URL: https://issues.apache.org/jira/browse/YARN-2305 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.1 Reporter: J.Andreina Assignee: Hong Zhiguo Attachments: Capture.jpg ENV Details: = 3 queues : a(50%),b(25%),c(25%) --- All max utilization is set to 100 2 Node cluster with total memory as 16GB TestSteps: = Execute following 3 jobs with different memory configurations for Map , reducer and AM task ./yarn jar wordcount-sleep.jar -Dmapreduce.job.queuename=a -Dwordcount.map.sleep.time=2000 -Dmapreduce.map.memory.mb=2048 -Dyarn.app.mapreduce.am.resource.mb=1024 -Dmapreduce.reduce.memory.mb=2048 /dir8 /preempt_85 (application_1405414066690_0023) ./yarn jar wordcount-sleep.jar -Dmapreduce.job.queuename=b -Dwordcount.map.sleep.time=2000 -Dmapreduce.map.memory.mb=2048 -Dyarn.app.mapreduce.am.resource.mb=2048 -Dmapreduce.reduce.memory.mb=2048 /dir2 /preempt_86 (application_1405414066690_0025) ./yarn jar wordcount-sleep.jar -Dmapreduce.job.queuename=c -Dwordcount.map.sleep.time=2000 -Dmapreduce.map.memory.mb=1024 -Dyarn.app.mapreduce.am.resource.mb=1024 -Dmapreduce.reduce.memory.mb=1024 /dir2 /preempt_62 Issue = when 2GB memory is in reserved state totoal memory is shown as 15GB and used as 15GB ( while total memory is 16GB) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2305) When a container is in reserved state then total cluster memory is displayed wrongly.
[ https://issues.apache.org/jira/browse/YARN-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064735#comment-14064735 ] J.Andreina commented on YARN-2305: -- Iam using Capacity Scheduler When a container is in reserved state then total cluster memory is displayed wrongly. - Key: YARN-2305 URL: https://issues.apache.org/jira/browse/YARN-2305 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.1 Reporter: J.Andreina Assignee: Hong Zhiguo Attachments: Capture.jpg ENV Details: = 3 queues : a(50%),b(25%),c(25%) --- All max utilization is set to 100 2 Node cluster with total memory as 16GB TestSteps: = Execute following 3 jobs with different memory configurations for Map , reducer and AM task ./yarn jar wordcount-sleep.jar -Dmapreduce.job.queuename=a -Dwordcount.map.sleep.time=2000 -Dmapreduce.map.memory.mb=2048 -Dyarn.app.mapreduce.am.resource.mb=1024 -Dmapreduce.reduce.memory.mb=2048 /dir8 /preempt_85 (application_1405414066690_0023) ./yarn jar wordcount-sleep.jar -Dmapreduce.job.queuename=b -Dwordcount.map.sleep.time=2000 -Dmapreduce.map.memory.mb=2048 -Dyarn.app.mapreduce.am.resource.mb=2048 -Dmapreduce.reduce.memory.mb=2048 /dir2 /preempt_86 (application_1405414066690_0025) ./yarn jar wordcount-sleep.jar -Dmapreduce.job.queuename=c -Dwordcount.map.sleep.time=2000 -Dmapreduce.map.memory.mb=1024 -Dyarn.app.mapreduce.am.resource.mb=1024 -Dmapreduce.reduce.memory.mb=1024 /dir2 /preempt_62 Issue = when 2GB memory is in reserved state totoal memory is shown as 15GB and used as 15GB ( while total memory is 16GB) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-1184) ClassCastException is thrown during preemption When a huge job is submitted to a queue B whose resources is used by a job in queueA
J.Andreina created YARN-1184: Summary: ClassCastException is thrown during preemption When a huge job is submitted to a queue B whose resources is used by a job in queueA Key: YARN-1184 URL: https://issues.apache.org/jira/browse/YARN-1184 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: J.Andreina preemption is enabled. Queue = a,b a capacity = 30% b capacity = 70% Step 1: Assign a big job to queue a ( so that job_a will utilize some resources from queue b) Step 2: Assigne a big job to queue b. Following exception is thrown at Resource Manager {noformat} 2013-09-12 10:42:32,535 ERROR [SchedulingMonitor (ProportionalCapacityPreemptionPolicy)] yarn.YarnUncaughtExceptionHandler (YarnUncaughtExceptionHandler.java:uncaughtException(68)) - Thread Thread[SchedulingMonitor (ProportionalCapacityPreemptionPolicy),5,main] threw an Exception. java.lang.ClassCastException: java.util.Collections$UnmodifiableSet cannot be cast to java.util.NavigableSet at org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.getContainersToPreempt(ProportionalCapacityPreemptionPolicy.java:403) at org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.containerBasedPreemptOrKill(ProportionalCapacityPreemptionPolicy.java:202) at org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.editSchedule(ProportionalCapacityPreemptionPolicy.java:173) at org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor.invokePolicy(SchedulingMonitor.java:72) at org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor$PreemptionChecker.run(SchedulingMonitor.java:82) at java.lang.Thread.run(Thread.java:662) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-842) Resource Manager Node Manager UI's doesn't work with IE
[ https://issues.apache.org/jira/browse/YARN-842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13757599#comment-13757599 ] J.Andreina commented on YARN-842: - Unable to view the jobs in IE9 both at RM and JHS UI. Currently using hadoop-2.1.0-beta version. IE version :9.0.8112.16421 Attached screenshots of RM and JHS UI Resource Manager Node Manager UI's doesn't work with IE - Key: YARN-842 URL: https://issues.apache.org/jira/browse/YARN-842 Project: Hadoop YARN Issue Type: Bug Components: nodemanager, resourcemanager Affects Versions: 2.0.4-alpha Reporter: Devaraj K Assignee: Devaraj K {code:xml} Webpage error details User Agent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0) Timestamp: Mon, 17 Jun 2013 12:06:03 UTC Message: 'JSON' is undefined Line: 41 Char: 218 Code: 0 URI: http://10.18.40.24:8088/cluster/apps {code} RM NM UI's are not working with IE and showing the above error for every link on the UI. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-933) After an AppAttempt_1 got failed [ removal and releasing of container is done , AppAttempt_2 is scheduled ] again relaunching of AppAttempt_1 throws Exception at RM .And cl
[ https://issues.apache.org/jira/browse/YARN-933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] J.Andreina updated YARN-933: Description: Hostname enabled. am max retries configured as 3 at client and RM side. Step 1: Install cluster with NM on 2 Machines Step 2: Make Ping using ip from RM machine to NM1 machine as successful ,But using Hostname should fail Step 3: Execute a job Step 4: After AM [ AppAttempt_1 ] allocation to NM1 machine is done , connection loss happened. Observation : == After AppAttempt_1 has moved to failed state ,release of container for AppAttempt_1 and Application removal are successful. New AppAttempt_2 is sponed. 1. Then again retry for AppAttempt_1 happens. 2. Again RM side it is trying to launch AppAttempt_1, hence fails with InvalidStateTransitonException 3. Client got exited after AppAttempt_1 is been finished [But actually job is still running ], while the appattempts configured is 3 and rest appattempts are all sponed and running. RMLogs: == 2013-07-17 16:22:51,013 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1373952096466_0056_01 State change from SCHEDULED to ALLOCATED 2013-07-17 16:35:48,171 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: host-10-18-40-15/10.18.40.59:8048. Already tried 36 time(s); maxRetries=45 2013-07-17 16:36:07,091 INFO org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: Expired:container_1373952096466_0056_01_01 Timed out after 600 secs 2013-07-17 16:36:07,093 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1373952096466_0056_01_01 Container Transitioned from ACQUIRED to EXPIRED 2013-07-17 16:36:07,093 INFO org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: Registering appattempt_1373952096466_0056_02 2013-07-17 16:36:07,131 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application appattempt_1373952096466_0056_01 is done. finalState=FAILED 2013-07-17 16:36:07,131 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Application removed - appId: application_1373952096466_0056 user: Rex leaf-queue of parent: root #applications: 35 2013-07-17 16:36:07,132 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application Submission: appattempt_1373952096466_0056_02, 2013-07-17 16:36:07,138 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1373952096466_0056_02 State change from SUBMITTED to SCHEDULED 2013-07-17 16:36:30,179 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: host-10-18-40-15/10.18.40.59:8048. Already tried 38 time(s); maxRetries=45 2013-07-17 16:38:36,203 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: host-10-18-40-15/10.18.40.59:8048. Already tried 44 time(s); maxRetries=45 2013-07-17 16:38:56,207 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error launching appattempt_1373952096466_0056_01. Got exception: java.lang.reflect.UndeclaredThrowableException 2013-07-17 16:38:56,207 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: LAUNCH_FAILED at FAILED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:445) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:630) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:99) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:495) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:476) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:130) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:77) at java.lang.Thread.run(Thread.java:662) Client Logs Caused by: org.apache.hadoop.net.ConnectTimeoutException: 2 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=host-10-18-40-15/10.18.40.59:8020] at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:573) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:534) 2013-07-17 16:37:05,987 ERROR org.apache.hadoop.security.UserGroupInformation:
[jira] [Updated] (YARN-933) After an AppAttempt_1 got failed [ removal and releasing of container is done , AppAttempt_2 is scheduled ] again relaunching of AppAttempt_1 throws Exception at RM .And cl
[ https://issues.apache.org/jira/browse/YARN-933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] J.Andreina updated YARN-933: Description: am max retries configured as 3 at client and RM side. Step 1: Install cluster with NM on 2 Machines Step 2: Make Ping using ip from RM machine to NM1 machine as successful ,But using Hostname should fail Step 3: Execute a job Step 4: After AM [ AppAttempt_1 ] allocation to NM1 machine is done , connection loss happened. Observation : == After AppAttempt_1 has moved to failed state ,release of container for AppAttempt_1 and Application removal are successful. New AppAttempt_2 is sponed. 1. Then again retry for AppAttempt_1 happens. 2. Again RM side it is trying to launch AppAttempt_1, hence fails with InvalidStateTransitonException 3. Client got exited after AppAttempt_1 is been finished [But actually job is still running ], while the appattempts configured is 3 and rest appattempts are all sponed and running. RMLogs: == 2013-07-17 16:22:51,013 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1373952096466_0056_01 State change from SCHEDULED to ALLOCATED 2013-07-17 16:35:48,171 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: host-10-18-40-15/10.18.40.59:8048. Already tried 36 time(s); maxRetries=45 2013-07-17 16:36:07,091 INFO org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: Expired:container_1373952096466_0056_01_01 Timed out after 600 secs 2013-07-17 16:36:07,093 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1373952096466_0056_01_01 Container Transitioned from ACQUIRED to EXPIRED 2013-07-17 16:36:07,093 INFO org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: Registering appattempt_1373952096466_0056_02 2013-07-17 16:36:07,131 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application appattempt_1373952096466_0056_01 is done. finalState=FAILED 2013-07-17 16:36:07,131 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Application removed - appId: application_1373952096466_0056 user: Rex leaf-queue of parent: root #applications: 35 2013-07-17 16:36:07,132 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application Submission: appattempt_1373952096466_0056_02, 2013-07-17 16:36:07,138 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1373952096466_0056_02 State change from SUBMITTED to SCHEDULED 2013-07-17 16:36:30,179 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: host-10-18-40-15/10.18.40.59:8048. Already tried 38 time(s); maxRetries=45 2013-07-17 16:38:36,203 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: host-10-18-40-15/10.18.40.59:8048. Already tried 44 time(s); maxRetries=45 2013-07-17 16:38:56,207 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error launching appattempt_1373952096466_0056_01. Got exception: java.lang.reflect.UndeclaredThrowableException 2013-07-17 16:38:56,207 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: LAUNCH_FAILED at FAILED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:445) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:630) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:99) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:495) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:476) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:130) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:77) at java.lang.Thread.run(Thread.java:662) Client Logs Caused by: org.apache.hadoop.net.ConnectTimeoutException: 2 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=host-10-18-40-15/10.18.40.59:8020] at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:573) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:534) 2013-07-17 16:37:05,987 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
[jira] [Created] (YARN-933) After an AppAttempt_1 got failed [ removal and releasing of container is done , AppAttempt_2 is scheduled ] again relaunching of AppAttempt_1 throws Exception at RM .And cl
J.Andreina created YARN-933: --- Summary: After an AppAttempt_1 got failed [ removal and releasing of container is done , AppAttempt_2 is scheduled ] again relaunching of AppAttempt_1 throws Exception at RM .And client exited before appattempt retries got over Key: YARN-933 URL: https://issues.apache.org/jira/browse/YARN-933 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.5-alpha Reporter: J.Andreina Hostname enabled. am max retries configured as 3 at client and RM side. Step 1: Install cluster in HA mode with NM on 2 Machines Step 2: Make Ping using ip from RM machine to NM1 machine as successful ,But using Hostname should fail Step 3: Execute a job Step 4: After AM [ AppAttempt_1 ] allocation to NM1 machine is done , connection loss happened. Observation : == After AppAttempt_1 has moved to failed state ,release of container for AppAttempt_1 and Application removal are successful. New AppAttempt_2 is sponed. 1. Then again retry for AppAttempt_1 happens. 2. Again RM side it is trying to launch AppAttempt_1, hence fails with InvalidStateTransitonException 3. Client got exited after AppAttempt_1 is been finished [But actually job is still running ], while the appattempts configured is 3 and rest appattempts are all sponed and running. RMLogs: == 2013-07-17 16:22:51,013 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1373952096466_0056_01 State change from SCHEDULED to ALLOCATED 2013-07-17 16:35:48,171 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: host-10-18-40-15/10.18.40.59:8048. Already tried 36 time(s); maxRetries=45 2013-07-17 16:36:07,091 INFO org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: Expired:container_1373952096466_0056_01_01 Timed out after 600 secs 2013-07-17 16:36:07,093 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1373952096466_0056_01_01 Container Transitioned from ACQUIRED to EXPIRED 2013-07-17 16:36:07,093 INFO org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: Registering appattempt_1373952096466_0056_02 2013-07-17 16:36:07,131 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application appattempt_1373952096466_0056_01 is done. finalState=FAILED 2013-07-17 16:36:07,131 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Application removed - appId: application_1373952096466_0056 user: Rex leaf-queue of parent: root #applications: 35 2013-07-17 16:36:07,132 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application Submission: appattempt_1373952096466_0056_02, 2013-07-17 16:36:07,138 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1373952096466_0056_02 State change from SUBMITTED to SCHEDULED 2013-07-17 16:36:30,179 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: host-10-18-40-15/10.18.40.59:8048. Already tried 38 time(s); maxRetries=45 2013-07-17 16:38:36,203 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: host-10-18-40-15/10.18.40.59:8048. Already tried 44 time(s); maxRetries=45 2013-07-17 16:38:56,207 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error launching appattempt_1373952096466_0056_01. Got exception: java.lang.reflect.UndeclaredThrowableException 2013-07-17 16:38:56,207 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: LAUNCH_FAILED at FAILED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:445) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:630) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:99) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:495) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:476) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:130) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:77) at java.lang.Thread.run(Thread.java:662) Client Logs Caused by: org.apache.hadoop.net.ConnectTimeoutException: 2
[jira] [Moved] (YARN-928) While killing attempt for a task which got succeeded , task transition happens from SUCCEEDED to SCHEDULED and InvalidStateTransitonException is thrown
[ https://issues.apache.org/jira/browse/YARN-928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] J.Andreina moved MAPREDUCE-5389 to YARN-928: Component/s: (was: task) applications Affects Version/s: (was: 2.0.5-alpha) 2.0.5-alpha Key: YARN-928 (was: MAPREDUCE-5389) Project: Hadoop YARN (was: Hadoop Map/Reduce) While killing attempt for a task which got succeeded , task transition happens from SUCCEEDED to SCHEDULED and InvalidStateTransitonException is thrown Key: YARN-928 URL: https://issues.apache.org/jira/browse/YARN-928 Project: Hadoop YARN Issue Type: Bug Components: applications Affects Versions: 2.0.5-alpha Reporter: J.Andreina Priority: Minor Step 1: Install cluster with HDFS , MR Step 2: Execute a job Step 3: Issue a kill task attempt for which the task has got completed. Rex@HOST-10-18-91-55:~/NodeAgentTmpDir/installations/hadoop-2.0.5.tar/hadoop-2.0.5/bin ./mapred job -kill-task attempt_1373875322959_0032_m_00_0 No GC_PROFILE is given. Defaults to medium. 13/07/15 14:46:32 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited. 13/07/15 14:46:32 INFO proxy.ResourceManagerProxies: HA Proxy Creation with xface : interface org.apache.hadoop.yarn.api.ClientRMProtocol 13/07/15 14:46:33 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is started. Killed task attempt_1373875322959_0032_m_00_0 Observation: === 1. task state has been transitioned from SUCCEEDED to SCHEDULED 2. For a Succeeded attempt , when client issues Kill , then the client is notified as killed for a succeeded attempt. 3. Launched second task_attempt which is succeeded and then killed later on client request. 4. Even after the job state transitioned from SUCCEEDED to ERROR , on UI the state is succeeded Issue : = 1. Client has been notified that the atttempt is killed , but acutually the attempt is succeeded and the same is displayed in JHS UI. 2. At App master InvalidStateTransitonException is thrown . 3. At client side and JHS job has exited with state Finished/succeeded ,At RM side the state is Finished/Failed. AM Logs: 2013-07-15 14:46:25,461 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1373875322959_0032_m_00_0 TaskAttempt Transitioned from RUNNING to SUCCEEDED 2013-07-15 14:46:25,468 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Task succeeded with attempt attempt_1373875322959_0032_m_00_0 2013-07-15 14:46:25,470 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1373875322959_0032_m_00 Task Transitioned from RUNNING to SUCCEEDED 2013-07-15 14:46:33,810 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1373875322959_0032_m_00 Task Transitioned from SUCCEEDED to SCHEDULED 2013-07-15 14:46:37,344 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Task succeeded with attempt attempt_1373875322959_0032_m_00_1 2013-07-15 14:46:37,344 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1373875322959_0032_m_00 Task Transitioned from RUNNING to SUCCEEDED 2013-07-15 14:46:37,345 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: JOB_TASK_COMPLETED at SUCCEEDED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:445) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:866) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:128) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1095) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1091) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:130) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:77) at java.lang.Thread.run(Thread.java:662) -- This message is automatically