[jira] [Commented] (YARN-9956) Improve connection error message for YARN ApiServerClient

2019-12-09 Thread Prabhu Joseph (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16992188#comment-16992188
 ] 

Prabhu Joseph commented on YARN-9956:
-

Yes [~eyang], Patch 003 is causing the testcase {{TestSecureApiServiceClient}} 
failure. Will fix the same and submit a new patch. Thanks.

> Improve connection error message for YARN ApiServerClient
> -
>
> Key: YARN-9956
> URL: https://issues.apache.org/jira/browse/YARN-9956
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Eric Yang
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9956-001.patch, YARN-9956-002.patch, 
> YARN-9956-003.patch
>
>
> In HA environment, yarn.resourcemanager.webapp.address configuration is 
> optional.  ApiServiceClient may produce confusing error message like this:
> {code}
> 19/10/30 20:13:42 INFO client.ApiServiceClient: Fail to connect to: 
> host1.example.com:8090
> 19/10/30 20:13:42 INFO client.ApiServiceClient: Fail to connect to: 
> host2.example.com:8090
> 19/10/30 20:13:42 INFO util.log: Logging initialized @2301ms
> 19/10/30 20:13:42 ERROR client.ApiServiceClient: Error: {}
> GSSException: No valid credentials provided (Mechanism level: Server not 
> found in Kerberos database (7) - LOOKING_UP_SERVER)
>   at 
> java.security.jgss/sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:771)
>   at 
> java.security.jgss/sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:266)
>   at 
> java.security.jgss/sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:196)
>   at 
> org.apache.hadoop.yarn.service.client.ApiServiceClient$1.run(ApiServiceClient.java:125)
>   at 
> org.apache.hadoop.yarn.service.client.ApiServiceClient$1.run(ApiServiceClient.java:105)
>   at java.base/java.security.AccessController.doPrivileged(Native Method)
>   at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
>   at 
> org.apache.hadoop.yarn.service.client.ApiServiceClient.generateToken(ApiServiceClient.java:105)
>   at 
> org.apache.hadoop.yarn.service.client.ApiServiceClient.getApiClient(ApiServiceClient.java:290)
>   at 
> org.apache.hadoop.yarn.service.client.ApiServiceClient.getApiClient(ApiServiceClient.java:271)
>   at 
> org.apache.hadoop.yarn.service.client.ApiServiceClient.actionLaunch(ApiServiceClient.java:416)
>   at 
> org.apache.hadoop.yarn.client.cli.ApplicationCLI.run(ApplicationCLI.java:589)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
>   at 
> org.apache.hadoop.yarn.client.cli.ApplicationCLI.main(ApplicationCLI.java:125)
> Caused by: KrbException: Server not found in Kerberos database (7) - 
> LOOKING_UP_SERVER
>   at 
> java.security.jgss/sun.security.krb5.KrbTgsRep.(KrbTgsRep.java:73)
>   at 
> java.security.jgss/sun.security.krb5.KrbTgsReq.getReply(KrbTgsReq.java:251)
>   at 
> java.security.jgss/sun.security.krb5.KrbTgsReq.sendAndGetCreds(KrbTgsReq.java:262)
>   at 
> java.security.jgss/sun.security.krb5.internal.CredentialsUtil.serviceCreds(CredentialsUtil.java:308)
>   at 
> java.security.jgss/sun.security.krb5.internal.CredentialsUtil.acquireServiceCreds(CredentialsUtil.java:126)
>   at 
> java.security.jgss/sun.security.krb5.Credentials.acquireServiceCreds(Credentials.java:458)
>   at 
> java.security.jgss/sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:695)
>   ... 15 more
> Caused by: KrbException: Identifier doesn't match expected value (906)
>   at 
> java.security.jgss/sun.security.krb5.internal.KDCRep.init(KDCRep.java:140)
>   at 
> java.security.jgss/sun.security.krb5.internal.TGSRep.init(TGSRep.java:65)
>   at 
> java.security.jgss/sun.security.krb5.internal.TGSRep.(TGSRep.java:60)
>   at 
> java.security.jgss/sun.security.krb5.KrbTgsRep.(KrbTgsRep.java:55)
>   ... 21 more
> 19/10/30 20:13:42 ERROR client.ApiServiceClient: Fail to launch application: 
> java.io.IOException: java.lang.reflect.UndeclaredThrowableException
>   at 
> org.apache.hadoop.yarn.service.client.ApiServiceClient.getApiClient(ApiServiceClient.java:293)
>   at 
> org.apache.hadoop.yarn.service.client.ApiServiceClient.getApiClient(ApiServiceClient.java:271)
>   at 
> org.apache.hadoop.yarn.service.client.ApiServiceClient.actionLaunch(ApiServiceClient.java:416)
>   at 
> org.apache.hadoop.yarn.client.cli.ApplicationCLI.run(ApplicationCLI.java:589)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
>   at 
> 

[jira] [Commented] (YARN-9985) Unsupported "transitionToObserver" option displaying for rmadmin command

2019-12-09 Thread Akira Ajisaka (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16992103#comment-16992103
 ] 

Akira Ajisaka commented on YARN-9985:
-

Filed HADOOP-16753 for refactoring.

> Unsupported "transitionToObserver" option displaying for rmadmin command
> 
>
> Key: YARN-9985
> URL: https://issues.apache.org/jira/browse/YARN-9985
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: RM, yarn
>Affects Versions: 3.2.1
>Reporter: Souryakanta Dwivedy
>Assignee: Ayush Saxena
>Priority: Minor
> Fix For: 3.3.0, 3.2.2
>
> Attachments: YARN-9985-01.patch, YARN-9985-02.patch, 
> image-2019-11-18-18-31-17-755.png, image-2019-11-18-18-35-54-688.png
>
>
> Unsupported "transitionToObserver" option displaying for rmadmin command
> Check the options for Yarn rmadmin command
> It will display the "-transitionToObserver " option which is not 
> supported 
>  by yarn rmadmin command which is wrong behavior.
>  But if you check the yarn rmadmin -help it will not display any option  
> "-transitionToObserver "
>  
> !image-2019-11-18-18-31-17-755.png!
>  
> ==
> install/hadoop/resourcemanager/bin> ./yarn rmadmin -help
> rmadmin is the command to execute YARN administrative commands.
> The full syntax is:
> yarn rmadmin [-refreshQueues] [-refreshNodes [-g|graceful [timeout in 
> seconds] -client|server]] [-refreshNodesResources] 
> [-refreshSuperUserGroupsConfiguration] [-refreshUserToGroupsMappings] 
> [-refreshAdminAcls] [-refreshServiceAcl] [-getGroup [username]] 
> [-addToClusterNodeLabels 
> <"label1(exclusive=true),label2(exclusive=false),label3">] 
> [-removeFromClusterNodeLabels ] [-replaceLabelsOnNode 
> <"node1[:port]=label1,label2 node2[:port]=label1"> [-failOnUnknownNodes]] 
> [-directlyAccessNodeLabelStore] [-refreshClusterMaxPriority] 
> [-updateNodeResource [NodeID] [MemSize] [vCores] ([OvercommitTimeout]) or 
> -updateNodeResource [NodeID] [ResourceTypes] ([OvercommitTimeout])] 
> *{color:#FF}[-transitionToActive [--forceactive] ]{color} 
> {color:#FF}[-transitionToStandby ]{color}* [-getServiceState 
> ] [-getAllServiceState] [-checkHealth ] [-help [cmd]]
> -refreshQueues: Reload the queues' acls, states and scheduler specific 
> properties.
>  ResourceManager will reload the mapred-queues configuration file.
>  -refreshNodes [-g|graceful [timeout in seconds] -client|server]: Refresh the 
> hosts information at the ResourceManager. Here [-g|graceful [timeout in 
> seconds] -client|server] is optional, if we specify the timeout then 
> ResourceManager will wait for timeout before marking the NodeManager as 
> decommissioned. The -client|server indicates if the timeout tracking should 
> be handled by the client or the ResourceManager. The client-side tracking is 
> blocking, while the server-side tracking is not. Omitting the timeout, or a 
> timeout of -1, indicates an infinite timeout. Known Issue: the server-side 
> tracking will immediately decommission if an RM HA failover occurs.
>  -refreshNodesResources: Refresh resources of NodeManagers at the 
> ResourceManager.
>  -refreshSuperUserGroupsConfiguration: Refresh superuser proxy groups mappings
>  -refreshUserToGroupsMappings: Refresh user-to-groups mappings
>  -refreshAdminAcls: Refresh acls for administration of ResourceManager
>  -refreshServiceAcl: Reload the service-level authorization policy file.
>  ResourceManager will reload the authorization policy file.
>  -getGroups [username]: Get the groups which given user belongs to.
>  -addToClusterNodeLabels 
> <"label1(exclusive=true),label2(exclusive=false),label3">: add to cluster 
> node labels. Default exclusivity is true
>  -removeFromClusterNodeLabels  (label splitted by ","): 
> remove from cluster node labels
>  -replaceLabelsOnNode <"node1[:port]=label1,label2 
> node2[:port]=label1,label2"> [-failOnUnknownNodes] : replace labels on nodes 
> (please note that we do not support specifying multiple labels on a single 
> host for now.)
>  [-failOnUnknownNodes] is optional, when we set this option, it will fail if 
> specified nodes are unknown.
>  -directlyAccessNodeLabelStore: This is DEPRECATED, will be removed in future 
> releases. Directly access node label store, with this option, all node label 
> related operations will not connect RM. Instead, they will access/modify 
> stored node labels directly. By default, it is false (access via RM). AND 
> PLEASE NOTE: if you configured yarn.node-labels.fs-store.root-dir to a local 
> directory (instead of NFS or HDFS), this option will only work when the 
> command run on the machine where RM is running.
>  -refreshClusterMaxPriority: Refresh cluster max priority
>  -updateNodeResource [NodeID] [MemSize] [vCores] ([OvercommitTimeout])

[jira] [Updated] (YARN-9914) Use separate configs for free disk space checking for full and not-full disks

2019-12-09 Thread Jonathan Hung (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated YARN-9914:

Fix Version/s: (was: 2.11.0)

> Use separate configs for free disk space checking for full and not-full disks
> -
>
> Key: YARN-9914
> URL: https://issues.apache.org/jira/browse/YARN-9914
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Minor
> Fix For: 2.10.0, 3.0.4, 3.3.0, 2.8.6, 2.9.3, 3.2.2, 3.1.4
>
> Attachments: YARN-9914-branch-2.8.001.patch, YARN-9914.001.patch, 
> YARN-9914.002.patch
>
>
> [YARN-3943] added separate configurations for the nodemanager health check 
> disk utilization full disk check:
> {{max-disk-utilization-per-disk-percentage}} - threshold for marking a good 
> disk full
> {{disk-utilization-watermark-low-per-disk-percentage}} - threshold for 
> marking a full disk as not full.
> On our clusters, we do not use these configs. We instead use 
> {{min-free-space-per-disk-mb}} so we can specify the limit in mb instead of 
> percent of utilization. We have observed the same oscillation behavior as 
> described in [YARN-3943] with this parameter. I would like to add an optional 
> config to specify a separate threshold for marking a full disk as not full:
> {{min-free-space-per-disk-mb}} - threshold at which a good disk is marked full
> {{disk-free-space-per-disk-high-watermark-mb}} - threshold at which a full 
> disk is marked good.
> So for example, we could set {{min-free-space-per-disk-mb = 5GB}}, which 
> would cause a disk to be marked full when free space goes below 5GB, and 
> {{disk-free-space-per-disk-high-watermark-mb = 10GB}} to keep the disk in the 
> full state until free space goes above 10GB.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4901) QueueMetrics needs to be cleared before MockRM is initialized

2019-12-09 Thread Jonathan Hung (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-4901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated YARN-4901:

Fix Version/s: (was: 2.11.0)

> QueueMetrics needs to be cleared before MockRM is initialized
> -
>
> Key: YARN-4901
> URL: https://issues.apache.org/jira/browse/YARN-4901
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Reporter: Daniel Templeton
>Assignee: Peter Bacsko
>Priority: Major
> Fix For: 3.3.0, 3.2.2, 3.1.4, 2.10.1
>
> Attachments: YARN-4901-001.patch, YARN-4901-branch-3.2.002.patch
>
>
> The {{ResourceManager}} rightly assumes that when it starts, it's starting 
> from naught.  The {{MockRM}}, however, violates that assumption.  For 
> example, in {{TestNMReconnect}}, each test method creates a new {{MockRM}} 
> instance.  The {{QueueMetrics.queueMetrics}} field is static, which means 
> that when multiple {{MockRM}} instances are created, the {{QueueMetrics}} 
> bleed over.  Having the MockRM clear the {{QueueMetrics}} when it starts 
> should resolve the issue.  I haven't looked yet at scope to see how hard easy 
> that is to do.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4901) QueueMetrics needs to be cleared before MockRM is initialized

2019-12-09 Thread Jonathan Hung (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-4901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991918#comment-16991918
 ] 

Jonathan Hung commented on YARN-4901:
-

Removing 2.11.0 fix version after branch-2 -> branch-2.10 rename

> QueueMetrics needs to be cleared before MockRM is initialized
> -
>
> Key: YARN-4901
> URL: https://issues.apache.org/jira/browse/YARN-4901
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Reporter: Daniel Templeton
>Assignee: Peter Bacsko
>Priority: Major
> Fix For: 3.3.0, 3.2.2, 3.1.4, 2.10.1
>
> Attachments: YARN-4901-001.patch, YARN-4901-branch-3.2.002.patch
>
>
> The {{ResourceManager}} rightly assumes that when it starts, it's starting 
> from naught.  The {{MockRM}}, however, violates that assumption.  For 
> example, in {{TestNMReconnect}}, each test method creates a new {{MockRM}} 
> instance.  The {{QueueMetrics.queueMetrics}} field is static, which means 
> that when multiple {{MockRM}} instances are created, the {{QueueMetrics}} 
> bleed over.  Having the MockRM clear the {{QueueMetrics}} when it starts 
> should resolve the issue.  I haven't looked yet at scope to see how hard easy 
> that is to do.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10012) Guaranteed and max capacity queue metrics for custom resources

2019-12-09 Thread Jonathan Hung (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991920#comment-16991920
 ] 

Jonathan Hung commented on YARN-10012:
--

Removing 2.11.0 fix version after branch-2 -> branch-2.10 rename

> Guaranteed and max capacity queue metrics for custom resources
> --
>
> Key: YARN-10012
> URL: https://issues.apache.org/jira/browse/YARN-10012
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jonathan Hung
>Assignee: Manikandan R
>Priority: Major
> Fix For: 3.3.0, 3.2.2, 3.1.4, 2.10.1
>
> Attachments: YARN-10012-branch-2.001.patch, 
> YARN-10012-branch-3.2.004.patch, YARN-10012.001.patch, YARN-10012.002.patch, 
> YARN-10012.003.patch
>
>
> YARN-9085 adds support for guaranteed/maxcapacity MB/vcores. We should add 
> the same for custom resources.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10012) Guaranteed and max capacity queue metrics for custom resources

2019-12-09 Thread Jonathan Hung (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated YARN-10012:
-
Fix Version/s: (was: 2.11.0)

> Guaranteed and max capacity queue metrics for custom resources
> --
>
> Key: YARN-10012
> URL: https://issues.apache.org/jira/browse/YARN-10012
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jonathan Hung
>Assignee: Manikandan R
>Priority: Major
> Fix For: 3.3.0, 3.2.2, 3.1.4, 2.10.1
>
> Attachments: YARN-10012-branch-2.001.patch, 
> YARN-10012-branch-3.2.004.patch, YARN-10012.001.patch, YARN-10012.002.patch, 
> YARN-10012.003.patch
>
>
> YARN-9085 adds support for guaranteed/maxcapacity MB/vcores. We should add 
> the same for custom resources.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9205) When using custom resource type, application will fail to run due to the CapacityScheduler throws InvalidResourceRequestException(GREATER_THEN_MAX_ALLOCATION)

2019-12-09 Thread Jonathan Hung (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated YARN-9205:

Fix Version/s: (was: 2.11.0)

> When using custom resource type, application will fail to run due to the 
> CapacityScheduler throws 
> InvalidResourceRequestException(GREATER_THEN_MAX_ALLOCATION) 
> ---
>
> Key: YARN-9205
> URL: https://issues.apache.org/jira/browse/YARN-9205
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Zhankun Tang
>Assignee: Zhankun Tang
>Priority: Critical
> Fix For: 3.1.2, 3.3.0, 3.2.1, 2.10.1
>
> Attachments: YARN-9205-branch-2.001.patch, 
> YARN-9205-branch-3.1.001.patch, YARN-9205-branch-3.2.001.patch, 
> YARN-9205-trunk.001.patch, YARN-9205-trunk.002.patch, 
> YARN-9205-trunk.003.patch, YARN-9205-trunk.004.patch, 
> YARN-9205-trunk.005.patch, YARN-9205-trunk.006.patch, 
> YARN-9205-trunk.007.patch, YARN-9205-trunk.008.patch, 
> YARN-9205-trunk.009.patch
>
>
> In a non-secure cluster. Reproduce it as follows:
>  # Set capacity scheduler in yarn-site.xml
>  # Use default capacity-scheduler.xml
>  # Set custom resource type "cmp.com/hdw" in resource-types.xml
>  # Set a value say 10 in node-resources.xml
>  # Start cluster
>  # Submit a distribute shell application which requests some "cmp.com/hdw"
> The AM will get an exception from CapacityScheduler and then failed. This bug 
> doesn't exist in FairScheduler.
> {code:java}
> 2019-01-17 22:12:11,286 INFO distributedshell.ApplicationMaster: Requested 
> container ask: Capability[ 2>]Priority[0]AllocationRequestId[0]ExecutionTypeRequest[{Execution Type: 
> GUARANTEED, Enforce Execution Type: false}]Resource Profile[]
> 2019-01-17 22:12:12,326 ERROR impl.AMRMClientAsyncImpl: Exception on heartbeat
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request! Cannot allocate containers as requested resource is greater 
> than maximum allowed allocation. Requested resource type=[cmp.com/hdw], 
> Requested resource=, maximum allowed 
> allocation=, please note that maximum allowed 
> allocation is calculated by scheduler based on maximum resource of registered 
> NodeManagers, which might be less than configured maximum 
> allocation=
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.throwInvalidResourceException(SchedulerUtils.java:492)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.checkResourceRequestAgainstAvailableResource(SchedulerUtils.java:388)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:315)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:293)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:301)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.normalizeAndValidateRequests(RMServerUtils.java:250)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor.allocate(DefaultAMSProcessor.java:240)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.processor.DisabledPlacementProcessor.allocate(DisabledPlacementProcessor.java:75)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AMSProcessingChain.allocate(AMSProcessingChain.java:92)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:424)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
> ...{code}
> Did a roughly debugging, below method return the wrong maximum capacity.
> DefaultAMSProcessor.java, Line 234.
> {code:java}
> Resource maximumCapacity =
>  getScheduler().getMaximumResourceCapability(app.getQueue());{code}
> The above code seems should return "" 
> but returns "".
> This incorrect value might be caused by queue maximum allocation calculation 
> involved in YARN-8720:
> AbstractCSQueue.java Line364
> {code:java}
> this.maximumAllocation =
>  configuration.getMaximumAllocationPerQueue(
>  getQueuePath());{code}
> And this invokes CapacitySchedulerConfiguration.java Line 895:
> {code:java}
> Resource clusterMax = ResourceUtils.fetchMaximumAllocationFromConfig(this);
> {code}
> Passing a "this" which is not a YarnConfiguration instance will cause below 
> code return null for resource names and then only contains mandatory 
> resources. This might be the root cause.
> {code:java}
> private static Map 
> 

[jira] [Commented] (YARN-9205) When using custom resource type, application will fail to run due to the CapacityScheduler throws InvalidResourceRequestException(GREATER_THEN_MAX_ALLOCATION)

2019-12-09 Thread Jonathan Hung (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991913#comment-16991913
 ] 

Jonathan Hung commented on YARN-9205:
-

Removing 2.11.0 fix version after branch-2 -> branch-2.10 rename

> When using custom resource type, application will fail to run due to the 
> CapacityScheduler throws 
> InvalidResourceRequestException(GREATER_THEN_MAX_ALLOCATION) 
> ---
>
> Key: YARN-9205
> URL: https://issues.apache.org/jira/browse/YARN-9205
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Zhankun Tang
>Assignee: Zhankun Tang
>Priority: Critical
> Fix For: 3.1.2, 3.3.0, 3.2.1, 2.10.1
>
> Attachments: YARN-9205-branch-2.001.patch, 
> YARN-9205-branch-3.1.001.patch, YARN-9205-branch-3.2.001.patch, 
> YARN-9205-trunk.001.patch, YARN-9205-trunk.002.patch, 
> YARN-9205-trunk.003.patch, YARN-9205-trunk.004.patch, 
> YARN-9205-trunk.005.patch, YARN-9205-trunk.006.patch, 
> YARN-9205-trunk.007.patch, YARN-9205-trunk.008.patch, 
> YARN-9205-trunk.009.patch
>
>
> In a non-secure cluster. Reproduce it as follows:
>  # Set capacity scheduler in yarn-site.xml
>  # Use default capacity-scheduler.xml
>  # Set custom resource type "cmp.com/hdw" in resource-types.xml
>  # Set a value say 10 in node-resources.xml
>  # Start cluster
>  # Submit a distribute shell application which requests some "cmp.com/hdw"
> The AM will get an exception from CapacityScheduler and then failed. This bug 
> doesn't exist in FairScheduler.
> {code:java}
> 2019-01-17 22:12:11,286 INFO distributedshell.ApplicationMaster: Requested 
> container ask: Capability[ 2>]Priority[0]AllocationRequestId[0]ExecutionTypeRequest[{Execution Type: 
> GUARANTEED, Enforce Execution Type: false}]Resource Profile[]
> 2019-01-17 22:12:12,326 ERROR impl.AMRMClientAsyncImpl: Exception on heartbeat
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request! Cannot allocate containers as requested resource is greater 
> than maximum allowed allocation. Requested resource type=[cmp.com/hdw], 
> Requested resource=, maximum allowed 
> allocation=, please note that maximum allowed 
> allocation is calculated by scheduler based on maximum resource of registered 
> NodeManagers, which might be less than configured maximum 
> allocation=
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.throwInvalidResourceException(SchedulerUtils.java:492)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.checkResourceRequestAgainstAvailableResource(SchedulerUtils.java:388)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:315)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:293)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:301)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.normalizeAndValidateRequests(RMServerUtils.java:250)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor.allocate(DefaultAMSProcessor.java:240)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.processor.DisabledPlacementProcessor.allocate(DisabledPlacementProcessor.java:75)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AMSProcessingChain.allocate(AMSProcessingChain.java:92)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:424)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
> ...{code}
> Did a roughly debugging, below method return the wrong maximum capacity.
> DefaultAMSProcessor.java, Line 234.
> {code:java}
> Resource maximumCapacity =
>  getScheduler().getMaximumResourceCapability(app.getQueue());{code}
> The above code seems should return "" 
> but returns "".
> This incorrect value might be caused by queue maximum allocation calculation 
> involved in YARN-8720:
> AbstractCSQueue.java Line364
> {code:java}
> this.maximumAllocation =
>  configuration.getMaximumAllocationPerQueue(
>  getQueuePath());{code}
> And this invokes CapacitySchedulerConfiguration.java Line 895:
> {code:java}
> Resource clusterMax = ResourceUtils.fetchMaximumAllocationFromConfig(this);
> {code}
> Passing a "this" which is not a YarnConfiguration instance will cause below 
> code return null for resource names and then only contains mandatory 
> resources. This might be the root 

[jira] [Updated] (YARN-9915) Fix FindBug issue in QueueMetrics

2019-12-09 Thread Jonathan Hung (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated YARN-9915:

Fix Version/s: (was: 2.11.0)

> Fix FindBug issue in QueueMetrics
> -
>
> Key: YARN-9915
> URL: https://issues.apache.org/jira/browse/YARN-9915
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Minor
> Fix For: 3.3.0, 3.2.2, 3.1.4, 2.10.1
>
> Attachments: YARN-9915-01.patch
>
>
> Below FindBug issue appears in the trunk build
> {code}
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics.registerCustomResources()
>  invokes inefficient new Long(long) constructor; use Long.valueOf(long) 
> instead
> Bug type DM_NUMBER_CTOR (click for details) 
> In class org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics
> In method 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics.registerCustomResources()
> Called method new Long(long)
> Should call Long.valueOf(long) instead
> At QueueMetrics.java:[line 468]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9915) Fix FindBug issue in QueueMetrics

2019-12-09 Thread Jonathan Hung (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991909#comment-16991909
 ] 

Jonathan Hung commented on YARN-9915:
-

Removing 2.11.0 fix version after branch-2 -> branch-2.10 rename

> Fix FindBug issue in QueueMetrics
> -
>
> Key: YARN-9915
> URL: https://issues.apache.org/jira/browse/YARN-9915
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Minor
> Fix For: 3.3.0, 3.2.2, 3.1.4, 2.10.1
>
> Attachments: YARN-9915-01.patch
>
>
> Below FindBug issue appears in the trunk build
> {code}
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics.registerCustomResources()
>  invokes inefficient new Long(long) constructor; use Long.valueOf(long) 
> instead
> Bug type DM_NUMBER_CTOR (click for details) 
> In class org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics
> In method 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics.registerCustomResources()
> Called method new Long(long)
> Should call Long.valueOf(long) instead
> At QueueMetrics.java:[line 468]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9773) Add QueueMetrics for Custom Resources

2019-12-09 Thread Jonathan Hung (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated YARN-9773:

Fix Version/s: (was: 2.11.0)

> Add QueueMetrics for Custom Resources
> -
>
> Key: YARN-9773
> URL: https://issues.apache.org/jira/browse/YARN-9773
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
> Fix For: 3.3.0, 3.2.2, 3.1.4, 2.10.1
>
> Attachments: YARN-9773.001.patch, YARN-9773.002.patch, 
> YARN-9773.003.patch
>
>
> Although the custom resource metrics are calculated and saved as a 
> QueueMetricsForCustomResources object within the QueueMetrics class, the JMX 
> and Simon QueueMetrics do not report that information for custom resources. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9773) Add QueueMetrics for Custom Resources

2019-12-09 Thread Jonathan Hung (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991908#comment-16991908
 ] 

Jonathan Hung commented on YARN-9773:
-

Removing 2.11.0 fix version after branch-2 -> branch-2.10 rename

> Add QueueMetrics for Custom Resources
> -
>
> Key: YARN-9773
> URL: https://issues.apache.org/jira/browse/YARN-9773
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
> Fix For: 3.3.0, 3.2.2, 3.1.4, 2.10.1
>
> Attachments: YARN-9773.001.patch, YARN-9773.002.patch, 
> YARN-9773.003.patch
>
>
> Although the custom resource metrics are calculated and saved as a 
> QueueMetricsForCustomResources object within the QueueMetrics class, the JMX 
> and Simon QueueMetrics do not report that information for custom resources. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9838) Fix resource inconsistency for queues when moving app with reserved container to another queue

2019-12-09 Thread Jonathan Hung (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated YARN-9838:

Fix Version/s: 2.10.1

> Fix resource inconsistency for queues when moving app with reserved container 
> to another queue
> --
>
> Key: YARN-9838
> URL: https://issues.apache.org/jira/browse/YARN-9838
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 2.7.3
>Reporter: jiulongzhu
>Assignee: jiulongzhu
>Priority: Critical
>  Labels: patch
> Fix For: 3.3.0, 2.9.3, 3.2.2, 3.1.4, 2.10.1
>
> Attachments: RM_UI_metric_negative.png, RM_UI_metric_positive.png, 
> YARN-9838.0001.patch, YARN-9838.0002.patch
>
>
>       In some clusters of ours, we are seeing "Used Resource","Used 
> Capacity","Absolute Used Capacity" and "Num Container" is positive or 
> negative when the queue is absolutely idle(no RUNNING, no NEW apps...).In 
> extreme cases, apps couldn't be submitted to the queue that is actually idle 
> but the "Used Resource" is far more than zero, just like "Container Leak".
>       Firstly,I found that "Used Resource","Used Capacity" and "Absolute Used 
> Capacity" use the "Used" value of ResourceUsage kept by AbstractCSQueue, and 
> "Num Container" use the "numContainer" value kept by LeafQueue.And 
> AbstractCSQueue#allocateResource and AbstractCSQueue#releaseResource will 
> change the state value of "numContainer" and "Used". Secondly, by comparing 
> the values numContainer and ResourceUsageByLabel and QueueMetrics 
> changed(#allocateContainer and #releaseContainer) logic of applications with 
> and without "movetoqueue",i found that moving the reservedContainers didn't 
> modify the "numContainer" value in AbstractCSQueue and "used" value in 
> ResourceUsage when the application was moved from a queue to another queue.
>         The metric values changed logic of reservedContainers are allocated, 
> and moved from $FROM queue to $TO queue, and released.The degree of increase 
> and decrease is not conservative, the Resource allocated from $FROM queue and 
> release to $TO queue.
> ||move reversedContainer||allocate||movetoqueue||release||
> |numContainer|increase in $FROM queue|{color:#FF}$FROM queue stay the 
> same,$TO queue stay the same{color}|decrease  in $TO queue|
> |ResourceUsageByLabel(USED)|increase in $FROM queue|{color:#FF}$FROM 
> queue stay the same,$TO queue stay the same{color}|decrease  in $TO queue |
> |QueueMetrics|increase in $FROM queue|decrease in $FROM queue, increase in 
> $TO queue|decrease  in $TO queue|
>       The metric values changed logic of allocatedContainer(allocated, 
> acquired, running) are allocated, and movetoqueue, and released are 
> absolutely conservative.
>    



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7589) TestPBImplRecords fails with NullPointerException

2019-12-09 Thread Jonathan Hung (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-7589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991907#comment-16991907
 ] 

Jonathan Hung commented on YARN-7589:
-

Removing 2.11.0 fix version after branch-2 -> branch-2.10 rename

> TestPBImplRecords fails with NullPointerException
> -
>
> Key: YARN-7589
> URL: https://issues.apache.org/jira/browse/YARN-7589
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0, 3.1.0, 3.0.1
>Reporter: Jason Darrell Lowe
>Assignee: Daniel Templeton
>Priority: Major
> Fix For: 3.0.0, 3.1.0, 3.0.1, 2.10.1
>
> Attachments: YARN-7589.001.patch
>
>
> TestPBImplRecords is failing consistently in trunk:
> {noformat}
> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.413 
> s <<< FAILURE! - in org.apache.hadoop.yarn.api.TestPBImplRecords
> [ERROR] org.apache.hadoop.yarn.api.TestPBImplRecords  Time elapsed: 0.413 s  
> <<< ERROR!
> java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.yarn.api.BasePBImplRecordsTest.generateByNewInstance(BasePBImplRecordsTest.java:151)
>   at 
> org.apache.hadoop.yarn.api.TestPBImplRecords.setup(TestPBImplRecords.java:371)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:369)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:275)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:239)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:160)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:373)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:334)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:119)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:407)
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.util.resource.ResourceUtils.createResourceTypesArray(ResourceUtils.java:644)
>   at 
> org.apache.hadoop.yarn.api.records.Resource.newInstance(Resource.java:105)
>   ... 23 more
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7589) TestPBImplRecords fails with NullPointerException

2019-12-09 Thread Jonathan Hung (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-7589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated YARN-7589:

Fix Version/s: (was: 2.11.0)

> TestPBImplRecords fails with NullPointerException
> -
>
> Key: YARN-7589
> URL: https://issues.apache.org/jira/browse/YARN-7589
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0, 3.1.0, 3.0.1
>Reporter: Jason Darrell Lowe
>Assignee: Daniel Templeton
>Priority: Major
> Fix For: 3.0.0, 3.1.0, 3.0.1, 2.10.1
>
> Attachments: YARN-7589.001.patch
>
>
> TestPBImplRecords is failing consistently in trunk:
> {noformat}
> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.413 
> s <<< FAILURE! - in org.apache.hadoop.yarn.api.TestPBImplRecords
> [ERROR] org.apache.hadoop.yarn.api.TestPBImplRecords  Time elapsed: 0.413 s  
> <<< ERROR!
> java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.yarn.api.BasePBImplRecordsTest.generateByNewInstance(BasePBImplRecordsTest.java:151)
>   at 
> org.apache.hadoop.yarn.api.TestPBImplRecords.setup(TestPBImplRecords.java:371)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:369)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:275)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:239)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:160)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:373)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:334)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:119)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:407)
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.util.resource.ResourceUtils.createResourceTypesArray(ResourceUtils.java:644)
>   at 
> org.apache.hadoop.yarn.api.records.Resource.newInstance(Resource.java:105)
>   ... 23 more
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8842) Expose metrics for custom resource types in QueueMetrics

2019-12-09 Thread Jonathan Hung (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991905#comment-16991905
 ] 

Jonathan Hung commented on YARN-8842:
-

Removing 2.11.0 fix version after branch-2 -> branch-2.10 rename

> Expose metrics for custom resource types in QueueMetrics
> 
>
> Key: YARN-8842
> URL: https://issues.apache.org/jira/browse/YARN-8842
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Fix For: 3.3.0, 3.2.2, 3.1.4, 2.10.1
>
> Attachments: YARN-8842-branch-2.001.patch, 
> YARN-8842-branch-2.002.patch, YARN-8842-branch-2.003.patch, 
> YARN-8842.001.patch, YARN-8842.002.patch, YARN-8842.003.patch, 
> YARN-8842.004.patch, YARN-8842.005.patch, YARN-8842.006.patch, 
> YARN-8842.007.patch, YARN-8842.008.patch, YARN-8842.009.patch, 
> YARN-8842.010.patch, YARN-8842.011.patch, YARN-8842.012.patch
>
>
> This is the 2nd dependent jira of YARN-8059.
> As updating the metrics is an independent step from handling preemption, this 
> jira only deals with the queue metrics update of custom resources.
> The following metrics should be updated: 
> * allocated resources
> * available resources
> * pending resources
> * reserved resources
> * aggregate seconds preempted



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8842) Expose metrics for custom resource types in QueueMetrics

2019-12-09 Thread Jonathan Hung (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-8842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated YARN-8842:

Fix Version/s: (was: 2.11.0)

> Expose metrics for custom resource types in QueueMetrics
> 
>
> Key: YARN-8842
> URL: https://issues.apache.org/jira/browse/YARN-8842
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Fix For: 3.3.0, 3.2.2, 3.1.4, 2.10.1
>
> Attachments: YARN-8842-branch-2.001.patch, 
> YARN-8842-branch-2.002.patch, YARN-8842-branch-2.003.patch, 
> YARN-8842.001.patch, YARN-8842.002.patch, YARN-8842.003.patch, 
> YARN-8842.004.patch, YARN-8842.005.patch, YARN-8842.006.patch, 
> YARN-8842.007.patch, YARN-8842.008.patch, YARN-8842.009.patch, 
> YARN-8842.010.patch, YARN-8842.011.patch, YARN-8842.012.patch
>
>
> This is the 2nd dependent jira of YARN-8059.
> As updating the metrics is an independent step from handling preemption, this 
> jira only deals with the queue metrics update of custom resources.
> The following metrics should be updated: 
> * allocated resources
> * available resources
> * pending resources
> * reserved resources
> * aggregate seconds preempted



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8179) Preemption does not happen due to natural_termination_factor when DRF is used

2019-12-09 Thread Jonathan Hung (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991902#comment-16991902
 ] 

Jonathan Hung commented on YARN-8179:
-

Seems this was committed to branch-2.10, adding 2.10.1 fix version

> Preemption does not happen due to natural_termination_factor when DRF is used
> -
>
> Key: YARN-8179
> URL: https://issues.apache.org/jira/browse/YARN-8179
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: kyungwan nam
>Assignee: kyungwan nam
>Priority: Major
> Fix For: 3.2.0, 3.1.1, 3.0.3
>
> Attachments: YARN-8179.001.patch, YARN-8179.002.patch, 
> YARN-8179.003.patch
>
>
> cluster
> * DominantResourceCalculator
> * QueueA : 50 (capacity) ~ 100 (max capacity)
> * QueueB : 50 (capacity) ~ 50 (max capacity)
> all of resources have been allocated to QueueA. (all Vcores are allocated to 
> QueueA)
> if App1 is submitted to QueueB, over-utilized QueueA should be preempted.
> but, I’ve met the problem, which preemption does not happen. it caused that 
> App1 AM can not allocated.
> when App1 is submitted, pending resources for asking App1 AM would be 
> 
> so, Vcores which need to be preempted from QueueB should be 1.
> but, it can be 0 due to natural_termination_factor (default is 0.2)
> we should guarantee that resources not to be 0 even though applying 
> natural_termination_factor



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8179) Preemption does not happen due to natural_termination_factor when DRF is used

2019-12-09 Thread Jonathan Hung (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-8179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated YARN-8179:

Fix Version/s: 2.10.1

> Preemption does not happen due to natural_termination_factor when DRF is used
> -
>
> Key: YARN-8179
> URL: https://issues.apache.org/jira/browse/YARN-8179
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: kyungwan nam
>Assignee: kyungwan nam
>Priority: Major
> Fix For: 3.2.0, 3.1.1, 3.0.3, 2.10.1
>
> Attachments: YARN-8179.001.patch, YARN-8179.002.patch, 
> YARN-8179.003.patch
>
>
> cluster
> * DominantResourceCalculator
> * QueueA : 50 (capacity) ~ 100 (max capacity)
> * QueueB : 50 (capacity) ~ 50 (max capacity)
> all of resources have been allocated to QueueA. (all Vcores are allocated to 
> QueueA)
> if App1 is submitted to QueueB, over-utilized QueueA should be preempted.
> but, I’ve met the problem, which preemption does not happen. it caused that 
> App1 AM can not allocated.
> when App1 is submitted, pending resources for asking App1 AM would be 
> 
> so, Vcores which need to be preempted from QueueB should be 1.
> but, it can be 0 due to natural_termination_factor (default is 0.2)
> we should guarantee that resources not to be 0 even though applying 
> natural_termination_factor



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7411) Inter-Queue preemption's computeFixpointAllocation need to handle absolute resources while computing normalizedGuarantee

2019-12-09 Thread Jonathan Hung (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991900#comment-16991900
 ] 

Jonathan Hung commented on YARN-7411:
-

Removing 2.11.0 fix version after branch-2 -> branch-2.10 rename

> Inter-Queue preemption's computeFixpointAllocation need to handle absolute 
> resources while computing normalizedGuarantee
> 
>
> Key: YARN-7411
> URL: https://issues.apache.org/jira/browse/YARN-7411
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: YARN-5881
>Reporter: Sunil G
>Assignee: Sunil G
>Priority: Major
> Fix For: 3.1.0, 2.10.1
>
> Attachments: YARN-7411-YARN-5881.004.patch, 
> YARN-7411-YARN-5881.005.patch, YARN-7411.001.patch, 
> YARN-7441.YARN-5881.002.patch, YARN-7441.YARN-5881.003.patch
>
>
> {{normalizedGuarantee}} is computed based on queue's capacity. This has to be 
> updated correctly when CS starts to accept queue's capacity in terms of 
> absolute resource.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7411) Inter-Queue preemption's computeFixpointAllocation need to handle absolute resources while computing normalizedGuarantee

2019-12-09 Thread Jonathan Hung (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated YARN-7411:

Fix Version/s: (was: 2.11.0)

> Inter-Queue preemption's computeFixpointAllocation need to handle absolute 
> resources while computing normalizedGuarantee
> 
>
> Key: YARN-7411
> URL: https://issues.apache.org/jira/browse/YARN-7411
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: YARN-5881
>Reporter: Sunil G
>Assignee: Sunil G
>Priority: Major
> Fix For: 3.1.0, 2.10.1
>
> Attachments: YARN-7411-YARN-5881.004.patch, 
> YARN-7411-YARN-5881.005.patch, YARN-7411.001.patch, 
> YARN-7441.YARN-5881.002.patch, YARN-7441.YARN-5881.003.patch
>
>
> {{normalizedGuarantee}} is computed based on queue's capacity. This has to be 
> updated correctly when CS starts to accept queue's capacity in terms of 
> absolute resource.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8202) DefaultAMSProcessor should properly check units of requested custom resource types against minimum/maximum allocation

2019-12-09 Thread Jonathan Hung (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991895#comment-16991895
 ] 

Jonathan Hung commented on YARN-8202:
-

Removing 2.11.0 fix version after branch-2 -> branch-2.10 rename

> DefaultAMSProcessor should properly check units of requested custom resource 
> types against minimum/maximum allocation
> -
>
> Key: YARN-8202
> URL: https://issues.apache.org/jira/browse/YARN-8202
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Blocker
> Fix For: 3.1.1, 2.10.1
>
> Attachments: YARN-8202-001.patch, YARN-8202-002.patch, 
> YARN-8202-003.patch, YARN-8202-004.patch, YARN-8202-005.patch, 
> YARN-8202-006.patch, YARN-8202-007.patch, YARN-8202-008.patch, 
> YARN-8202-009.patch, YARN-8202-010.patch
>
>
>  
> When I execute a pi job with arguments: 
> {code:java}
> -Dmapreduce.map.resource.memory-mb=200 
> -Dmapreduce.map.resource.resource1=500M 1 1000{code}
> and I have one node with 5GB of resource1, I get the following exception on 
> every second and the job hangs:
> {code:java}
> 2018-04-24 08:42:03,694 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 20 on 8030, call Call#386 Retry#0 
> org.apache.hadoop.yarn.api.ApplicationMasterProtocolPB.allocate from 
> 172.31.119.172:58138
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, requested resource type=[resource1] < 0 or greater than 
> maximum allowed allocation. Requested resource= resource1: 500M>, maximum allowed allocation= resource1: 5G>, please note that maximum allowed allocation is calculated by 
> scheduler based on maximum resource of registered NodeManagers, which might 
> be less than configured maximum allocation= resource1: 9223372036854775807G>
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:286)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:242)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndvalidateRequest(SchedulerUtils.java:258)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.normalizeAndValidateRequests(RMServerUtils.java:249)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor.allocate(DefaultAMSProcessor.java:230)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.processor.DisabledPlacementProcessor.allocate(DisabledPlacementProcessor.java:75)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.AMSProcessingChain.allocate(AMSProcessingChain.java:92)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:433)
>         at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
>         at 
> org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
>         at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)
>         at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678)
> {code}
> *This is because 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils#validateResourceRequest
>  does not take resource units into account.*
>  
> However, if I start a job with arguments: 
> {code:java}
> -Dmapreduce.map.resource.memory-mb=200 -Dmapreduce.map.resource.resource1=1G 
> 1 1000{code}
> and I still have 5GB of resource1 on one node then the job runs successfully.
>  
> I also tried a third job run, when I request 1GB of resource1 and I have no 
> nodes with any amount of resource1, then I restart the node with 5GBs of 
> resource1, the job ultimately completes, but just after the node with enough 
> resources registered in RM, which is the desired behaviour.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: 

[jira] [Updated] (YARN-8202) DefaultAMSProcessor should properly check units of requested custom resource types against minimum/maximum allocation

2019-12-09 Thread Jonathan Hung (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-8202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated YARN-8202:

Fix Version/s: (was: 2.11.0)

> DefaultAMSProcessor should properly check units of requested custom resource 
> types against minimum/maximum allocation
> -
>
> Key: YARN-8202
> URL: https://issues.apache.org/jira/browse/YARN-8202
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Blocker
> Fix For: 3.1.1, 2.10.1
>
> Attachments: YARN-8202-001.patch, YARN-8202-002.patch, 
> YARN-8202-003.patch, YARN-8202-004.patch, YARN-8202-005.patch, 
> YARN-8202-006.patch, YARN-8202-007.patch, YARN-8202-008.patch, 
> YARN-8202-009.patch, YARN-8202-010.patch
>
>
>  
> When I execute a pi job with arguments: 
> {code:java}
> -Dmapreduce.map.resource.memory-mb=200 
> -Dmapreduce.map.resource.resource1=500M 1 1000{code}
> and I have one node with 5GB of resource1, I get the following exception on 
> every second and the job hangs:
> {code:java}
> 2018-04-24 08:42:03,694 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 20 on 8030, call Call#386 Retry#0 
> org.apache.hadoop.yarn.api.ApplicationMasterProtocolPB.allocate from 
> 172.31.119.172:58138
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, requested resource type=[resource1] < 0 or greater than 
> maximum allowed allocation. Requested resource= resource1: 500M>, maximum allowed allocation= resource1: 5G>, please note that maximum allowed allocation is calculated by 
> scheduler based on maximum resource of registered NodeManagers, which might 
> be less than configured maximum allocation= resource1: 9223372036854775807G>
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:286)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:242)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndvalidateRequest(SchedulerUtils.java:258)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.normalizeAndValidateRequests(RMServerUtils.java:249)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor.allocate(DefaultAMSProcessor.java:230)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.processor.DisabledPlacementProcessor.allocate(DisabledPlacementProcessor.java:75)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.AMSProcessingChain.allocate(AMSProcessingChain.java:92)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:433)
>         at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
>         at 
> org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
>         at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)
>         at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678)
> {code}
> *This is because 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils#validateResourceRequest
>  does not take resource units into account.*
>  
> However, if I start a job with arguments: 
> {code:java}
> -Dmapreduce.map.resource.memory-mb=200 -Dmapreduce.map.resource.resource1=1G 
> 1 1000{code}
> and I still have 5GB of resource1 on one node then the job runs successfully.
>  
> I also tried a third job run, when I request 1GB of resource1 and I have no 
> nodes with any amount of resource1, then I restart the node with 5GBs of 
> resource1, the job ultimately completes, but just after the node with enough 
> resources registered in RM, which is the desired behaviour.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: 

[jira] [Updated] (YARN-7739) DefaultAMSProcessor should properly check customized resource types against minimum/maximum allocation

2019-12-09 Thread Jonathan Hung (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-7739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated YARN-7739:

Fix Version/s: (was: 2.11.0)

> DefaultAMSProcessor should properly check customized resource types against 
> minimum/maximum allocation
> --
>
> Key: YARN-7739
> URL: https://issues.apache.org/jira/browse/YARN-7739
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Blocker
> Fix For: 3.1.0, 2.10.1
>
> Attachments: YARN-7339.002.patch, YARN-7739.001.patch
>
>
> Currently, YARN RM reject requested resource if memory or vcores are less 
> than 0 or greater than maximum allocation. We should run the check for 
> customized resource types as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7739) DefaultAMSProcessor should properly check customized resource types against minimum/maximum allocation

2019-12-09 Thread Jonathan Hung (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-7739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991892#comment-16991892
 ] 

Jonathan Hung commented on YARN-7739:
-

Removing 2.11.0 fix version after branch-2 -> branch-2.10 rename

> DefaultAMSProcessor should properly check customized resource types against 
> minimum/maximum allocation
> --
>
> Key: YARN-7739
> URL: https://issues.apache.org/jira/browse/YARN-7739
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Blocker
> Fix For: 3.1.0, 2.10.1
>
> Attachments: YARN-7339.002.patch, YARN-7739.001.patch
>
>
> Currently, YARN RM reject requested resource if memory or vcores are less 
> than 0 or greater than maximum allocation. We should run the check for 
> customized resource types as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7541) Node updates don't update the maximum cluster capability for resources other than CPU and memory

2019-12-09 Thread Jonathan Hung (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-7541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991894#comment-16991894
 ] 

Jonathan Hung commented on YARN-7541:
-

Removing 2.11.0 fix version after branch-2 -> branch-2.10 rename

> Node updates don't update the maximum cluster capability for resources other 
> than CPU and memory
> 
>
> Key: YARN-7541
> URL: https://issues.apache.org/jira/browse/YARN-7541
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 3.0.0-beta1, 3.1.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Critical
> Fix For: 3.0.0, 3.1.0, 2.10.1
>
> Attachments: YARN-7541.001.patch, YARN-7541.002.patch, 
> YARN-7541.003.patch, YARN-7541.004.patch, YARN-7541.005.patch, 
> YARN-7541.006.patch, YARN-7541.branch-3.0.001.patch
>
>
> When I submit an MR job that asks for too much memory or CPU for the map or 
> reduce, the AM will fail because it recognizes that the request is too large. 
>  With any other resources, however, the resource requests will instead be 
> made and remain pending forever.  Looks like we forgot to update the code 
> that tracks the maximum container allocation in {{ClusterNodeTracker}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7541) Node updates don't update the maximum cluster capability for resources other than CPU and memory

2019-12-09 Thread Jonathan Hung (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-7541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated YARN-7541:

Fix Version/s: (was: 2.11.0)

> Node updates don't update the maximum cluster capability for resources other 
> than CPU and memory
> 
>
> Key: YARN-7541
> URL: https://issues.apache.org/jira/browse/YARN-7541
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 3.0.0-beta1, 3.1.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Critical
> Fix For: 3.0.0, 3.1.0, 2.10.1
>
> Attachments: YARN-7541.001.patch, YARN-7541.002.patch, 
> YARN-7541.003.patch, YARN-7541.004.patch, YARN-7541.005.patch, 
> YARN-7541.006.patch, YARN-7541.branch-3.0.001.patch
>
>
> When I submit an MR job that asks for too much memory or CPU for the map or 
> reduce, the AM will fail because it recognizes that the request is too large. 
>  With any other resources, however, the resource requests will instead be 
> made and remain pending forever.  Looks like we forgot to update the code 
> that tracks the maximum container allocation in {{ClusterNodeTracker}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8004) Add unit tests for inter queue preemption for dominant resource calculator

2019-12-09 Thread Jonathan Hung (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-8004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated YARN-8004:

Fix Version/s: (was: 2.11.0)

> Add unit tests for inter queue preemption for dominant resource calculator
> --
>
> Key: YARN-8004
> URL: https://issues.apache.org/jira/browse/YARN-8004
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sumana Sathish
>Assignee: Zian Chen
>Priority: Critical
> Fix For: 3.2.0, 3.1.1, 3.0.3, 2.10.1
>
> Attachments: YARN-8004.001.patch, YARN-8004.002.patch, 
> YARN-8004.003.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8004) Add unit tests for inter queue preemption for dominant resource calculator

2019-12-09 Thread Jonathan Hung (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991888#comment-16991888
 ] 

Jonathan Hung commented on YARN-8004:
-

Removing 2.11.0 fix version after branch-2 -> branch-2.10 rename

> Add unit tests for inter queue preemption for dominant resource calculator
> --
>
> Key: YARN-8004
> URL: https://issues.apache.org/jira/browse/YARN-8004
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sumana Sathish
>Assignee: Zian Chen
>Priority: Critical
> Fix For: 3.2.0, 3.1.1, 3.0.3, 2.10.1
>
> Attachments: YARN-8004.001.patch, YARN-8004.002.patch, 
> YARN-8004.003.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9945) Fix javadoc in FederationProxyProviderUtil in branch-2

2019-12-09 Thread Jonathan Hung (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated YARN-9945:

Fix Version/s: (was: 2.11.0)

> Fix javadoc in FederationProxyProviderUtil in branch-2
> --
>
> Key: YARN-9945
> URL: https://issues.apache.org/jira/browse/YARN-9945
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
> Fix For: 2.10.1
>
> Attachments: YARN-9945-branch-2.001.patch
>
>
> {noformat}
> [ERROR] 
> /home/jhung/hadoop-mp/hadoop/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/federation/failover/FederationProxyProviderUtil.java:83:
>  error: reference not found
> [ERROR] * @param configuration Configuration to generate {@link 
> ClientRMProxy} {noformat}
> This import was removed in branch-2 for YARN-7900 but it's referenced in this 
> file's javadocs. It fails javadoc for java8.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9945) Fix javadoc in FederationProxyProviderUtil in branch-2

2019-12-09 Thread Jonathan Hung (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991887#comment-16991887
 ] 

Jonathan Hung commented on YARN-9945:
-

Removing 2.11.0 fix version after branch-2 -> branch-2.10 rename

> Fix javadoc in FederationProxyProviderUtil in branch-2
> --
>
> Key: YARN-9945
> URL: https://issues.apache.org/jira/browse/YARN-9945
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
> Fix For: 2.10.1
>
> Attachments: YARN-9945-branch-2.001.patch
>
>
> {noformat}
> [ERROR] 
> /home/jhung/hadoop-mp/hadoop/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/federation/failover/FederationProxyProviderUtil.java:83:
>  error: reference not found
> [ERROR] * @param configuration Configuration to generate {@link 
> ClientRMProxy} {noformat}
> This import was removed in branch-2 for YARN-7900 but it's referenced in this 
> file's javadocs. It fails javadoc for java8.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9951) Unify Error Messages in container-executor

2019-12-09 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991884#comment-16991884
 ] 

David Mollitor commented on YARN-9951:
--

[~szegedim]

> Unify Error Messages in container-executor
> --
>
> Key: YARN-9951
> URL: https://issues.apache.org/jira/browse/YARN-9951
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
> Attachments: YARN-9951.1.patch
>
>
> [https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c]
>  
> Has several different ways for reporting errors:
>  
>  # Couldn't
>  # Can't
>  # Could not
>  # Failed to
>  # Unable to
>  # Other
>  
> I think "Failed to" is the best verbage.  Contractions are hard for 
> non-native English speaking folks.  "Failed" is to the point. and more likely 
> to grep logs for 'fail' than I am 'unable' or 'could not'.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9956) Improve connection error message for YARN ApiServerClient

2019-12-09 Thread Eric Yang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991851#comment-16991851
 ] 

Eric Yang commented on YARN-9956:
-

[~prabhujoseph] Something is still wrong with patch 003 in pre-commit test.  
Can you double check?  Thanks

> Improve connection error message for YARN ApiServerClient
> -
>
> Key: YARN-9956
> URL: https://issues.apache.org/jira/browse/YARN-9956
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Eric Yang
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9956-001.patch, YARN-9956-002.patch, 
> YARN-9956-003.patch
>
>
> In HA environment, yarn.resourcemanager.webapp.address configuration is 
> optional.  ApiServiceClient may produce confusing error message like this:
> {code}
> 19/10/30 20:13:42 INFO client.ApiServiceClient: Fail to connect to: 
> host1.example.com:8090
> 19/10/30 20:13:42 INFO client.ApiServiceClient: Fail to connect to: 
> host2.example.com:8090
> 19/10/30 20:13:42 INFO util.log: Logging initialized @2301ms
> 19/10/30 20:13:42 ERROR client.ApiServiceClient: Error: {}
> GSSException: No valid credentials provided (Mechanism level: Server not 
> found in Kerberos database (7) - LOOKING_UP_SERVER)
>   at 
> java.security.jgss/sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:771)
>   at 
> java.security.jgss/sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:266)
>   at 
> java.security.jgss/sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:196)
>   at 
> org.apache.hadoop.yarn.service.client.ApiServiceClient$1.run(ApiServiceClient.java:125)
>   at 
> org.apache.hadoop.yarn.service.client.ApiServiceClient$1.run(ApiServiceClient.java:105)
>   at java.base/java.security.AccessController.doPrivileged(Native Method)
>   at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
>   at 
> org.apache.hadoop.yarn.service.client.ApiServiceClient.generateToken(ApiServiceClient.java:105)
>   at 
> org.apache.hadoop.yarn.service.client.ApiServiceClient.getApiClient(ApiServiceClient.java:290)
>   at 
> org.apache.hadoop.yarn.service.client.ApiServiceClient.getApiClient(ApiServiceClient.java:271)
>   at 
> org.apache.hadoop.yarn.service.client.ApiServiceClient.actionLaunch(ApiServiceClient.java:416)
>   at 
> org.apache.hadoop.yarn.client.cli.ApplicationCLI.run(ApplicationCLI.java:589)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
>   at 
> org.apache.hadoop.yarn.client.cli.ApplicationCLI.main(ApplicationCLI.java:125)
> Caused by: KrbException: Server not found in Kerberos database (7) - 
> LOOKING_UP_SERVER
>   at 
> java.security.jgss/sun.security.krb5.KrbTgsRep.(KrbTgsRep.java:73)
>   at 
> java.security.jgss/sun.security.krb5.KrbTgsReq.getReply(KrbTgsReq.java:251)
>   at 
> java.security.jgss/sun.security.krb5.KrbTgsReq.sendAndGetCreds(KrbTgsReq.java:262)
>   at 
> java.security.jgss/sun.security.krb5.internal.CredentialsUtil.serviceCreds(CredentialsUtil.java:308)
>   at 
> java.security.jgss/sun.security.krb5.internal.CredentialsUtil.acquireServiceCreds(CredentialsUtil.java:126)
>   at 
> java.security.jgss/sun.security.krb5.Credentials.acquireServiceCreds(Credentials.java:458)
>   at 
> java.security.jgss/sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:695)
>   ... 15 more
> Caused by: KrbException: Identifier doesn't match expected value (906)
>   at 
> java.security.jgss/sun.security.krb5.internal.KDCRep.init(KDCRep.java:140)
>   at 
> java.security.jgss/sun.security.krb5.internal.TGSRep.init(TGSRep.java:65)
>   at 
> java.security.jgss/sun.security.krb5.internal.TGSRep.(TGSRep.java:60)
>   at 
> java.security.jgss/sun.security.krb5.KrbTgsRep.(KrbTgsRep.java:55)
>   ... 21 more
> 19/10/30 20:13:42 ERROR client.ApiServiceClient: Fail to launch application: 
> java.io.IOException: java.lang.reflect.UndeclaredThrowableException
>   at 
> org.apache.hadoop.yarn.service.client.ApiServiceClient.getApiClient(ApiServiceClient.java:293)
>   at 
> org.apache.hadoop.yarn.service.client.ApiServiceClient.getApiClient(ApiServiceClient.java:271)
>   at 
> org.apache.hadoop.yarn.service.client.ApiServiceClient.actionLaunch(ApiServiceClient.java:416)
>   at 
> org.apache.hadoop.yarn.client.cli.ApplicationCLI.run(ApplicationCLI.java:589)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
>   at 
> 

[jira] [Commented] (YARN-10014) Refactor boolean flag based approach in SchedConfCLI#run

2019-12-09 Thread Oleg Bonar (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991785#comment-16991785
 ] 

Oleg Bonar commented on YARN-10014:
---

Need a review. Thanks.

> Refactor boolean flag based approach in SchedConfCLI#run
> 
>
> Key: YARN-10014
> URL: https://issues.apache.org/jira/browse/YARN-10014
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Prabhu Joseph
>Priority: Major
>
> Boolean-flag based approach in 
> org.apache.hadoop.yarn.client.cli.SchedConfCLI#run: 
> Everything is controlled with boolean flags here.
> The flag hasOption is set to true in each of the if-clauses just to make the 
> condition below the hasOption-conditions happy. The flag is set to true even 
> for parameter that don't have an option (like 'getConf') at all, this is very 
> misleading and hard to understand for the first read.
> Need below refactoring:
> a. Eliminates the hasOption boolean flag
> b. Where an option is misused, fail-fast: Have a method that contains this 
> code and call it for every option, in-place:
> {code}
> if (!hasOption) {
>  System.err.println("Invalid Command Usage: ");
>  printUsage();
>  return -1;
>  }
> {code}
> c. Remove the boolean flags: format and getConf as well. These are 
> unnecessary.
> cc [~snemeth]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5106) Provide a builder interface for FairScheduler allocations for use in tests

2019-12-09 Thread Wilfred Spiegelenburg (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-5106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991698#comment-16991698
 ] 

Wilfred Spiegelenburg commented on YARN-5106:
-

It is correct that XMLs would have been moved around as testing has changed 
with the addition of YARN-8967.
Back porting could thus require major updates to the patch to make sure you 
pick up the earlier set of XMLs. It really depends on how important this is to 
get back ported.

Not having it in the three active branches 3.x could make back porting new FS 
tests much harder. The choice becomes: do the work once now or keep doing it 
for a longer period for each fix/change separately.

> Provide a builder interface for FairScheduler allocations for use in tests
> --
>
> Key: YARN-5106
> URL: https://issues.apache.org/jira/browse/YARN-5106
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Adam Antal
>Priority: Major
>  Labels: newbie++
> Fix For: 3.3.0
>
> Attachments: YARN-5106-branch-3.1.001.patch, 
> YARN-5106-branch-3.1.001.patch, YARN-5106-branch-3.1.001.patch, 
> YARN-5106-branch-3.1.002.patch, YARN-5106-branch-3.2.001.patch, 
> YARN-5106-branch-3.2.001.patch, YARN-5106-branch-3.2.002.patch, 
> YARN-5106.001.patch, YARN-5106.002.patch, YARN-5106.003.patch, 
> YARN-5106.004.patch, YARN-5106.005.patch, YARN-5106.006.patch, 
> YARN-5106.007.patch, YARN-5106.008.patch, YARN-5106.008.patch, 
> YARN-5106.008.patch, YARN-5106.009.patch, YARN-5106.010.patch, 
> YARN-5106.011.patch, YARN-5106.012.patch, YARN-5106.013.patch, 
> YARN-5106.014.patch, YARN-5106.015.patch, YARN-5106.016.patch
>
>
> Most, if not all, fair scheduler tests create an allocations XML file. Having 
> a helper class that potentially uses a builder would make the tests cleaner. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7721) TestContinuousScheduling fails sporadically with NPE

2019-12-09 Thread Wilfred Spiegelenburg (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-7721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991693#comment-16991693
 ] 

Wilfred Spiegelenburg commented on YARN-7721:
-

If there is a problem with the {{transferStateFromPreviousAttempt}} call itself 
then it should cause other tests to fail.
It is not a generic issue in the sense that we always want to wait until 
{{getLastScheduledContainer}} returns a non null value. In the case that we are 
scheduling normally, i.e. non test code, there is no problem and we do not want 
to delay there.

There is nothing left to do for this jira except commit the code, that was why 
it was assigned to [~sunilg] [~prabhujoseph] or [~snemeth] could also do that.

> TestContinuousScheduling fails sporadically with NPE
> 
>
> Key: YARN-7721
> URL: https://issues.apache.org/jira/browse/YARN-7721
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 3.1.0
>Reporter: Jason Darrell Lowe
>Assignee: Sunil G
>Priority: Major
> Attachments: YARN-7721.001.patch
>
>
> TestContinuousScheduling#testFairSchedulerContinuousSchedulingInitTime is 
> failing sporadically with an NPE in precommit builds, and I can usually 
> reproduce it locally after a few tries:
> {noformat}
> [ERROR] 
> testFairSchedulerContinuousSchedulingInitTime(org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestContinuousScheduling)
>   Time elapsed: 0.085 s  <<< ERROR!
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestContinuousScheduling.testFairSchedulerContinuousSchedulingInitTime(TestContinuousScheduling.java:383)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
> [...]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5106) Provide a builder interface for FairScheduler allocations for use in tests

2019-12-09 Thread Adam Antal (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-5106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991658#comment-16991658
 ] 

Adam Antal commented on YARN-5106:
--

Well I checked the patch, and though it seems the layout of the XMLs are not 
changed, but the exact FS XMLs are added/moved/removed. So with the absent of 
YARN-8967, I don't really see the reason why we should force backporting this 
to branch-3.2. I'm aware of the consequences, but committing this to branch-3.2 
would require to come up with a very different patch, and I don't recommend to 
do that.

Any thoughts [~wilfreds], [~snemeth]?

> Provide a builder interface for FairScheduler allocations for use in tests
> --
>
> Key: YARN-5106
> URL: https://issues.apache.org/jira/browse/YARN-5106
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Adam Antal
>Priority: Major
>  Labels: newbie++
> Fix For: 3.3.0
>
> Attachments: YARN-5106-branch-3.1.001.patch, 
> YARN-5106-branch-3.1.001.patch, YARN-5106-branch-3.1.001.patch, 
> YARN-5106-branch-3.1.002.patch, YARN-5106-branch-3.2.001.patch, 
> YARN-5106-branch-3.2.001.patch, YARN-5106-branch-3.2.002.patch, 
> YARN-5106.001.patch, YARN-5106.002.patch, YARN-5106.003.patch, 
> YARN-5106.004.patch, YARN-5106.005.patch, YARN-5106.006.patch, 
> YARN-5106.007.patch, YARN-5106.008.patch, YARN-5106.008.patch, 
> YARN-5106.008.patch, YARN-5106.009.patch, YARN-5106.010.patch, 
> YARN-5106.011.patch, YARN-5106.012.patch, YARN-5106.013.patch, 
> YARN-5106.014.patch, YARN-5106.015.patch, YARN-5106.016.patch
>
>
> Most, if not all, fair scheduler tests create an allocations XML file. Having 
> a helper class that potentially uses a builder would make the tests cleaner. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9883) Reshape SchedulerHealth class

2019-12-09 Thread Kinga Marton (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991656#comment-16991656
 ] 

Kinga Marton commented on YARN-9883:


[~adam.antal], unfortunately I didn't had time to start the work on this. So I 
can wait with this until your changes will be merged.

> Reshape SchedulerHealth class
> -
>
> Key: YARN-9883
> URL: https://issues.apache.org/jira/browse/YARN-9883
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, yarn
>Affects Versions: 3.3.0
>Reporter: Adam Antal
>Assignee: Kinga Marton
>Priority: Minor
>
> The {{SchedulerHealth}} class has some flaws, for example:
> - It has no javadoc at all
> - All its objects are package-private: they should be private
> - The internal maps should be (Concurrent) EnumMaps instead of HashMaps: they 
> are more efficient in storing Enums
> - schedulerHealthDetails only stores the last operation, its name should 
> reflect that (just like lastSchedulerRunDetails)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9923) Introduce HealthReporter interface to support multiple health checker files

2019-12-09 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991605#comment-16991605
 ] 

Hadoop QA commented on YARN-9923:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m  
4s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 26 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
18s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 18m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
51s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  4m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
21m 27s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m 
53s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  4m  
6s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
20s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  3m 
 0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 15m 
38s{color} | {color:green} root generated 0 new + 1868 unchanged - 2 fixed = 
1868 total (was 1870) {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
2m 45s{color} | {color:orange} root: The patch generated 2 new + 596 unchanged 
- 52 fixed = 598 total (was 648) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  4m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 47s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  6m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  4m  
1s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  8m 
58s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
53s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
46s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 21m 18s{color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:green}+1{color} | 

[jira] [Commented] (YARN-9525) IFile format is not working against s3a remote folder

2019-12-09 Thread Adam Antal (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991592#comment-16991592
 ] 

Adam Antal commented on YARN-9525:
--

Validated that this patch on top of YARN-9607 actually fixes the bug.

Log aggregation using the IndexedFileController targeting an S3a URI as 
remote-app-folder works in simple and in the rolling case.

As the latest jenkins result was green, could you commit this [~snemeth] if 
there are no more concerns?

> IFile format is not working against s3a remote folder
> -
>
> Key: YARN-9525
> URL: https://issues.apache.org/jira/browse/YARN-9525
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: log-aggregation
>Affects Versions: 3.1.2
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
> Attachments: IFile-S3A-POC01.patch, YARN-9525-001.patch, 
> YARN-9525.002.patch, YARN-9525.003.patch, YARN-9525.004.patch, 
> YARN-9525.005.patch, YARN-9525.006.patch
>
>
> Using the IndexedFileFormat {{yarn.nodemanager.remote-app-log-dir}} 
> configured to an s3a URI throws the following exception during log 
> aggregation:
> {noformat}
> Cannot create writer for app application_1556199768861_0001. Skip log upload 
> this time. 
> java.io.IOException: java.io.FileNotFoundException: No such file or 
> directory: 
> s3a://adamantal-log-test/logs/systest/ifile/application_1556199768861_0001/adamantal-3.gce.cloudera.com_8041
>   at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.initializeWriter(LogAggregationIndexedFileController.java:247)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.uploadLogsForContainers(AppLogAggregatorImpl.java:306)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.doAppLogAggregation(AppLogAggregatorImpl.java:464)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.run(AppLogAggregatorImpl.java:420)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService$1.run(LogAggregationService.java:276)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.FileNotFoundException: No such file or directory: 
> s3a://adamantal-log-test/logs/systest/ifile/application_1556199768861_0001/adamantal-3.gce.cloudera.com_8041
>   at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:2488)
>   at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:2382)
>   at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:2321)
>   at 
> org.apache.hadoop.fs.DelegateToFileSystem.getFileStatus(DelegateToFileSystem.java:128)
>   at org.apache.hadoop.fs.FileContext$15.next(FileContext.java:1244)
>   at org.apache.hadoop.fs.FileContext$15.next(FileContext.java:1240)
>   at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
>   at org.apache.hadoop.fs.FileContext.getFileStatus(FileContext.java:1246)
>   at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController$1.run(LogAggregationIndexedFileController.java:228)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>   at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.initializeWriter(LogAggregationIndexedFileController.java:195)
>   ... 7 more
> {noformat}
> This stack trace point to 
> {{LogAggregationIndexedFileController$initializeWriter}} where we do the 
> following steps (in a non-rolling log aggregation setup):
> - create FSDataOutputStream
> - writing out a UUID
> - flushing
> - immediately after that we call a GetFileStatus to get the length of the log 
> file (the bytes we just wrote out), and that's where the failures happens: 
> the file is not there yet due to eventual consistency.
> Maybe we can get rid of that, so we can use IFile format against a s3a target.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9883) Reshape SchedulerHealth class

2019-12-09 Thread Adam Antal (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991507#comment-16991507
 ] 

Adam Antal commented on YARN-9883:
--

Hi [~kmarton],

Recently I started to work on YARN-3890. May I ask whether you want to resolve 
this issue in the near future? 
Working in parallel on the same piece of code would probably cause one of us to 
handle lot of conflicts when rebasing.

> Reshape SchedulerHealth class
> -
>
> Key: YARN-9883
> URL: https://issues.apache.org/jira/browse/YARN-9883
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, yarn
>Affects Versions: 3.3.0
>Reporter: Adam Antal
>Assignee: Kinga Marton
>Priority: Minor
>
> The {{SchedulerHealth}} class has some flaws, for example:
> - It has no javadoc at all
> - All its objects are package-private: they should be private
> - The internal maps should be (Concurrent) EnumMaps instead of HashMaps: they 
> are more efficient in storing Enums
> - schedulerHealthDetails only stores the last operation, its name should 
> reflect that (just like lastSchedulerRunDetails)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7721) TestContinuousScheduling fails sporadically with NPE

2019-12-09 Thread Adam Antal (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-7721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991502#comment-16991502
 ] 

Adam Antal commented on YARN-7721:
--

Hi [~sunilg],

Do you plan to work on this in the near future or may I take this over?

> TestContinuousScheduling fails sporadically with NPE
> 
>
> Key: YARN-7721
> URL: https://issues.apache.org/jira/browse/YARN-7721
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 3.1.0
>Reporter: Jason Darrell Lowe
>Assignee: Sunil G
>Priority: Major
> Attachments: YARN-7721.001.patch
>
>
> TestContinuousScheduling#testFairSchedulerContinuousSchedulingInitTime is 
> failing sporadically with an NPE in precommit builds, and I can usually 
> reproduce it locally after a few tries:
> {noformat}
> [ERROR] 
> testFairSchedulerContinuousSchedulingInitTime(org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestContinuousScheduling)
>   Time elapsed: 0.085 s  <<< ERROR!
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestContinuousScheduling.testFairSchedulerContinuousSchedulingInitTime(TestContinuousScheduling.java:383)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
> [...]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10018) container-executor: possible -1 return value of fork() is not always checked

2019-12-09 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991487#comment-16991487
 ] 

Peter Bacsko commented on YARN-10018:
-

[~adam.antal] usually there's a proper exit code defined in different 
functions/contexts, except for one place where I didn't see any which could be 
reused, so that's why I introduced a new one.

> container-executor: possible -1 return value of fork() is not always checked
> 
>
> Key: YARN-10018
> URL: https://issues.apache.org/jira/browse/YARN-10018
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-10018-001.patch, YARN-10018-001.patch
>
>
> There are some places in the container-executor native, where the {{fork()}} 
> call is not handled properly. This operation can fail with -1, but sometimes 
> the necessary if branch is missing to validate that it's been successful.
> Also, at one location, the return value is defined as an {{int}}, not 
> {{pid_t}}. It's better to handle this transparently and change it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9923) Introduce HealthReporter interface to support multiple health checker files

2019-12-09 Thread Adam Antal (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Antal updated YARN-9923:
-
Attachment: YARN-9923.009.patch

> Introduce HealthReporter interface to support multiple health checker files
> ---
>
> Key: YARN-9923
> URL: https://issues.apache.org/jira/browse/YARN-9923
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, yarn
>Affects Versions: 3.2.1
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
> Attachments: YARN-9923.001.patch, YARN-9923.002.patch, 
> YARN-9923.003.patch, YARN-9923.004.patch, YARN-9923.005.patch, 
> YARN-9923.006.patch, YARN-9923.007.patch, YARN-9923.008.patch, 
> YARN-9923.009.patch
>
>
> Currently if a NodeManager is enabled to allocate Docker containers, but the 
> specified binary (docker.binary in the container-executor.cfg) is missing the 
> container allocation fails with the following error message:
> {noformat}
> Container launch fails
> Exit code: 29
> Exception message: Launch container failed
> Shell error output: sh: : No 
> such file or directory
> Could not inspect docker network to get type /usr/bin/docker network inspect 
> host --format='{{.Driver}}'.
> Error constructing docker command, docker error code=-1, error 
> message='Unknown error'
> {noformat}
> I suggest to add a property say "yarn.nodemanager.runtime.linux.docker.check" 
> to have the following options:
> - STARTUP: setting this option the NodeManager would not start if Docker 
> binaries are missing or the Docker daemon is not running (the exception is 
> considered FATAL during startup)
> - RUNTIME: would give a more detailed/user-friendly exception in 
> NodeManager's side (NM logs) if Docker binaries are missing or the daemon is 
> not working. This would also prevent further Docker container allocation as 
> long as the binaries do not exist and the docker daemon is not running.
> - NONE (default): preserving the current behaviour, throwing exception during 
> container allocation, carrying on using the default retry procedure.
> 
> A new interface called {{HealthChecker}} is introduced which is used in the 
> {{NodeHealthCheckerService}}. Currently existing implementations like 
> {{LocalDirsHandlerService}} are modified to implement this giving a clear 
> abstraction to the node's health. The {{DockerHealthChecker}} implements this 
> new interface.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9923) Introduce HealthReporter interface to support multiple health checker files

2019-12-09 Thread Adam Antal (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991484#comment-16991484
 ] 

Adam Antal commented on YARN-9923:
--

Thanks for the thorough review [~snemeth], uploaded patchset v9. Let me respond 
to the review items.
0. 1. 2. 3. 5. 7. 10. and 12 done cleanly, please double check.

bq. 4. More on this part, I don't get what this line mean:

It means that if you provide a string in 
{{yarn.nodemanager.health-checker.script}}, you have to add 
{{yarn.nodemanager.health-checker.%s.path}}, 
{{yarn.nodemanager.health-checker.%s.opts}} and possibly similar configs as 
well. E.g. you have:
{noformat}
yarn.nodemanager.health-checker.script=script1,script2
yarn.nodemanager.health-checker.script1.path=/path/to/the/first/script/
yarn.nodemanager.health-checker.script1.opts=args to the first script
yarn.nodemanager.health-checker.script2.path=/path/to/the/second/script/
yarn.nodemanager.health-checker.script2.opts=args to the second script
yarn.nodemanager.health-checker.script2.timeout-ms=3000
{noformat}

bq. 6. [...] I think NodeHealthCheckerService should have a method that 
receives an exception and propagates it to the ExceptionReporter

Actually this was the original behaviour, I decoupled the ExceptionReporter 
responsibility to handle exceptions separately. Restored the original.

bq. 8. What happens if the user provides an invalid value (e.g. negative) for 
health-checker interval?

The existing behaviour did not check the inputs, that's why I refrained from 
adding it. I think it's better to fix this bug now, so added 
{{IllegalArgumentException}} cases to the mentioned configs.

bq. 9 [...] TimedHealthReporterService#setHealthStatus is always receiving an 
empty string if the status is true (successful).

I think creating a class for that is a little bit of overkill. I modified the 
class so that it reports either healthy or not healthy. The latter function 
must be invoked with a String argument.

bq. 11. Is it enough that 
org.apache.hadoop.yarn.server.nodemanager.health.NodeHealthScriptRunner.NodeHealthMonitorExecutor#hasErrors
 only checks for uppercase "ERROR" in the shell output? Where is this 
documented that health checker scripts should print ERROR in their output?

It is documented in the markdowns. I don't want to touch that part of the code, 
as it may break existing behaviour. For further discussion see YARN-6715.

bq. 13. Isn't a helper method like this somewhere already implemented (Apache 
commons, lang3, YARN helper methods)?

I didn't find any one liner for this, but I could enhance that piece of a code 
into a few lines. It looks better. Also the use cases may vary for each 
invocation, so I couldn't decrease the level of duplication by a lot by moving 
to this {{FileUtil}}.

> Introduce HealthReporter interface to support multiple health checker files
> ---
>
> Key: YARN-9923
> URL: https://issues.apache.org/jira/browse/YARN-9923
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, yarn
>Affects Versions: 3.2.1
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
> Attachments: YARN-9923.001.patch, YARN-9923.002.patch, 
> YARN-9923.003.patch, YARN-9923.004.patch, YARN-9923.005.patch, 
> YARN-9923.006.patch, YARN-9923.007.patch, YARN-9923.008.patch
>
>
> Currently if a NodeManager is enabled to allocate Docker containers, but the 
> specified binary (docker.binary in the container-executor.cfg) is missing the 
> container allocation fails with the following error message:
> {noformat}
> Container launch fails
> Exit code: 29
> Exception message: Launch container failed
> Shell error output: sh: : No 
> such file or directory
> Could not inspect docker network to get type /usr/bin/docker network inspect 
> host --format='{{.Driver}}'.
> Error constructing docker command, docker error code=-1, error 
> message='Unknown error'
> {noformat}
> I suggest to add a property say "yarn.nodemanager.runtime.linux.docker.check" 
> to have the following options:
> - STARTUP: setting this option the NodeManager would not start if Docker 
> binaries are missing or the Docker daemon is not running (the exception is 
> considered FATAL during startup)
> - RUNTIME: would give a more detailed/user-friendly exception in 
> NodeManager's side (NM logs) if Docker binaries are missing or the daemon is 
> not working. This would also prevent further Docker container allocation as 
> long as the binaries do not exist and the docker daemon is not running.
> - NONE (default): preserving the current behaviour, throwing exception during 
> container allocation, carrying on using the default retry procedure.
> 

[jira] [Comment Edited] (YARN-9781) SchedConfCli to get current stored scheduler configuration

2019-12-09 Thread Oleg Bonar (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991459#comment-16991459
 ] 

Oleg Bonar edited comment on YARN-9781 at 12/9/19 10:07 AM:


Hi [~snemeth]! I'm not sure if it is off-topic or not but you said

??What I don't like is that this flag is set to true even for parameter that 
don't have an option (like your new one, 'getConf') at all??.
 It seems that there is a little misunderstanding on the meaning of the flag. 
Please see my comment and proposals on YARN-10014.


was (Author: oleg_bonar):
Hi [~snemeth]! I'm not sure if it is offtopic or not but you said

??What I don't like is that this flag is set to true even for parameter that 
don't have an option (like your new one, 'getConf') at all??.
 It seems that there is a little misunderstanding on the meaning of the flag. 
Please see my comment and proposals on YARN-10014.

> SchedConfCli to get current stored scheduler configuration
> --
>
> Key: YARN-9781
> URL: https://issues.apache.org/jira/browse/YARN-9781
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9781-001.patch, YARN-9781-002.patch, 
> YARN-9781-003.patch, YARN-9781-004.patch, YARN-9781-005.patch, 
> YARN-9781-006.patch, YARN-9781-007.patch, YARN-9781-008.patch
>
>
> SchedConfCLI currently allows to add / remove / remove queue. It does not 
> support get configuration which RMWebServices provides as part of YARN-8559.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9781) SchedConfCli to get current stored scheduler configuration

2019-12-09 Thread Oleg Bonar (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991459#comment-16991459
 ] 

Oleg Bonar edited comment on YARN-9781 at 12/9/19 10:06 AM:


Hi [~snemeth]! I'm not sure if it is offtopic or not but you said

??What I don't like is that this flag is set to true even for parameter that 
don't have an option (like your new one, 'getConf') at all??.
 It seems that there is a little misunderstanding on the meaning of the flag. 
Please see my comment and proposals on YARN-10014.


was (Author: oleg_bonar):
Hi [~snemeth]! I'm not sure if it is oftopic or not but you said

??What I don't like is that this flag is set to true even for parameter that 
don't have an option (like your new one, 'getConf') at all??.
 It seems that there is a little misunderstanding on the meaning of the flag. 
Please see my comment and proposals on YARN-10014.

> SchedConfCli to get current stored scheduler configuration
> --
>
> Key: YARN-9781
> URL: https://issues.apache.org/jira/browse/YARN-9781
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9781-001.patch, YARN-9781-002.patch, 
> YARN-9781-003.patch, YARN-9781-004.patch, YARN-9781-005.patch, 
> YARN-9781-006.patch, YARN-9781-007.patch, YARN-9781-008.patch
>
>
> SchedConfCLI currently allows to add / remove / remove queue. It does not 
> support get configuration which RMWebServices provides as part of YARN-8559.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9781) SchedConfCli to get current stored scheduler configuration

2019-12-09 Thread Oleg Bonar (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991459#comment-16991459
 ] 

Oleg Bonar commented on YARN-9781:
--

Hi [~snemeth]! I'm not sure if it is oftopic or not but you said

??What I don't like is that this flag is set to true even for parameter that 
don't have an option (like your new one, 'getConf') at all??.
 It seems that there is a little misunderstanding on the meaning of the flag. 
Please see my comment and proposals on YARN-10014.

> SchedConfCli to get current stored scheduler configuration
> --
>
> Key: YARN-9781
> URL: https://issues.apache.org/jira/browse/YARN-9781
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9781-001.patch, YARN-9781-002.patch, 
> YARN-9781-003.patch, YARN-9781-004.patch, YARN-9781-005.patch, 
> YARN-9781-006.patch, YARN-9781-007.patch, YARN-9781-008.patch
>
>
> SchedConfCLI currently allows to add / remove / remove queue. It does not 
> support get configuration which RMWebServices provides as part of YARN-8559.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9985) Unsupported "transitionToObserver" option displaying for rmadmin command

2019-12-09 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991436#comment-16991436
 ] 

Hudson commented on YARN-9985:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17743 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/17743/])
YARN-9985. Unsupported transitionToObserver option displaying for (aajisaka: 
rev dc66de744826e0501040f8c2ca9e1edc076a80cf)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestRMAdminCLI.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/RMAdminCLI.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/YarnCommands.md


> Unsupported "transitionToObserver" option displaying for rmadmin command
> 
>
> Key: YARN-9985
> URL: https://issues.apache.org/jira/browse/YARN-9985
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: RM, yarn
>Affects Versions: 3.2.1
>Reporter: Souryakanta Dwivedy
>Assignee: Ayush Saxena
>Priority: Minor
> Fix For: 3.3.0, 3.2.2
>
> Attachments: YARN-9985-01.patch, YARN-9985-02.patch, 
> image-2019-11-18-18-31-17-755.png, image-2019-11-18-18-35-54-688.png
>
>
> Unsupported "transitionToObserver" option displaying for rmadmin command
> Check the options for Yarn rmadmin command
> It will display the "-transitionToObserver " option which is not 
> supported 
>  by yarn rmadmin command which is wrong behavior.
>  But if you check the yarn rmadmin -help it will not display any option  
> "-transitionToObserver "
>  
> !image-2019-11-18-18-31-17-755.png!
>  
> ==
> install/hadoop/resourcemanager/bin> ./yarn rmadmin -help
> rmadmin is the command to execute YARN administrative commands.
> The full syntax is:
> yarn rmadmin [-refreshQueues] [-refreshNodes [-g|graceful [timeout in 
> seconds] -client|server]] [-refreshNodesResources] 
> [-refreshSuperUserGroupsConfiguration] [-refreshUserToGroupsMappings] 
> [-refreshAdminAcls] [-refreshServiceAcl] [-getGroup [username]] 
> [-addToClusterNodeLabels 
> <"label1(exclusive=true),label2(exclusive=false),label3">] 
> [-removeFromClusterNodeLabels ] [-replaceLabelsOnNode 
> <"node1[:port]=label1,label2 node2[:port]=label1"> [-failOnUnknownNodes]] 
> [-directlyAccessNodeLabelStore] [-refreshClusterMaxPriority] 
> [-updateNodeResource [NodeID] [MemSize] [vCores] ([OvercommitTimeout]) or 
> -updateNodeResource [NodeID] [ResourceTypes] ([OvercommitTimeout])] 
> *{color:#FF}[-transitionToActive [--forceactive] ]{color} 
> {color:#FF}[-transitionToStandby ]{color}* [-getServiceState 
> ] [-getAllServiceState] [-checkHealth ] [-help [cmd]]
> -refreshQueues: Reload the queues' acls, states and scheduler specific 
> properties.
>  ResourceManager will reload the mapred-queues configuration file.
>  -refreshNodes [-g|graceful [timeout in seconds] -client|server]: Refresh the 
> hosts information at the ResourceManager. Here [-g|graceful [timeout in 
> seconds] -client|server] is optional, if we specify the timeout then 
> ResourceManager will wait for timeout before marking the NodeManager as 
> decommissioned. The -client|server indicates if the timeout tracking should 
> be handled by the client or the ResourceManager. The client-side tracking is 
> blocking, while the server-side tracking is not. Omitting the timeout, or a 
> timeout of -1, indicates an infinite timeout. Known Issue: the server-side 
> tracking will immediately decommission if an RM HA failover occurs.
>  -refreshNodesResources: Refresh resources of NodeManagers at the 
> ResourceManager.
>  -refreshSuperUserGroupsConfiguration: Refresh superuser proxy groups mappings
>  -refreshUserToGroupsMappings: Refresh user-to-groups mappings
>  -refreshAdminAcls: Refresh acls for administration of ResourceManager
>  -refreshServiceAcl: Reload the service-level authorization policy file.
>  ResourceManager will reload the authorization policy file.
>  -getGroups [username]: Get the groups which given user belongs to.
>  -addToClusterNodeLabels 
> <"label1(exclusive=true),label2(exclusive=false),label3">: add to cluster 
> node labels. Default exclusivity is true
>  -removeFromClusterNodeLabels  (label splitted by ","): 
> remove from cluster node labels
>  -replaceLabelsOnNode <"node1[:port]=label1,label2 
> node2[:port]=label1,label2"> [-failOnUnknownNodes] : replace labels on nodes 
> (please note that we do not support specifying multiple labels on a single 
> host for now.)
>  [-failOnUnknownNodes] is optional, when we set this option, it will fail if 
> specified nodes are unknown.
>  -directlyAccessNodeLabelStore: This is DEPRECATED, will be removed in future 
> releases. 

[jira] [Commented] (YARN-10021) NPE in YARN Registry DNS when wrong DNS message is incoming

2019-12-09 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991427#comment-16991427
 ] 

Hadoop QA commented on YARN-10021:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
26s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m  0s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
36s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 14m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 14m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 50s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
13s{color} | {color:green} hadoop-registry in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
49s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 84m  5s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | YARN-10021 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12988295/YARN-10021.001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 4091646d6ce6 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / e9c5bb8 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/25279/testReport/ |
| Max. process+thread count | 455 (vs. ulimit of 5500) |
| modules | C: hadoop-common-project/hadoop-registry U: 
hadoop-common-project/hadoop-registry |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/25279/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> NPE in YARN Registry DNS when wrong DNS 

[jira] [Updated] (YARN-10020) Fix build instruction of hadoop-yarn-ui

2019-12-09 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated YARN-10020:

Component/s: yarn-ui-v2

> Fix build instruction of hadoop-yarn-ui
> ---
>
> Key: YARN-10020
> URL: https://issues.apache.org/jira/browse/YARN-10020
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
>Priority: Minor
>
> We don't need to manually install package managers such as yarn and bower as 
> described in README.md since frontend-maven-plugin was introduced by 
> YARN-6278.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9985) Unsupported "transitionToObserver" option displaying for rmadmin command

2019-12-09 Thread Akira Ajisaka (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991414#comment-16991414
 ] 

Akira Ajisaka commented on YARN-9985:
-

I'm +1 for this change.

{quote} Can we move HDFS-specific command options from HAAdmin to DFSHAAdmin? 
{quote}

This refactoring can be done in a separate jira.

> Unsupported "transitionToObserver" option displaying for rmadmin command
> 
>
> Key: YARN-9985
> URL: https://issues.apache.org/jira/browse/YARN-9985
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: RM, yarn
>Affects Versions: 3.2.1
>Reporter: Souryakanta Dwivedy
>Assignee: Ayush Saxena
>Priority: Minor
> Attachments: YARN-9985-01.patch, YARN-9985-02.patch, 
> image-2019-11-18-18-31-17-755.png, image-2019-11-18-18-35-54-688.png
>
>
> Unsupported "transitionToObserver" option displaying for rmadmin command
> Check the options for Yarn rmadmin command
> It will display the "-transitionToObserver " option which is not 
> supported 
>  by yarn rmadmin command which is wrong behavior.
>  But if you check the yarn rmadmin -help it will not display any option  
> "-transitionToObserver "
>  
> !image-2019-11-18-18-31-17-755.png!
>  
> ==
> install/hadoop/resourcemanager/bin> ./yarn rmadmin -help
> rmadmin is the command to execute YARN administrative commands.
> The full syntax is:
> yarn rmadmin [-refreshQueues] [-refreshNodes [-g|graceful [timeout in 
> seconds] -client|server]] [-refreshNodesResources] 
> [-refreshSuperUserGroupsConfiguration] [-refreshUserToGroupsMappings] 
> [-refreshAdminAcls] [-refreshServiceAcl] [-getGroup [username]] 
> [-addToClusterNodeLabels 
> <"label1(exclusive=true),label2(exclusive=false),label3">] 
> [-removeFromClusterNodeLabels ] [-replaceLabelsOnNode 
> <"node1[:port]=label1,label2 node2[:port]=label1"> [-failOnUnknownNodes]] 
> [-directlyAccessNodeLabelStore] [-refreshClusterMaxPriority] 
> [-updateNodeResource [NodeID] [MemSize] [vCores] ([OvercommitTimeout]) or 
> -updateNodeResource [NodeID] [ResourceTypes] ([OvercommitTimeout])] 
> *{color:#FF}[-transitionToActive [--forceactive] ]{color} 
> {color:#FF}[-transitionToStandby ]{color}* [-getServiceState 
> ] [-getAllServiceState] [-checkHealth ] [-help [cmd]]
> -refreshQueues: Reload the queues' acls, states and scheduler specific 
> properties.
>  ResourceManager will reload the mapred-queues configuration file.
>  -refreshNodes [-g|graceful [timeout in seconds] -client|server]: Refresh the 
> hosts information at the ResourceManager. Here [-g|graceful [timeout in 
> seconds] -client|server] is optional, if we specify the timeout then 
> ResourceManager will wait for timeout before marking the NodeManager as 
> decommissioned. The -client|server indicates if the timeout tracking should 
> be handled by the client or the ResourceManager. The client-side tracking is 
> blocking, while the server-side tracking is not. Omitting the timeout, or a 
> timeout of -1, indicates an infinite timeout. Known Issue: the server-side 
> tracking will immediately decommission if an RM HA failover occurs.
>  -refreshNodesResources: Refresh resources of NodeManagers at the 
> ResourceManager.
>  -refreshSuperUserGroupsConfiguration: Refresh superuser proxy groups mappings
>  -refreshUserToGroupsMappings: Refresh user-to-groups mappings
>  -refreshAdminAcls: Refresh acls for administration of ResourceManager
>  -refreshServiceAcl: Reload the service-level authorization policy file.
>  ResourceManager will reload the authorization policy file.
>  -getGroups [username]: Get the groups which given user belongs to.
>  -addToClusterNodeLabels 
> <"label1(exclusive=true),label2(exclusive=false),label3">: add to cluster 
> node labels. Default exclusivity is true
>  -removeFromClusterNodeLabels  (label splitted by ","): 
> remove from cluster node labels
>  -replaceLabelsOnNode <"node1[:port]=label1,label2 
> node2[:port]=label1,label2"> [-failOnUnknownNodes] : replace labels on nodes 
> (please note that we do not support specifying multiple labels on a single 
> host for now.)
>  [-failOnUnknownNodes] is optional, when we set this option, it will fail if 
> specified nodes are unknown.
>  -directlyAccessNodeLabelStore: This is DEPRECATED, will be removed in future 
> releases. Directly access node label store, with this option, all node label 
> related operations will not connect RM. Instead, they will access/modify 
> stored node labels directly. By default, it is false (access via RM). AND 
> PLEASE NOTE: if you configured yarn.node-labels.fs-store.root-dir to a local 
> directory (instead of NFS or HDFS), this option will only work when the 
> command run on the machine where RM is running.
>  -refreshClusterMaxPriority: Refresh 

[jira] [Resolved] (YARN-9987) Upgrade bower to 1.8.8

2019-12-09 Thread Akira Ajisaka (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka resolved YARN-9987.
-
Resolution: Done

The PR has been merged.

> Upgrade bower to 1.8.8
> --
>
> Key: YARN-9987
> URL: https://issues.apache.org/jira/browse/YARN-9987
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Reporter: Akira Ajisaka
>Priority: Major
>
> Merge https://github.com/apache/hadoop/pull/1683 to fix some vulnerabilities.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10020) Fix build instruction of hadoop-yarn-ui

2019-12-09 Thread Masatake Iwasaki (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991410#comment-16991410
 ] 

Masatake Iwasaki commented on YARN-10020:
-

Thanks for the comment [~akhilpb].

I think we can reuse node and yarn locally installed by frontend-maven-plugin 
(under target/webapp/) for local testing and debugging. I will try it and add 
the way to the README if it works. I would like to add/update the versions of 
prerequisites for manual installation at least.

> Fix build instruction of hadoop-yarn-ui
> ---
>
> Key: YARN-10020
> URL: https://issues.apache.org/jira/browse/YARN-10020
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
>Priority: Minor
>
> We don't need to manually install package managers such as yarn and bower as 
> described in README.md since frontend-maven-plugin was introduced by 
> YARN-6278.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10015) Correct the sample command in SLS README file

2019-12-09 Thread Adam Antal (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991382#comment-16991382
 ] 

Adam Antal commented on YARN-10015:
---

+1 (non-binding).

> Correct the sample command in SLS README file
> -
>
> Key: YARN-10015
> URL: https://issues.apache.org/jira/browse/YARN-10015
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Trivial
> Attachments: YARN-10015.patch
>
>
> The sample command in SLS README {{bin/slsrun.sh 
> —-input-rumen=sample-data/2jobs2min-rumen-jh.json 
> —-output-dir=sample-output}} contains a dash from different encoding. The 
> command will give the following exception. 
> ERROR: Invalid option —-input-rumen=sample-data/2jobs2min-rumen-jh.json



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10018) container-executor: possible -1 return value of fork() is not always checked

2019-12-09 Thread Adam Antal (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991380#comment-16991380
 ] 

Adam Antal commented on YARN-10018:
---

Thanks for the patch [~pbacsko].

It looks good overall. I have a question regarding the error codes, because you 
throw different exit codes each time of the forks. Wouldn't it be better to 
throw ERROR_FORKING_PROCESS always? I know that the context is a bit different 
each time.

> container-executor: possible -1 return value of fork() is not always checked
> 
>
> Key: YARN-10018
> URL: https://issues.apache.org/jira/browse/YARN-10018
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-10018-001.patch, YARN-10018-001.patch
>
>
> There are some places in the container-executor native, where the {{fork()}} 
> call is not handled properly. This operation can fail with -1, but sometimes 
> the necessary if branch is missing to validate that it's been successful.
> Also, at one location, the return value is defined as an {{int}}, not 
> {{pid_t}}. It's better to handle this transparently and change it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10020) Fix build instruction of hadoop-yarn-ui

2019-12-09 Thread Akhil PB (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991237#comment-16991237
 ] 

Akhil PB commented on YARN-10020:
-

[~iwasakims] The prerequisites mentioned in the README.md is for local 
development. In case of hadoop release builds, frontend-maven-plugin will be 
used. Usually we are not using maven for local dev, we will be running ember 
dev server running locally (hence yarn and bower).

cc: [~sunilg]

> Fix build instruction of hadoop-yarn-ui
> ---
>
> Key: YARN-10020
> URL: https://issues.apache.org/jira/browse/YARN-10020
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
>Priority: Minor
>
> We don't need to manually install package managers such as yarn and bower as 
> described in README.md since frontend-maven-plugin was introduced by 
> YARN-6278.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-10021) NPE in YARN Registry DNS when wrong DNS message is incoming

2019-12-09 Thread kyungwan nam (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kyungwan nam reassigned YARN-10021:
---

Attachment: YARN-10021.001.patch
  Assignee: kyungwan nam

> NPE in YARN Registry DNS when wrong DNS message is incoming
> ---
>
> Key: YARN-10021
> URL: https://issues.apache.org/jira/browse/YARN-10021
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: kyungwan nam
>Assignee: kyungwan nam
>Priority: Major
> Attachments: YARN-10021.001.patch
>
>
> I’ve met NPE in YARN Registry DNS as below.
> It looks like this happens if the incoming DNS request is the wrong format.
> {code:java}
> 2019-11-29 10:51:12,178 ERROR dns.RegistryDNS (RegistryDNS.java:call(932)) - 
> Error initializing DNS UDP listener
> java.lang.NullPointerException
> at java.nio.ByteBuffer.put(ByteBuffer.java:859)
> at 
> org.apache.hadoop.registry.server.dns.RegistryDNS.serveNIOUDP(RegistryDNS.java:983)
> at 
> org.apache.hadoop.registry.server.dns.RegistryDNS.access$100(RegistryDNS.java:121)
> at 
> org.apache.hadoop.registry.server.dns.RegistryDNS$5.call(RegistryDNS.java:930)
> at 
> org.apache.hadoop.registry.server.dns.RegistryDNS$5.call(RegistryDNS.java:926)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> 2019-11-29 10:51:12,180 WARN  concurrent.ExecutorHelper 
> (ExecutorHelper.java:logThrowableFromAfterExecute(50)) - Execution exception 
> when running task in RegistryDNS 1
> 2019-11-29 10:51:12,180 WARN  concurrent.ExecutorHelper 
> (ExecutorHelper.java:logThrowableFromAfterExecute(63)) - Caught exception in 
> thread RegistryDNS 1:
> java.lang.NullPointerException
> at java.nio.ByteBuffer.put(ByteBuffer.java:859)
> at 
> org.apache.hadoop.registry.server.dns.RegistryDNS.serveNIOUDP(RegistryDNS.java:983)
> at 
> org.apache.hadoop.registry.server.dns.RegistryDNS.access$100(RegistryDNS.java:121)
> at 
> org.apache.hadoop.registry.server.dns.RegistryDNS$5.call(RegistryDNS.java:930)
> at 
> org.apache.hadoop.registry.server.dns.RegistryDNS$5.call(RegistryDNS.java:926)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-10021) NPE in YARN Registry DNS when wrong DNS message is incoming

2019-12-09 Thread kyungwan nam (Jira)
kyungwan nam created YARN-10021:
---

 Summary: NPE in YARN Registry DNS when wrong DNS message is 
incoming
 Key: YARN-10021
 URL: https://issues.apache.org/jira/browse/YARN-10021
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: kyungwan nam


I’ve met NPE in YARN Registry DNS as below.
It looks like this happens if the incoming DNS request is the wrong format.

{code:java}
2019-11-29 10:51:12,178 ERROR dns.RegistryDNS (RegistryDNS.java:call(932)) - 
Error initializing DNS UDP listener
java.lang.NullPointerException
at java.nio.ByteBuffer.put(ByteBuffer.java:859)
at 
org.apache.hadoop.registry.server.dns.RegistryDNS.serveNIOUDP(RegistryDNS.java:983)
at 
org.apache.hadoop.registry.server.dns.RegistryDNS.access$100(RegistryDNS.java:121)
at 
org.apache.hadoop.registry.server.dns.RegistryDNS$5.call(RegistryDNS.java:930)
at 
org.apache.hadoop.registry.server.dns.RegistryDNS$5.call(RegistryDNS.java:926)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
2019-11-29 10:51:12,180 WARN  concurrent.ExecutorHelper 
(ExecutorHelper.java:logThrowableFromAfterExecute(50)) - Execution exception 
when running task in RegistryDNS 1
2019-11-29 10:51:12,180 WARN  concurrent.ExecutorHelper 
(ExecutorHelper.java:logThrowableFromAfterExecute(63)) - Caught exception in 
thread RegistryDNS 1:
java.lang.NullPointerException
at java.nio.ByteBuffer.put(ByteBuffer.java:859)
at 
org.apache.hadoop.registry.server.dns.RegistryDNS.serveNIOUDP(RegistryDNS.java:983)
at 
org.apache.hadoop.registry.server.dns.RegistryDNS.access$100(RegistryDNS.java:121)
at 
org.apache.hadoop.registry.server.dns.RegistryDNS$5.call(RegistryDNS.java:930)
at 
org.apache.hadoop.registry.server.dns.RegistryDNS$5.call(RegistryDNS.java:926)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org