[jira] [Commented] (YARN-10397) SchedulerRequest should be forwarded to scheduler if custom scheduler supports placement constraints

2020-09-09 Thread Brahma Reddy Battula (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17192799#comment-17192799
 ] 

Brahma Reddy Battula commented on YARN-10397:
-

+1, Going to commit shortly.

> SchedulerRequest should be forwarded to scheduler if custom scheduler 
> supports placement constraints
> 
>
> Key: YARN-10397
> URL: https://issues.apache.org/jira/browse/YARN-10397
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Minor
> Attachments: YARN-10397.001.patch, YARN-10397.002.patch
>
>
> Currently only CapacityScheduler supports placement constraints so request 
> gets forwarded only for capacityScheduler. Below exception will be thrown if 
> custom scheduler supports placement constraint
> {code:java}
> if (request.getSchedulingRequests() != null
> && !request.getSchedulingRequests().isEmpty()) {
>   if (!(scheduler instanceof CapacityScheduler)) {
> String message = "Found non empty SchedulingRequest of "
> + "AllocateRequest for application=" + appAttemptId.toString()
> + ", however the configured scheduler="
> + scheduler.getClass().getCanonicalName()
> + " cannot handle placement constraints, rejecting this "
> + "allocate operation";
> LOG.warn(message);
> throw new YarnException(message);
>   }
> }
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10386) Create new JSON schema for Placement Rules

2020-08-30 Thread Brahma Reddy Battula (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17187290#comment-17187290
 ] 

Brahma Reddy Battula commented on YARN-10386:
-

[~snemeth] and [~pbacsko].

 

Looks this patch introduced some ASF Warnings, please try to address check 
following for same. (Even last yetus report of the this Jira also show this 
errors.)

[https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/116/artifact/out/patch-asflicense-problems.txt]

> Create new JSON schema for Placement Rules
> --
>
> Key: YARN-10386
> URL: https://issues.apache.org/jira/browse/YARN-10386
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler, capacityscheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: MappingRulesDescription_v1.json, YARN-10386-001.patch, 
> YARN-10386-002.patch, YARN-10386-003.patch, YARN-10386-004.patch, 
> YARN-10386-005.patch, YARN-10386-006.patch, YARN-10386-007.patch, 
> YARN-10386-008.patch
>
>
> Tasks in this JIRA:
>  # Create new JSON schema
>  # Add Maven plugin which generates Java POJOs based on the schema
>  # Add helper class which essentially does the same as #2 (for dev purposes)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10229) [Federation] Client should be able to submit application to RM directly using normal client conf

2020-07-17 Thread Brahma Reddy Battula (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17159697#comment-17159697
 ] 

Brahma Reddy Battula commented on YARN-10229:
-

+1.

> [Federation] Client should be able to submit application to RM directly using 
> normal client conf
> 
>
> Key: YARN-10229
> URL: https://issues.apache.org/jira/browse/YARN-10229
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: amrmproxy, federation
>Affects Versions: 3.1.1
>Reporter: JohnsonGuo
>Assignee: Bilwa S T
>Priority: Major
> Attachments: YARN-10229.001.patch, YARN-10229.002.patch, 
> YARN-10229.003.patch, YARN-10229.004.patch, YARN-10229.005.patch, 
> YARN-10229.006.patch, YARN-10229.007.patch, YARN-10229.008.patch
>
>
> Scenario: When enable the yarn federation feature with multi yarn clusters, 
> one can submit their job to yarn-router by *modified* their client 
> configuration with yarn router address.
> But if one still wants to submit their jobs via the original client (before 
> enable federation) to RM directly, it will encounter the AMRMToken exception. 
>  That means once enable federation ,if some one want to submit job, they have 
> to  modify the client conf.
>  
> one possible solution for this Scenario is:
> In NodeManger, when the client ApplicationMaster request comes:
>  * get the client job.xml  from HDFS "".
>  * parse the "yarn.resourcemanager.scheduler.address" parameter in job.xml
>  * if the value of the parameter is "localhost:8049"(AMRM address),then do 
> the AMRMToken valid process
>  * if the value of the parameter is "rm:port"(rm address),then skip the 
> AMRMToken valid process
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10291) Yarn service commands doesn't work when https is enabled in RM

2020-07-13 Thread Brahma Reddy Battula (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17156763#comment-17156763
 ] 

Brahma Reddy Battula commented on YARN-10291:
-

[~eyang] thanks for bringing up this.. this makes me to trigger why hadoop uses 
SSLEngine.

HDFS, MapReduce, and YARN use the Hadoop SSL Keystore Factory to manage 
SSLHDFS, MapReduce, and YARN use the Hadoop SSL Keystore Factory to manage 
SSLCertificates. This factory uses a common directory for server keystore and 
client truststore.The Hadoop SSL Keystore Factory allows you to use CA 
certificates managed in their own stores.

 The following list describes major differences between certificates managed by 
the HadoopSSL Keystore Management Factory and certificates managed by JDK:
 * Hadoop SSL Keystore Management Factory:
 ** Supports only JKS formatted keys.
 **  Supports toggling the shuffle between HTTP and HTTPS.
 **  Supports two way certificate and name validation.
 **  Uses a common location for both the keystore and truststore that is 
available to other       Hadoop core services.
 **  Allows you to manage SSL in a central location and propagate changes to 
all cluster nodes.
 **  Automatically reloads the keystore and truststore without restarting 
services.
 * • SSL Management with JDK:
 **  Allows either HTTP or HTTPS.
 ** Uses hard-coded locations for truststores and keystores that may vary 
between hosts. Typically, this requires you to generate key pairs and import 
certificates on each host.
 **  Requires the service to be restarted to reload the keystores and 
truststores.
 **  Requires certificates to be installed in the client CA truststore.

 
{quote}The odd ends of Hadoop ssl is having odd implementation of SSL support, 
which does not have reliable accepted issuer validation.
{quote}
 

if this true, then we need to re-look whoever using in hadoop right..?  and can 
you elaborate more this..?

 

> Yarn service commands doesn't work when https is enabled in RM
> --
>
> Key: YARN-10291
> URL: https://issues.apache.org/jira/browse/YARN-10291
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Major
> Attachments: YARN-10291.001.patch
>
>
> when we submit application using command "yarn app -launch sleeper-service 
> ../share/hadoop/yarn/yarn-service-examples/sleeper/sleeper.json" , it throws 
> below exception 
> {code:java}
> com.sun.jersey.api.client.ClientHandlerException: 
> javax.net.ssl.SSLHandshakeException: 
> sun.security.validator.ValidatorException: PKIX path building failed: 
> sun.security.provider.certpath.SunCertPathBuilderException: unable to find 
> valid certification path to requested target
> {code}
> We should use WebServiceClient#createClient as it takes care of setting 
> sslfactory when https is called.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10341) Yarn Service Container Completed event doesn't get processed

2020-07-09 Thread Brahma Reddy Battula (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154250#comment-17154250
 ] 

Brahma Reddy Battula commented on YARN-10341:
-

[~BilwaST] thanks bilwa for addressing the checkstyle, going to commit shortly.

 

> Yarn Service Container Completed event doesn't get processed 
> -
>
> Key: YARN-10341
> URL: https://issues.apache.org/jira/browse/YARN-10341
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Critical
> Attachments: YARN-10341.001.patch, YARN-10341.002.patch, 
> YARN-10341.003.patch, YARN-10341.004.patch
>
>
> If there 10 workers running and if containers get killed , after a while we 
> see that there are just 9 workers runnning. This is due to CONTAINER 
> COMPLETED Event is not processed on AM side. 
> Issue is in below code:
> {code:java}
> public void onContainersCompleted(List statuses) {
>   for (ContainerStatus status : statuses) {
> ContainerId containerId = status.getContainerId();
> ComponentInstance instance = 
> liveInstances.get(status.getContainerId());
> if (instance == null) {
>   LOG.warn(
>   "Container {} Completed. No component instance exists. 
> exitStatus={}. diagnostics={} ",
>   containerId, status.getExitStatus(), status.getDiagnostics());
>   return;
> }
> ComponentEvent event =
> new ComponentEvent(instance.getCompName(), CONTAINER_COMPLETED)
> .setStatus(status).setInstance(instance)
> .setContainerId(containerId);
> dispatcher.getEventHandler().handle(event);
>   }
> {code}
> If component instance doesnt exist for a container, it doesnt iterate over 
> other containers as its returning from method



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10347) Fix double locking in CapacityScheduler#reinitialize in branch-3.1

2020-07-08 Thread Brahma Reddy Battula (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154183#comment-17154183
 ] 

Brahma Reddy Battula commented on YARN-10347:
-

[~iwasakims], thanks for reporting this.. Same will applicable to trunnk and 
other versions..?

and can give more details on this..?

> Fix double locking in CapacityScheduler#reinitialize in branch-3.1
> --
>
> Key: YARN-10347
> URL: https://issues.apache.org/jira/browse/YARN-10347
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.1.4
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
>Priority: Critical
> Attachments: YARN-10347-branch-3.1.001.patch
>
>
> Double locking blocks another threads in ResourceManager waiting for the lock.
> I found the issue on testing hadoop-3.1.4-RC2 with RM-HA enabled deployment. 
> ResourceManager blocks on {{submitApplication}} waiting for the lock when I 
> run example MR applications.
> {noformat}
> "IPC Server handler 45 on default port 8032" #211 daemon prio=5 os_prio=0 
> tid=0x7f0e45a40200 nid=0x418 waiting on condition [0x7f0e14abe000]
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x85d56510> (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
> at 
> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.checkAndGetApplicationPriority(CapacityScheduler.java:2521)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:417)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:342)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:678)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:277)
> at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:563)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:527)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1036)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1015)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:943)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2943)
> {noformat}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10324) Fetch data from NodeManager may case read timeout when disk is busy

2020-07-08 Thread Brahma Reddy Battula (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154155#comment-17154155
 ] 

Brahma Reddy Battula commented on YARN-10324:
-

Updated the target version to 3.3.1 as 3.3.0 is going to release.

> Fetch data from NodeManager may case read timeout when disk is busy
> ---
>
> Key: YARN-10324
> URL: https://issues.apache.org/jira/browse/YARN-10324
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: auxservices
>Affects Versions: 2.7.0, 3.2.1
>Reporter: Yao Guangdong
>Priority: Minor
>  Labels: patch
> Attachments: YARN-10324.001.patch
>
>
>  With the cluster size become more and more big.The cost  time on Reduce 
> fetch Map's result from NodeManager become more and more long.We often see 
> the WARN logs in the reduce's logs as follow.
> {quote}2020-06-19 15:43:15,522 WARN [fetcher#8] 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher: Failed to connect to 
> TX-196-168-211.com:13562 with 5 map outputs
> java.net.SocketTimeoutException: Read timed out
> at java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
> at java.net.SocketInputStream.read(SocketInputStream.java:171)
> at java.net.SocketInputStream.read(SocketInputStream.java:141)
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
> at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
> at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:735)
> at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:678)
> at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1587)
> at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1492)
> at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480)
> at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.verifyConnection(Fetcher.java:434)
> at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.setupConnectionsWithRetry(Fetcher.java:400)
> at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.openShuffleUrl(Fetcher.java:271)
> at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:330)
> at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:198)
> {quote}
>  We check the NodeManager server find that the disk IO util and connections 
> became very high when the read timeout happened.We analyze that if we have 
> 20,000 maps and 1,000 reduces which will make NodeManager generate 20 million 
> times IO stream operate in the shuffle phase.If the reduce fetch data size is 
> very small from map output files.Which make the disk IO util become very high 
> in big cluster.Then read timeout happened frequently.The application finished 
> time become longer.
> We find ShuffleHandler have IndexCache for cache file.out.index file.Then we 
> want to change the small IO to big IO which can reduce the small disk IO 
> times. So we try to cache all the small file data(file.out) in memory when 
> the first fetch request come.Then the others fetch request only need read 
> data from memory avoid disk IO operation.After we cache data to memory we 
> find the read timeout disappeared.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10324) Fetch data from NodeManager may case read timeout when disk is busy

2020-07-08 Thread Brahma Reddy Battula (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-10324:

Target Version/s: 2.7.8, 3.3.1  (was: 2.7.8, 3.3.0)

> Fetch data from NodeManager may case read timeout when disk is busy
> ---
>
> Key: YARN-10324
> URL: https://issues.apache.org/jira/browse/YARN-10324
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: auxservices
>Affects Versions: 2.7.0, 3.2.1
>Reporter: Yao Guangdong
>Priority: Minor
>  Labels: patch
> Attachments: YARN-10324.001.patch
>
>
>  With the cluster size become more and more big.The cost  time on Reduce 
> fetch Map's result from NodeManager become more and more long.We often see 
> the WARN logs in the reduce's logs as follow.
> {quote}2020-06-19 15:43:15,522 WARN [fetcher#8] 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher: Failed to connect to 
> TX-196-168-211.com:13562 with 5 map outputs
> java.net.SocketTimeoutException: Read timed out
> at java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
> at java.net.SocketInputStream.read(SocketInputStream.java:171)
> at java.net.SocketInputStream.read(SocketInputStream.java:141)
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
> at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
> at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:735)
> at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:678)
> at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1587)
> at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1492)
> at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480)
> at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.verifyConnection(Fetcher.java:434)
> at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.setupConnectionsWithRetry(Fetcher.java:400)
> at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.openShuffleUrl(Fetcher.java:271)
> at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:330)
> at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:198)
> {quote}
>  We check the NodeManager server find that the disk IO util and connections 
> became very high when the read timeout happened.We analyze that if we have 
> 20,000 maps and 1,000 reduces which will make NodeManager generate 20 million 
> times IO stream operate in the shuffle phase.If the reduce fetch data size is 
> very small from map output files.Which make the disk IO util become very high 
> in big cluster.Then read timeout happened frequently.The application finished 
> time become longer.
> We find ShuffleHandler have IndexCache for cache file.out.index file.Then we 
> want to change the small IO to big IO which can reduce the small disk IO 
> times. So we try to cache all the small file data(file.out) in memory when 
> the first fetch request come.Then the others fetch request only need read 
> data from memory avoid disk IO operation.After we cache data to memory we 
> find the read timeout disappeared.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10341) Yarn Service Container Completed event doesn't get processed

2020-07-07 Thread Brahma Reddy Battula (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17152749#comment-17152749
 ] 

Brahma Reddy Battula commented on YARN-10341:
-

[~BilwaST] thanks for reporting.

Looks to be hidden bug here..Patch lgtm.. Try to add one UT for this.

 

> Yarn Service Container Completed event doesn't get processed 
> -
>
> Key: YARN-10341
> URL: https://issues.apache.org/jira/browse/YARN-10341
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Critical
> Attachments: YARN-10341.001.patch
>
>
> If there 10 workers running and if containers get killed , after a while we 
> see that there are just 9 workers runnning. This is due to CONTAINER 
> COMPLETED Event is not processed on AM side. 
> Issue is in below code:
> {code:java}
> public void onContainersCompleted(List statuses) {
>   for (ContainerStatus status : statuses) {
> ContainerId containerId = status.getContainerId();
> ComponentInstance instance = 
> liveInstances.get(status.getContainerId());
> if (instance == null) {
>   LOG.warn(
>   "Container {} Completed. No component instance exists. 
> exitStatus={}. diagnostics={} ",
>   containerId, status.getExitStatus(), status.getDiagnostics());
>   return;
> }
> ComponentEvent event =
> new ComponentEvent(instance.getCompName(), CONTAINER_COMPLETED)
> .setStatus(status).setInstance(instance)
> .setContainerId(containerId);
> dispatcher.getEventHandler().handle(event);
>   }
> {code}
> If component instance doesnt exist for a container, it doesnt iterate over 
> other containers as its returning from method



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10340) HsWebServices getContainerReport uses loginUser instead of remoteUser to access ApplicationClientProtocol

2020-07-06 Thread Brahma Reddy Battula (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17152142#comment-17152142
 ] 

Brahma Reddy Battula commented on YARN-10340:
-

does this related to HADOOP-16095?

> HsWebServices getContainerReport uses loginUser instead of remoteUser to 
> access ApplicationClientProtocol
> -
>
> Key: YARN-10340
> URL: https://issues.apache.org/jira/browse/YARN-10340
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Prabhu Joseph
>Assignee: Tarun Parimi
>Priority: Major
>
> HsWebServices getContainerReport uses loginUser instead of remoteUser to 
> access ApplicationClientProtocol
>  
> [http://:19888/ws/v1/history/containers/container_e03_1594030808801_0002_01_03/logs|http://pjoseph-secure-1.pjoseph-secure.root.hwx.site:19888/ws/v1/history/containers/container_e03_1594030808801_0002_01_03/logs]
> While accessing above link using systest user, the request fails saying 
> mapred user does not have access to the job
>  
> {code:java}
> 2020-07-06 14:02:59,178 WARN org.apache.hadoop.yarn.server.webapp.LogServlet: 
> Could not obtain node HTTP address from provider.
> javax.ws.rs.WebApplicationException: 
> org.apache.hadoop.yarn.exceptions.YarnException: User mapred does not have 
> privilege to see this application application_1593997842459_0214
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getContainerReport(ClientRMService.java:516)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getContainerReport(ApplicationClientProtocolPBServiceImpl.java:466)
> at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:639)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:985)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:913)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2882)
> at 
> org.apache.hadoop.yarn.server.webapp.WebServices.rewrapAndThrowThrowable(WebServices.java:544)
> at 
> org.apache.hadoop.yarn.server.webapp.WebServices.rewrapAndThrowException(WebServices.java:530)
> at 
> org.apache.hadoop.yarn.server.webapp.WebServices.getContainer(WebServices.java:405)
> at 
> org.apache.hadoop.yarn.server.webapp.WebServices.getNodeHttpAddress(WebServices.java:373)
> at 
> org.apache.hadoop.yarn.server.webapp.LogServlet.getContainerLogsInfo(LogServlet.java:268)
> at 
> org.apache.hadoop.mapreduce.v2.hs.webapp.HsWebServices.getContainerLogs(HsWebServices.java:461)
>  
> {code}
> On Analyzing, found WebServices#getContainer uses doAs using UGI created by 
> createRemoteUser(end user) to access RM#ApplicationClientProtocol which does 
> not work. Need to use createProxyUser to do the same.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10229) [Federation] Client should be able to submit application to RM directly using normal client conf

2020-06-11 Thread Brahma Reddy Battula (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17133286#comment-17133286
 ] 

Brahma Reddy Battula commented on YARN-10229:
-

[~elgoiri],[~bibinchundatt] and [~subru], will you look into latest patch..?

> [Federation] Client should be able to submit application to RM directly using 
> normal client conf
> 
>
> Key: YARN-10229
> URL: https://issues.apache.org/jira/browse/YARN-10229
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: amrmproxy, federation
>Affects Versions: 3.1.1
>Reporter: JohnsonGuo
>Assignee: Bilwa S T
>Priority: Major
> Attachments: YARN-10229.001.patch
>
>
> Scenario: When enable the yarn federation feature with multi yarn clusters, 
> one can submit their job to yarn-router by *modified* their client 
> configuration with yarn router address.
> But if one still wants to submit their jobs via the original client (before 
> enable federation) to RM directly, it will encounter the AMRMToken exception. 
>  That means once enable federation ,if some one want to submit job, they have 
> to  modify the client conf.
>  
> one possible solution for this Scenario is:
> In NodeManger, when the client ApplicationMaster request comes:
>  * get the client job.xml  from HDFS "".
>  * parse the "yarn.resourcemanager.scheduler.address" parameter in job.xml
>  * if the value of the parameter is "localhost:8049"(AMRM address),then do 
> the AMRMToken valid process
>  * if the value of the parameter is "rm:port"(rm address),then skip the 
> AMRMToken valid process
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6526) Refactoring SQLFederationStateStore by avoiding to recreate a connection at every call

2020-06-11 Thread Brahma Reddy Battula (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-6526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17133248#comment-17133248
 ] 

Brahma Reddy Battula commented on YARN-6526:


[~BilwaST], thanks for updating the patch.. Latest patch lgtm.. I will hold for 
commit till [~elgoiri] review.

> Refactoring SQLFederationStateStore by avoiding to recreate a connection at 
> every call
> --
>
> Key: YARN-6526
> URL: https://issues.apache.org/jira/browse/YARN-6526
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation
>Reporter: Giovanni Matteo Fumarola
>Assignee: Bilwa S T
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: YARN-6526.001.patch, YARN-6526.002.patch, 
> YARN-6526.003.patch, YARN-6526.004.patch, YARN-6526.005.patch, 
> YARN-6526.006.patch, YARN-6526.007.patch, YARN-6526.008.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6526) Refactoring SQLFederationStateStore by avoiding to recreate a connection at every call

2020-05-14 Thread Brahma Reddy Battula (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-6526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17107298#comment-17107298
 ] 

Brahma Reddy Battula commented on YARN-6526:


Patch lgtm. apart from the following doubt which i have.

Only I can see incrmenting the connection.
|991|FederationStateStoreClientMetrics.incrConnections();|

 

dn't we need to decrease it while closing the connection in the following 
code..?

 
{code:java}
 /**
 * Returns the SQL FederationStateStore connections to the pool.
 *
 * @param log the logger interface
 * @param cstmt the interface used to execute SQL stored procedures
 * @param conn the SQL connection
 * @param rs the ResultSet interface used to execute SQL stored procedures
 * @throws YarnException on failure
 */
public static void returnToPool(Logger log, CallableStatement cstmt,
 Connection conn, ResultSet rs) throws YarnException {
 if (cstmt != null) {
 try {
 cstmt.close();
 } catch (SQLException e) {
 logAndThrowException(log, "Exception while trying to close Statement",
 e);
 }
 }

 if (conn != null) {
 try {
 conn.close();
 } catch (SQLException e) {
 logAndThrowException(log, "Exception while trying to close Connection",
 e);
 }
 }
{code}
 

> Refactoring SQLFederationStateStore by avoiding to recreate a connection at 
> every call
> --
>
> Key: YARN-6526
> URL: https://issues.apache.org/jira/browse/YARN-6526
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation
>Reporter: Giovanni Matteo Fumarola
>Assignee: Bilwa S T
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: YARN-6526.001.patch, YARN-6526.002.patch, 
> YARN-6526.003.patch, YARN-6526.004.patch, YARN-6526.005.patch, 
> YARN-6526.006.patch, YARN-6526.007.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10229) [Federation] Client should be able to submit application to RM directly using normal client conf

2020-05-14 Thread Brahma Reddy Battula (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17107274#comment-17107274
 ] 

Brahma Reddy Battula commented on YARN-10229:
-

[~BilwaST]  thanks for working on this. Path lgtm. This will be useful for 
while upgrading cluster to YARN federaion, Existing applications might need to 
chnage their configuration.

> [Federation] Client should be able to submit application to RM directly using 
> normal client conf
> 
>
> Key: YARN-10229
> URL: https://issues.apache.org/jira/browse/YARN-10229
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: amrmproxy, federation
>Affects Versions: 3.1.1
>Reporter: JohnsonGuo
>Assignee: Bilwa S T
>Priority: Major
> Attachments: YARN-10229.001.patch
>
>
> Scenario: When enable the yarn federation feature with multi yarn clusters, 
> one can submit their job to yarn-router by *modified* their client 
> configuration with yarn router address.
> But if one still wants to submit their jobs via the original client (before 
> enable federation) to RM directly, it will encounter the AMRMToken exception. 
>  That means once enable federation ,if some one want to submit job, they have 
> to  modify the client conf.
>  
> one possible solution for this Scenario is:
> In NodeManger, when the client ApplicationMaster request comes:
>  * get the client job.xml  from HDFS "".
>  * parse the "yarn.resourcemanager.scheduler.address" parameter in job.xml
>  * if the value of the parameter is "localhost:8049"(AMRM address),then do 
> the AMRMToken valid process
>  * if the value of the parameter is "rm:port"(rm address),then skip the 
> AMRMToken valid process
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10229) [Federation] Client should be able to submit application to RM directly using normal client conf

2020-05-14 Thread Brahma Reddy Battula (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-10229:

Issue Type: Bug  (was: Wish)

> [Federation] Client should be able to submit application to RM directly using 
> normal client conf
> 
>
> Key: YARN-10229
> URL: https://issues.apache.org/jira/browse/YARN-10229
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: amrmproxy, federation
>Affects Versions: 3.1.1
>Reporter: JohnsonGuo
>Assignee: Bilwa S T
>Priority: Major
> Attachments: YARN-10229.001.patch
>
>
> Scenario: When enable the yarn federation feature with multi yarn clusters, 
> one can submit their job to yarn-router by *modified* their client 
> configuration with yarn router address.
> But if one still wants to submit their jobs via the original client (before 
> enable federation) to RM directly, it will encounter the AMRMToken exception. 
>  That means once enable federation ,if some one want to submit job, they have 
> to  modify the client conf.
>  
> one possible solution for this Scenario is:
> In NodeManger, when the client ApplicationMaster request comes:
>  * get the client job.xml  from HDFS "".
>  * parse the "yarn.resourcemanager.scheduler.address" parameter in job.xml
>  * if the value of the parameter is "localhost:8049"(AMRM address),then do 
> the AMRMToken valid process
>  * if the value of the parameter is "rm:port"(rm address),then skip the 
> AMRMToken valid process
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10246) Enable Yarn Router to have a dedicated Zookeeper

2020-05-05 Thread Brahma Reddy Battula (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099548#comment-17099548
 ] 

Brahma Reddy Battula commented on YARN-10246:
-

[~dmmkr] thanks for reporting this. Make to sense to have different ZK 
statestore for Federation.

[~subru]/[~bibinchundatt] looks you had discussed on this topic in 
YARN-5597,any thoughts on this apporach..?

.[~inigoiri]  this can used in HDFS router also.?

 

[~dmmkr] Looks you missed to update "isFederationZkCurator" in the following.
|66|this.isFederationZkCurator = true;|

> Enable Yarn Router to have a dedicated Zookeeper
> 
>
> Key: YARN-10246
> URL: https://issues.apache.org/jira/browse/YARN-10246
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: federation, router
>Reporter: D M Murali Krishna Reddy
>Assignee: D M Murali Krishna Reddy
>Priority: Major
> Attachments: YARN-10246.001.patch
>
>
> Currently, we have a single parameter hadoop.zk.address for Router and 
> Resourcemanager, Due to this we need have FederationStateStore and 
> RMStateStore on the same Zookeeper instance. 
> With the above topology there can be a load on ZooKeeper, since all 
> subcluster RMs will write to single ZooKeeper.
> So, If we Introduce a new configuration such as hadoop.federation.zk.address 
> we can have FederationStateStore on a dedicated Zookeeper.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10247) Application priority queue ACLs are not respected

2020-05-03 Thread Brahma Reddy Battula (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17098648#comment-17098648
 ] 

Brahma Reddy Battula commented on YARN-10247:
-

Pushed to branch-3.3.0 also as this was marked blocker for 3.3.0 release. 
Thanks all.

> Application priority queue ACLs are not respected
> -
>
> Key: YARN-10247
> URL: https://issues.apache.org/jira/browse/YARN-10247
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: capacity scheduler
>Reporter: Sunil G
>Assignee: Sunil G
>Priority: Blocker
> Fix For: 3.3.0, 3.4.0
>
> Attachments: YARN-10247.0001.patch
>
>
> This is a regression from queue path jira.
> App priority acls are not working correctly. 
> {code:java}
> yarn.scheduler.capacity.root.B.acl_application_max_priority=[user=john 
> group=users max_priority=4]
> {code}
> max_priority enforcement is not working. For user john, maximum supported 
> priority is 4. However I can submit like priority 6 for this user.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10247) Application priority queue ACLs are not respected

2020-04-28 Thread Brahma Reddy Battula (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17095107#comment-17095107
 ] 

Brahma Reddy Battula commented on YARN-10247:
-

[~prabhujoseph] and [~shuzirra] could you help to review this.. Looks 
straightforwad change.

> Application priority queue ACLs are not respected
> -
>
> Key: YARN-10247
> URL: https://issues.apache.org/jira/browse/YARN-10247
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: capacity scheduler
>Reporter: Sunil G
>Assignee: Sunil G
>Priority: Blocker
> Attachments: YARN-10247.0001.patch
>
>
> This is a regression from queue path jira.
> App priority acls are not working correctly. 
> {code:java}
> yarn.scheduler.capacity.root.B.acl_application_max_priority=[user=john 
> group=users max_priority=4]
> {code}
> max_priority enforcement is not working. For user john, maximum supported 
> priority is 4. However I can submit like priority 6 for this user.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10194) YARN RMWebServices /scheduler-conf/validate leaks ZK Connections

2020-04-24 Thread Brahma Reddy Battula (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091796#comment-17091796
 ] 

Brahma Reddy Battula commented on YARN-10194:
-

[~sunilg] can you help to review this jira..?

> YARN RMWebServices /scheduler-conf/validate leaks ZK Connections
> 
>
> Key: YARN-10194
> URL: https://issues.apache.org/jira/browse/YARN-10194
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.3.0
>Reporter: Akhil PB
>Assignee: Prabhu Joseph
>Priority: Blocker
> Attachments: YARN-10194-001.patch, YARN-10194-002.patch, 
> YARN-10194-003.patch, YARN-10194-004.patch, YARN-10194-005.patch
>
>
> YARN RMWebServices /scheduler-conf/validate leaks ZK Connections. Validation 
> API creates a new CapacityScheduler and missed to close after the validation. 
> Every CapacityScheduler#init opens MutableCSConfigurationProvider which opens 
> ZKConfigurationStore and creates a ZK Connection. 
> *ZK LOGS*
> {code}
> -03-12 16:45:51,881 WARN org.apache.zookeeper.server.NIOServerCnxnFactory: [2 
> times] Error accepting new connection: Too many connections from 
> /172.27.99.64 - max is 60
> 2020-03-12 16:45:52,449 WARN 
> org.apache.zookeeper.server.NIOServerCnxnFactory: Error accepting new 
> connection: Too many connections from /172.27.99.64 - max is 60
> 2020-03-12 16:45:52,710 WARN 
> org.apache.zookeeper.server.NIOServerCnxnFactory: Error accepting new 
> connection: Too many connections from /172.27.99.64 - max is 60
> 2020-03-12 16:45:52,876 WARN 
> org.apache.zookeeper.server.NIOServerCnxnFactory: [4 times] Error accepting 
> new connection: Too many connections from /172.27.99.64 - max is 60
> 2020-03-12 16:45:53,068 WARN 
> org.apache.zookeeper.server.NIOServerCnxnFactory: [2 times] Error accepting 
> new connection: Too many connections from /172.27.99.64 - max is 60
> 2020-03-12 16:45:53,391 WARN 
> org.apache.zookeeper.server.NIOServerCnxnFactory: [2 times] Error accepting 
> new connection: Too many connections from /172.27.99.64 - max is 60
> 2020-03-12 16:45:54,008 WARN 
> org.apache.zookeeper.server.NIOServerCnxnFactory: Error accepting new 
> connection: Too many connections from /172.27.99.64 - max is 60
> 2020-03-12 16:45:54,287 WARN 
> org.apache.zookeeper.server.NIOServerCnxnFactory: Error accepting new 
> connection: Too many connections from /172.27.99.64 - max is 60
> 2020-03-12 16:45:54,483 WARN 
> org.apache.zookeeper.server.NIOServerCnxnFactory: [4 times] Error accepting 
> new connection: Too many connections from /172.27.99.64 - max is 60
> {code}
> And there is an another bug in ZKConfigurationStore which has not handled 
> close() of ZKCuratorManager.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9848) revert YARN-4946

2020-04-24 Thread Brahma Reddy Battula (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091794#comment-17091794
 ] 

Brahma Reddy Battula edited comment on YARN-9848 at 4/24/20, 5:48 PM:
--

[~Steven Rand], can you raise the seperate Jira for branch-3.2 ..? and resolve 
this issue..?


was (Author: brahmareddy):
[~Steven Rand], can you raise the seperate Jira for this..?

> revert YARN-4946
> 
>
> Key: YARN-9848
> URL: https://issues.apache.org/jira/browse/YARN-9848
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation, resourcemanager
>Reporter: Steven Rand
>Assignee: Steven Rand
>Priority: Blocker
> Attachments: YARN-9848-01.patch, YARN-9848.002.patch, 
> YARN-9848.003.patch
>
>
> In YARN-4946, we've been discussing a revert due to the potential for keeping 
> more applications in the state store than desired, and the potential to 
> greatly increase RM recovery times.
>  
> I'm in favor of reverting the patch, but other ideas along the lines of 
> YARN-9571 would work as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9848) revert YARN-4946

2020-04-24 Thread Brahma Reddy Battula (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091794#comment-17091794
 ] 

Brahma Reddy Battula commented on YARN-9848:


[~Steven Rand], can you raise the seperate Jira for this..?

> revert YARN-4946
> 
>
> Key: YARN-9848
> URL: https://issues.apache.org/jira/browse/YARN-9848
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation, resourcemanager
>Reporter: Steven Rand
>Assignee: Steven Rand
>Priority: Blocker
> Attachments: YARN-9848-01.patch, YARN-9848.002.patch, 
> YARN-9848.003.patch
>
>
> In YARN-4946, we've been discussing a revert due to the potential for keeping 
> more applications in the state store than desired, and the potential to 
> greatly increase RM recovery times.
>  
> I'm in favor of reverting the patch, but other ideas along the lines of 
> YARN-9571 would work as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9898) Dependency netty-all-4.1.27.Final doesn't support ARM platform

2020-04-20 Thread Brahma Reddy Battula (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17088004#comment-17088004
 ] 

Brahma Reddy Battula commented on YARN-9898:


Sure,I will look into this..

> Dependency netty-all-4.1.27.Final doesn't support ARM platform
> --
>
> Key: YARN-9898
> URL: https://issues.apache.org/jira/browse/YARN-9898
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: liusheng
>Assignee: liusheng
>Priority: Major
>
> Hadoop dependent the Netty package, but the *netty-all-4.1.27.Final* of 
> io.netty maven repo, cannot support ARM platform. 
> When run the test *TestCsiClient.testIdentityService* on ARM server, it will 
> raise error like following:
> {code:java}
> Caused by: java.io.FileNotFoundException: 
> META-INF/native/libnetty_transport_native_epoll_aarch_64.so
> at 
> io.netty.util.internal.NativeLibraryLoader.load(NativeLibraryLoader.java:161)
> ... 45 more
> Suppressed: java.lang.UnsatisfiedLinkError: no 
> netty_transport_native_epoll_aarch_64 in java.library.path
> at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1867)
> at java.lang.Runtime.loadLibrary0(Runtime.java:870)
> at java.lang.System.loadLibrary(System.java:1122)
> at 
> io.netty.util.internal.NativeLibraryUtil.loadLibrary(NativeLibraryUtil.java:38)
> at 
> io.netty.util.internal.NativeLibraryLoader.loadLibrary(NativeLibraryLoader.java:243)
> at 
> io.netty.util.internal.NativeLibraryLoader.load(NativeLibraryLoader.java:124)
> ... 45 more
> Suppressed: java.lang.UnsatisfiedLinkError: no 
> netty_transport_native_epoll_aarch_64 in java.library.path
> at 
> java.lang.ClassLoader.loadLibrary(ClassLoader.java:1867)
> at java.lang.Runtime.loadLibrary0(Runtime.java:870)
> at java.lang.System.loadLibrary(System.java:1122)
> at 
> io.netty.util.internal.NativeLibraryUtil.loadLibrary(NativeLibraryUtil.java:38)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> io.netty.util.internal.NativeLibraryLoader$1.run(NativeLibraryLoader.java:263)
> at java.security.AccessController.doPrivileged(Native 
> Method)
> at 
> io.netty.util.internal.NativeLibraryLoader.loadLibraryByHelper(NativeLibraryLoader.java:255)
> at 
> io.netty.util.internal.NativeLibraryLoader.loadLibrary(NativeLibraryLoader.java:233)
> ... 46 more
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9848) revert YARN-4946

2020-04-14 Thread Brahma Reddy Battula (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17083431#comment-17083431
 ] 

Brahma Reddy Battula commented on YARN-9848:


[~Steven Rand] can you please upload the patch,As this is blocker to 3.3.0 
release.

> revert YARN-4946
> 
>
> Key: YARN-9848
> URL: https://issues.apache.org/jira/browse/YARN-9848
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation, resourcemanager
>Reporter: Steven Rand
>Priority: Major
> Attachments: YARN-9848-01.patch
>
>
> In YARN-4946, we've been discussing a revert due to the potential for keeping 
> more applications in the state store than desired, and the potential to 
> greatly increase RM recovery times.
>  
> I'm in favor of reverting the patch, but other ideas along the lines of 
> YARN-9571 would work as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9606) Set sslfactory for AuthenticatedURL() while creating LogsCLI#webServiceClient

2020-04-11 Thread Brahma Reddy Battula (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17081525#comment-17081525
 ] 

Brahma Reddy Battula commented on YARN-9606:


[~BilwaST] thanks for uploading the patch. Latest patch lgtm. [~prabhujoseph] 
do you any comments on this  as reviewed before YARN-10120.?

> Set sslfactory for AuthenticatedURL() while creating LogsCLI#webServiceClient 
> --
>
> Key: YARN-9606
> URL: https://issues.apache.org/jira/browse/YARN-9606
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Major
> Attachments: YARN-9606-001.patch, YARN-9606-002.patch, 
> YARN-9606.003.patch
>
>
> Yarn logs fails for running containers    
>   
> 
>   {quote}                                                                     
>                           
>   
>
>  Unable to fetch log files list
>  Exception in thread "main" java.io.IOException: 
> com.sun.jersey.api.client.ClientHandlerException: 
> javax.net.ssl.SSLHandshakeException: Error while authenticating with 
> endpoint: 
> [https://vm2:65321/ws/v1/node/containers/container_e05_1559802125016_0001_01_08/logs]
>  at 
> org.apache.hadoop.yarn.client.cli.LogsCLI.getContainerLogFiles(LogsCLI.java:543)
>  at 
> org.apache.hadoop.yarn.client.cli.LogsCLI.getMatchedContainerLogFiles(LogsCLI.java:1338)
>  at 
> org.apache.hadoop.yarn.client.cli.LogsCLI.getMatchedOptionForRunningApp(LogsCLI.java:1514)
>  at 
> org.apache.hadoop.yarn.client.cli.LogsCLI.fetchContainerLogs(LogsCLI.java:1052)
>  at org.apache.hadoop.yarn.client.cli.LogsCLI.runCommand(LogsCLI.java:367)
>  at org.apache.hadoop.yarn.client.cli.LogsCLI.run(LogsCLI.java:152)
>  at org.apache.hadoop.yarn.client.cli.LogsCLI.main(LogsCLI.java:399)
>  {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9998) Code cleanup in LeveldbConfigurationStore

2020-04-11 Thread Brahma Reddy Battula (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17080796#comment-17080796
 ] 

Brahma Reddy Battula edited comment on YARN-9998 at 4/11/20, 6:09 PM:
--

[~snemeth] could you please review the branch-3.2 patch and close the jira..? I 
am planning for 3.3.0 release shortly and this Jira shouldn't open as this 
merged to 3.3.0 also.


was (Author: brahmareddy):
Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a 
blocker.

> Code cleanup in LeveldbConfigurationStore
> -
>
> Key: YARN-9998
> URL: https://issues.apache.org/jira/browse/YARN-9998
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Benjamin Teke
>Priority: Minor
> Fix For: 3.3.0
>
> Attachments: YARN-9998.001.patch, YARN-9998.002.patch
>
>
> Many things can be improved:
> * Field compactionTimer could be a local variable
> * Field versiondb should be camelcase
> * initDatabase is a very long method: Initialize db / versionDb should be in 
> separate methods, split this method into smaller chunks
> * Remove TODOs
> * Remove duplicated code block in 
> LeveldbConfigurationStore.CompactionTimerTask
> * Any other cleanup



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-10002) Code cleanup and improvements in ConfigurationStoreBaseTest

2020-04-11 Thread Brahma Reddy Battula (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17080809#comment-17080809
 ] 

Brahma Reddy Battula edited comment on YARN-10002 at 4/11/20, 6:08 PM:
---

[~snemeth] could you please review the branch-3.2 patch and close the jira..? I 
am planning for 3.3.0 release shortly and this Jira shouldn't open as this 
merged to 3.3.0 also.


was (Author: brahmareddy):
Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a 
blocker.

> Code cleanup and improvements in ConfigurationStoreBaseTest
> ---
>
> Key: YARN-10002
> URL: https://issues.apache.org/jira/browse/YARN-10002
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Benjamin Teke
>Priority: Minor
> Fix For: 3.3.0
>
> Attachments: YARN-10002.001.patch, YARN-10002.002.patch, 
> YARN-10002.003.patch, YARN-10002.004.patch, YARN-10002.005.patch, 
> YARN-10002.006.patch, YARN-10002.branch-3.2.001.patch
>
>
> * Some protected fields could be package-private
> * Could add a helper method that prepares a simple LogMutation with 1, 2 or 3 
> updates (Key + value) as this pattern is used extensively in subclasses



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9354) Resources should be created with ResourceTypesTestHelper instead of TestUtils

2020-04-11 Thread Brahma Reddy Battula (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17080800#comment-17080800
 ] 

Brahma Reddy Battula edited comment on YARN-9354 at 4/11/20, 6:07 PM:
--

[~snemeth] could you please review the branch-3.2 patch and close the jira..? I 
am planning for 3.3.0 release shortly and this Jira shouldn't open as this 
merged to 3.3.0 also.


was (Author: brahmareddy):
Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a 
blocker.

> Resources should be created with ResourceTypesTestHelper instead of TestUtils
> -
>
> Key: YARN-9354
> URL: https://issues.apache.org/jira/browse/YARN-9354
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Andras Gyori
>Priority: Trivial
>  Labels: newbie, newbie++
> Fix For: 3.3.0
>
> Attachments: YARN-9354.001.patch, YARN-9354.002.patch, 
> YARN-9354.003.patch, YARN-9354.004.patch, YARN-9354.branch-3.2.001.patch, 
> YARN-9354.branch-3.2.002.patch, YARN-9354.branch-3.2.003.patch
>
>
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestUtils#createResource
>  has not identical, but very similar implementation to 
> org.apache.hadoop.yarn.resourcetypes.ResourceTypesTestHelper#newResource. 
> Since these 2 methods are doing the same essentially and 
> ResourceTypesTestHelper is newer and used more, TestUtils#createResource 
> should be replaced with ResourceTypesTestHelper#newResource with all 
> occurrence.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9490) applicationresourceusagereport return wrong number of reserved containers

2020-04-10 Thread Brahma Reddy Battula (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-9490:
---
Fix Version/s: (was: 3.3.0)

> applicationresourceusagereport return wrong number of reserved containers
> -
>
> Key: YARN-9490
> URL: https://issues.apache.org/jira/browse/YARN-9490
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.3.0
>Reporter: yanbing zhang
>Assignee: yanbing zhang
>Priority: Minor
> Attachments: YARN-9490.002.patch, YARN-9490.patch, 
> YARN-9490.patch1.patch
>
>
> when getting an ApplicationResourceUsageReport instance from the class of 
> SchedulerApplicationAttempt, I found the input constructor 
> parameter(reservedContainers.size()) is wrong.  because the type of this 
> variable is Map>, so 
> "reservedContainer.size()" is not the number of containers, but the number of 
> SchedulerRequestKey.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9997) Code cleanup in ZKConfigurationStore

2020-04-10 Thread Brahma Reddy Battula (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17080823#comment-17080823
 ] 

Brahma Reddy Battula edited comment on YARN-9997 at 4/10/20, 6:36 PM:
--

[~snemeth] this Jira is committed to branch-3.3 .I am planning release 3.3.0  
release this week, can we close this Jira and raise another Jira for branch-3.2 
tracking..


was (Author: brahmareddy):
[~snemeth] this Jira is committed to branch-3.3 .I am planning release 3.3.0  
release this week, can we close this Jira and raise another Jira for branch-3.2 
tracking..?
 * [|https://issues.apache.org/jira/secure/AddComment!default.jspa?id=13271213]

> Code cleanup in ZKConfigurationStore
> 
>
> Key: YARN-9997
> URL: https://issues.apache.org/jira/browse/YARN-9997
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Andras Gyori
>Priority: Minor
> Fix For: 3.3.0
>
> Attachments: YARN-9997.001.patch, YARN-9997.002.patch, 
> YARN-9997.003.patch, YARN-9997.004.patch, YARN-9997.005.patch, 
> YARN-9997.006.patch
>
>
> Many thins can be improved:
> * znodeParentPath could be a local variable
> * zkManager could be private, VisibleForTesting annotation is not needed 
> anymore
> * Do something with unchecked casts
> * zkManager.safeSetData calls are almost having the same set of parameters: 
> Simplify this
> * Extract zkManager calls to their own methods: They are repeated
> * Remove TODOs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9997) Code cleanup in ZKConfigurationStore

2020-04-10 Thread Brahma Reddy Battula (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17080823#comment-17080823
 ] 

Brahma Reddy Battula edited comment on YARN-9997 at 4/10/20, 6:35 PM:
--

[~snemeth] this Jira is committed to branch-3.3 .I am planning release 3.3.0  
release this week, can we close this Jira and raise another Jira for branch-3.2 
tracking..?
 * [|https://issues.apache.org/jira/secure/AddComment!default.jspa?id=13271213]


was (Author: brahmareddy):
Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a 
blocker.

> Code cleanup in ZKConfigurationStore
> 
>
> Key: YARN-9997
> URL: https://issues.apache.org/jira/browse/YARN-9997
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Andras Gyori
>Priority: Minor
> Fix For: 3.3.0
>
> Attachments: YARN-9997.001.patch, YARN-9997.002.patch, 
> YARN-9997.003.patch, YARN-9997.004.patch, YARN-9997.005.patch, 
> YARN-9997.006.patch
>
>
> Many thins can be improved:
> * znodeParentPath could be a local variable
> * zkManager could be private, VisibleForTesting annotation is not needed 
> anymore
> * Do something with unchecked casts
> * zkManager.safeSetData calls are almost having the same set of parameters: 
> Simplify this
> * Extract zkManager calls to their own methods: They are repeated
> * Remove TODOs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-10001) Add explanation of unimplemented methods in InMemoryConfigurationStore

2020-04-10 Thread Brahma Reddy Battula (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17080813#comment-17080813
 ] 

Brahma Reddy Battula edited comment on YARN-10001 at 4/10/20, 6:34 PM:
---

[~snemeth] this Jira is committed to branch-3.3 .I am planning release 3.3.0  
release this week, can we close this Jira and raise another Jira for branch-3.2 
tracking..?


was (Author: brahmareddy):
Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a 
blocker.

> Add explanation of unimplemented methods in InMemoryConfigurationStore
> --
>
> Key: YARN-10001
> URL: https://issues.apache.org/jira/browse/YARN-10001
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Siddharth Ahuja
>Priority: Major
> Fix For: 3.3.0, 3.4.0
>
> Attachments: YARN-10001-branch-3.2.003.patch, YARN-10001.001.patch, 
> YARN-10001.002.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10179) Queue mapping based on group id passed through application tag

2020-04-10 Thread Brahma Reddy Battula (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-10179:

Fix Version/s: (was: 3.3.0)

> Queue mapping based on group id passed through application tag
> --
>
> Key: YARN-10179
> URL: https://issues.apache.org/jira/browse/YARN-10179
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Szilard Nemeth
>Assignee: Andras Gyori
>Priority: Major
> Attachments: YARN-10179.001.patch, YARN-10179.002.patch
>
>
> There are situations when the real submitting user differs from the user what 
> arrives to YARN. For example in case of a Hive application when Hive 
> impersonation is turned off, the hive queries will run as Hive user and the 
> mapping is done based on the user's group. 
> Unfortunately in this case YARN doesn't have any information about the real 
> user and there are cases when the customer may want to map these applications 
> to the real submitting user's queue (based on the group id) instead of the 
> Hive queue.
> For these cases, if they would pass the group id (or name) in the application 
> tag we may read it and use it during the queue mapping, if that user has 
> rights to run on the real user's queue.  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8192) Introduce container readiness check type

2020-04-10 Thread Brahma Reddy Battula (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-8192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-8192:
---
Target Version/s: 3.4.0  (was: 3.3.0)

Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a 
blocker.

> Introduce container readiness check type
> 
>
> Key: YARN-8192
> URL: https://issues.apache.org/jira/browse/YARN-8192
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Billie Rinaldi
>Assignee: Billie Rinaldi
>Priority: Major
> Attachments: YARN-8192.1.patch, YARN-8192.2.patch
>
>
> In some cases, the AM may not be able to perform a readiness check for a 
> container. For example, if a docker container is using a custom network type, 
> its IP may not be reachable from the AM. In this case, the AM could request a 
> new container to perform a readiness command, and use the exit status of the 
> container to determine whether the readiness check succeeded or not.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9671) Improve Locality Scheduling when cluster is busy

2020-04-10 Thread Brahma Reddy Battula (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-9671:
---
Target Version/s: 3.4.0  (was: 3.3.0, 2.10.1)

Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a 
blocker.

> Improve Locality Scheduling when cluster is busy
> 
>
> Key: YARN-9671
> URL: https://issues.apache.org/jira/browse/YARN-9671
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.10.0, 3.3.0
>Reporter: Muhammad Samir Khan
>Assignee: Muhammad Samir Khan
>Priority: Major
> Attachments: YARN-9671.001.patch
>
>
> When a cluster is very busy, scheduling opportunities are few and far 
> between. Scheduling opportunities are how an application knows when to give 
> up looking for decent locality.
> It doesn't make sense to work hard waiting for locality when the odds of it 
> coming are very small and it may actually take a very long time to actually 
> give up.
> This causes the priority of queues to be violated which is the last thing we 
> want to do when the cluster is full.
> Add a mode to disable skipping locality when cluster is busy.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-2748) Upload logs in the sub-folders under the local log dir when aggregating logs

2020-04-10 Thread Brahma Reddy Battula (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-2748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-2748:
---
Target Version/s: 3.4.0  (was: 3.3.0)

Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a 
blocker.

> Upload logs in the sub-folders under the local log dir when aggregating logs
> 
>
> Key: YARN-2748
> URL: https://issues.apache.org/jira/browse/YARN-2748
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: log-aggregation
>Affects Versions: 2.6.0
>Reporter: Zhijie Shen
>Assignee: Varun Saxena
>Priority: Major
> Attachments: YARN-2748.001.patch, YARN-2748.002.patch, 
> YARN-2748.03.patch, YARN-2748.04.patch
>
>
> YARN-2734 has a temporal fix to skip sub folders to avoid exception. Ideally, 
> if the app is creating a sub folder and putting its rolling logs there, we 
> need to upload these logs as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10032) Implement regex querying of logs

2020-04-10 Thread Brahma Reddy Battula (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-10032:

Fix Version/s: (was: 3.3.0)

> Implement regex querying of logs
> 
>
> Key: YARN-10032
> URL: https://issues.apache.org/jira/browse/YARN-10032
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.2.1
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
>
> After YARN-10031, we have query parameters to the log servlet's GET endpoint.
> To demonstrate the new capabilities of the log servlet and how easy it will 
> be to add a functionality to all log servlets at the same time: let's add the 
> ability to search in the aggregated logs with a given regex.
> A conceptual use case:
> User run several MR jobs daily, but some of them fail to localize a 
> particular resource at first. We want to search in the logs of these Yarn 
> applications, and extract some data from them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7461) DominantResourceCalculator#ratio calculation problem when right resource contains zero value

2020-04-10 Thread Brahma Reddy Battula (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-7461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-7461:
---
Target Version/s: 3.4.0  (was: 3.3.0)

Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a 
blocker.

> DominantResourceCalculator#ratio calculation problem when right resource 
> contains zero value
> 
>
> Key: YARN-7461
> URL: https://issues.apache.org/jira/browse/YARN-7461
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha4
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Minor
> Attachments: YARN-7461.001.patch, YARN-7461.002.patch, 
> YARN-7461.003.patch, YARN-7461.004.patch
>
>
> Currently DominantResourceCalculator#ratio may return wrong result when right 
> resource contains zero value. For example, there are three resource types 
> such as , leftResource=<5, 5, 0> and 
> rightResource=<10, 10, 0>, we expect the result of 
> DominantResourceCalculator#ratio(leftResource, rightResource) is 0.5 but 
> currently is NaN.
> There should be a verification before divide calculation to ensure that 
> dividend is not zero.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10032) Implement regex querying of logs

2020-04-10 Thread Brahma Reddy Battula (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-10032:


Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a 
blocker.

> Implement regex querying of logs
> 
>
> Key: YARN-10032
> URL: https://issues.apache.org/jira/browse/YARN-10032
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.2.1
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
> Fix For: 3.3.0
>
>
> After YARN-10031, we have query parameters to the log servlet's GET endpoint.
> To demonstrate the new capabilities of the log servlet and how easy it will 
> be to add a functionality to all log servlets at the same time: let's add the 
> ability to search in the aggregated logs with a given regex.
> A conceptual use case:
> User run several MR jobs daily, but some of them fail to localize a 
> particular resource at first. We want to search in the logs of these Yarn 
> applications, and extract some data from them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8928) TestRMAdminService is failing

2020-04-10 Thread Brahma Reddy Battula (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-8928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-8928:
---
Target Version/s: 3.4.0  (was: 3.3.0)

Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a 
blocker.

> TestRMAdminService is failing
> -
>
> Key: YARN-8928
> URL: https://issues.apache.org/jira/browse/YARN-8928
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Jason Darrell Lowe
>Assignee: David Mollitor
>Priority: Major
> Attachments: YARN-8928.1.patch, YARN-8928.2.patch, YARN-8928.3.patch
>
>
> After HADOOP-15836 TestRMAdminService has started failing consistently.  
> Sample stacktraces to follow.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-2031) YARN Proxy model doesn't support REST APIs in AMs

2020-04-10 Thread Brahma Reddy Battula (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-2031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-2031:
---
Target Version/s: 3.4.0  (was: 3.3.0)

Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a 
blocker.

> YARN Proxy model doesn't support REST APIs in AMs
> -
>
> Key: YARN-2031
> URL: https://issues.apache.org/jira/browse/YARN-2031
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Attachments: YARN-2031-002.patch, YARN-2031-003.patch, 
> YARN-2031-004.patch, YARN-2031-005.patch, YARN-2031.patch.001
>
>
> AMs can't support REST APIs because
> # the AM filter redirects all requests to the proxy with a 302 response (not 
> 307)
> # the proxy doesn't forward PUT/POST/DELETE verbs
> Either the AM filter needs to return 307 and the proxy to forward the verbs, 
> or Am filter should not filter a REST bit of the web site



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5464) Server-Side NM Graceful Decommissioning with RM HA

2020-04-10 Thread Brahma Reddy Battula (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-5464:
---
Target Version/s: 3.4.0  (was: 3.3.0)

Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a 
blocker.

> Server-Side NM Graceful Decommissioning with RM HA
> --
>
> Key: YARN-5464
> URL: https://issues.apache.org/jira/browse/YARN-5464
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful, yarn
>Reporter: Robert Kanter
>Assignee: Gergely Pollak
>Priority: Major
> Attachments: YARN-5464.001.patch, YARN-5464.002.patch, 
> YARN-5464.003.patch, YARN-5464.004.patch, YARN-5464.005.patch, 
> YARN-5464.006.patch, YARN-5464.wip.patch
>
>
> Make sure to remove the note added by YARN-7094 about RM HA failover not 
> working right.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-3232) Some application states are not necessarily exposed to users

2020-04-10 Thread Brahma Reddy Battula (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-3232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-3232:
---
Target Version/s: 3.4.0  (was: 3.3.0)

Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a 
blocker.

> Some application states are not necessarily exposed to users
> 
>
> Key: YARN-3232
> URL: https://issues.apache.org/jira/browse/YARN-3232
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.7.0
>Reporter: Jian He
>Assignee: Varun Saxena
>Priority: Major
> Attachments: YARN-3232.002.patch, YARN-3232.01.patch, 
> YARN-3232.02.patch, YARN-3232.v2.01.patch
>
>
> application NEW_SAVING and SUBMITTED states are not necessarily exposed to 
> users as they mostly internal to the system, transient and not user-facing. 
> We may deprecate these two states and remove them from the web UI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4435) Add RM Delegation Token DtFetcher Implementation for DtUtil

2020-04-10 Thread Brahma Reddy Battula (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-4435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-4435:
---
Target Version/s: 3.4.0  (was: 3.3.0)

Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a 
blocker.

> Add RM Delegation Token DtFetcher Implementation for DtUtil
> ---
>
> Key: YARN-4435
> URL: https://issues.apache.org/jira/browse/YARN-4435
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: client, security, yarn
>Affects Versions: 3.0.0-alpha2
>Reporter: Matthew Paduano
>Assignee: Matthew Paduano
>Priority: Major
>  Labels: oct16-medium
> Attachments: YARN-4435-003.patch, YARN-4435-003.patch, 
> YARN-4435.00.patch.txt, YARN-4435.01.patch, YARN-4435.02.patch, 
> proposed_solution
>
>
> Add a class to yarn project that implements the DtFetcher interface to return 
> a RM delegation token object.  
> I attached a proposed class implementation that does this, but it cannot be 
> added as a patch until the interface is merged in HADOOP-12563



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7829) Rebalance UI2 cluster overview page

2020-04-10 Thread Brahma Reddy Battula (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-7829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-7829:
---
Target Version/s: 3.4.0  (was: 3.3.0)

Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a 
blocker.

> Rebalance UI2 cluster overview page
> ---
>
> Key: YARN-7829
> URL: https://issues.apache.org/jira/browse/YARN-7829
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Affects Versions: 3.0.0
>Reporter: Eric Yang
>Assignee: Gergely Novák
>Priority: Major
> Attachments: YARN-7829.001.patch, YARN-7829.jpg, 
> ui2-cluster-overview.png
>
>
> The cluster overview page looks like a upside down triangle.  It would be 
> nice to rebalance the charts to ensure horizontal real estate are utilized 
> properly.  The screenshot attachment includes some suggestion for rebalance.  
> Node Manager status and cluster resource are closely related, it would be 
> nice to promote the chart to first row.  Application Status, and Resource 
> Availability are closely related.  It would be nice to promote Resource usage 
> to side by side with Application Status to fill up the horizontal real 
> estates.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10050) NodeManagerCGroupsMemory.md does not show up in the official documentation

2020-04-10 Thread Brahma Reddy Battula (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-10050:

Target Version/s: 3.4.0  (was: 3.3.0, 3.2.2)

Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a 
blocker.

> NodeManagerCGroupsMemory.md does not show up in the official documentation
> --
>
> Key: YARN-10050
> URL: https://issues.apache.org/jira/browse/YARN-10050
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation
>Reporter: Miklos Szegedi
>Assignee: Masatake Iwasaki
>Priority: Minor
> Attachments: YARN-10050.001.patch
>
>
> I looked at this doc:
> [https://github.com/apache/hadoop/blob/9636fe4114eed9035cdc80108a026c657cd196d9/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/NodeManagerCGroupsMemory.md]
> It does not show up here:
> [https://hadoop.apache.org/docs/stable/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8012) Support Unmanaged Container Cleanup

2020-04-10 Thread Brahma Reddy Battula (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-8012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-8012:
---
Target Version/s: 3.4.0  (was: 3.3.0)

Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a 
blocker.

> Support Unmanaged Container Cleanup
> ---
>
> Key: YARN-8012
> URL: https://issues.apache.org/jira/browse/YARN-8012
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>Priority: Major
> Attachments: YARN-8012 - Unmanaged Container Cleanup.pdf, 
> YARN-8012-branch-2.7.1.001.patch
>
>
> An *unmanaged container / leaked container* is a container which is no longer 
> managed by NM. Thus, it is cannot be managed / leaked by YARN, too.
> *There are many cases a YARN managed container can become unmanaged, such as:*
>  * NM service is disabled or removed on the node.
>  * NM is unable to start up again on the node, such as depended 
> configuration, or resources cannot be ready.
>  * NM local leveldb store is corrupted or lost, such as bad disk sectors.
>  * NM has bugs, such as wrongly mark live container as complete.
> Note, they are caused or things become worse if work-preserving NM restart 
> enabled, see YARN-1336
> *Bad impacts of unmanaged container, such as:*
>  # Resource cannot be managed for YARN on the node:
>  ** Cause YARN on the node resource leak
>  ** Cannot kill the container to release YARN resource on the node to free up 
> resource for other urgent computations on the node.
>  # Container and App killing is not eventually consistent for App user:
>  ** App which has bugs can still produce bad impacts to outside even if the 
> App is killed for a long time



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6167) RM option to delegate NM loss container action to AM

2020-04-10 Thread Brahma Reddy Battula (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-6167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-6167:
---
Target Version/s: 3.4.0  (was: 3.3.0)

Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a 
blocker.

> RM option to delegate NM loss container action to AM
> 
>
> Key: YARN-6167
> URL: https://issues.apache.org/jira/browse/YARN-6167
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Billie Rinaldi
>Assignee: Billie Rinaldi
>Priority: Major
> Attachments: YARN-6167.01.patch, YARN-6167.02.patch
>
>
> Currently, if the RM times out an NM, the scheduler will kill all containers 
> that were running on the NM. For some applications, in the event of a 
> temporary NM outage, it might be better to delegate to the AM the decision 
> whether to kill the containers and request new containers from the RM.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7844) Expose metrics for scheduler operation (allocate, schedulerEvent) to JMX

2020-04-10 Thread Brahma Reddy Battula (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-7844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-7844:
---
Target Version/s: 3.4.0  (was: 3.3.0)

Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a 
blocker.

> Expose metrics for scheduler operation (allocate, schedulerEvent) to JMX
> 
>
> Key: YARN-7844
> URL: https://issues.apache.org/jira/browse/YARN-7844
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Wei Yan
>Assignee: Wei Yan
>Priority: Major
> Attachments: YARN-7844.000.patch, YARN-7844.001.patch
>
>
> Currently FairScheduler's FSOpDurations records some scheduler operation 
> metrics: nodeUpdateCall, preemptCall, etc. We may need similar for 
> CapacityScheduler. Also, need to add more metrics there. This could help 
> monitor the RM scheduler performance, and get more insights whether scheduler 
> is under-pressure.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6492) Generate queue metrics for each partition

2020-04-10 Thread Brahma Reddy Battula (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-6492:
---
Target Version/s: 3.4.0  (was: 3.3.0)

Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a 
blocker.

> Generate queue metrics for each partition
> -
>
> Key: YARN-6492
> URL: https://issues.apache.org/jira/browse/YARN-6492
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Reporter: Jonathan Hung
>Assignee: Manikandan R
>Priority: Major
> Attachments: PartitionQueueMetrics_default_partition.txt, 
> PartitionQueueMetrics_x_partition.txt, PartitionQueueMetrics_y_partition.txt, 
> YARN-6492.001.patch, YARN-6492.002.patch, YARN-6492.003.patch, 
> YARN-6492.004.patch, YARN-6492.005.WIP.patch, YARN-6492.006.WIP.patch, 
> YARN-6492.007.WIP.patch, partition_metrics.txt
>
>
> We are interested in having queue metrics for all partitions. Right now each 
> queue has one QueueMetrics object which captures metrics either in default 
> partition or across all partitions. (After YARN-6467 it will be in default 
> partition)
> But having the partition metrics would be very useful.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7436) YARN UI: Upgrade to the latest em-table

2020-04-10 Thread Brahma Reddy Battula (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-7436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-7436:
---
Target Version/s: 3.4.0  (was: 3.3.0)

Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a 
blocker.

> YARN UI: Upgrade to the latest em-table
> ---
>
> Key: YARN-7436
> URL: https://issues.apache.org/jira/browse/YARN-7436
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Reporter: Sreenath Somarajapuram
>Assignee: Sreenath Somarajapuram
>Priority: Major
> Attachments: YARN-7436.1.patch
>
>
> Similar to TEZ-3842, upgrade the em-table version to take advantage of the 
> new features.
> - em-table have improved a lot in the past few months. SQL like advanced 
> searching capability and Faceted filters are the best among them.
> \cc [~skmvasu] [~sunilg]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9877) Intermittent TIME_OUT of LogAggregationReport

2020-04-10 Thread Brahma Reddy Battula (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-9877:
---
Target Version/s: 3.4.0  (was: 3.0.4, 3.3.0, 3.2.2, 3.1.4)

Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a 
blocker.

> Intermittent TIME_OUT of LogAggregationReport
> -
>
> Key: YARN-9877
> URL: https://issues.apache.org/jira/browse/YARN-9877
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation, resourcemanager, yarn
>Affects Versions: 3.0.3, 3.3.0, 3.2.1, 3.1.3
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
> Attachments: YARN-9877.001.patch
>
>
> I noticed some intermittent TIME_OUT in some downstream log-aggregation based 
> tests.
> Steps to reproduce:
> - Let's run a MR job
> {code}
> hadoop jar hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar sleep 
> -Dmapreduce.job.queuename=root.default -m 10 -r 10 -mt 5000 -rt 5000
> {code}
> - Suppose the AM is requesting more containers, but as soon as they're 
> allocated - the AM realizes it doesn't need them. The container's state 
> changes are: ALLOCATED -> ACQUIRED -> RELEASED. 
> Let's suppose these extra containers are allocated in a different node from 
> the other 21 (AM + 10 mapper + 10 reducer) containers' node.
> - All the containers finish successfully and the app is finished successfully 
> as well. Log aggregation status for the whole app seemingly stucks in RUNNING 
> state.
> - After a while the final log aggregation status for the app changes to 
> TIME_OUT.
> Root cause:
> - As unused containers are getting through the state transition in the RM's 
> internal representation, {{RMAppImpl$AppRunningOnNodeTransition}}'s 
> transition function is called. This calls the 
> {{RMAppLogAggregation$addReportIfNecessary}} which forcefully adds the 
> "NOT_START" LogAggregationStatus associated with this NodeId for the app, 
> even though it does not have any running container on it.
> - The node's LogAggregationStatus is never updated to "SUCCEEDED" by the 
> NodeManager because it does not have any running container on it (Note that 
> the AM immediately released them after acquisition). The LogAggregationStatus 
> remains NOT_START until time out is reached. After that point the RM 
> aggregates the LogAggregationReports for all the nodes, and though all the 
> containers have SUCCEEDED state, one particular node has NOT_START, so the 
> final log aggregation will be TIME_OUT.
> (I crawled the RM UI for the log aggregation statuses, and it was always 
> NOT_START for this particular node).
> This situation is highly unlikely, but has an estimated ~0.8% of failure rate 
> based on a year's 1500 run on an unstressed cluster.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9279) Remove the old hamlet package

2020-04-10 Thread Brahma Reddy Battula (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-9279:
---
Target Version/s: 3.4.0  (was: 3.3.0)

Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a 
blocker.

> Remove the old hamlet package
> -
>
> Key: YARN-9279
> URL: https://issues.apache.org/jira/browse/YARN-9279
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Akira Ajisaka
>Assignee: Akira Ajisaka
>Priority: Major
> Attachments: YARN-9279.01.patch
>
>
> The old hamlet package was deprecated in HADOOP-11875. Let's remove this to 
> improve the maintenability.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10179) Queue mapping based on group id passed through application tag

2020-04-10 Thread Brahma Reddy Battula (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-10179:

Target Version/s: 3.4.0

Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a 
blocker.

> Queue mapping based on group id passed through application tag
> --
>
> Key: YARN-10179
> URL: https://issues.apache.org/jira/browse/YARN-10179
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Szilard Nemeth
>Assignee: Andras Gyori
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-10179.001.patch, YARN-10179.002.patch
>
>
> There are situations when the real submitting user differs from the user what 
> arrives to YARN. For example in case of a Hive application when Hive 
> impersonation is turned off, the hive queries will run as Hive user and the 
> mapping is done based on the user's group. 
> Unfortunately in this case YARN doesn't have any information about the real 
> user and there are cases when the customer may want to map these applications 
> to the real submitting user's queue (based on the group id) instead of the 
> Hive queue.
> For these cases, if they would pass the group id (or name) in the application 
> tag we may read it and use it during the queue mapping, if that user has 
> rights to run on the real user's queue.  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-3514) Active directory usernames like domain\login cause YARN failures

2020-04-10 Thread Brahma Reddy Battula (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-3514:
---
Target Version/s: 3.4.0  (was: 3.3.0)

Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a 
blocker.

> Active directory usernames like domain\login cause YARN failures
> 
>
> Key: YARN-3514
> URL: https://issues.apache.org/jira/browse/YARN-3514
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.2.0
> Environment: CentOS6
>Reporter: john lilley
>Priority: Minor
>  Labels: oct16-easy
> Attachments: YARN-3514.001.patch, YARN-3514.002.patch
>
>
> We have a 2.2.0 (Cloudera 5.3) cluster running on CentOS6 that is 
> Kerberos-enabled and uses an external AD domain controller for the KDC.  We 
> are able to authenticate, browse HDFS, etc.  However, YARN fails during 
> localization because it seems to get confused by the presence of a \ 
> character in the local user name.
> Our AD authentication on the nodes goes through sssd and set configured to 
> map AD users onto the form domain\username.  For example, our test user has a 
> Kerberos principal of hadoopu...@domain.com and that maps onto a CentOS user 
> "domain\hadoopuser".  We have no problem validating that user with PAM, 
> logging in as that user, su-ing to that user, etc.
> However, when we attempt to run a YARN application master, the localization 
> step fails when setting up the local cache directory for the AM.  The error 
> that comes out of the RM logs:
> 2015-04-17 12:47:09 INFO net.redpoint.yarnapp.Client[0]: monitorApplication: 
> ApplicationReport: appId=1, state=FAILED, progress=0.0, finalStatus=FAILED, 
> diagnostics='Application application_1429295486450_0001 failed 1 times due to 
> AM Container for appattempt_1429295486450_0001_01 exited with  exitCode: 
> -1000 due to: Application application_1429295486450_0001 initialization 
> failed (exitCode=255) with output: main : command provided 0
> main : user is DOMAIN\hadoopuser
> main : requested yarn user is domain\hadoopuser
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Cannot create 
> directory: 
> /data/yarn/nm/usercache/domain%5Chadoopuser/appcache/application_1429295486450_0001/filecache/10
> at 
> org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:105)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.download(ContainerLocalizer.java:199)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.localizeFiles(ContainerLocalizer.java:241)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:169)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.main(ContainerLocalizer.java:347)
> .Failing this attempt.. Failing the application.'
> However, when we look on the node launching the AM, we see this:
> [root@rpb-cdh-kerb-2 ~]# cd /data/yarn/nm/usercache
> [root@rpb-cdh-kerb-2 usercache]# ls -l
> drwxr-s--- 4 DOMAIN\hadoopuser yarn 4096 Apr 17 12:10 domain\hadoopuser
> There appears to be different treatment of the \ character in different 
> places.  Something creates the directory as "domain\hadoopuser" but something 
> else later attempts to use it as "domain%5Chadoopuser".  I’m not sure where 
> or why the URL escapement converts the \ to %5C or why this is not consistent.
> I should also mention, for the sake of completeness, our auth_to_local rule 
> is set up to map u...@domain.com to domain\user:
> RULE:[1:$1@$0](^.*@DOMAIN\.COM$)s/^(.*)@DOMAIN\.COM$/domain\\$1/g



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-3625) RollingLevelDBTimelineStore Incorrectly Forbids Related Entity in Same Put

2020-04-10 Thread Brahma Reddy Battula (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-3625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-3625:
---
Target Version/s: 3.4.0  (was: 3.3.0)

Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a 
blocker.

> RollingLevelDBTimelineStore Incorrectly Forbids Related Entity in Same Put
> --
>
> Key: YARN-3625
> URL: https://issues.apache.org/jira/browse/YARN-3625
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Reporter: Jonathan Turner Eagles
>Assignee: Jonathan Turner Eagles
>Priority: Major
>  Labels: oct16-medium
> Attachments: YARN-3625.1.patch, YARN-3625.2.patch
>
>
> RollingLevelDBTimelineStore batches all entities in the same put to improve 
> performance. This causes an error when relating to an entity in the same put 
> however.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6409) RM does not blacklist node for AM launch failures

2020-04-10 Thread Brahma Reddy Battula (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-6409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-6409:
---
Target Version/s: 3.4.0  (was: 3.3.0)

Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a 
blocker.

> RM does not blacklist node for AM launch failures
> -
>
> Key: YARN-6409
> URL: https://issues.apache.org/jira/browse/YARN-6409
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.0.0-alpha2
>Reporter: Haibo Chen
>Assignee: Kanwaljeet Sachdev
>Priority: Major
> Attachments: YARN-6409.00.patch, YARN-6409.01.patch, 
> YARN-6409.02.patch, YARN-6409.03.patch
>
>
> Currently, node blacklisting upon AM failures only handles failures that 
> happen after AM container is launched (see 
> RMAppAttemptImpl.shouldCountTowardsNodeBlacklisting()).  However, AM launch 
> can also fail if the NM, where the AM container is allocated, goes 
> unresponsive.  Because it is not handled, scheduler may continue to allocate 
> AM containers on that same NM for the following app attempts. 
> {code}
> Application application_1478721503753_0870 failed 2 times due to Error 
> launching appattempt_1478721503753_0870_02. Got exception: 
> java.io.IOException: Failed on local exception: java.io.IOException: 
> java.net.SocketTimeoutException: 6 millis timeout while waiting for 
> channel to be ready for read. ch : java.nio.channels.SocketChannel[connected 
> local=/17.111.179.113:46702 remote=*.me.com/17.111.178.125:8041]; Host 
> Details : local host is: "*.me.com/17.111.179.113"; destination host is: 
> "*.me.com":8041; 
> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772) 
> at org.apache.hadoop.ipc.Client.call(Client.java:1475) 
> at org.apache.hadoop.ipc.Client.call(Client.java:1408) 
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
>  
> at com.sun.proxy.$Proxy86.startContainers(Unknown Source) 
> at 
> org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:96)
>  
> at sun.reflect.GeneratedMethodAccessor155.invoke(Unknown Source) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  
> at java.lang.reflect.Method.invoke(Method.java:497) 
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256)
>  
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
>  
> at com.sun.proxy.$Proxy87.startContainers(Unknown Source) 
> at 
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:120)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:256)
>  
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  
> at java.lang.Thread.run(Thread.java:745) 
> Caused by: java.io.IOException: java.net.SocketTimeoutException: 6 millis 
> timeout while waiting for channel to be ready for read. ch : 
> java.nio.channels.SocketChannel[connected local=/17.111.179.113:46702 
> remote=*.me.com/17.111.178.125:8041] 
> at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:687) 
> at java.security.AccessController.doPrivileged(Native Method) 
> at javax.security.auth.Subject.doAs(Subject.java:422) 
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
>  
> at 
> org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:650)
>  
> at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:738) 
> at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375) 
> at org.apache.hadoop.ipc.Client.getConnection(Client.java:1524) 
> at org.apache.hadoop.ipc.Client.call(Client.java:1447) 
> ... 15 more 
> Caused by: java.net.SocketTimeoutException: 6 millis timeout while 
> waiting for channel to be ready for read. ch : 
> java.nio.channels.SocketChannel[connected local=/17.111.179.113:46702 
> remote=*.me.com/17.111.178.125:8041] 
> at 
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164) 
> at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161) 
> at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131) 
> at java.io.FilterInputStream.read(FilterInputStream.java:133) 
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) 
> at java.io.BufferedInputStream.read(BufferedInputStream.java:265) 
> at java.io.DataInputStream.readInt(DataInputStream.java:387) 
> at 
> 

[jira] [Updated] (YARN-574) PrivateLocalizer does not support parallel resource download via ContainerLocalizer

2020-04-10 Thread Brahma Reddy Battula (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-574:
--
Target Version/s: 3.4.0  (was: 3.3.0)

Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a 
blocker.

> PrivateLocalizer does not support parallel resource download via 
> ContainerLocalizer
> ---
>
> Key: YARN-574
> URL: https://issues.apache.org/jira/browse/YARN-574
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.6.0, 2.8.0, 2.7.1
>Reporter: Omkar Vinit Joshi
>Assignee: Ajith S
>Priority: Major
> Attachments: YARN-574.03.patch, YARN-574.04.patch, YARN-574.05.patch, 
> YARN-574.1.patch, YARN-574.2.patch
>
>
> At present private resources will be downloaded in parallel only if multiple 
> containers request the same resource. However otherwise it will be serial. 
> The protocol between PrivateLocalizer and ContainerLocalizer supports 
> multiple downloads however it is not used and only one resource is sent for 
> downloading at a time.
> I think we can increase / assure parallelism (even for single container 
> requesting resource) for private/application resources by making multiple 
> downloads per ContainerLocalizer.
> Total Parallelism before
> = number of threads allotted for PublicLocalizer [public resource] + number 
> of containers[private and application resource]
> Total Parallelism after
> = number of threads allotted for PublicLocalizer [public resource] + number 
> of containers * max downloads per container [private and application resource]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8509) Total pending resource calculation in preemption should use user-limit factor instead of minimum-user-limit-percent

2020-04-10 Thread Brahma Reddy Battula (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-8509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-8509:
---
Target Version/s: 3.4.0  (was: 3.3.0)

Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a 
blocker.

> Total pending resource calculation in preemption should use user-limit factor 
> instead of minimum-user-limit-percent
> ---
>
> Key: YARN-8509
> URL: https://issues.apache.org/jira/browse/YARN-8509
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Zian Chen
>Assignee: Zian Chen
>Priority: Major
>  Labels: capacityscheduler
> Attachments: YARN-8509.001.patch, YARN-8509.002.patch, 
> YARN-8509.003.patch, YARN-8509.004.patch, YARN-8509.005.patch
>
>
> In LeafQueue#getTotalPendingResourcesConsideringUserLimit, we calculate total 
> pending resource based on user-limit percent and user-limit factor which will 
> cap pending resource for each user to the minimum of user-limit pending and 
> actual pending. This will prevent queue from taking more pending resource to 
> achieve queue balance after all queue satisfied with its ideal allocation.
>   
>  We need to change the logic to let queue pending can go beyond userlimit.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10219) YARN service placement constraints is broken

2020-04-10 Thread Brahma Reddy Battula (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-10219:

Target Version/s: 3.4.0  (was: 3.3.0)

Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a 
blocker.

> YARN service placement constraints is broken
> 
>
> Key: YARN-10219
> URL: https://issues.apache.org/jira/browse/YARN-10219
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.0, 3.2.0, 3.1.1, 3.1.2, 3.3.0, 3.2.1, 3.1.3
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Attachments: YARN-10219.001.patch, YARN-10219.002.patch, 
> YARN-10219.003.patch
>
>
> YARN service placement constraint does not work with node label nor node 
> attributes. Example of placement constraints: 
> {code} 
>   "placement_policy": {
> "constraints": [
>   {
> "type": "AFFINITY",
> "scope": "NODE",
> "node_attributes": {
>   "label":["genfile"]
> },
> "target_tags": [
>   "ping"
> ] 
>   }
> ]
>   },
> {code}
> Node attribute added: 
> {code} ./bin/yarn nodeattributes -add "host-3.example.com:label=genfile" 
> {code} 
> Scheduling activities shows: 
> {code}  Node does not match partition or placement constraints, 
> unsatisfied PC expression="in,node,ping", target-type=ALLOCATION_TAG 
> 
>  1
>  host-3.example.com:45454{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4988) Limit filter in ApplicationBaseProtocol#getApplications should return latest applications

2020-04-10 Thread Brahma Reddy Battula (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-4988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-4988:
---
Target Version/s: 3.4.0  (was: 3.3.0)

Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a 
blocker.

> Limit filter in ApplicationBaseProtocol#getApplications should return latest 
> applications
> -
>
> Key: YARN-4988
> URL: https://issues.apache.org/jira/browse/YARN-4988
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Major
>  Labels: oct16-medium
> Attachments: YARN-4988-wip.patch
>
>
> When ever limit filter is used to get application report using 
> ApplicationBaseProtocol#getApplications, the applications retrieved are not 
> the latest. The retrieved applications are random based on the hashcode. 
> The reason for above problem is RM maintains the apps in MAP where in 
> insertion of application id is based on the hashcode. So if there are 10 
> applications from app-1 to app-10 and then limit is 5, then supposed to 
> expect that applications from app-6 to app-10 should be retrieved. But now 
> some first 5 apps in the MAP are retrieved. So applications retrieved are 
> random 5!!
> I think limit should retrieve latest applications only.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9256) Make ATSv2 compilation default with hbase.profile=2.0

2020-04-10 Thread Brahma Reddy Battula (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-9256:
---
Target Version/s: 3.4.0  (was: 3.3.0)

Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a 
blocker.

> Make ATSv2 compilation default with hbase.profile=2.0
> -
>
> Key: YARN-9256
> URL: https://issues.apache.org/jira/browse/YARN-9256
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Rohith Sharma K S
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9256.01.patch, YARN-9256.02.patch, 
> YARN-9256.03.patch
>
>
> By default Hadoop compiles with hbase.profile one which corresponds to 
> hbase.version=1.4 for ATSv2. Change compilation to hbase.profile=2.0 by 
> default in trunk. 
> This JIRA is to discuss for any concerns. 
> cc:/ [~vrushalic]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9997) Code cleanup in ZKConfigurationStore

2020-04-10 Thread Brahma Reddy Battula (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-9997:
---
Target Version/s: 3.4.0

Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a 
blocker.

> Code cleanup in ZKConfigurationStore
> 
>
> Key: YARN-9997
> URL: https://issues.apache.org/jira/browse/YARN-9997
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Andras Gyori
>Priority: Minor
> Fix For: 3.3.0
>
> Attachments: YARN-9997.001.patch, YARN-9997.002.patch, 
> YARN-9997.003.patch, YARN-9997.004.patch, YARN-9997.005.patch, 
> YARN-9997.006.patch
>
>
> Many thins can be improved:
> * znodeParentPath could be a local variable
> * zkManager could be private, VisibleForTesting annotation is not needed 
> anymore
> * Do something with unchecked casts
> * zkManager.safeSetData calls are almost having the same set of parameters: 
> Simplify this
> * Extract zkManager calls to their own methods: They are repeated
> * Remove TODOs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10002) Code cleanup and improvements in ConfigurationStoreBaseTest

2020-04-10 Thread Brahma Reddy Battula (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-10002:

Target Version/s: 3.4.0

Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a 
blocker.

> Code cleanup and improvements in ConfigurationStoreBaseTest
> ---
>
> Key: YARN-10002
> URL: https://issues.apache.org/jira/browse/YARN-10002
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Benjamin Teke
>Priority: Minor
> Fix For: 3.3.0
>
> Attachments: YARN-10002.001.patch, YARN-10002.002.patch, 
> YARN-10002.003.patch, YARN-10002.004.patch, YARN-10002.005.patch, 
> YARN-10002.006.patch, YARN-10002.branch-3.2.001.patch
>
>
> * Some protected fields could be package-private
> * Could add a helper method that prepares a simple LogMutation with 1, 2 or 3 
> updates (Key + value) as this pattern is used extensively in subclasses



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8258) YARN webappcontext for UI2 should inherit all filters from default context

2020-04-10 Thread Brahma Reddy Battula (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-8258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-8258:
---
Target Version/s: 3.4.0  (was: 3.3.0)

Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a 
blocker.

> YARN webappcontext for UI2 should inherit all filters from default context
> --
>
> Key: YARN-8258
> URL: https://issues.apache.org/jira/browse/YARN-8258
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Reporter: Sumana Sathish
>Assignee: Sunil G
>Priority: Major
> Attachments: Screen Shot 2018-06-26 at 5.54.35 PM.png, 
> YARN-8258.001.patch, YARN-8258.002.patch, YARN-8258.003.patch, 
> YARN-8258.004.patch, YARN-8258.005.patch, YARN-8258.006.patch, 
> YARN-8258.007.patch, YARN-8258.008.patch, YARN-8258.009.patch
>
>
> Thanks [~ssath...@hortonworks.com] for finding this.
> Ideally all filters from default context has to be inherited to UI2 context 
> as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5995) Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition performance

2020-04-10 Thread Brahma Reddy Battula (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-5995:
---
Target Version/s: 3.4.0  (was: 3.3.0)

Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a 
blocker.

> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance
> ---
>
> Key: YARN-5995
> URL: https://issues.apache.org/jira/browse/YARN-5995
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: metrics, resourcemanager
>Affects Versions: 2.7.1
> Environment: CentOS7.2 Hadoop-2.7.1 
>Reporter: zhangyubiao
>Assignee: zhangyubiao
>Priority: Major
>  Labels: patch
> Attachments: YARN-5995.0001.patch, YARN-5995.0002.patch, 
> YARN-5995.0003.patch, YARN-5995.0004.patch, YARN-5995.0005.patch, 
> YARN-5995.patch
>
>
> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9856) Remove log-aggregation related duplicate function

2020-04-10 Thread Brahma Reddy Battula (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-9856:
---
Target Version/s: 3.4.0  (was: 3.3.0)

Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a 
blocker.

> Remove log-aggregation related duplicate function
> -
>
> Key: YARN-9856
> URL: https://issues.apache.org/jira/browse/YARN-9856
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: log-aggregation, yarn
>Affects Versions: 3.3.0
>Reporter: Adam Antal
>Assignee: Szilard Nemeth
>Priority: Trivial
> Attachments: YARN-9856.001.patch, YARN-9856.002.patch
>
>
> [~snemeth] has noticed a duplication in two of the log-aggregation related 
> functions.
> {quote}I noticed duplicated code in 
> org.apache.hadoop.yarn.logaggregation.LogToolUtils#outputContainerLog, 
> duplicated in 
> org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat.LogReader#readContainerLogs.
>  [...]
> {quote}
> We should remove the duplication.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-1564) add workflow YARN services

2020-04-10 Thread Brahma Reddy Battula (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-1564:
---
Target Version/s: 3.4.0  (was: 3.3.0)

Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a 
blocker.

> add workflow YARN services
> --
>
> Key: YARN-1564
> URL: https://issues.apache.org/jira/browse/YARN-1564
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api, nodemanager, resourcemanager
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
>  Labels: oct16-hard
> Attachments: YARN-1564-001.patch, YARN-1564-002.patch, 
> YARN-1564-003.patch
>
>   Original Estimate: 24h
>  Time Spent: 48h
>  Remaining Estimate: 0h
>
> I've been using some alternative composite services to help build workflows 
> of process execution in a YARN AM.
> They and their tests could be moved in YARN for the use by others -this would 
> make it easier to build aggregate services in an AM



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10001) Add explanation of unimplemented methods in InMemoryConfigurationStore

2020-04-10 Thread Brahma Reddy Battula (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-10001:

Target Version/s: 3.4.0

Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a 
blocker.

> Add explanation of unimplemented methods in InMemoryConfigurationStore
> --
>
> Key: YARN-10001
> URL: https://issues.apache.org/jira/browse/YARN-10001
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Siddharth Ahuja
>Priority: Major
> Fix For: 3.3.0, 3.4.0
>
> Attachments: YARN-10001-branch-3.2.003.patch, YARN-10001.001.patch, 
> YARN-10001.002.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7604) Fix some minor typos in the opportunistic container logging

2020-04-10 Thread Brahma Reddy Battula (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-7604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-7604:
---
Target Version/s: 3.4.0  (was: 3.3.0)

Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a 
blocker.

> Fix some minor typos in the opportunistic container logging
> ---
>
> Key: YARN-7604
> URL: https://issues.apache.org/jira/browse/YARN-7604
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.9.0
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Trivial
> Attachments: YARN-7604.01.patch
>
>
> Fix some minor text issues. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9354) Resources should be created with ResourceTypesTestHelper instead of TestUtils

2020-04-10 Thread Brahma Reddy Battula (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-9354:
---
Target Version/s: 3.4.0

Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a 
blocker.

> Resources should be created with ResourceTypesTestHelper instead of TestUtils
> -
>
> Key: YARN-9354
> URL: https://issues.apache.org/jira/browse/YARN-9354
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Andras Gyori
>Priority: Trivial
>  Labels: newbie, newbie++
> Fix For: 3.3.0
>
> Attachments: YARN-9354.001.patch, YARN-9354.002.patch, 
> YARN-9354.003.patch, YARN-9354.004.patch, YARN-9354.branch-3.2.001.patch, 
> YARN-9354.branch-3.2.002.patch, YARN-9354.branch-3.2.003.patch
>
>
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestUtils#createResource
>  has not identical, but very similar implementation to 
> org.apache.hadoop.yarn.resourcetypes.ResourceTypesTestHelper#newResource. 
> Since these 2 methods are doing the same essentially and 
> ResourceTypesTestHelper is newer and used more, TestUtils#createResource 
> should be replaced with ResourceTypesTestHelper#newResource with all 
> occurrence.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-3861) Add fav icon to YARN & MR daemons web UI

2020-04-10 Thread Brahma Reddy Battula (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-3861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-3861:
---
Target Version/s: 3.4.0  (was: 3.3.0)

Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a 
blocker.

> Add fav icon to YARN & MR daemons web UI
> 
>
> Key: YARN-3861
> URL: https://issues.apache.org/jira/browse/YARN-3861
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: webapp
>Reporter: Devaraj Kavali
>Assignee: Devaraj Kavali
>Priority: Major
>  Labels: oct16-easy
> Attachments: RM UI in Chrome-With Patch.png, RM UI in Chrome-Without 
> Patch.png, RM UI in IE-With Patch.png, RM UI in IE-Without Patch.png.png, 
> YARN-3861.patch, hadoop-fav-transparent.png, hadoop-fav.png
>
>
> Add fav icon image to all YARN & MR daemons web UI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9637) Make SLS wrapper class name configurable

2020-04-10 Thread Brahma Reddy Battula (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-9637:
---
Target Version/s: 3.4.0  (was: 3.3.0)

Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a 
blocker.

> Make SLS wrapper class name configurable
> 
>
> Key: YARN-9637
> URL: https://issues.apache.org/jira/browse/YARN-9637
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler-load-simulator
>Affects Versions: 3.3.0
>Reporter: Erkin Alp Güney
>Assignee: Adam Antal
>Priority: Major
>  Labels: configuration-addition
> Attachments: YARN-9637.001.patch
>
>
> SLS currently has hardcoded lookup on which scheduler wrapper to load based 
> on scheduler, and it only knows about Fair and Capacity schedulers. Making it 
> configurable will accelerate development of new pluggable YARN schedulers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8503) Add unit test for FINISHED_CONTAINERS_PULLED_BY_AM event on DECOMMISSIONING

2020-04-10 Thread Brahma Reddy Battula (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-8503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-8503:
---
Target Version/s: 3.4.0  (was: 3.3.0)

Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a 
blocker.

> Add unit test for FINISHED_CONTAINERS_PULLED_BY_AM event on DECOMMISSIONING
> ---
>
> Key: YARN-8503
> URL: https://issues.apache.org/jira/browse/YARN-8503
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: test
>Affects Versions: 2.7.2
>Reporter: Amiya Chakraborty
>Assignee: Amiya Chakraborty
>Priority: Major
>  Labels: patch-available, yarn
> Attachments: YARN-8503.001.patch, YARN-8503.001.patch
>
>
> Currently, there is no unit test for testing the functionality - 
> FINISHED_CONTAINERS_PULLED_BY_AM event while Decommissioning of node. This 
> patch provides the same to check the AM has pulled the containers from the 
> RM; then the RM will inform the NM about it and the NM can remove the 
> completed container from its list during DECOMMISSIONING.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9807) ContainerAllocator re-creates RMContainer instance when allocate for ReservedContainer

2020-04-10 Thread Brahma Reddy Battula (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-9807:
---
Target Version/s: 3.4.0  (was: 3.3.0, 2.9.3, 2.10.1)

Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a 
blocker.

> ContainerAllocator re-creates RMContainer instance when allocate for 
> ReservedContainer
> --
>
> Key: YARN-9807
> URL: https://issues.apache.org/jira/browse/YARN-9807
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yicong Cai
>Assignee: Yicong Cai
>Priority: Major
> Attachments: YARN-9807.01.patch, YARN-9807.branch-2.01.patch
>
>
> The ContainerAllocator re-creates the RMContainer instance when it is 
> allocated to the ReservedContainer. This will cause the RMContainer to lose 
> information from NEW to RESERVED.
> {panel:title=RM Log}
> 2019-08-28 18:42:30,320 [10645451] - INFO [SchedulerEventDispatcher:Event 
> Processor:RMContainerImpl@486] - container_e47_1566978597856_2831_01_07 
> Container Transitioned from NEW to RESERVED
> 2019-08-28 18:42:38,055 [10653186] - INFO [SchedulerEventDispatcher:Event 
> Processor:AbstractContainerAllocator@126] - assignedContainer application 
> attempt=appattempt_1566978597856_2831_01 
> container=container_e47_1566978597856_2831_01_07 
> queue=org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator@69f543b5
>  clusterResource= type=NODE_LOCAL 
> requestedPartition=label_ndir_2
> 2019-08-28 18:42:38,055 [10653186] - INFO [SchedulerEventDispatcher:Event 
> Processor:RMContainerImpl@486] - container_e47_1566978597856_2831_01_07 
> Container Transitioned from NEW to ALLOCATED
> {panel}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9998) Code cleanup in LeveldbConfigurationStore

2020-04-10 Thread Brahma Reddy Battula (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-9998:
---
Target Version/s: 3.4.0

Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a 
blocker.

> Code cleanup in LeveldbConfigurationStore
> -
>
> Key: YARN-9998
> URL: https://issues.apache.org/jira/browse/YARN-9998
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Benjamin Teke
>Priority: Minor
> Fix For: 3.3.0
>
> Attachments: YARN-9998.001.patch, YARN-9998.002.patch
>
>
> Many things can be improved:
> * Field compactionTimer could be a local variable
> * Field versiondb should be camelcase
> * initDatabase is a very long method: Initialize db / versionDb should be in 
> separate methods, split this method into smaller chunks
> * Remove TODOs
> * Remove duplicated code block in 
> LeveldbConfigurationStore.CompactionTimerTask
> * Any other cleanup



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10215) Endpoint for obtaining direct URL for the logs

2020-04-10 Thread Brahma Reddy Battula (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-10215:

Target Version/s: 3.4.0  (was: 3.3.0, 3.4.0)

Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a 
blocker.

> Endpoint for obtaining direct URL for the logs
> --
>
> Key: YARN-10215
> URL: https://issues.apache.org/jira/browse/YARN-10215
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.3.0
>Reporter: Adam Antal
>Assignee: Andras Gyori
>Priority: Major
> Attachments: YARN-10025.001.patch, YARN-10025.002.patch, 
> YARN-10025.003.patch
>
>
> If CORS protected UIs are set up, there is an issue when the browser tries to 
> access the logs of a running container in the RM web UIv2.
> Assuming ATS is not up, the browser follows the following call chain:
> - Tries to access ATS, it fails, falls back to JHS
> - From RM the browser received basic app info, we know that the application 
> is running
> - From the JHS we got the list of containers and their log files.
> - When we try to access a specific log file, the JHS redirects the request to 
> the NM's UI (on which node the container is running). This redirect is 
> performed by the browser automatically. In this setup the host is considered 
> as a protected information, thus the browser omits the "Origin" field from 
> the request when this redirect is done. The browser then denies access to the 
> NodeManager's web UI due to the CORS header set up for NM, but the Origin is 
> null in the redirect request. 
> - Finally, "Logs are unavailable" message is shown in the RM web UIv2 due to 
> the CORS violation.
> We should fix this. As an approach we can expose another endpoints which only 
> returns the URL of the NodeManager what we should call directly from the UIv2 
> in order to receive the log. This adds a bit of a complexity, but will enable 
> users to keep the CORS protected setup.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5683) Support specifying storage type for per-application local dirs

2020-04-10 Thread Brahma Reddy Battula (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-5683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-5683:
---
Target Version/s: 3.4.0  (was: 3.3.0)

Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a 
blocker.

> Support specifying storage type for per-application local dirs
> --
>
> Key: YARN-5683
> URL: https://issues.apache.org/jira/browse/YARN-5683
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 3.0.0-alpha2
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
>  Labels: oct16-hard
> Attachments: YARN-5683-1.patch, YARN-5683-2.patch, YARN-5683-3.patch, 
> flow_diagram_for_MapReduce-2.png, flow_diagram_for_MapReduce.png
>
>
> h3.  Introduction
> * Some applications of various frameworks (Flink, Spark and MapReduce etc) 
> using local storage (checkpoint, shuffle etc) might require high IO 
> performance. It's useful to allocate local directories to high performance 
> storage media for these applications on heterogeneous clusters.
> * YARN does not distinguish different storage types and hence applications 
> cannot selectively use storage media with different performance 
> characteristics. Adding awareness of storage media can allow YARN to make 
> better decisions about the placement of local directories.
> h3.  Approach
> * NodeManager will distinguish storage types for local directories.
> ** yarn.nodemanager.local-dirs and yarn.nodemanager.log-dirs configuration 
> should allow the cluster administrator to optionally specify the storage type 
> for each local directories. Example: 
> [SSD]/disk1/nm-local-dir,/disk2/nm-local-dir,/disk3/nm-local-dir (equals to 
> [SSD]/disk1/nm-local-dir,[DISK]/disk2/nm-local-dir,[DISK]/disk3/nm-local-dir)
> ** StorageType defines DISK/SSD storage types and takes DISK as the default 
> storage type. 
> ** StorageLocation separates storage type and directory path, used by 
> LocalDirAllocator to aware the types of local dirs, the default storage type 
> is DISK.
> ** getLocalPathForWrite method of LocalDirAllcator will prefer to choose the 
> local directory of the specified storage type, and will fallback to not care 
> storage type if the requirement can not be satisfied.
> ** Support for container related local/log directories by ContainerLaunch. 
> All application frameworks can set the environment variables 
> (LOCAL_STORAGE_TYPE and LOG_STORAGE_TYPE) to specified the desired storage 
> type of local/log directories, and choose to not launch container if fallback 
> through these environment variables (ENSURE_LOCAL_STORAGE_TYPE and 
> ENSURE_LOG_STORAGE_TYPE).
> * Allow specified storage type for various frameworks (Take MapReduce as an 
> example)
> ** Add new configurations should allow application administrator to 
> optionally specify the storage type of local/log directories and fallback 
> strategy (MapReduce configurations: mapreduce.job.local-storage-type, 
> mapreduce.job.log-storage-type, mapreduce.job.ensure-local-storage-type and 
> mapreduce.job.ensure-log-storage-type).
> ** Support for container work directories. Set the environment variables 
> includes LOCAL_STORAGE_TYPE and LOG_STORAGE_TYPE according to configurations 
> above for ContainerLaunchContext and ApplicationSubmissionContext. (MapReduce 
> should update YARNRunner and TaskAttemptImpl)
> ** Add storage type prefix for request path to support for other local 
> directories of frameworks (such as shuffle directories for MapReduce). 
> (MapReduce should update YarnOutputFiles, MROutputFiles and YarnChild to 
> support for output/work directories)
> ** Flow diagram for MapReduce framework
> !flow_diagram_for_MapReduce-2.png!
> h3.  Further Discussion
> * Scheduling : The requirement of storage type for local/log directories may 
> not be satisfied for a part of nodes on heterogeneous clusters. To achieve 
> global optimum, scheduler should aware and manage disk resources. 
> ** Approach-1: Based on node attributes (YARN-3409), Scheduler can allocate 
> containers which have SSD requirement on nodes with attribute:ssd=true.
> ** Approach-2: Based on extended resource model (YARN-3926), it's easy to 
> support scheduling through extending resource models like vdisk and vssd 
> using this feature, but hard to measure for applications and isolate for 
> non-CFQ based disks.
> * Fallback strategy still needs to be concerned. Certain applications might 
> not work well when the requirement of storage type is not satisfied. When 
> none of desired storage type disk are available, should container launching 
> be failed? let AM handle? We have implemented a fallback strategy that fail 
> to launch container when none of desired storage type 

[jira] [Updated] (YARN-6426) Compress ZK YARN keys to scale up (especially AppStateData

2020-04-10 Thread Brahma Reddy Battula (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-6426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-6426:
---
Target Version/s: 3.4.0  (was: 3.3.0)

Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a 
blocker.

> Compress ZK YARN keys to scale up (especially AppStateData
> --
>
> Key: YARN-6426
> URL: https://issues.apache.org/jira/browse/YARN-6426
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.0.0-alpha2
>Reporter: Roni Burd
>Assignee: Roni Burd
>Priority: Major
>  Labels: patch
> Attachments: zkcompression.patch
>
>
> ZK today stores the protobuf files uncompressed. This is not an issue except 
> that if a customer job has thousands of files, AppStateData will store the 
> user context as a string with multiple URLs and it is easy to get to 1MB or 
> more. 
> This can put unnecessary strain on ZK and make the process slow. 
> The proposal is to simply compress protobufs before sending them to ZK



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-2681) Support bandwidth enforcement for containers while reading from HDFS

2020-04-10 Thread Brahma Reddy Battula (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-2681:
---
Target Version/s: 3.4.0  (was: 3.3.0)

Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a 
blocker.

> Support bandwidth enforcement for containers while reading from HDFS
> 
>
> Key: YARN-2681
> URL: https://issues.apache.org/jira/browse/YARN-2681
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 2.5.1
> Environment: Linux
>Reporter: Nam H. Do
>Priority: Major
> Attachments: Traffic Control Design.png, YARN-2681.001.patch, 
> YARN-2681.002.patch, YARN-2681.003.patch, YARN-2681.004.patch, 
> YARN-2681.005.patch, YARN-2681.patch
>
>
> To read/write data from HDFS on data node, applications establise TCP/IP 
> connections with the datanode. The HDFS read can be controled by setting 
> Linux Traffic Control  (TC) subsystem on the data node to make filters on 
> appropriate connections.
> The current cgroups net_cls concept can not be applied on the node where the 
> container is launched, netheir on data node since:
> -   TC hanldes outgoing bandwidth only, so it can be set on container node 
> (HDFS read = incoming data for the container)
> -   Since HDFS data node is handled by only one process,  it is not possible 
> to use net_cls to separate connections from different containers to the 
> datanode.
> Tasks:
> 1) Extend Resource model to define bandwidth enforcement rate
> 2) Monitor TCP/IP connection estabilised by container handling process and 
> its child processes
> 3) Set Linux Traffic Control rules on data node base on address:port pairs in 
> order to enforce bandwidth of outgoing data
> Concept: http://www.hit.bme.hu/~do/papers/EnforcementDesign.pdf
> Implementation: 
> http://www.hit.bme.hu/~dohoai/documents/HdfsTrafficControl.pdf
> http://www.hit.bme.hu/~dohoai/documents/HdfsTrafficControl_UML_diagram.png



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6315) Improve LocalResourcesTrackerImpl#isResourcePresent to return false for corrupted files

2020-04-10 Thread Brahma Reddy Battula (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-6315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-6315:
---
Target Version/s: 3.4.0  (was: 3.3.0)

Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a 
blocker.

> Improve LocalResourcesTrackerImpl#isResourcePresent to return false for 
> corrupted files
> ---
>
> Key: YARN-6315
> URL: https://issues.apache.org/jira/browse/YARN-6315
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.3, 2.8.1
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: YARN-6315.001.patch, YARN-6315.002.patch, 
> YARN-6315.003.patch, YARN-6315.004.patch, YARN-6315.005.patch, 
> YARN-6315.006.patch
>
>
> We currently check if a resource is present by making sure that the file 
> exists locally. There can be a case where the LocalizationTracker thinks that 
> it has the resource if the file exists but with size 0 or less than the 
> "expected" size of the LocalResource. This JIRA tracks the change to harden 
> the isResourcePresent call to address that case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9490) applicationresourceusagereport return wrong number of reserved containers

2020-04-10 Thread Brahma Reddy Battula (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-9490:
---
Target Version/s: 3.4.0  (was: 3.3.0)

Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a 
blocker.

> applicationresourceusagereport return wrong number of reserved containers
> -
>
> Key: YARN-9490
> URL: https://issues.apache.org/jira/browse/YARN-9490
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.3.0
>Reporter: yanbing zhang
>Assignee: yanbing zhang
>Priority: Minor
> Fix For: 3.3.0
>
> Attachments: YARN-9490.002.patch, YARN-9490.patch, 
> YARN-9490.patch1.patch
>
>
> when getting an ApplicationResourceUsageReport instance from the class of 
> SchedulerApplicationAttempt, I found the input constructor 
> parameter(reservedContainers.size()) is wrong.  because the type of this 
> variable is Map>, so 
> "reservedContainer.size()" is not the number of containers, but the number of 
> SchedulerRequestKey.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5625) FairScheduler should use FSContext more aggressively to avoid constructors with many parameters

2020-04-09 Thread Brahma Reddy Battula (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-5625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-5625:
---
Target Version/s: 3.4.0  (was: 3.3.0)

Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a 
blocker.

> FairScheduler should use FSContext more aggressively to avoid constructors 
> with many parameters
> ---
>
> Key: YARN-5625
> URL: https://issues.apache.org/jira/browse/YARN-5625
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 2.9.0
>Reporter: Karthik Kambatla
>Priority: Major
>
> YARN-5609 introduces FSContext, a structure to capture basic FairScheduler 
> information. In addition to preemption details, it could host references to 
> the scheduler, QueueManager, AllocationConfiguration etc. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4843) [Umbrella] Revisit YARN ProtocolBuffer int32 usages that need to upgrade to int64

2020-04-09 Thread Brahma Reddy Battula (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-4843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-4843:
---
Target Version/s: 3.4.0  (was: 3.3.0)

Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a 
blocker.

> [Umbrella] Revisit YARN ProtocolBuffer int32 usages that need to upgrade to 
> int64
> -
>
> Key: YARN-4843
> URL: https://issues.apache.org/jira/browse/YARN-4843
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: api
>Affects Versions: 3.0.0-alpha1
>Reporter: Wangda Tan
>Priority: Major
>
> This JIRA is to track all int32 usages in YARN's ProtocolBuffer APIs that we 
> possibly need to update to int64.
> One example is resource API. We use int32 for memory now, if a cluster has 
> 10k nodes, each node has 210G memory, we will get a negative total cluster 
> memory.
> We may have other fields may need to upgrade from int32 to int64. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5465) Server-Side NM Graceful Decommissioning subsequent call behavior

2020-04-09 Thread Brahma Reddy Battula (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-5465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-5465:
---
Target Version/s: 3.4.0  (was: 3.3.0)

Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a 
blocker.

> Server-Side NM Graceful Decommissioning subsequent call behavior
> 
>
> Key: YARN-5465
> URL: https://issues.apache.org/jira/browse/YARN-5465
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful
>Reporter: Robert Kanter
>Priority: Major
>
> The Server-Side NM Graceful Decommissioning feature added by YARN-4676 has 
> the following behavior when subsequent calls are made:
> # Start a long-running job that has containers running on nodeA
> # Add nodeA to the exclude file
> # Run {{-refreshNodes -g 120 -server}} (2min) to begin gracefully 
> decommissioning nodeA
> # Wait 30 seconds
> # Add nodeB to the exclude file
> # Run {{-refreshNodes -g 30 -server}} (30sec)
> # After 30 seconds, both nodeA and nodeB shut down
> In a nutshell, issuing a subsequent call to gracefully decommission nodes 
> updates the timeout for any currently decommissioning nodes.  This makes it 
> impossible to gracefully decommission different sets of nodes with different 
> timeouts.  Though it does let you easily update the timeout of currently 
> decommissioning nodes.
> Another behavior we could do is this:
> # {color:grey}Start a long-running job that has containers running on nodeA
> # {color:grey}Add nodeA to the exclude file{color}
> # {color:grey}Run {{-refreshNodes -g 120 -server}} (2min) to begin gracefully 
> decommissioning nodeA{color}
> # {color:grey}Wait 30 seconds{color}
> # {color:grey}Add nodeB to the exclude file{color}
> # {color:grey}Run {{-refreshNodes -g 30 -server}} (30sec){color}
> # After 30 seconds, nodeB shuts down
> # After 60 more seconds, nodeA shuts down
> This keeps the nodes affected by each call to gracefully decommission nodes 
> independent.  You can now have different sets of decommissioning nodes with 
> different timeouts.  However, to update the timeout of a currently 
> decommissioning node, you'd have to first recommission it, and then 
> decommission it again.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4637) AM launching blacklist purge mechanism (time based)

2020-04-09 Thread Brahma Reddy Battula (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-4637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-4637:
---
Target Version/s: 3.4.0  (was: 3.3.0)

Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a 
blocker.

> AM launching blacklist purge mechanism (time based)
> ---
>
> Key: YARN-4637
> URL: https://issues.apache.org/jira/browse/YARN-4637
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Junping Du
>Assignee: Sunil G
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5414) Integrate NodeQueueLoadMonitor with ClusterNodeTracker

2020-04-09 Thread Brahma Reddy Battula (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-5414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-5414:
---
Target Version/s: 3.4.0  (was: 3.3.0)

Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a 
blocker.

> Integrate NodeQueueLoadMonitor with ClusterNodeTracker
> --
>
> Key: YARN-5414
> URL: https://issues.apache.org/jira/browse/YARN-5414
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: container-queuing, distributed-scheduling, scheduler
>Reporter: Arun Suresh
>Assignee: Abhishek Modi
>Priority: Major
>
> The {{ClusterNodeTracker}} tracks the states of clusterNodes and provides 
> convenience methods like sort and filter.
> The {{NodeQueueLoadMonitor}} should use the {{ClusterNodeTracker}} instead of 
> maintaining its own data-structure of node information.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4944) Handle lack of ResourceCalculatorPlugin gracefully

2020-04-09 Thread Brahma Reddy Battula (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-4944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-4944:
---
Target Version/s: 3.4.0  (was: 3.3.0)

Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a 
blocker.

> Handle lack of ResourceCalculatorPlugin gracefully
> --
>
> Key: YARN-4944
> URL: https://issues.apache.org/jira/browse/YARN-4944
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Priority: Major
>  Labels: newbie++
>
> On some systems (e.g. mac), the NM might not be able to instantiate a 
> ResourceCalculatorPlugin and leads to logging a bunch of error messages. We 
> could improve the way we handle this. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5536) Multiple format support (JSON, etc.) for exclude node file in NM graceful decommission with timeout

2020-04-09 Thread Brahma Reddy Battula (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-5536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-5536:
---
Target Version/s: 3.4.0  (was: 3.3.0)

Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a 
blocker.

> Multiple format support (JSON, etc.) for exclude node file in NM graceful 
> decommission with timeout
> ---
>
> Key: YARN-5536
> URL: https://issues.apache.org/jira/browse/YARN-5536
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful
>Reporter: Junping Du
>Priority: Major
>
> Per discussion in YARN-4676, we agree that multiple format (other than xml) 
> should be supported to decommission nodes with timeout values.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-1426) YARN Components need to unregister their beans upon shutdown

2020-04-09 Thread Brahma Reddy Battula (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-1426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-1426:
---
Target Version/s: 3.4.0  (was: 3.3.0)

Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a 
blocker.

> YARN Components need to unregister their beans upon shutdown
> 
>
> Key: YARN-1426
> URL: https://issues.apache.org/jira/browse/YARN-1426
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.3.0, 3.0.0-alpha1
>Reporter: Jonathan Turner Eagles
>Assignee: Jonathan Turner Eagles
>Priority: Major
>  Labels: oct16-easy
> Attachments: YARN-1426.2.patch, YARN-1426.patch, YARN-1426.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4953) Delete completed container log folder when rolling log aggregation is enabled

2020-04-09 Thread Brahma Reddy Battula (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-4953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-4953:
---
Target Version/s: 3.4.0  (was: 3.3.0)

Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a 
blocker.

> Delete completed container log folder when rolling log aggregation is enabled
> -
>
> Key: YARN-4953
> URL: https://issues.apache.org/jira/browse/YARN-4953
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Major
>
> There would be potential bottle neck when cluster is running with very large 
> number of containers on the same NodeManager for single application. The 
> linux limits the subfolders count to 32K. If number of containers is greater 
> than 32K for an application, there would be container launch failure. At this 
> point of time, there are no more containers can be launched in this node.
> Currently log folders are deleted after app is finished. Rolling log 
> aggregation aggregates logs to hdfs periodically. 
> I think if aggregation is completed for finished containers, then clean up 
> can be done i.e deleting log folder for finished containers. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9883) Reshape SchedulerHealth class

2020-04-09 Thread Brahma Reddy Battula (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-9883:
---
Target Version/s: 3.4.0  (was: 3.3.0)

Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a 
blocker.

> Reshape SchedulerHealth class
> -
>
> Key: YARN-9883
> URL: https://issues.apache.org/jira/browse/YARN-9883
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, yarn
>Affects Versions: 3.3.0
>Reporter: Adam Antal
>Assignee: Kinga Marton
>Priority: Minor
>
> The {{SchedulerHealth}} class has some flaws, for example:
> - It has no javadoc at all
> - All its objects are package-private: they should be private
> - The internal maps should be (Concurrent) EnumMaps instead of HashMaps: they 
> are more efficient in storing Enums
> - schedulerHealthDetails only stores the last operation, its name should 
> reflect that (just like lastSchedulerRunDetails)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-2024) IOException in AppLogAggregatorImpl does not give stacktrace and leaves aggregated TFile in a bad state.

2020-04-09 Thread Brahma Reddy Battula (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-2024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-2024:
---
Target Version/s: 3.4.0  (was: 3.3.0)

Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a 
blocker.

> IOException in AppLogAggregatorImpl does not give stacktrace and leaves 
> aggregated TFile in a bad state.
> 
>
> Key: YARN-2024
> URL: https://issues.apache.org/jira/browse/YARN-2024
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: log-aggregation
>Affects Versions: 0.23.10, 2.4.0
>Reporter: Eric Payne
>Assignee: Xuan Gong
>Priority: Major
>
> Multiple issues were encountered when AppLogAggregatorImpl encountered an 
> IOException in AppLogAggregatorImpl#uploadLogsForContainer while aggregating 
> yarn-logs for an application that had very large (>150G each) error logs.
> - An IOException was encountered during the LogWriter#append call, and a 
> message was printed, but no stacktrace was provided. Message: "ERROR: 
> Couldn't upload logs for container_n_nnn_nn_nn. Skipping 
> this container."
> - After the IOExceptin, the TFile is in a bad state, so subsequent calls to 
> LogWriter#append fail with the following stacktrace:
> 2014-04-16 13:29:09,772 [LogAggregationService #17907] ERROR 
> org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread 
> Thread[LogAggregationService #17907,5,main] threw an Exception.
> java.lang.IllegalStateException: Incorrect state to start a new key: IN_VALUE
> at 
> org.apache.hadoop.io.file.tfile.TFile$Writer.prepareAppendKey(TFile.java:528)
> at 
> org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogWriter.append(AggregatedLogFormat.java:262)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.uploadLogsForContainer(AppLogAggregatorImpl.java:128)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.doAppLogAggregation(AppLogAggregatorImpl.java:164)
> ...
> - At this point, the yarn-logs cleaner still thinks the thread is 
> aggregating, so the huge yarn-logs never get cleaned up for that application.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4969) Fix more loggings in CapacityScheduler

2020-04-09 Thread Brahma Reddy Battula (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-4969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-4969:
---
Target Version/s: 3.4.0  (was: 3.3.0)

Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a 
blocker.

> Fix more loggings in CapacityScheduler
> --
>
> Key: YARN-4969
> URL: https://issues.apache.org/jira/browse/YARN-4969
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Major
>  Labels: oct16-easy
> Attachments: YARN-4969.1.patch
>
>
> YARN-3966 did logging cleanup for Capacity Scheduler before, however, 
> there're some loggings we need to improvement:
> Container allocation / complete / reservation / un-reserve messages for every 
> hierarchy (app/leaf/parent-queue) should be printed at INFO level:
> I'm debugging one issue that root queue's resource usage could be negative, 
> it is very hard to reproduce, so we cannot enable debug logging since RM 
> start, size of log cannot be fit in a single disk.
> Existing CS prints INFO message when container cannot be allocated, such as 
> re-reservation / node heartbeat, etc. we should avoid printing such message 
> at INFO level.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10032) Implement regex querying of logs

2020-04-09 Thread Brahma Reddy Battula (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-10032:

Target Version/s: 3.4.0

Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a 
blocker.

> Implement regex querying of logs
> 
>
> Key: YARN-10032
> URL: https://issues.apache.org/jira/browse/YARN-10032
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.2.1
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
> Fix For: 3.3.0
>
>
> After YARN-10031, we have query parameters to the log servlet's GET endpoint.
> To demonstrate the new capabilities of the log servlet and how easy it will 
> be to add a functionality to all log servlets at the same time: let's add the 
> ability to search in the aggregated logs with a given regex.
> A conceptual use case:
> User run several MR jobs daily, but some of them fail to localize a 
> particular resource at first. We want to search in the logs of these Yarn 
> applications, and extract some data from them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-2684) FairScheduler: When failing an application due to changes in queue config or placement policy, indicate the cause.

2020-04-09 Thread Brahma Reddy Battula (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-2684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-2684:
---
Target Version/s: 3.4.0  (was: 3.3.0)

Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a 
blocker.

> FairScheduler: When failing an application due to changes in queue config or 
> placement policy, indicate the cause.
> --
>
> Key: YARN-2684
> URL: https://issues.apache.org/jira/browse/YARN-2684
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.5.1
>Reporter: Karthik Kambatla
>Priority: Major
> Attachments: 0001-YARN-2684.patch, 0002-YARN-2684.patch
>
>
> YARN-2308 fixes this issue for CS, this JIRA is to fix it for FS. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-1946) need Public interface for WebAppUtils.getProxyHostAndPort

2020-04-09 Thread Brahma Reddy Battula (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-1946:
---
Target Version/s: 3.4.0  (was: 3.3.0)

Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a 
blocker.

> need Public interface for WebAppUtils.getProxyHostAndPort
> -
>
> Key: YARN-1946
> URL: https://issues.apache.org/jira/browse/YARN-1946
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, webapp
>Affects Versions: 2.4.0
>Reporter: Thomas Graves
>Priority: Major
>
> ApplicationMasters are supposed to go through the ResourceManager web app 
> proxy if they have web UI's so they are properly secured.  There is currently 
> no public interface for Application Masters to conveniently get the proxy 
> host and port.  There is a function in WebAppUtils, but that class is 
> private.  
> We should provide this as a utility since any properly written AM will need 
> to do this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4638) Node whitelist support for AM launching

2020-04-09 Thread Brahma Reddy Battula (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-4638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-4638:
---
Target Version/s: 3.4.0  (was: 3.3.0)

Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a 
blocker.

> Node whitelist support for AM launching 
> 
>
> Key: YARN-4638
> URL: https://issues.apache.org/jira/browse/YARN-4638
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4636) Make blacklist tracking policy pluggable for more extensions.

2020-04-09 Thread Brahma Reddy Battula (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-4636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-4636:
---
Target Version/s: 3.4.0  (was: 3.3.0)

Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a 
blocker.

> Make blacklist tracking policy pluggable for more extensions.
> -
>
> Key: YARN-4636
> URL: https://issues.apache.org/jira/browse/YARN-4636
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Junping Du
>Assignee: Sunil G
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4971) RM fails to re-bind to wildcard IP after failover in multi homed clusters

2020-04-09 Thread Brahma Reddy Battula (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-4971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-4971:
---
Target Version/s: 3.4.0  (was: 3.3.0)

Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a 
blocker.

> RM fails to re-bind to wildcard IP after failover in multi homed clusters
> -
>
> Key: YARN-4971
> URL: https://issues.apache.org/jira/browse/YARN-4971
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.2
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Attachments: YARN-4971.1.patch
>
>
> If the RM has the {{yarn.resourcemanager.bind-host}} set to 0.0.0.0 the first 
> time the service becomes active binding to the wildcard works as expected. If 
> the service has transitioned from active to standby and then becomes active 
> again after failovers the service only binds to one of the ip addresses.
> There is a difference between the services inside the RM: it only seem to 
> happen for the services listening on ports: 8030 and 8032



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4808) SchedulerNode can use a few more cosmetic changes

2020-04-09 Thread Brahma Reddy Battula (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-4808:
---
Target Version/s: 3.4.0  (was: 3.3.0)

Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a 
blocker.

> SchedulerNode can use a few more cosmetic changes
> -
>
> Key: YARN-4808
> URL: https://issues.apache.org/jira/browse/YARN-4808
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>Priority: Major
> Attachments: yarn-4808-1.patch, yarn-4808-2.patch
>
>
> We have made some cosmetic changes to SchedulerNode recently. While working 
> on YARN-4511, realized we could improve it a little more:
> # Remove volatile variables - don't see the need for them being volatile
> # Some methods end up doing very similar things, so consolidating them
> # Renaming totalResource to capacity. YARN-4511 plans to add inflatedCapacity 
> to include the un-utilized resources, and having two totals can be a little 
> confusing. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



  1   2   3   4   5   6   7   >