[jira] [Created] (YARN-4299) Distcp fails even if ignoreFailures option is set
Prabhu Joseph created YARN-4299: --- Summary: Distcp fails even if ignoreFailures option is set Key: YARN-4299 URL: https://issues.apache.org/jira/browse/YARN-4299 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Prabhu Joseph hadoop distcp fails even if ignoreFailures option is set using -i option. When an IOException is thrown from RetriableFileCopyCommand, the hadoopFailures method in CopyMapper does not honor ignoreFailures. if (ignoreFailures && exception.getCause() instanceof RetriableFileCopyCommand.CopyReadException) OR should be used above. And there is one more bug, when i wrap IOException with CopyReadException, the exception.getCause is still IOException. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4256) YARN fair scheduler vcores with decimal values
[ https://issues.apache.org/jira/browse/YARN-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14960049#comment-14960049 ] Prabhu Joseph commented on YARN-4256: - Thanks Jun Gong. > YARN fair scheduler vcores with decimal values > -- > > Key: YARN-4256 > URL: https://issues.apache.org/jira/browse/YARN-4256 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.7.1 >Reporter: Prabhu Joseph >Assignee: Jun Gong >Priority: Minor > Fix For: 2.7.2 > > Attachments: YARN-4256.001.patch > > > When the queue with vcores is in decimal value, the value after the decimal > point is taken as vcores by FairScheduler. > For the below queue, > 2 mb,20 vcores,20.25 disks > 3 mb,40.2 vcores,30.25 disks > When many applications submitted parallely into queue, all were in PENDING > state as the vcores is taken as 2 skipping the value 40. > The code FairSchedulerConfiguration.java to Pattern match the vcores has to > be improved in such a way either throw > AllocationConfigurationException("Missing resource") or consider the value > before decimal. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4256) YARN fair scheduler vcores with decimal values
Prabhu Joseph created YARN-4256: --- Summary: YARN fair scheduler vcores with decimal values Key: YARN-4256 URL: https://issues.apache.org/jira/browse/YARN-4256 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.7.1 Reporter: Prabhu Joseph Priority: Critical Fix For: 2.7.2 When the queue with vcores is in decimal value, the value after the decimal point is taken as vcores by FairScheduler. For the below queue, 2 mb,20 vcores,20.25 disks 3 mb,40.2 vcores,30.25 disks When many applications submitted parallely into queue, all were in PENDING state as the vcores is taken as 2 skipping the value 40. The code FairSchedulerConfiguration.java to Pattern match the vcores has to be improved in such a way either throw AllocationConfigurationException("Missing resource") or consider the value before decimal. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4256) YARN fair scheduler vcores with decimal values
[ https://issues.apache.org/jira/browse/YARN-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated YARN-4256: Priority: Minor (was: Critical) > YARN fair scheduler vcores with decimal values > -- > > Key: YARN-4256 > URL: https://issues.apache.org/jira/browse/YARN-4256 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.7.1 >Reporter: Prabhu Joseph >Priority: Minor > Fix For: 2.7.2 > > > When the queue with vcores is in decimal value, the value after the decimal > point is taken as vcores by FairScheduler. > For the below queue, > 2 mb,20 vcores,20.25 disks > 3 mb,40.2 vcores,30.25 disks > When many applications submitted parallely into queue, all were in PENDING > state as the vcores is taken as 2 skipping the value 40. > The code FairSchedulerConfiguration.java to Pattern match the vcores has to > be improved in such a way either throw > AllocationConfigurationException("Missing resource") or consider the value > before decimal. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4437) JobEndNotification info logs are missing in AM container syslog
Prabhu Joseph created YARN-4437: --- Summary: JobEndNotification info logs are missing in AM container syslog Key: YARN-4437 URL: https://issues.apache.org/jira/browse/YARN-4437 Project: Hadoop YARN Issue Type: Bug Components: api Affects Versions: 2.7.0 Reporter: Prabhu Joseph Priority: Minor JobEndNotification logs are not written by MRAppMaster and JobEndNotifier classes even though Log.info is present. The reason was MRAppMaster.this.stop() has been called before the JobEndNotification and hence somewhere during the stop log appenders also made null. AM container syslog is not having below logs from JobEndNotifier Job end notification trying + urlToNotify Job end notification to + urlToNotify + succeeded / failed -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4469) yarn application -status should not show a stack trace for an unknown application ID
[ https://issues.apache.org/jira/browse/YARN-4469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15066359#comment-15066359 ] Prabhu Joseph commented on YARN-4469: - [~templedf] The issue is already corrected in 2.7.0 as part of YARN-2356 > yarn application -status should not show a stack trace for an unknown > application ID > > > Key: YARN-4469 > URL: https://issues.apache.org/jira/browse/YARN-4469 > Project: Hadoop YARN > Issue Type: Improvement > Components: client >Affects Versions: 2.7.1 >Reporter: Daniel Templeton >Assignee: Daniel Templeton > > For example: > {noformat} > # yarn application -status application_1234567890_12345 > Exception in thread "main" > org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application > with id 'application_1234567890_12345' doesn't exist in RM. > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:324) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:170) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:401) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:526) > at > org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53) > at > org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:101) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:190) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104) > at com.sun.proxy.$Proxy12.getApplicationReport(Unknown Source) > at > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationReport(YarnClientImpl.java:399) > at > org.apache.hadoop.yarn.client.cli.ApplicationCLI.printApplicationReport(ApplicationCLI.java:429) > at > org.apache.hadoop.yarn.client.cli.ApplicationCLI.run(ApplicationCLI.java:154) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) > at > org.apache.hadoop.yarn.client.cli.ApplicationCLI.main(ApplicationCLI.java:77) > Caused by: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException): > Application with id 'application_1234567890_12345' doesn't exist in RM. > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:324) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:170) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:401) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082) > at
[jira] [Comment Edited] (YARN-5295) YARN queue-mappings to check Queue is present before submitting job
[ https://issues.apache.org/jira/browse/YARN-5295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15351347#comment-15351347 ] Prabhu Joseph edited comment on YARN-5295 at 6/27/16 4:23 PM: -- Hi [~sunilg], Yes, if test queue is present, the application submitted by test user placed into test queue. But if test queue is not present or if test queue is not a leaf queue or if test user does not have either Submit_Applications or Administer_Queue ACL, then the application is rejected. Instead, the getMappedQueue in CapacityScheduler can do the three sanity checks and return a valid queue that is platform instead of test. (Assuming test user passes the sanity checks on platform Queue) Currently the sanity checks are done separately after deciding the queue to be placed, instead sanity checks can be included in getMappedQueue logic, where once queue mapping is chosen from the list, the sanity checks can be done and if it fails, then move to the next queue mapping in the list. was (Author: prabhu joseph): [~sunilg] Yes, if test queue is present, the application submitted by test user placed into test queue. But if test queue is not present or if test queue is not a leaf queue or if test user does not have either Submit_Applications or Administer_Queue ACL, then the application is rejected. Instead, the getMappedQueue in CapacityScheduler can do the three sanity checks and return a valid queue that is platform instead of test. (Assuming test user passes the sanity checks on platform Queue) Currently the sanity checks are done separately after deciding the queue to be placed, instead sanity checks can be included in getMappedQueue logic, where once queue mapping is chosen from the list, the sanity checks can be done and if it fails, then move to the next queue mapping in the list. > YARN queue-mappings to check Queue is present before submitting job > --- > > Key: YARN-5295 > URL: https://issues.apache.org/jira/browse/YARN-5295 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Affects Versions: 2.7.2 >Reporter: Prabhu Joseph > > In yarn Queue-Mappings, Yarn should check if the queue is present before > submitting the job. If not present it should go to next mapping available. > For example if we have > yarn.scheduler.capacity.queue-mappings=u:%user:%user,g:edw:platform > and I submit job with user "test" and if there is no "test" queue then it > should check the second mapping (g:edw:platform) in the list and if test is > part of edw group it should submit job in platform queue. > Below Sanity checks has to be done for the mapped queue in the list and if it > fails then the the next queue mapping has to be chosen, when there is no > queue mapping passing the sanity check, only then the application has to be > Rejected. > 1. is queue present > 2. is queue not a leaf queue > 3. is user either have ACL Submit_Applications or Administer_Queue of the > queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5295) YARN queue-mappings to check Queue is present before submitting job
[ https://issues.apache.org/jira/browse/YARN-5295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15351450#comment-15351450 ] Prabhu Joseph commented on YARN-5295: - Yes, doing Sanity Check 1 and 2 well before in getMappedQueue is suffice to help administrators to configure a default queue to any user or group in case of no valid queue mapping. For example, with this fix, Administrators can allow any new user added and who does not have queue created with same user name can still be placed in default queue through list of queue mappings u:%user:%user,u:%user:default > YARN queue-mappings to check Queue is present before submitting job > --- > > Key: YARN-5295 > URL: https://issues.apache.org/jira/browse/YARN-5295 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Affects Versions: 2.7.2 >Reporter: Prabhu Joseph > > In yarn Queue-Mappings, Yarn should check if the queue is present before > submitting the job. If not present it should go to next mapping available. > For example if we have > yarn.scheduler.capacity.queue-mappings=u:%user:%user,g:edw:platform > and I submit job with user "test" and if there is no "test" queue then it > should check the second mapping (g:edw:platform) in the list and if test is > part of edw group it should submit job in platform queue. > Below Sanity checks has to be done for the mapped queue in the list and if it > fails then the the next queue mapping has to be chosen, when there is no > queue mapping passing the sanity check, only then the application has to be > Rejected. > 1. is queue present > 2. is queue not a leaf queue > 3. is user either have ACL Submit_Applications or Administer_Queue of the > queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5295) YARN queue-mappings to check Queue is present before submitting job
[ https://issues.apache.org/jira/browse/YARN-5295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15351347#comment-15351347 ] Prabhu Joseph commented on YARN-5295: - [~sunilg] Yes, if test queue is present, the application submitted by test user placed into test queue. But if test queue is not present or if test queue is not a leaf queue or if test user does not have either Submit_Applications or Administer_Queue ACL, then the application is rejected. Instead, the getMappedQueue in CapacityScheduler can do the three sanity checks and return a valid queue that is platform instead of test. (Assuming test user passes the sanity checks on platform Queue) Currently the sanity checks are done separately after deciding the queue to be placed, instead sanity checks can be included in getMappedQueue logic, where once queue mapping is chosen from the list, the sanity checks can be done and if it fails, then move to the next queue mapping in the list. > YARN queue-mappings to check Queue is present before submitting job > --- > > Key: YARN-5295 > URL: https://issues.apache.org/jira/browse/YARN-5295 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Affects Versions: 2.7.2 >Reporter: Prabhu Joseph > > In yarn Queue-Mappings, Yarn should check if the queue is present before > submitting the job. If not present it should go to next mapping available. > For example if we have > yarn.scheduler.capacity.queue-mappings=u:%user:%user,g:edw:platform > and I submit job with user "test" and if there is no "test" queue then it > should check the second mapping (g:edw:platform) in the list and if test is > part of edw group it should submit job in platform queue. > Below Sanity checks has to be done for the mapped queue in the list and if it > fails then the the next queue mapping has to be chosen, when there is no > queue mapping passing the sanity check, only then the application has to be > Rejected. > 1. is queue present > 2. is queue not a leaf queue > 3. is user either have ACL Submit_Applications or Administer_Queue of the > queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-5295) YARN queue-mappings to check Queue is present before submitting job
[ https://issues.apache.org/jira/browse/YARN-5295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15351347#comment-15351347 ] Prabhu Joseph edited comment on YARN-5295 at 6/27/16 4:25 PM: -- Hi [~sunilg], Yes, if test queue is present, the application submitted by test user placed into test queue. But if test queue is not present or if test queue is not a leaf queue or if test user does not have either Submit_Applications or Administer_Queue ACL, then the application is rejected. Instead, the getMappedQueue in CapacityScheduler can do the three sanity checks well before and return a valid queue that is platform instead of test. (Assuming test user passes the sanity checks on platform Queue) Currently the sanity checks are done separately after deciding the queue to be placed, instead sanity checks can be included in getMappedQueue logic, where once queue mapping is chosen from the list, the sanity checks can be done and if it fails, then move to the next queue mapping in the list. was (Author: prabhu joseph): Hi [~sunilg], Yes, if test queue is present, the application submitted by test user placed into test queue. But if test queue is not present or if test queue is not a leaf queue or if test user does not have either Submit_Applications or Administer_Queue ACL, then the application is rejected. Instead, the getMappedQueue in CapacityScheduler can do the three sanity checks and return a valid queue that is platform instead of test. (Assuming test user passes the sanity checks on platform Queue) Currently the sanity checks are done separately after deciding the queue to be placed, instead sanity checks can be included in getMappedQueue logic, where once queue mapping is chosen from the list, the sanity checks can be done and if it fails, then move to the next queue mapping in the list. > YARN queue-mappings to check Queue is present before submitting job > --- > > Key: YARN-5295 > URL: https://issues.apache.org/jira/browse/YARN-5295 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Affects Versions: 2.7.2 >Reporter: Prabhu Joseph > > In yarn Queue-Mappings, Yarn should check if the queue is present before > submitting the job. If not present it should go to next mapping available. > For example if we have > yarn.scheduler.capacity.queue-mappings=u:%user:%user,g:edw:platform > and I submit job with user "test" and if there is no "test" queue then it > should check the second mapping (g:edw:platform) in the list and if test is > part of edw group it should submit job in platform queue. > Below Sanity checks has to be done for the mapped queue in the list and if it > fails then the the next queue mapping has to be chosen, when there is no > queue mapping passing the sanity check, only then the application has to be > Rejected. > 1. is queue present > 2. is queue not a leaf queue > 3. is user either have ACL Submit_Applications or Administer_Queue of the > queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-5295) YARN queue-mappings to check Queue is present before submitting job
Prabhu Joseph created YARN-5295: --- Summary: YARN queue-mappings to check Queue is present before submitting job Key: YARN-5295 URL: https://issues.apache.org/jira/browse/YARN-5295 Project: Hadoop YARN Issue Type: Bug Components: capacity scheduler Affects Versions: 2.7.2 Reporter: Prabhu Joseph In yarn Queue-Mappings, Yarn should check if the queue is present before submitting the job. If not present it should go to next mapping available. For example if we have yarn.scheduler.capacity.queue-mappings=u:%user:%user,g:edw:platform and I submit job with user "test" and if there is no "test" queue then it should check the second mapping (g:edw:platform) in the list and if test is part of edw group it should submit job in platform queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5295) YARN queue-mappings to check Queue is present before submitting job
[ https://issues.apache.org/jira/browse/YARN-5295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated YARN-5295: Description: In yarn Queue-Mappings, Yarn should check if the queue is present before submitting the job. If not present it should go to next mapping available. For example if we have yarn.scheduler.capacity.queue-mappings=u:%user:%user,g:edw:platform and I submit job with user "test" and if there is no "test" queue then it should check the second mapping (g:edw:platform) in the list and if test is part of edw group it should submit job in platform queue. Below Sanity checks has to be done for the mapped queue in the list and if it fails then the the next queue mapping has to be chosen, when there is no queue mapping passing the sanity check, only then the application has to be Rejected. 1. is queue present 2. is queue not a leaf queue 3. is user either have ACL Submit_Applications or Administer_Queue of the queue. was: In yarn Queue-Mappings, Yarn should check if the queue is present before submitting the job. If not present it should go to next mapping available. For example if we have yarn.scheduler.capacity.queue-mappings=u:%user:%user,g:edw:platform and I submit job with user "test" and if there is no "test" queue then it should check the second mapping (g:edw:platform) in the list and if test is part of edw group it should submit job in platform queue. Below Sanity Checks has to be done for the mapped queue in the list and if it fails then the the next queue mapping has to be chosen, when there is no queue mapping passing the sanity check, only then the application has to be Rejected. 1. is queue present 2. is queue not a leaf queue 3. is user either have ACL Submit_Applications or Administer_Queue of the queue. > YARN queue-mappings to check Queue is present before submitting job > --- > > Key: YARN-5295 > URL: https://issues.apache.org/jira/browse/YARN-5295 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Affects Versions: 2.7.2 >Reporter: Prabhu Joseph > > In yarn Queue-Mappings, Yarn should check if the queue is present before > submitting the job. If not present it should go to next mapping available. > For example if we have > yarn.scheduler.capacity.queue-mappings=u:%user:%user,g:edw:platform > and I submit job with user "test" and if there is no "test" queue then it > should check the second mapping (g:edw:platform) in the list and if test is > part of edw group it should submit job in platform queue. > Below Sanity checks has to be done for the mapped queue in the list and if it > fails then the the next queue mapping has to be chosen, when there is no > queue mapping passing the sanity check, only then the application has to be > Rejected. > 1. is queue present > 2. is queue not a leaf queue > 3. is user either have ACL Submit_Applications or Administer_Queue of the > queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5295) YARN queue-mappings to check Queue is present before submitting job
[ https://issues.apache.org/jira/browse/YARN-5295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated YARN-5295: Description: In yarn Queue-Mappings, Yarn should check if the queue is present before submitting the job. If not present it should go to next mapping available. For example if we have yarn.scheduler.capacity.queue-mappings=u:%user:%user,g:edw:platform and I submit job with user "test" and if there is no "test" queue then it should check the second mapping (g:edw:platform) in the list and if test is part of edw group it should submit job in platform queue. Below Sanity Checks has to be done for the mapped queue in the list and if it fails then the the next queue mapping has to be chosen, when there is no queue mapping passing the sanity check, only then the application has to be Rejected. 1. is queue present 2. is queue not a leaf queue 3. is user either have ACL Submit_Applications or Administer_Queue of the queue. was: In yarn Queue-Mappings, Yarn should check if the queue is present before submitting the job. If not present it should go to next mapping available. For example if we have yarn.scheduler.capacity.queue-mappings=u:%user:%user,g:edw:platform and I submit job with user "test" and if there is no "test" queue then it should check the second mapping (g:edw:platform) in the list and if test is part of edw group it should submit job in platform queue. > YARN queue-mappings to check Queue is present before submitting job > --- > > Key: YARN-5295 > URL: https://issues.apache.org/jira/browse/YARN-5295 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Affects Versions: 2.7.2 >Reporter: Prabhu Joseph > > In yarn Queue-Mappings, Yarn should check if the queue is present before > submitting the job. If not present it should go to next mapping available. > For example if we have > yarn.scheduler.capacity.queue-mappings=u:%user:%user,g:edw:platform > and I submit job with user "test" and if there is no "test" queue then it > should check the second mapping (g:edw:platform) in the list and if test is > part of edw group it should submit job in platform queue. > Below Sanity Checks has to be done for the mapped queue in the list and if it > fails then the the next queue mapping has to be chosen, when there is no > queue mapping passing the sanity check, only then the application has to be > Rejected. > 1. is queue present > 2. is queue not a leaf queue > 3. is user either have ACL Submit_Applications or Administer_Queue of the > queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4682) AMRM client to log when AMRM token updated
[ https://issues.apache.org/jira/browse/YARN-4682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15141162#comment-15141162 ] Prabhu Joseph commented on YARN-4682: - [~ste...@apache.org] Steve, do i need to checkout branch-2. The issue "No AMRMToken" happened on hadoop-2.4.1. So like you mentioned, the fix of YARN-3103 and YARN-2212 is missing there. I am doing a testing with the YARN-3103 fix, for every yarn.resourcemanager.am-rm-tokens.master-key-rolling-interval-secs, the AMRMToken gets updated. How to decrease the life time of a token, trying to simulate the issue again. > AMRM client to log when AMRM token updated > -- > > Key: YARN-4682 > URL: https://issues.apache.org/jira/browse/YARN-4682 > Project: Hadoop YARN > Issue Type: Improvement > Components: client >Affects Versions: 2.7.2 >Reporter: Steve Loughran > Attachments: YARN-4682.patch > > Original Estimate: 0.25h > Remaining Estimate: 0.25h > > There's no information right now as to when the AMRM token gets updated; if > something has gone wrong with the update, you can't tell when it last when > through. > fix: add a log statement. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4682) AMRM client to log when AMRM token updated
[ https://issues.apache.org/jira/browse/YARN-4682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated YARN-4682: Attachment: YARN-4682.patch.1 > AMRM client to log when AMRM token updated > -- > > Key: YARN-4682 > URL: https://issues.apache.org/jira/browse/YARN-4682 > Project: Hadoop YARN > Issue Type: Improvement > Components: client >Affects Versions: 2.7.2 >Reporter: Steve Loughran > Attachments: YARN-4682.patch, YARN-4682.patch.1 > > Original Estimate: 0.25h > Remaining Estimate: 0.25h > > There's no information right now as to when the AMRM token gets updated; if > something has gone wrong with the update, you can't tell when it last when > through. > fix: add a log statement. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4682) AMRM client to log when AMRM token updated
[ https://issues.apache.org/jira/browse/YARN-4682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15142301#comment-15142301 ] Prabhu Joseph commented on YARN-4682: - git checkout branch-2 git diff > YARN-4682.patch.1 But I am not seeing any difference with the previous patch. > AMRM client to log when AMRM token updated > -- > > Key: YARN-4682 > URL: https://issues.apache.org/jira/browse/YARN-4682 > Project: Hadoop YARN > Issue Type: Improvement > Components: client >Affects Versions: 2.7.2 >Reporter: Steve Loughran > Attachments: YARN-4682.patch, YARN-4682.patch.1 > > Original Estimate: 0.25h > Remaining Estimate: 0.25h > > There's no information right now as to when the AMRM token gets updated; if > something has gone wrong with the update, you can't tell when it last when > through. > fix: add a log statement. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4682) AMRM client to log when AMRM token updated
[ https://issues.apache.org/jira/browse/YARN-4682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15144490#comment-15144490 ] Prabhu Joseph commented on YARN-4682: - Thanks Steve > AMRM client to log when AMRM token updated > -- > > Key: YARN-4682 > URL: https://issues.apache.org/jira/browse/YARN-4682 > Project: Hadoop YARN > Issue Type: Improvement > Components: client >Affects Versions: 2.7.2 >Reporter: Steve Loughran > Attachments: YARN-4682-002.patch, YARN-4682.patch, YARN-4682.patch.1 > > Original Estimate: 0.25h > Remaining Estimate: 0.25h > > There's no information right now as to when the AMRM token gets updated; if > something has gone wrong with the update, you can't tell when it last when > through. > fix: add a log statement. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4682) AMRM client to log when AMRM token updated
[ https://issues.apache.org/jira/browse/YARN-4682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated YARN-4682: Attachment: YARN-4682.patch > AMRM client to log when AMRM token updated > -- > > Key: YARN-4682 > URL: https://issues.apache.org/jira/browse/YARN-4682 > Project: Hadoop YARN > Issue Type: Improvement > Components: client >Affects Versions: 2.7.2 >Reporter: Steve Loughran > Attachments: YARN-4682.patch > > Original Estimate: 0.25h > Remaining Estimate: 0.25h > > There's no information right now as to when the AMRM token gets updated; if > something has gone wrong with the update, you can't tell when it last when > through. > fix: add a log statement. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4682) AMRM client to log when AMRM token updated
[ https://issues.apache.org/jira/browse/YARN-4682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15140720#comment-15140720 ] Prabhu Joseph commented on YARN-4682: - [~ste...@apache.org] Added a info log. > AMRM client to log when AMRM token updated > -- > > Key: YARN-4682 > URL: https://issues.apache.org/jira/browse/YARN-4682 > Project: Hadoop YARN > Issue Type: Improvement > Components: client >Affects Versions: 2.7.2 >Reporter: Steve Loughran > Attachments: YARN-4682.patch > > Original Estimate: 0.25h > Remaining Estimate: 0.25h > > There's no information right now as to when the AMRM token gets updated; if > something has gone wrong with the update, you can't tell when it last when > through. > fix: add a log statement. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4730) YARN preemption based on instantaneous fair share
Prabhu Joseph created YARN-4730: --- Summary: YARN preemption based on instantaneous fair share Key: YARN-4730 URL: https://issues.apache.org/jira/browse/YARN-4730 Project: Hadoop YARN Issue Type: Bug Reporter: Prabhu Joseph On a big cluster with Total Cluster Resource of 10TB, 3000 cores and Fair Sheduler having 230 queues and total 6 jobs run a day. [ all 230 queues are very critical and hence the minResource is same for all]. On this case, when a Spark Job is run on queue A and which occupies the entire cluster resource and does not release any resource, another job submitted into queue B and preemption is getting only the Fair Share which is <10TB , 3000> / 230 = <45 GB , 13 cores> which is very less fair share for a queue.shared by many applications. The Preemption should get the instantaneous fair Share, that is <10TB, 3000> / 2 (active queues) = 5TB and 1500 cores, so that the first job won't hog the entire cluster resource and also the subsequent jobs run fine. This issue is only when the number of queues are very high. In case of less number of queues, Preemption getting Fair Share would be suffice as the fair share will be high. But in case of too many number of queues, Preemption should try to get the instantaneous Fair Share. Note: Configuring optimal maxResources to 230 queues is difficult and also putting constraint for the queues using maxResource will leave cluster resource idle most of the time. There are 1000s of Spark Jobs, so asking each user to restrict the number of executors is also difficult. Preempting Instantaneous Fair Share will help to overcome the above issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-4730) YARN preemption based on instantaneous fair share
[ https://issues.apache.org/jira/browse/YARN-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph resolved YARN-4730. - Resolution: Duplicate YARN-2026 > YARN preemption based on instantaneous fair share > - > > Key: YARN-4730 > URL: https://issues.apache.org/jira/browse/YARN-4730 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Reporter: Prabhu Joseph > > On a big cluster with Total Cluster Resource of 10TB, 3000 cores and Fair > Sheduler having 230 queues and total 6 jobs run a day. [ all 230 queues > are very critical and hence the minResource is same for all]. On this case, > when a Spark Job is run on queue A and which occupies the entire cluster > resource and does not release any resource, another job submitted into queue > B and preemption is getting only the Fair Share which is <10TB , 3000> / 230 > = <45 GB , 13 cores> which is very less fair share for a queue.shared by many > applications. > The Preemption should get the instantaneous fair Share, that is <10TB, 3000> > / 2 (active queues) = 5TB and 1500 cores, so that the first job won't hog the > entire cluster resource and also the subsequent jobs run fine. > This issue is only when the number of queues are very high. In case of less > number of queues, Preemption getting Fair Share would be suffice as the fair > share will be high. But in case of too many number of queues, Preemption > should try to get the instantaneous Fair Share. > Note: Configuring optimal maxResources to 230 queues is difficult and also > putting constraint for the queues using maxResource will leave cluster > resource idle most of the time. > There are 1000s of Spark Jobs, so asking each user to restrict the > number of executors is also difficult. > Preempting Instantaneous Fair Share will help to overcome the above issues. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-5295) YARN queue-mappings to check Queue is present before submitting job
[ https://issues.apache.org/jira/browse/YARN-5295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15358899#comment-15358899 ] Prabhu Joseph commented on YARN-5295: - Hi [~sunilg], In addition, we also need to include below code snippet in UserGroupMappingPlacementRule#getMappedQueue before returning the mapped queue, which will return a valid queue which is a existent leaf queue. {code} for (QueueMapping mapping : mappings) { if (queue == null || !(queue instanceof LeafQueue)) { continue; } return queue; } {code} > YARN queue-mappings to check Queue is present before submitting job > --- > > Key: YARN-5295 > URL: https://issues.apache.org/jira/browse/YARN-5295 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Affects Versions: 2.7.2 >Reporter: Prabhu Joseph > > In yarn Queue-Mappings, Yarn should check if the queue is present before > submitting the job. If not present it should go to next mapping available. > For example if we have > yarn.scheduler.capacity.queue-mappings=u:%user:%user,g:edw:platform > and I submit job with user "test" and if there is no "test" queue then it > should check the second mapping (g:edw:platform) in the list and if test is > part of edw group it should submit job in platform queue. > Below Sanity checks has to be done for the mapped queue in the list and if it > fails then the the next queue mapping has to be chosen, when there is no > queue mapping passing the sanity check, only then the application has to be > Rejected. > 1. is queue present > 2. is queue not a leaf queue > 3. is user either have ACL Submit_Applications or Administer_Queue of the > queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-5933) ATS stale entries in active directory causes ApplicationNotFoundException in RM
Prabhu Joseph created YARN-5933: --- Summary: ATS stale entries in active directory causes ApplicationNotFoundException in RM Key: YARN-5933 URL: https://issues.apache.org/jira/browse/YARN-5933 Project: Hadoop YARN Issue Type: Bug Components: ATSv2 Affects Versions: 2.7.3 Reporter: Prabhu Joseph On Secure cluster where ATS is down, Tez job submitted will fail while getting TIMELINE_DELEGATION_TOKEN with below exception {code} 0: jdbc:hive2://kerberos-2.openstacklocal:100> select csmallint from alltypesorc group by csmallint; INFO : Session is already open INFO : Dag name: select csmallint from alltypesor...csmallint(Stage-1) INFO : Tez session was closed. Reopening... ERROR : Failed to execute tez graph. java.lang.RuntimeException: Failed to connect to timeline server. Connection retries limit exceeded. The posted timeline event may be missing at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:266) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.operateDelegationToken(TimelineClientImpl.java:590) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.getDelegationToken(TimelineClientImpl.java:506) at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getTimelineDelegationToken(YarnClientImpl.java:349) at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.addTimelineDelegationToken(YarnClientImpl.java:330) at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:250) at org.apache.tez.client.TezYarnClient.submitApplication(TezYarnClient.java:72) at org.apache.tez.client.TezClient.start(TezClient.java:409) at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:196) at org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager.closeAndOpen(TezSessionPoolManager.java:311) at org.apache.hadoop.hive.ql.exec.tez.TezTask.submit(TezTask.java:453) at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:180) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:89) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1728) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1485) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1262) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1126) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1121) at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:154) at org.apache.hive.service.cli.operation.SQLOperation.access$100(SQLOperation.java:71) at org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:206) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709) at org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:218) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {code} Tez YarnClient has received an applicationID from RM. On Restarting ATS now, ATS tries to get the application report from RM and so RM will throw ApplicationNotFoundException. ATS will keep on requesting and which floods RM. {code} RM logs: 2016-11-23 13:53:57,345 INFO org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Allocated new applicationId: 5 2016-11-23 14:05:04,936 INFO org.apache.hadoop.ipc.Server: IPC Server handler 9 on 8050, call org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport from 172.26.71.120:37699 Call#26 Retry#0 org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application with id 'application_1479897867169_0005' doesn't exist in RM. at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:328) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:175) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:417) at
[jira] [Assigned] (YARN-5933) ATS stale entries in active directory causes ApplicationNotFoundException in RM
[ https://issues.apache.org/jira/browse/YARN-5933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph reassigned YARN-5933: --- Assignee: Prabhu Joseph > ATS stale entries in active directory causes ApplicationNotFoundException in > RM > --- > > Key: YARN-5933 > URL: https://issues.apache.org/jira/browse/YARN-5933 > Project: Hadoop YARN > Issue Type: Bug > Components: ATSv2 >Affects Versions: 2.7.3 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph > > On Secure cluster where ATS is down, Tez job submitted will fail while > getting TIMELINE_DELEGATION_TOKEN with below exception > {code} > 0: jdbc:hive2://kerberos-2.openstacklocal:100> select csmallint from > alltypesorc group by csmallint; > INFO : Session is already open > INFO : Dag name: select csmallint from alltypesor...csmallint(Stage-1) > INFO : Tez session was closed. Reopening... > ERROR : Failed to execute tez graph. > java.lang.RuntimeException: Failed to connect to timeline server. Connection > retries limit exceeded. The posted timeline event may be missing > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:266) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.operateDelegationToken(TimelineClientImpl.java:590) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.getDelegationToken(TimelineClientImpl.java:506) > at > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getTimelineDelegationToken(YarnClientImpl.java:349) > at > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.addTimelineDelegationToken(YarnClientImpl.java:330) > at > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:250) > at > org.apache.tez.client.TezYarnClient.submitApplication(TezYarnClient.java:72) > at org.apache.tez.client.TezClient.start(TezClient.java:409) > at > org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:196) > at > org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager.closeAndOpen(TezSessionPoolManager.java:311) > at org.apache.hadoop.hive.ql.exec.tez.TezTask.submit(TezTask.java:453) > at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:180) > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160) > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:89) > at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1728) > at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1485) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1262) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1126) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1121) > at > org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:154) > at > org.apache.hive.service.cli.operation.SQLOperation.access$100(SQLOperation.java:71) > at > org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:206) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709) > at > org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:218) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {code} > Tez YarnClient has received an applicationID from RM. On Restarting ATS now, > ATS tries to get the application report from RM and so RM will throw > ApplicationNotFoundException. ATS will keep on requesting and which floods RM. > {code} > RM logs: > 2016-11-23 13:53:57,345 INFO > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Allocated new > applicationId: 5 > 2016-11-23 14:05:04,936 INFO org.apache.hadoop.ipc.Server: IPC Server handler > 9 on 8050, call > org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport > from 172.26.71.120:37699 Call#26 Retry#0 > org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application > with id 'application_1479897867169_0005' doesn't exist in RM. > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:328) > at >
[jira] [Commented] (YARN-5933) ATS stale entries in active directory causes ApplicationNotFoundException in RM
[ https://issues.apache.org/jira/browse/YARN-5933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15698124#comment-15698124 ] Prabhu Joseph commented on YARN-5933: - Hi [~sunilg] [~gtCarrera9], Below are some of the ways to fix this issue assuming an application which is not found in RM at first getApplicationReport call will never be one of APP_FINAL_STATES at subsequent getApplicationReport call. 1. Once the AppState is Unknown, the appDir can be removed from ActivePath immediately. Not sure why there is a wait of unknownActiveMillis and then app marked as completed. If we choose removal of appDir immediately, then there won't be any need for unknownActiveMillis handling code. 2. If there is a need to move unknown state app also to done directory, then the appDir can be moved immediately before waiting for unknownActiveMillis Please share your comments. > ATS stale entries in active directory causes ApplicationNotFoundException in > RM > --- > > Key: YARN-5933 > URL: https://issues.apache.org/jira/browse/YARN-5933 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.3 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph > > On Secure cluster where ATS is down, Tez job submitted will fail while > getting TIMELINE_DELEGATION_TOKEN with below exception > {code} > 0: jdbc:hive2://kerberos-2.openstacklocal:100> select csmallint from > alltypesorc group by csmallint; > INFO : Session is already open > INFO : Dag name: select csmallint from alltypesor...csmallint(Stage-1) > INFO : Tez session was closed. Reopening... > ERROR : Failed to execute tez graph. > java.lang.RuntimeException: Failed to connect to timeline server. Connection > retries limit exceeded. The posted timeline event may be missing > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:266) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.operateDelegationToken(TimelineClientImpl.java:590) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.getDelegationToken(TimelineClientImpl.java:506) > at > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getTimelineDelegationToken(YarnClientImpl.java:349) > at > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.addTimelineDelegationToken(YarnClientImpl.java:330) > at > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:250) > at > org.apache.tez.client.TezYarnClient.submitApplication(TezYarnClient.java:72) > at org.apache.tez.client.TezClient.start(TezClient.java:409) > at > org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:196) > at > org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager.closeAndOpen(TezSessionPoolManager.java:311) > at org.apache.hadoop.hive.ql.exec.tez.TezTask.submit(TezTask.java:453) > at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:180) > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160) > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:89) > at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1728) > at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1485) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1262) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1126) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1121) > at > org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:154) > at > org.apache.hive.service.cli.operation.SQLOperation.access$100(SQLOperation.java:71) > at > org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:206) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709) > at > org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:218) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {code} > Tez YarnClient has received an applicationID from RM. On Restarting ATS now, > ATS tries to get the application report from RM and so RM will throw > ApplicationNotFoundException. ATS will keep on requesting and which floods
[jira] [Commented] (YARN-5933) ATS stale entries in active directory causes ApplicationNotFoundException in RM
[ https://issues.apache.org/jira/browse/YARN-5933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15704980#comment-15704980 ] Prabhu Joseph commented on YARN-5933: - Thanks [~gtCarrera9], looks not a simple one to directly remove unknown appDir. Assume there are 10 tez jobs failed when ATS is down, then there will be 10 * unknownActiveSecs / scanIntervalSecs = 14400 ApplicationNotFoundException stacktrace will be in RM throughout that entire day logs. If there is no impact other than flooding of RM logs, is it better to change the ApplicationNotFoundException stacktrace into a single WARN message. > ATS stale entries in active directory causes ApplicationNotFoundException in > RM > --- > > Key: YARN-5933 > URL: https://issues.apache.org/jira/browse/YARN-5933 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.3 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph > > On Secure cluster where ATS is down, Tez job submitted will fail while > getting TIMELINE_DELEGATION_TOKEN with below exception > {code} > 0: jdbc:hive2://kerberos-2.openstacklocal:100> select csmallint from > alltypesorc group by csmallint; > INFO : Session is already open > INFO : Dag name: select csmallint from alltypesor...csmallint(Stage-1) > INFO : Tez session was closed. Reopening... > ERROR : Failed to execute tez graph. > java.lang.RuntimeException: Failed to connect to timeline server. Connection > retries limit exceeded. The posted timeline event may be missing > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:266) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.operateDelegationToken(TimelineClientImpl.java:590) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.getDelegationToken(TimelineClientImpl.java:506) > at > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getTimelineDelegationToken(YarnClientImpl.java:349) > at > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.addTimelineDelegationToken(YarnClientImpl.java:330) > at > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:250) > at > org.apache.tez.client.TezYarnClient.submitApplication(TezYarnClient.java:72) > at org.apache.tez.client.TezClient.start(TezClient.java:409) > at > org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:196) > at > org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager.closeAndOpen(TezSessionPoolManager.java:311) > at org.apache.hadoop.hive.ql.exec.tez.TezTask.submit(TezTask.java:453) > at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:180) > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160) > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:89) > at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1728) > at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1485) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1262) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1126) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1121) > at > org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:154) > at > org.apache.hive.service.cli.operation.SQLOperation.access$100(SQLOperation.java:71) > at > org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:206) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709) > at > org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:218) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {code} > Tez YarnClient has received an applicationID from RM. On Restarting ATS now, > ATS tries to get the application report from RM and so RM will throw > ApplicationNotFoundException. ATS will keep on requesting and which floods RM. > {code} > RM logs: > 2016-11-23 13:53:57,345 INFO > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Allocated new > applicationId: 5 > 2016-11-23 14:05:04,936 INFO org.apache.hadoop.ipc.Server: IPC Server handler > 9 on 8050, call >
[jira] [Commented] (YARN-5933) ATS stale entries in active directory causes ApplicationNotFoundException in RM
[ https://issues.apache.org/jira/browse/YARN-5933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15708237#comment-15708237 ] Prabhu Joseph commented on YARN-5933: - Hi [~gtCarrera9] Okay, I think AppLogs#parseSummaryLogs() can skip subsequent getAppState for Unknown apps and move them to complete after unknownActiveSecs. > ATS stale entries in active directory causes ApplicationNotFoundException in > RM > --- > > Key: YARN-5933 > URL: https://issues.apache.org/jira/browse/YARN-5933 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.3 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph > > On Secure cluster where ATS is down, Tez job submitted will fail while > getting TIMELINE_DELEGATION_TOKEN with below exception > {code} > 0: jdbc:hive2://kerberos-2.openstacklocal:100> select csmallint from > alltypesorc group by csmallint; > INFO : Session is already open > INFO : Dag name: select csmallint from alltypesor...csmallint(Stage-1) > INFO : Tez session was closed. Reopening... > ERROR : Failed to execute tez graph. > java.lang.RuntimeException: Failed to connect to timeline server. Connection > retries limit exceeded. The posted timeline event may be missing > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:266) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.operateDelegationToken(TimelineClientImpl.java:590) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.getDelegationToken(TimelineClientImpl.java:506) > at > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getTimelineDelegationToken(YarnClientImpl.java:349) > at > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.addTimelineDelegationToken(YarnClientImpl.java:330) > at > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:250) > at > org.apache.tez.client.TezYarnClient.submitApplication(TezYarnClient.java:72) > at org.apache.tez.client.TezClient.start(TezClient.java:409) > at > org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:196) > at > org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager.closeAndOpen(TezSessionPoolManager.java:311) > at org.apache.hadoop.hive.ql.exec.tez.TezTask.submit(TezTask.java:453) > at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:180) > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160) > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:89) > at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1728) > at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1485) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1262) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1126) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1121) > at > org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:154) > at > org.apache.hive.service.cli.operation.SQLOperation.access$100(SQLOperation.java:71) > at > org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:206) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709) > at > org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:218) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {code} > Tez YarnClient has received an applicationID from RM. On Restarting ATS now, > ATS tries to get the application report from RM and so RM will throw > ApplicationNotFoundException. ATS will keep on requesting and which floods RM. > {code} > RM logs: > 2016-11-23 13:53:57,345 INFO > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Allocated new > applicationId: 5 > 2016-11-23 14:05:04,936 INFO org.apache.hadoop.ipc.Server: IPC Server handler > 9 on 8050, call > org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport > from 172.26.71.120:37699 Call#26 Retry#0 > org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application > with id 'application_1479897867169_0005' doesn't exist in RM. > at >
[jira] [Updated] (YARN-6052) Yarn RM UI % of Queue at application level is wrong
[ https://issues.apache.org/jira/browse/YARN-6052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated YARN-6052: Attachment: RM_UI.png > Yarn RM UI % of Queue at application level is wrong > --- > > Key: YARN-6052 > URL: https://issues.apache.org/jira/browse/YARN-6052 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.3 >Reporter: Prabhu Joseph >Priority: Minor > Attachments: RM_UI.png > > > Test Case: > yarn.scheduler.capacity.root.capacity=100 > yarn.scheduler.capacity.root.queues=default,dummy > yarn.scheduler.capacity.root.default.capacity=20 > yarn.scheduler.capacity.root.dummy.capacity=80 > yarn.scheduler.capacity.root.dummy.child.capacity=50 > yarn.scheduler.capacity.root.dummy.child2.capacity=50 > Memory Total is 20GB, default queue share is 4GB and dummy queue share is > 16GB. Child and Child1 queue gets 8GB share each. > A map reduce job is submitted to child2 queue which asks 2 containers of 512 > MB. Now cluster Memory Used is 1GB. > Root queue usage = 100 / (total memory / used memory) = 100 / (20 / 1) = 5% > Dummy queue usage = 100 / (16 /1) = 6.3% > Dummy.Child2 queue usage = 100 / (8/1) = 12.5% > At application level, % of queue is calculated as 100 / (50% of root queue > capacity) = 100 / (50% of 20GB) = 10.0 instead of > 100 / (50% of dummy queue capacity) = 100 / (50% of 16GB) = 100 / 8 = 12.5 > Where 50% is dummy.child2 capacity > Attached RM UI screenshot. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-6052) Yarn RM UI % of Queue at application level is wrong
[ https://issues.apache.org/jira/browse/YARN-6052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph reassigned YARN-6052: --- Assignee: Prabhu Joseph > Yarn RM UI % of Queue at application level is wrong > --- > > Key: YARN-6052 > URL: https://issues.apache.org/jira/browse/YARN-6052 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.3 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Minor > Attachments: RM_UI.png > > > Test Case: > yarn.scheduler.capacity.root.capacity=100 > yarn.scheduler.capacity.root.queues=default,dummy > yarn.scheduler.capacity.root.default.capacity=20 > yarn.scheduler.capacity.root.dummy.capacity=80 > yarn.scheduler.capacity.root.dummy.child.capacity=50 > yarn.scheduler.capacity.root.dummy.child2.capacity=50 > Memory Total is 20GB, default queue share is 4GB and dummy queue share is > 16GB. Child and Child1 queue gets 8GB share each. > A map reduce job is submitted to child2 queue which asks 2 containers of 512 > MB. Now cluster Memory Used is 1GB. > Root queue usage = 100 / (total memory / used memory) = 100 / (20 / 1) = 5% > Dummy queue usage = 100 / (16 /1) = 6.3% > Dummy.Child2 queue usage = 100 / (8/1) = 12.5% > At application level, % of queue is calculated as 100 / (50% of root queue > capacity) = 100 / (50% of 20GB) = 10.0 instead of > 100 / (50% of dummy queue capacity) = 100 / (50% of 16GB) = 100 / 8 = 12.5 > Where 50% is dummy.child2 capacity > Attached RM UI screenshot. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-6053) RM Web Service shows startedTime , finsihedTime as zero when RM is kerberized and ACL is setup
Prabhu Joseph created YARN-6053: --- Summary: RM Web Service shows startedTime , finsihedTime as zero when RM is kerberized and ACL is setup Key: YARN-6053 URL: https://issues.apache.org/jira/browse/YARN-6053 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.3 Reporter: Prabhu Joseph Priority: Minor RM UI is Kerberized and ACL is setup, a user pjoseph has logged into RM UI and able to see the other user prabhu job’s startTime and finishTime but won’t be able to read the attempts of the application which is expected as ACL is setup. But on using RM Web Services, http://kerberos-3.openstacklocal:8088/ws/v1/cluster/apps/application_1482325548661_0002 the startedTime, finishedTime and elapsedTime are 0 [AppInfo.java sets this to zero if user does not have access]. We can display the correct values as anyway RM UI shows them. Attached output of RM UI and RM WebService. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-6053) RM Web Service shows startedTime , finsihedTime as zero when RM is kerberized and ACL is setup
[ https://issues.apache.org/jira/browse/YARN-6053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph reassigned YARN-6053: --- Assignee: Prabhu Joseph > RM Web Service shows startedTime , finsihedTime as zero when RM is kerberized > and ACL is setup > -- > > Key: YARN-6053 > URL: https://issues.apache.org/jira/browse/YARN-6053 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.3 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Minor > Attachments: RM_UI_ACL.png, RM_UI_start_stop.png, > RM_WEB_SERVICE_start_stop.png > > > RM UI is Kerberized and ACL is setup, a user pjoseph has logged into RM UI > and able to see the other user prabhu job’s startTime and finishTime but > won’t be able to read the attempts of the application which is expected as > ACL is setup. But on using RM Web Services, > http://kerberos-3.openstacklocal:8088/ws/v1/cluster/apps/application_1482325548661_0002 > the startedTime, > finishedTime and elapsedTime are 0 [AppInfo.java sets this to zero if user > does not have access]. We can display the correct values as anyway RM UI > shows them. > Attached output of RM UI and RM WebService. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6053) RM Web Service shows startedTime , finsihedTime as zero when RM is kerberized and ACL is setup
[ https://issues.apache.org/jira/browse/YARN-6053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated YARN-6053: Attachment: RM_UI_ACL.png RM_WEB_SERVICE_start_stop.png RM_UI_start_stop.png > RM Web Service shows startedTime , finsihedTime as zero when RM is kerberized > and ACL is setup > -- > > Key: YARN-6053 > URL: https://issues.apache.org/jira/browse/YARN-6053 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.3 >Reporter: Prabhu Joseph >Priority: Minor > Attachments: RM_UI_ACL.png, RM_UI_start_stop.png, > RM_WEB_SERVICE_start_stop.png > > > RM UI is Kerberized and ACL is setup, a user pjoseph has logged into RM UI > and able to see the other user prabhu job’s startTime and finishTime but > won’t be able to read the attempts of the application which is expected as > ACL is setup. But on using RM Web Services, > http://kerberos-3.openstacklocal:8088/ws/v1/cluster/apps/application_1482325548661_0002 > the startedTime, > finishedTime and elapsedTime are 0 [AppInfo.java sets this to zero if user > does not have access]. We can display the correct values as anyway RM UI > shows them. > Attached output of RM UI and RM WebService. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-6052) Yarn RM UI % of Queue at application level is wrong
Prabhu Joseph created YARN-6052: --- Summary: Yarn RM UI % of Queue at application level is wrong Key: YARN-6052 URL: https://issues.apache.org/jira/browse/YARN-6052 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.3 Reporter: Prabhu Joseph Priority: Minor Test Case: yarn.scheduler.capacity.root.capacity=100 yarn.scheduler.capacity.root.queues=default,dummy yarn.scheduler.capacity.root.default.capacity=20 yarn.scheduler.capacity.root.dummy.capacity=80 yarn.scheduler.capacity.root.dummy.child.capacity=50 yarn.scheduler.capacity.root.dummy.child2.capacity=50 Memory Total is 20GB, default queue share is 4GB and dummy queue share is 16GB. Child and Child1 queue gets 8GB share each. A map reduce job is submitted to child2 queue which asks 2 containers of 512 MB. Now cluster Memory Used is 1GB. Root queue usage = 100 / (total memory / used memory) = 100 / (20 / 1) = 5% Dummy queue usage = 100 / (16 /1) = 6.3% Dummy.Child2 queue usage = 100 / (8/1) = 12.5% At application level, % of queue is calculated as 100 / (50% of root queue capacity) = 100 / (50% of 20GB) = 10.0 instead of 100 / (50% of dummy queue capacity) = 100 / (50% of 16GB) = 100 / 8 = 12.5 Where 50% is dummy.child2 capacity Attached RM UI screenshot. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6075) Yarn top for FairScheduler
[ https://issues.apache.org/jira/browse/YARN-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated YARN-6075: Attachment: Yarn_Top_FairScheduler.png > Yarn top for FairScheduler > -- > > Key: YARN-6075 > URL: https://issues.apache.org/jira/browse/YARN-6075 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler, resourcemanager >Reporter: Prabhu Joseph > Attachments: Yarn_Top_FairScheduler.png > > > Yarn top output for FairScheduler shows empty values. (attached output) We > need to handle yarn top with FairScheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-6075) Yarn top for FairScheduler
Prabhu Joseph created YARN-6075: --- Summary: Yarn top for FairScheduler Key: YARN-6075 URL: https://issues.apache.org/jira/browse/YARN-6075 Project: Hadoop YARN Issue Type: Improvement Components: fairscheduler, resourcemanager Reporter: Prabhu Joseph Yarn top output for FairScheduler shows empty values. (attached output) We need to handle yarn top with FairScheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6052) Yarn RM UI % of Queue at application level is wrong
[ https://issues.apache.org/jira/browse/YARN-6052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15803853#comment-15803853 ] Prabhu Joseph commented on YARN-6052: - Sorry for the spam, the issue is already fixed by YARN-. Closing this as a Duplicate. > Yarn RM UI % of Queue at application level is wrong > --- > > Key: YARN-6052 > URL: https://issues.apache.org/jira/browse/YARN-6052 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.3 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Minor > Attachments: RM_UI.png > > > Test Case: > yarn.scheduler.capacity.root.capacity=100 > yarn.scheduler.capacity.root.queues=default,dummy > yarn.scheduler.capacity.root.default.capacity=20 > yarn.scheduler.capacity.root.dummy.capacity=80 > yarn.scheduler.capacity.root.dummy.child.capacity=50 > yarn.scheduler.capacity.root.dummy.child2.capacity=50 > Memory Total is 20GB, default queue share is 4GB and dummy queue share is > 16GB. Child and Child1 queue gets 8GB share each. > A map reduce job is submitted to child2 queue which asks 2 containers of 512 > MB. Now cluster Memory Used is 1GB. > Root queue usage = 100 / (total memory / used memory) = 100 / (20 / 1) = 5% > Dummy queue usage = 100 / (16 /1) = 6.3% > Dummy.Child2 queue usage = 100 / (8/1) = 12.5% > At application level, % of queue is calculated as 100 / (50% of root queue > capacity) = 100 / (50% of 20GB) = 10.0 instead of > 100 / (50% of dummy queue capacity) = 100 / (50% of 16GB) = 100 / 8 = 12.5 > Where 50% is dummy.child2 capacity > Attached RM UI screenshot. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-6052) Yarn RM UI % of Queue at application level is wrong
[ https://issues.apache.org/jira/browse/YARN-6052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph resolved YARN-6052. - Resolution: Duplicate > Yarn RM UI % of Queue at application level is wrong > --- > > Key: YARN-6052 > URL: https://issues.apache.org/jira/browse/YARN-6052 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.3 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Minor > Attachments: RM_UI.png > > > Test Case: > yarn.scheduler.capacity.root.capacity=100 > yarn.scheduler.capacity.root.queues=default,dummy > yarn.scheduler.capacity.root.default.capacity=20 > yarn.scheduler.capacity.root.dummy.capacity=80 > yarn.scheduler.capacity.root.dummy.child.capacity=50 > yarn.scheduler.capacity.root.dummy.child2.capacity=50 > Memory Total is 20GB, default queue share is 4GB and dummy queue share is > 16GB. Child and Child1 queue gets 8GB share each. > A map reduce job is submitted to child2 queue which asks 2 containers of 512 > MB. Now cluster Memory Used is 1GB. > Root queue usage = 100 / (total memory / used memory) = 100 / (20 / 1) = 5% > Dummy queue usage = 100 / (16 /1) = 6.3% > Dummy.Child2 queue usage = 100 / (8/1) = 12.5% > At application level, % of queue is calculated as 100 / (50% of root queue > capacity) = 100 / (50% of 20GB) = 10.0 instead of > 100 / (50% of dummy queue capacity) = 100 / (50% of 16GB) = 100 / 8 = 12.5 > Where 50% is dummy.child2 capacity > Attached RM UI screenshot. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6929) yarn.nodemanager.remote-app-log-dir structure is not scalable
[ https://issues.apache.org/jira/browse/YARN-6929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112071#comment-16112071 ] Prabhu Joseph commented on YARN-6929: - Date can be retrieved from the timestamp present in the application id while creating date subdirectory. So while scanning we will know which date subdirectory to check directly. The URL can remain the same. > yarn.nodemanager.remote-app-log-dir structure is not scalable > - > > Key: YARN-6929 > URL: https://issues.apache.org/jira/browse/YARN-6929 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation >Affects Versions: 2.7.3 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph > > The current directory structure for yarn.nodemanager.remote-app-log-dir is > not scalable. Maximum Subdirectory limit by default is 1048576 (HDFS-6102). > With retention yarn.nodemanager.log.retain-second of 7days, there are more > chances LogAggregationService fails to create a new directory with > FSLimitException$MaxDirectoryItemsExceededException. > The current structure is > //logs/. This can be > improved with adding date as a subdirectory like > //logs// > {code} > WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService: > Application failed to init aggregation > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.FSLimitException$MaxDirectoryItemsExceededException): > The directory item limit of /app-logs/yarn/logs is exceeded: limit=1048576 > items=1048576 > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxDirItems(FSDirectory.java:2021) > > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:2072) > > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedMkdir(FSDirectory.java:1841) > > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsRecursively(FSNamesystem.java:4351) > > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4262) > > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4221) > > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4194) > > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:813) > > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:600) > > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) > > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.createAppDir(LogAggregationService.java:308) > > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initAppAggregator(LogAggregationService.java:366) > > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initApp(LogAggregationService.java:320) > > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:443) > > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:67) > > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) > > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) > at java.lang.Thread.run(Thread.java:745) > Caused by: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.FSLimitException$MaxDirectoryItemsExceededException): > The directory item limit of /app-logs/yarn/logs is exceeded: limit=1048576 > items=1048576 > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxDirItems(FSDirectory.java:2021) > > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:2072) > > at >
[jira] [Updated] (YARN-6929) yarn.nodemanager.remote-app-log-dir structure is not scalable
[ https://issues.apache.org/jira/browse/YARN-6929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated YARN-6929: Description: The current directory structure for yarn.nodemanager.remote-app-log-dir is not scalable. Maximum Subdirectory limit by default is 1048576 (HDFS-6102). With retention yarn.log-aggregation.retain-seconds of 7days, there are more chances LogAggregationService fails to create a new directory with FSLimitException$MaxDirectoryItemsExceededException. The current structure is //logs/. This can be improved with adding date as a subdirectory like //logs// {code} WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService: Application failed to init aggregation org.apache.hadoop.yarn.exceptions.YarnRuntimeException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.FSLimitException$MaxDirectoryItemsExceededException): The directory item limit of /app-logs/yarn/logs is exceeded: limit=1048576 items=1048576 at org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxDirItems(FSDirectory.java:2021) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:2072) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedMkdir(FSDirectory.java:1841) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsRecursively(FSNamesystem.java:4351) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4262) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4221) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4194) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:813) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:600) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.createAppDir(LogAggregationService.java:308) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initAppAggregator(LogAggregationService.java:366) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initApp(LogAggregationService.java:320) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:443) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:67) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.FSLimitException$MaxDirectoryItemsExceededException): The directory item limit of /app-logs/yarn/logs is exceeded: limit=1048576 items=1048576 at org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxDirItems(FSDirectory.java:2021) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:2072) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedMkdir(FSDirectory.java:1841) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsRecursively(FSNamesystem.java:4351) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4262) {code} Thanks to Robert Mancuso for finding this issue. was: The current directory structure for yarn.nodemanager.remote-app-log-dir is not scalable. Maximum Subdirectory limit by default is 1048576 (HDFS-6102). With retention yarn.nodemanager.log.retain-second of 7days, there are more chances LogAggregationService fails to create a new directory with FSLimitException$MaxDirectoryItemsExceededException. The current structure is //logs/. This can be improved with adding date as a subdirectory like //logs// {code} WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService:
[jira] [Commented] (YARN-6929) yarn.nodemanager.remote-app-log-dir structure is not scalable
[ https://issues.apache.org/jira/browse/YARN-6929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113232#comment-16113232 ] Prabhu Joseph commented on YARN-6929: - Yes got it. I think Max Bucket Size can be derived from yarn.log-aggregation.retain-seconds (in days) say, yarn.log-aggregation.retain-seconds (in days) * 24 and so it will scale with any number of configured retention period. Else an optimal max bucket size for 7 days retention won;t be for 30 days. And why we need two sub directories (app_id/ bucket_size) and (app_id%bucket_size). I think below itself should solve. {code} aggregation_log_root / user / cluster_timestamp / (app_id%bucket_size) where bucket_size determined from yarn.log-aggregation.retain-seconds (in days) * 24 {code} > yarn.nodemanager.remote-app-log-dir structure is not scalable > - > > Key: YARN-6929 > URL: https://issues.apache.org/jira/browse/YARN-6929 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation >Affects Versions: 2.7.3 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph > > The current directory structure for yarn.nodemanager.remote-app-log-dir is > not scalable. Maximum Subdirectory limit by default is 1048576 (HDFS-6102). > With retention yarn.log-aggregation.retain-seconds of 7days, there are more > chances LogAggregationService fails to create a new directory with > FSLimitException$MaxDirectoryItemsExceededException. > The current structure is > //logs/. This can be > improved with adding date as a subdirectory like > //logs// > {code} > WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService: > Application failed to init aggregation > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.FSLimitException$MaxDirectoryItemsExceededException): > The directory item limit of /app-logs/yarn/logs is exceeded: limit=1048576 > items=1048576 > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxDirItems(FSDirectory.java:2021) > > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:2072) > > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedMkdir(FSDirectory.java:1841) > > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsRecursively(FSNamesystem.java:4351) > > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4262) > > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4221) > > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4194) > > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:813) > > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:600) > > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) > > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.createAppDir(LogAggregationService.java:308) > > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initAppAggregator(LogAggregationService.java:366) > > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initApp(LogAggregationService.java:320) > > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:443) > > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:67) > > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) > > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) > at java.lang.Thread.run(Thread.java:745) > Caused by: >
[jira] [Commented] (YARN-6929) yarn.nodemanager.remote-app-log-dir structure is not scalable
[ https://issues.apache.org/jira/browse/YARN-6929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113176#comment-16113176 ] Prabhu Joseph commented on YARN-6929: - Thanks, missed it. Hash can be generated from ApplicationID#getId() with yarn.log-aggregation.retain-seconds * 24 buckets (Hoping 1 hour will have less than millions of apps). This way random read and write of appDir is possible. Deletion Service will traverse on these hashDirs for every userDir. > yarn.nodemanager.remote-app-log-dir structure is not scalable > - > > Key: YARN-6929 > URL: https://issues.apache.org/jira/browse/YARN-6929 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation >Affects Versions: 2.7.3 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph > > The current directory structure for yarn.nodemanager.remote-app-log-dir is > not scalable. Maximum Subdirectory limit by default is 1048576 (HDFS-6102). > With retention yarn.log-aggregation.retain-seconds of 7days, there are more > chances LogAggregationService fails to create a new directory with > FSLimitException$MaxDirectoryItemsExceededException. > The current structure is > //logs/. This can be > improved with adding date as a subdirectory like > //logs// > {code} > WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService: > Application failed to init aggregation > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.FSLimitException$MaxDirectoryItemsExceededException): > The directory item limit of /app-logs/yarn/logs is exceeded: limit=1048576 > items=1048576 > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxDirItems(FSDirectory.java:2021) > > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:2072) > > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedMkdir(FSDirectory.java:1841) > > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsRecursively(FSNamesystem.java:4351) > > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4262) > > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4221) > > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4194) > > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:813) > > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:600) > > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) > > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.createAppDir(LogAggregationService.java:308) > > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initAppAggregator(LogAggregationService.java:366) > > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initApp(LogAggregationService.java:320) > > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:443) > > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:67) > > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) > > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) > at java.lang.Thread.run(Thread.java:745) > Caused by: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.FSLimitException$MaxDirectoryItemsExceededException): > The directory item limit of /app-logs/yarn/logs is exceeded: limit=1048576 > items=1048576 > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxDirItems(FSDirectory.java:2021) > > at >
[jira] [Commented] (YARN-6929) yarn.nodemanager.remote-app-log-dir structure is not scalable
[ https://issues.apache.org/jira/browse/YARN-6929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113858#comment-16113858 ] Prabhu Joseph commented on YARN-6929: - Yes clear now. {code} aggregation_log_root / user / cluster_timestamp / (app_id/ bucket_size) where bucket_size = DFS_NAMENODE_MAX_DIRECTORY_ITEMS_KEY {code} > yarn.nodemanager.remote-app-log-dir structure is not scalable > - > > Key: YARN-6929 > URL: https://issues.apache.org/jira/browse/YARN-6929 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation >Affects Versions: 2.7.3 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph > > The current directory structure for yarn.nodemanager.remote-app-log-dir is > not scalable. Maximum Subdirectory limit by default is 1048576 (HDFS-6102). > With retention yarn.log-aggregation.retain-seconds of 7days, there are more > chances LogAggregationService fails to create a new directory with > FSLimitException$MaxDirectoryItemsExceededException. > The current structure is > //logs/. This can be > improved with adding date as a subdirectory like > //logs// > {code} > WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService: > Application failed to init aggregation > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.FSLimitException$MaxDirectoryItemsExceededException): > The directory item limit of /app-logs/yarn/logs is exceeded: limit=1048576 > items=1048576 > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxDirItems(FSDirectory.java:2021) > > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:2072) > > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedMkdir(FSDirectory.java:1841) > > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsRecursively(FSNamesystem.java:4351) > > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4262) > > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4221) > > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4194) > > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:813) > > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:600) > > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) > > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.createAppDir(LogAggregationService.java:308) > > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initAppAggregator(LogAggregationService.java:366) > > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initApp(LogAggregationService.java:320) > > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:443) > > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:67) > > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) > > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) > at java.lang.Thread.run(Thread.java:745) > Caused by: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.FSLimitException$MaxDirectoryItemsExceededException): > The directory item limit of /app-logs/yarn/logs is exceeded: limit=1048576 > items=1048576 > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxDirItems(FSDirectory.java:2021) > > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:2072) > > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedMkdir(FSDirectory.java:1841) > > at >
[jira] [Created] (YARN-6810) YARN localizer has to validate the mapreduce.tar.gz present in cache before using it
Prabhu Joseph created YARN-6810: --- Summary: YARN localizer has to validate the mapreduce.tar.gz present in cache before using it Key: YARN-6810 URL: https://issues.apache.org/jira/browse/YARN-6810 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.7.3 Reporter: Prabhu Joseph When a localized mapreduce.tar.gz is corrupt and zero bytes, all MapReduce jobs fails on the cluster with "Error: Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster " as it uses corrupt mapreduce.tar.gz. YARN Localizer has to check if the existing mapreduce.tar.gz is a valid file before using it. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6810) YARN localizer has to validate the mapreduce.tar.gz present in cache before using it
[ https://issues.apache.org/jira/browse/YARN-6810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16084191#comment-16084191 ] Prabhu Joseph commented on YARN-6810: - [~jlowe] Missed it while searching for existing jira. Will close this as a Duplicate. > YARN localizer has to validate the mapreduce.tar.gz present in cache before > using it > > > Key: YARN-6810 > URL: https://issues.apache.org/jira/browse/YARN-6810 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.3 >Reporter: Prabhu Joseph > > When a localized mapreduce.tar.gz is corrupt and zero bytes, all MapReduce > jobs fails on the cluster with "Error: Could not find or load main class > org.apache.hadoop.mapreduce.v2.app.MRAppMaster " as it uses corrupt > mapreduce.tar.gz. YARN Localizer has to check if the existing > mapreduce.tar.gz is a valid file before using it. > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-6929) yarn.nodemanager.remote-app-log-dir structure is not scalable
Prabhu Joseph created YARN-6929: --- Summary: yarn.nodemanager.remote-app-log-dir structure is not scalable Key: YARN-6929 URL: https://issues.apache.org/jira/browse/YARN-6929 Project: Hadoop YARN Issue Type: Bug Components: log-aggregation Affects Versions: 2.7.3 Reporter: Prabhu Joseph Assignee: Prabhu Joseph The current directory structure for yarn.nodemanager.remote-app-log-dir is not scalable. Maximum Subdirectory limit by default is 1048576 (HDFS-6102). With retention yarn.nodemanager.log.retain-second of 7days, there are more chances LogAggregationService fails to create a new directory with FSLimitException$MaxDirectoryItemsExceededException. The current structure is //logs/. This can be improved with adding date as a subdirectory like //logs// {code} WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService: Application failed to init aggregation org.apache.hadoop.yarn.exceptions.YarnRuntimeException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.FSLimitException$MaxDirectoryItemsExceededException): The directory item limit of /app-logs/yarn/logs is exceeded: limit=1048576 items=1048576 at org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxDirItems(FSDirectory.java:2021) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:2072) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedMkdir(FSDirectory.java:1841) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsRecursively(FSNamesystem.java:4351) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4262) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4221) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4194) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:813) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:600) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.createAppDir(LogAggregationService.java:308) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initAppAggregator(LogAggregationService.java:366) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initApp(LogAggregationService.java:320) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:443) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:67) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.FSLimitException$MaxDirectoryItemsExceededException): The directory item limit of /app-logs/yarn/logs is exceeded: limit=1048576 items=1048576 at org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxDirItems(FSDirectory.java:2021) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:2072) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedMkdir(FSDirectory.java:1841) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsRecursively(FSNamesystem.java:4351) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4262) {code} Thanks to Robert Mancuso for finding this issue. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-6616) YARN AHS shows submitTime for jobs same as startTime
Prabhu Joseph created YARN-6616: --- Summary: YARN AHS shows submitTime for jobs same as startTime Key: YARN-6616 URL: https://issues.apache.org/jira/browse/YARN-6616 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.3 Reporter: Prabhu Joseph Assignee: Prabhu Joseph Priority: Minor YARN AHS returns startTime value for both submitTime and startTime for the jobs. Looks the code sets the submitTime with startTime value. https://github.com/apache/hadoop/blob/branch-2.7.3/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/dao/AppInfo.java#L80 {code} curl --negotiate -u: http://prabhuzeppelin3.openstacklocal:8188/ws/v1/applicationhistory/apps 149501553757414950155375741495016384084 {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-6557) YARN ContainerLocalizer logs are missing
[ https://issues.apache.org/jira/browse/YARN-6557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph resolved YARN-6557. - Resolution: Fixed Duplicate of YARN-5422 > YARN ContainerLocalizer logs are missing > > > Key: YARN-6557 > URL: https://issues.apache.org/jira/browse/YARN-6557 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.1 >Reporter: Prabhu Joseph > > YARN LCE ContainerLocalizer runs as a separate process and the logs / error > messages are not captured. We need to redirect them to a stdout or separate > log file which helps to debug Localization issues. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6557) YARN ContainerLocalizer logs are missing
[ https://issues.apache.org/jira/browse/YARN-6557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15996267#comment-15996267 ] Prabhu Joseph commented on YARN-6557: - [~Naganarasimha] Yes, missed it. Will close this one as duplicate. > YARN ContainerLocalizer logs are missing > > > Key: YARN-6557 > URL: https://issues.apache.org/jira/browse/YARN-6557 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.1 >Reporter: Prabhu Joseph > > YARN LCE ContainerLocalizer runs as a separate process and the logs / error > messages are not captured. We need to redirect them to a stdout or separate > log file which helps to debug Localization issues. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-6557) YARN ContainerLocalizer logs are missing
Prabhu Joseph created YARN-6557: --- Summary: YARN ContainerLocalizer logs are missing Key: YARN-6557 URL: https://issues.apache.org/jira/browse/YARN-6557 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.7.1 Reporter: Prabhu Joseph YARN LCE ContainerLocalizer runs as a separate process and the logs / error messages are not captured. We need to redirect them to a stdout or separate log file which helps to debug Localization issues. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-7235) RMWebServices SSL renegotiate denied
Prabhu Joseph created YARN-7235: --- Summary: RMWebServices SSL renegotiate denied Key: YARN-7235 URL: https://issues.apache.org/jira/browse/YARN-7235 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 2.7.3 Reporter: Prabhu Joseph We see lot of SSL renegotiate denied WARN messages in RM logs {code} 2017-08-29 08:14:15,821 WARN mortbay.log (Slf4jLog.java:warn(76)) - SSL renegotiate denied: java.nio.channels.SocketChannel[connected local=/10.136.19.134:8078 remote=/10.136.19.103:59994] {code} Looks we need a similar fix like YARN-6797 for RMWebServices. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-7288) ContaninerLocalizer with multiple JVM Options
Prabhu Joseph created YARN-7288: --- Summary: ContaninerLocalizer with multiple JVM Options Key: YARN-7288 URL: https://issues.apache.org/jira/browse/YARN-7288 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Affects Versions: 2.7.3 Reporter: Prabhu Joseph Assignee: Prabhu Joseph Currently ContaninerLocalizer can be configured with a single JVM option through yarn.nodemanager.container-localizer.java.opts. There are cases where we need more than one like adding -Dlog4j.debug / -verbose to debug issues. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7288) ContainerLocalizer with multiple JVM Options
[ https://issues.apache.org/jira/browse/YARN-7288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16191240#comment-16191240 ] Prabhu Joseph commented on YARN-7288: - It works fine now, have configured wrongly with double quotes which it does not expect. Thanks [~jlowe] > ContainerLocalizer with multiple JVM Options > > > Key: YARN-7288 > URL: https://issues.apache.org/jira/browse/YARN-7288 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 2.7.3 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph > > Currently ContaninerLocalizer can be configured with a single JVM option > through yarn.nodemanager.container-localizer.java.opts. There are cases where > we need more than one like adding -Dlog4j.debug / -verbose to debug issues. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7111) ApplicationHistoryServer webpage startTime and state are not readable
[ https://issues.apache.org/jira/browse/YARN-7111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16146923#comment-16146923 ] Prabhu Joseph commented on YARN-7111: - Looks the problem does not exist in 2.7.4 (attached image). Closing this as not a problem. > ApplicationHistoryServer webpage startTime and state are not readable > - > > Key: YARN-7111 > URL: https://issues.apache.org/jira/browse/YARN-7111 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Affects Versions: 2.7.3 >Reporter: Prabhu Joseph > Attachments: Screen Shot 2017-08-28 at 5.24.01 PM.png > > > ApplicationHistoryServer webpage FINISHED applications displays startTime and > state in not readable format. (attached image) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7111) ApplicationHistoryServer webpage startTime and state are not readable
[ https://issues.apache.org/jira/browse/YARN-7111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated YARN-7111: Attachment: working.png > ApplicationHistoryServer webpage startTime and state are not readable > - > > Key: YARN-7111 > URL: https://issues.apache.org/jira/browse/YARN-7111 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Affects Versions: 2.7.3 >Reporter: Prabhu Joseph > Attachments: Screen Shot 2017-08-28 at 5.24.01 PM.png, working.png > > > ApplicationHistoryServer webpage FINISHED applications displays startTime and > state in not readable format. (attached image) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-7111) ApplicationHistoryServer webpage startTime and state are not readable
[ https://issues.apache.org/jira/browse/YARN-7111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph resolved YARN-7111. - Resolution: Not A Problem > ApplicationHistoryServer webpage startTime and state are not readable > - > > Key: YARN-7111 > URL: https://issues.apache.org/jira/browse/YARN-7111 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Affects Versions: 2.7.3 >Reporter: Prabhu Joseph > Attachments: Screen Shot 2017-08-28 at 5.24.01 PM.png, working.png > > > ApplicationHistoryServer webpage FINISHED applications displays startTime and > state in not readable format. (attached image) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-7118) AHS REST API can return NullPointerException
[ https://issues.apache.org/jira/browse/YARN-7118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph reassigned YARN-7118: --- Assignee: Prabhu Joseph > AHS REST API can return NullPointerException > > > Key: YARN-7118 > URL: https://issues.apache.org/jira/browse/YARN-7118 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.1 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph > > ApplicationHistoryService REST Api returns NullPointerException > {code} > [prabhu@prabhu2 root]$ curl --negotiate -u: 'http:// IP>:8188/ws/v1/applicationhistory/apps?queue=test' > {"exception":"NullPointerException","javaClassName":"java.lang.NullPointerException"} > {code} > TimelineServer logs shows below. > {code} > 2017-08-17 17:54:54,128 WARN webapp.GenericExceptionHandler > (GenericExceptionHandler.java:toResponse(98)) - INTERNAL_SERVER_ERROR > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.webapp.WebServices.getApps(WebServices.java:191) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.webapp.AHSWebServices.getApps(AHSWebServices.java:96) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60) > at > com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7284) NodeManager crashes with OOM when Debug log enabled for ContainerLocalizer
[ https://issues.apache.org/jira/browse/YARN-7284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated YARN-7284: Attachment: Screen Shot 2017-10-03 at 1.29.35 PM.png Screen Shot 2017-10-03 at 1.29.48 PM.png > NodeManager crashes with OOM when Debug log enabled for ContainerLocalizer > --- > > Key: YARN-7284 > URL: https://issues.apache.org/jira/browse/YARN-7284 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.3 >Reporter: Prabhu Joseph > Attachments: Screen Shot 2017-10-03 at 1.29.35 PM.png, Screen Shot > 2017-10-03 at 1.29.48 PM.png > > > NodeManager crashes with OOM when DEBUG log enabled for ContainerLocalizer. > {code} > 2017-10-03 07:25:20,066 FATAL yarn.YarnUncaughtExceptionHandler > (YarnUncaughtExceptionHandler.java:uncaughtException(51)) - Thread > Thread[Thread-2114,5,main] threw an Error. Shutting down now... > java.lang.OutOfMemoryError: Java heap space > at java.util.Arrays.copyOf(Arrays.java:3332) > at > java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:137) > at > java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:121) > at > java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:421) > at java.lang.StringBuffer.append(StringBuffer.java:272) > at org.apache.hadoop.util.Shell$1.run(Shell.java:900) > {code} > errThread part of Hadoop Common Shell reads all the DEBUG log lines and > appends to StringBuffer errMsg. As per the heap dump, the errMsg stores more > than 1GB of contents. (attached image) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-7284) NodeManager crashes with OOM when Debug log enabled for ContainerLocalizer
Prabhu Joseph created YARN-7284: --- Summary: NodeManager crashes with OOM when Debug log enabled for ContainerLocalizer Key: YARN-7284 URL: https://issues.apache.org/jira/browse/YARN-7284 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.3 Reporter: Prabhu Joseph NodeManager crashes with OOM when DEBUG log enabled for ContainerLocalizer. {code} 2017-10-03 07:25:20,066 FATAL yarn.YarnUncaughtExceptionHandler (YarnUncaughtExceptionHandler.java:uncaughtException(51)) - Thread Thread[Thread-2114,5,main] threw an Error. Shutting down now... java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:3332) at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:137) at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:121) at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:421) at java.lang.StringBuffer.append(StringBuffer.java:272) at org.apache.hadoop.util.Shell$1.run(Shell.java:900) {code} errThread part of Hadoop Common Shell reads all the DEBUG log lines and appends to StringBuffer errMsg. As per the heap dump, the errMsg stores more than 1GB of contents. (attached image) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-7108) Refreshing Default Node Label Expression of a queue does not reflect for running apps
Prabhu Joseph created YARN-7108: --- Summary: Refreshing Default Node Label Expression of a queue does not reflect for running apps Key: YARN-7108 URL: https://issues.apache.org/jira/browse/YARN-7108 Project: Hadoop YARN Issue Type: Bug Components: capacity scheduler Affects Versions: 2.7.3 Reporter: Prabhu Joseph Refreshing a queue's default node label expression does not reflect for the running applications. Repro Steps: 4 node cluster, two node labels label1 and label2. label1 is Exclusive Partition with Node1 and Node2, label2 is Exclusive Partition with Node3 and Node4. A default queue whose default node label expression is label1. 1.Shutdown NodeManagers on label1 nodes Node1 and Node2 2.Submit a sample mapreduce on default queue which will stay in ACCEPTED state 3.Change default node label expression for default queue to label2 in capacity-scheduler.xml yarn rmadmin -refreshQueues queue's config gets reflected to label2 as shown on RM UI queue section but job still stays at ACCEPTED state 4. Submitting a new job into default queue moves into RUNNING state -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7108) Refreshing Default Node Label Expression of a queue does not reflect for running apps
[ https://issues.apache.org/jira/browse/YARN-7108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16143458#comment-16143458 ] Prabhu Joseph commented on YARN-7108: - Have submitted a mapreduce job with default queue and no label settings. {code} hadoop jar hadoop-mapreduce-examples-2.7.3.jar wordcount /Input /Output {code} > Refreshing Default Node Label Expression of a queue does not reflect for > running apps > - > > Key: YARN-7108 > URL: https://issues.apache.org/jira/browse/YARN-7108 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Affects Versions: 2.7.3 >Reporter: Prabhu Joseph > > Refreshing a queue's default node label expression does not reflect for the > running applications. > Repro Steps: > 4 node cluster, two node labels label1 and label2. label1 is Exclusive > Partition with Node1 and Node2, label2 is Exclusive Partition with Node3 and > Node4. A default queue whose default node label expression is label1. > 1.Shutdown NodeManagers on label1 nodes Node1 and Node2 > 2.Submit a sample mapreduce on default queue which will stay in ACCEPTED > state > 3.Change default node label expression for default queue to label2 in > capacity-scheduler.xml > yarn rmadmin -refreshQueues > queue's config gets reflected to label2 as shown on RM UI queue section but > job still stays at ACCEPTED state > 4. Submitting a new job into default queue moves into RUNNING state -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-7111) ApplicationHistoryServer webpage startTime and state are not readable
Prabhu Joseph created YARN-7111: --- Summary: ApplicationHistoryServer webpage startTime and state are not readable Key: YARN-7111 URL: https://issues.apache.org/jira/browse/YARN-7111 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Affects Versions: 2.7.3 Reporter: Prabhu Joseph ApplicationHistoryServer webpage FINISHED applications displays startTime and state in not readable format. (attached image) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7111) ApplicationHistoryServer webpage startTime and state are not readable
[ https://issues.apache.org/jira/browse/YARN-7111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated YARN-7111: Attachment: Screen Shot 2017-08-28 at 5.24.01 PM.png > ApplicationHistoryServer webpage startTime and state are not readable > - > > Key: YARN-7111 > URL: https://issues.apache.org/jira/browse/YARN-7111 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Affects Versions: 2.7.3 >Reporter: Prabhu Joseph > Attachments: Screen Shot 2017-08-28 at 5.24.01 PM.png > > > ApplicationHistoryServer webpage FINISHED applications displays startTime and > state in not readable format. (attached image) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7108) Refreshing Default Node Label Expression of a queue does not reflect for running apps
[ https://issues.apache.org/jira/browse/YARN-7108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16143812#comment-16143812 ] Prabhu Joseph commented on YARN-7108: - Thought application will implicitly run on queue's configured default node label. It moves into RUNNING state when the default node label has nodes running, if not it stays at ACCEPTED state which is expected. But refreshing queue's default label to a new label which has nodes running does not refresh the app state. Is this expected behavior. > Refreshing Default Node Label Expression of a queue does not reflect for > running apps > - > > Key: YARN-7108 > URL: https://issues.apache.org/jira/browse/YARN-7108 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Affects Versions: 2.7.3 >Reporter: Prabhu Joseph > > Refreshing a queue's default node label expression does not reflect for the > running applications. > Repro Steps: > 4 node cluster, two node labels label1 and label2. label1 is Exclusive > Partition with Node1 and Node2, label2 is Exclusive Partition with Node3 and > Node4. A default queue whose default node label expression is label1. > 1.Shutdown NodeManagers on label1 nodes Node1 and Node2 > 2.Submit a sample mapreduce on default queue which will stay in ACCEPTED > state > 3.Change default node label expression for default queue to label2 in > capacity-scheduler.xml > yarn rmadmin -refreshQueues > queue's config gets reflected to label2 as shown on RM UI queue section but > job still stays at ACCEPTED state > 4. Submitting a new job into default queue moves into RUNNING state -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-7118) AHS REST API can return NullPointerException
Prabhu Joseph created YARN-7118: --- Summary: AHS REST API can return NullPointerException Key: YARN-7118 URL: https://issues.apache.org/jira/browse/YARN-7118 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.1 Reporter: Prabhu Joseph ApplicationHistoryService REST Api returns NullPointerException {code} [prabhu@prabhu2 root]$ curl --negotiate -u: 'http://:8188/ws/v1/applicationhistory/apps?queue=test' {"exception":"NullPointerException","javaClassName":"java.lang.NullPointerException"} {code} TimelineServer logs shows below. {code} 2017-08-17 17:54:54,128 WARN webapp.GenericExceptionHandler (GenericExceptionHandler.java:toResponse(98)) - INTERNAL_SERVER_ERROR java.lang.NullPointerException at org.apache.hadoop.yarn.server.webapp.WebServices.getApps(WebServices.java:191) at org.apache.hadoop.yarn.server.applicationhistoryservice.webapp.AHSWebServices.getApps(AHSWebServices.java:96) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60) at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185) {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7284) NodeManager crashes with OOM when Debug log enabled for ContainerLocalizer
[ https://issues.apache.org/jira/browse/YARN-7284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated YARN-7284: Component/s: nodemanager > NodeManager crashes with OOM when Debug log enabled for ContainerLocalizer > --- > > Key: YARN-7284 > URL: https://issues.apache.org/jira/browse/YARN-7284 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.3 >Reporter: Prabhu Joseph > Attachments: Screen Shot 2017-10-03 at 1.29.35 PM.png, Screen Shot > 2017-10-03 at 1.29.48 PM.png > > > NodeManager crashes with OOM when DEBUG log enabled for ContainerLocalizer. > {code} > 2017-10-03 07:25:20,066 FATAL yarn.YarnUncaughtExceptionHandler > (YarnUncaughtExceptionHandler.java:uncaughtException(51)) - Thread > Thread[Thread-2114,5,main] threw an Error. Shutting down now... > java.lang.OutOfMemoryError: Java heap space > at java.util.Arrays.copyOf(Arrays.java:3332) > at > java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:137) > at > java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:121) > at > java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:421) > at java.lang.StringBuffer.append(StringBuffer.java:272) > at org.apache.hadoop.util.Shell$1.run(Shell.java:900) > {code} > errThread part of Hadoop Common Shell reads all the DEBUG log lines and > appends to StringBuffer errMsg. As per the heap dump, the errMsg stores more > than 1GB of contents. (attached image) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7463) Using getLocalPathForWrite for Container related debug information
[ https://issues.apache.org/jira/browse/YARN-7463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated YARN-7463: Attachment: YARN-7463.1.patch > Using getLocalPathForWrite for Container related debug information > -- > > Key: YARN-7463 > URL: https://issues.apache.org/jira/browse/YARN-7463 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.3 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Minor > Attachments: YARN-7463.1.patch > > > Containers debug information launch_container.sh and directory.info are > always logged into first directory of NM_LOG_DIRS instead of using the log > directory returned from getLogPathForWrite. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7500) LogAggregation DeletionService should consider completedTime for long running jobs
[ https://issues.apache.org/jira/browse/YARN-7500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16254905#comment-16254905 ] Prabhu Joseph commented on YARN-7500: - [~jlowe] Oh yes, missed it. The issue is our customer has a long running custom Yarn Application which has started before yarn.log-aggregation.retain-seconds and was running yesterday as per the RM UI, and today we din see any logs under app-logs. Logs where there for running job. Looks only possibility is the custom app has not updated the logs for many days. Will check RM logs and hdfs-audit logs to validate and give more information. > LogAggregation DeletionService should consider completedTime for long running > jobs > -- > > Key: YARN-7500 > URL: https://issues.apache.org/jira/browse/YARN-7500 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation >Affects Versions: 2.7.3 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph > > Currently LogAggregation deletes the application logs based on start time of > the job. For long running jobs (started before > yarn.log-aggregation.retain-seconds), say it is failed yesterday for some > reason and we won't have the job logs today for debugging. > Better to consider the completedTime of the job as part of the deletion > condition. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-7500) LogAggregation DeletionService should consider completedTime for long running jobs
Prabhu Joseph created YARN-7500: --- Summary: LogAggregation DeletionService should consider completedTime for long running jobs Key: YARN-7500 URL: https://issues.apache.org/jira/browse/YARN-7500 Project: Hadoop YARN Issue Type: Bug Components: log-aggregation Affects Versions: 2.7.3 Reporter: Prabhu Joseph Assignee: Prabhu Joseph Currently LogAggregation deletes the application logs based on start time of the job. For long running jobs (started before yarn.log-aggregation.retain-seconds), say it is failed yesterday for some reason and we won't have the job logs today for debugging. Better to consider the completedTime of the job as part of the deletion condition. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7428) Localizer Failed does not log containeId
[ https://issues.apache.org/jira/browse/YARN-7428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated YARN-7428: Attachment: YARN-7428.1.patch > Localizer Failed does not log containeId > > > Key: YARN-7428 > URL: https://issues.apache.org/jira/browse/YARN-7428 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.3 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Attachments: YARN-7428.1.patch > > > When a Localizer fails for some reason, the error message does not have the > containerId to correlate. > {code} > 2017-10-31 00:03:11,046 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor: > IOException executing command: > java.io.InterruptedIOException: java.lang.InterruptedException > at org.apache.hadoop.util.Shell.runCommand(Shell.java:947) > at org.apache.hadoop.util.Shell.run(Shell.java:848) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1142) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:151) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.startLocalizer(LinuxContainerExecutor.java:264) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1114) > Caused by: java.lang.InterruptedException > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:502) > at java.lang.UNIXProcess.waitFor(UNIXProcess.java:396) > at org.apache.hadoop.util.Shell.runCommand(Shell.java:937) > ... 5 more > 2017-10-31 00:03:11,047 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: > Localizer failed > java.lang.NullPointerException > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-7426) Add a finite shell command timeout to ContainerLocalizer
Prabhu Joseph created YARN-7426: --- Summary: Add a finite shell command timeout to ContainerLocalizer Key: YARN-7426 URL: https://issues.apache.org/jira/browse/YARN-7426 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 2.7.3 Reporter: Prabhu Joseph Priority: Critical When the NodeManager is overloaded and ContainerLocalizer processes are hanging, the containers will timeout and cleaned up. The LocalizerRunner thread will be interrupted during cleanup but the interrupt does not work when it is reading from FileInputStream. LocalizerRunner threads and ContainerLocalizer process keeps on accumulating which makes the node completely unresponsive. We can have a timeout for Shell Command to avoid this similar to HADOOP-13817. The timeout value can be set by AM same as container timeout. ContainerLocalizer JVM stacktrace: {code} "main" #1 prio=5 os_prio=0 tid=0x7fd8ec019000 nid=0xc295 runnable [0x7fd8f3956000] java.lang.Thread.State: RUNNABLE at java.util.zip.ZipFile.open(Native Method) at java.util.zip.ZipFile.(ZipFile.java:219) at java.util.zip.ZipFile.(ZipFile.java:149) at java.util.jar.JarFile.(JarFile.java:166) at java.util.jar.JarFile.(JarFile.java:103) at sun.misc.URLClassPath$JarLoader.getJarFile(URLClassPath.java:893) at sun.misc.URLClassPath$JarLoader.access$700(URLClassPath.java:756) at sun.misc.URLClassPath$JarLoader$1.run(URLClassPath.java:838) at sun.misc.URLClassPath$JarLoader$1.run(URLClassPath.java:831) at java.security.AccessController.doPrivileged(Native Method) at sun.misc.URLClassPath$JarLoader.ensureOpen(URLClassPath.java:830) at sun.misc.URLClassPath$JarLoader.(URLClassPath.java:803) at sun.misc.URLClassPath$3.run(URLClassPath.java:530) at sun.misc.URLClassPath$3.run(URLClassPath.java:520) at java.security.AccessController.doPrivileged(Native Method) at sun.misc.URLClassPath.getLoader(URLClassPath.java:519) at sun.misc.URLClassPath.getLoader(URLClassPath.java:492) - locked <0x00076ac75058> (a sun.misc.URLClassPath) at sun.misc.URLClassPath.getNextLoader(URLClassPath.java:457) - locked <0x00076ac75058> (a sun.misc.URLClassPath) at sun.misc.URLClassPath.getResource(URLClassPath.java:211) at java.net.URLClassLoader$1.run(URLClassLoader.java:365) at java.net.URLClassLoader$1.run(URLClassLoader.java:362) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:361) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) - locked <0x00076ac7f960> (a java.lang.Object) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:495) {code} NodeManager LocalizerRunner thread which is not interrupted: {code} "LocalizerRunner for container_e746_1508665985104_601806_01_05" #3932753 prio=5 os_prio=0 tid=0x7fb258d5f800 nid=0x11091 runnable [0x7fb153946000] java.lang.Thread.State: RUNNABLE at java.io.FileInputStream.readBytes(Native Method) at java.io.FileInputStream.read(FileInputStream.java:255) at java.io.BufferedInputStream.read1(BufferedInputStream.java:284) at java.io.BufferedInputStream.read(BufferedInputStream.java:345) - locked <0x000718502b80> (a java.lang.UNIXProcess$ProcessPipeInputStream) at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284) at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326) at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) - locked <0x000718502bd8> (a java.io.InputStreamReader) at java.io.InputStreamReader.read(InputStreamReader.java:184) at java.io.BufferedReader.fill(BufferedReader.java:161) at java.io.BufferedReader.read1(BufferedReader.java:212) at java.io.BufferedReader.read(BufferedReader.java:286) - locked <0x000718502bd8> (a java.io.InputStreamReader) at org.apache.hadoop.util.Shell$ShellCommandExecutor.parseExecResult(Shell.java:1155) at org.apache.hadoop.util.Shell.runCommand(Shell.java:930) at org.apache.hadoop.util.Shell.run(Shell.java:848) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1142) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:151) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.startLocalizer(LinuxContainerExecutor.java:264) at
[jira] [Commented] (YARN-7428) Localizer Failed does not log containeId
[ https://issues.apache.org/jira/browse/YARN-7428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1624#comment-1624 ] Prabhu Joseph commented on YARN-7428: - Thanks [~bibinchundatt] for the review. > Localizer Failed does not log containeId > > > Key: YARN-7428 > URL: https://issues.apache.org/jira/browse/YARN-7428 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.3 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph > Attachments: YARN-7428.1.patch > > > When a Localizer fails for some reason, the error message does not have the > containerId to correlate. > {code} > 2017-10-31 00:03:11,046 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor: > IOException executing command: > java.io.InterruptedIOException: java.lang.InterruptedException > at org.apache.hadoop.util.Shell.runCommand(Shell.java:947) > at org.apache.hadoop.util.Shell.run(Shell.java:848) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1142) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:151) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.startLocalizer(LinuxContainerExecutor.java:264) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1114) > Caused by: java.lang.InterruptedException > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:502) > at java.lang.UNIXProcess.waitFor(UNIXProcess.java:396) > at org.apache.hadoop.util.Shell.runCommand(Shell.java:937) > ... 5 more > 2017-10-31 00:03:11,047 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: > Localizer failed > java.lang.NullPointerException > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7426) Interrupt does not work when LocalizerRunner is reading from InputStream
[ https://issues.apache.org/jira/browse/YARN-7426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated YARN-7426: Summary: Interrupt does not work when LocalizerRunner is reading from InputStream (was: Add a finite shell command timeout to ContainerLocalizer) > Interrupt does not work when LocalizerRunner is reading from InputStream > > > Key: YARN-7426 > URL: https://issues.apache.org/jira/browse/YARN-7426 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.7.3 >Reporter: Prabhu Joseph >Priority: Critical > > When the NodeManager is overloaded and ContainerLocalizer processes are > hanging, the containers will timeout and cleaned up. The LocalizerRunner > thread will be interrupted during cleanup but the interrupt does not work > when it is reading from FileInputStream. LocalizerRunner threads and > ContainerLocalizer process keeps on accumulating which makes the node > completely unresponsive. We can have a timeout for Shell Command to avoid > this similar to HADOOP-13817. > The timeout value can be set by AM same as container timeout. > ContainerLocalizer JVM stacktrace: > {code} > "main" #1 prio=5 os_prio=0 tid=0x7fd8ec019000 nid=0xc295 runnable > [0x7fd8f3956000] >java.lang.Thread.State: RUNNABLE > at java.util.zip.ZipFile.open(Native Method) > at java.util.zip.ZipFile.(ZipFile.java:219) > at java.util.zip.ZipFile.(ZipFile.java:149) > at java.util.jar.JarFile.(JarFile.java:166) > at java.util.jar.JarFile.(JarFile.java:103) > at sun.misc.URLClassPath$JarLoader.getJarFile(URLClassPath.java:893) > at sun.misc.URLClassPath$JarLoader.access$700(URLClassPath.java:756) > at sun.misc.URLClassPath$JarLoader$1.run(URLClassPath.java:838) > at sun.misc.URLClassPath$JarLoader$1.run(URLClassPath.java:831) > at java.security.AccessController.doPrivileged(Native Method) > at sun.misc.URLClassPath$JarLoader.ensureOpen(URLClassPath.java:830) > at sun.misc.URLClassPath$JarLoader.(URLClassPath.java:803) > at sun.misc.URLClassPath$3.run(URLClassPath.java:530) > at sun.misc.URLClassPath$3.run(URLClassPath.java:520) > at java.security.AccessController.doPrivileged(Native Method) > at sun.misc.URLClassPath.getLoader(URLClassPath.java:519) > at sun.misc.URLClassPath.getLoader(URLClassPath.java:492) > - locked <0x00076ac75058> (a sun.misc.URLClassPath) > at sun.misc.URLClassPath.getNextLoader(URLClassPath.java:457) > - locked <0x00076ac75058> (a sun.misc.URLClassPath) > at sun.misc.URLClassPath.getResource(URLClassPath.java:211) > at java.net.URLClassLoader$1.run(URLClassLoader.java:365) > at java.net.URLClassLoader$1.run(URLClassLoader.java:362) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:361) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > - locked <0x00076ac7f960> (a java.lang.Object) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:495) > {code} > NodeManager LocalizerRunner thread which is not interrupted: > {code} > "LocalizerRunner for container_e746_1508665985104_601806_01_05" #3932753 > prio=5 os_prio=0 tid=0x7fb258d5f800 nid=0x11091 runnable > [0x7fb153946000] >java.lang.Thread.State: RUNNABLE > at java.io.FileInputStream.readBytes(Native Method) > at java.io.FileInputStream.read(FileInputStream.java:255) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:284) > at java.io.BufferedInputStream.read(BufferedInputStream.java:345) > - locked <0x000718502b80> (a > java.lang.UNIXProcess$ProcessPipeInputStream) > at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284) > at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326) > at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) > - locked <0x000718502bd8> (a java.io.InputStreamReader) > at java.io.InputStreamReader.read(InputStreamReader.java:184) > at java.io.BufferedReader.fill(BufferedReader.java:161) > at java.io.BufferedReader.read1(BufferedReader.java:212) > at java.io.BufferedReader.read(BufferedReader.java:286) > - locked <0x000718502bd8> (a java.io.InputStreamReader) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.parseExecResult(Shell.java:1155) > at org.apache.hadoop.util.Shell.runCommand(Shell.java:930) > at
[jira] [Created] (YARN-7428) Localizer Failed does not log containeId
Prabhu Joseph created YARN-7428: --- Summary: Localizer Failed does not log containeId Key: YARN-7428 URL: https://issues.apache.org/jira/browse/YARN-7428 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.7.3 Reporter: Prabhu Joseph Assignee: Prabhu Joseph Priority: Major When a Localizer fails for some reason, the error message does not have the containerId to correlate. {code} 2017-10-31 00:03:11,046 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor: IOException executing command: java.io.InterruptedIOException: java.lang.InterruptedException at org.apache.hadoop.util.Shell.runCommand(Shell.java:947) at org.apache.hadoop.util.Shell.run(Shell.java:848) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1142) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:151) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.startLocalizer(LinuxContainerExecutor.java:264) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1114) Caused by: java.lang.InterruptedException at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:502) at java.lang.UNIXProcess.waitFor(UNIXProcess.java:396) at org.apache.hadoop.util.Shell.runCommand(Shell.java:937) ... 5 more 2017-10-31 00:03:11,047 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Localizer failed java.lang.NullPointerException {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6078) Containers stuck in Localizing state
[ https://issues.apache.org/jira/browse/YARN-6078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235308#comment-16235308 ] Prabhu Joseph commented on YARN-6078: - We have hit this issue recently. Below are the analysis When the NodeManager is overloaded and ContainerLocalizer processes are hanging, the containers will timeout and cleaned up. The LocalizerRunner thread will be interrupted during cleanup but the interrupt does not work when it is reading from FileInputStream. LocalizerRunner threads and ContainerLocalizer process keeps on accumulating which makes the node completely unresponsive. There are below options which will help to avoid this: 1. ShellCommandExecutor parseExecResult currently uses blocking read() which can be changed into below to use non blocking available() + sleep for some time. {code} while(running) { if(in.available() > 0) { n = in.read(buffer); //do stuff with the buffer } else { Thread.sleep(500); } } {code} 2. Add a timeout for shell command similar to HADOOP-13817, timeout value can be set by AM same as container timeout. ContainerLocalizer JVM stacktrace: {code} "main" #1 prio=5 os_prio=0 tid=0x7fd8ec019000 nid=0xc295 runnable [0x7fd8f3956000] java.lang.Thread.State: RUNNABLE at java.util.zip.ZipFile.open(Native Method) at java.util.zip.ZipFile.(ZipFile.java:219) at java.util.zip.ZipFile.(ZipFile.java:149) at java.util.jar.JarFile.(JarFile.java:166) at java.util.jar.JarFile.(JarFile.java:103) at sun.misc.URLClassPath$JarLoader.getJarFile(URLClassPath.java:893) at sun.misc.URLClassPath$JarLoader.access$700(URLClassPath.java:756) at sun.misc.URLClassPath$JarLoader$1.run(URLClassPath.java:838) at sun.misc.URLClassPath$JarLoader$1.run(URLClassPath.java:831) at java.security.AccessController.doPrivileged(Native Method) at sun.misc.URLClassPath$JarLoader.ensureOpen(URLClassPath.java:830) at sun.misc.URLClassPath$JarLoader.(URLClassPath.java:803) at sun.misc.URLClassPath$3.run(URLClassPath.java:530) at sun.misc.URLClassPath$3.run(URLClassPath.java:520) at java.security.AccessController.doPrivileged(Native Method) at sun.misc.URLClassPath.getLoader(URLClassPath.java:519) at sun.misc.URLClassPath.getLoader(URLClassPath.java:492) - locked <0x00076ac75058> (a sun.misc.URLClassPath) at sun.misc.URLClassPath.getNextLoader(URLClassPath.java:457) - locked <0x00076ac75058> (a sun.misc.URLClassPath) at sun.misc.URLClassPath.getResource(URLClassPath.java:211) at java.net.URLClassLoader$1.run(URLClassLoader.java:365) at java.net.URLClassLoader$1.run(URLClassLoader.java:362) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:361) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) - locked <0x00076ac7f960> (a java.lang.Object) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:495) {code} NodeManager LocalizerRunner thread which is not interrupted: {code} "LocalizerRunner for container_e746_1508665985104_601806_01_05" #3932753 prio=5 os_prio=0 tid=0x7fb258d5f800 nid=0x11091 runnable [0x7fb153946000] java.lang.Thread.State: RUNNABLE at java.io.FileInputStream.readBytes(Native Method) at java.io.FileInputStream.read(FileInputStream.java:255) at java.io.BufferedInputStream.read1(BufferedInputStream.java:284) at java.io.BufferedInputStream.read(BufferedInputStream.java:345) - locked <0x000718502b80> (a java.lang.UNIXProcess$ProcessPipeInputStream) at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284) at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326) at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) - locked <0x000718502bd8> (a java.io.InputStreamReader) at java.io.InputStreamReader.read(InputStreamReader.java:184) at java.io.BufferedReader.fill(BufferedReader.java:161) at java.io.BufferedReader.read1(BufferedReader.java:212) at java.io.BufferedReader.read(BufferedReader.java:286) - locked <0x000718502bd8> (a java.io.InputStreamReader) at org.apache.hadoop.util.Shell$ShellCommandExecutor.parseExecResult(Shell.java:1155) at org.apache.hadoop.util.Shell.runCommand(Shell.java:930) at org.apache.hadoop.util.Shell.run(Shell.java:848) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1142) at
[jira] [Created] (YARN-7429) Auxillary Service status on NodeManager UI / Cli
Prabhu Joseph created YARN-7429: --- Summary: Auxillary Service status on NodeManager UI / Cli Key: YARN-7429 URL: https://issues.apache.org/jira/browse/YARN-7429 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.7.3 Reporter: Prabhu Joseph Priority: Major When Auxillary services like Spark Shuffle , MapReduce Shuffle Service fails for some reason, and the jobs running will have issue when the remote containers tries to fetch the data from the Node where service is failed to initialize. Reason shuffle service failed to start will be in NodeManager logs during startup and likely will be lost after few days when we noticed the jobs failing. Useful if NodeManager UI / Cli shows the list of Auxillary Services and their status and capture any error message if it has failed. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-7463) Using getLocalPathForWrite for Container related debug information
Prabhu Joseph created YARN-7463: --- Summary: Using getLocalPathForWrite for Container related debug information Key: YARN-7463 URL: https://issues.apache.org/jira/browse/YARN-7463 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.7.3 Reporter: Prabhu Joseph Assignee: Prabhu Joseph Priority: Minor Containers debug information launch_container.sh and directory.info are always logged into first directory of NM_LOG_DIRS instead of using the log directory returned from getLogPathForWrite. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6929) yarn.nodemanager.remote-app-log-dir structure is not scalable
[ https://issues.apache.org/jira/browse/YARN-6929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated YARN-6929: Attachment: YARN-6929.1.patch > yarn.nodemanager.remote-app-log-dir structure is not scalable > - > > Key: YARN-6929 > URL: https://issues.apache.org/jira/browse/YARN-6929 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation >Affects Versions: 2.7.3 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph > Attachments: YARN-6929.1.patch, YARN-6929.patch > > > The current directory structure for yarn.nodemanager.remote-app-log-dir is > not scalable. Maximum Subdirectory limit by default is 1048576 (HDFS-6102). > With retention yarn.log-aggregation.retain-seconds of 7days, there are more > chances LogAggregationService fails to create a new directory with > FSLimitException$MaxDirectoryItemsExceededException. > The current structure is > //logs/. This can be > improved with adding date as a subdirectory like > //logs// > {code} > WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService: > Application failed to init aggregation > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.FSLimitException$MaxDirectoryItemsExceededException): > The directory item limit of /app-logs/yarn/logs is exceeded: limit=1048576 > items=1048576 > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxDirItems(FSDirectory.java:2021) > > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:2072) > > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedMkdir(FSDirectory.java:1841) > > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsRecursively(FSNamesystem.java:4351) > > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4262) > > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4221) > > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4194) > > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:813) > > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:600) > > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) > > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.createAppDir(LogAggregationService.java:308) > > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initAppAggregator(LogAggregationService.java:366) > > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initApp(LogAggregationService.java:320) > > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:443) > > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:67) > > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) > > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) > at java.lang.Thread.run(Thread.java:745) > Caused by: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.FSLimitException$MaxDirectoryItemsExceededException): > The directory item limit of /app-logs/yarn/logs is exceeded: limit=1048576 > items=1048576 > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxDirItems(FSDirectory.java:2021) > > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:2072) > > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedMkdir(FSDirectory.java:1841) > > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsRecursively(FSNamesystem.java:4351) > > at >
[jira] [Updated] (YARN-6929) yarn.nodemanager.remote-app-log-dir structure is not scalable
[ https://issues.apache.org/jira/browse/YARN-6929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated YARN-6929: Attachment: YARN-6929.2.patch > yarn.nodemanager.remote-app-log-dir structure is not scalable > - > > Key: YARN-6929 > URL: https://issues.apache.org/jira/browse/YARN-6929 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation >Affects Versions: 2.7.3 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph > Attachments: YARN-6929.1.patch, YARN-6929.2.patch, YARN-6929.patch > > > The current directory structure for yarn.nodemanager.remote-app-log-dir is > not scalable. Maximum Subdirectory limit by default is 1048576 (HDFS-6102). > With retention yarn.log-aggregation.retain-seconds of 7days, there are more > chances LogAggregationService fails to create a new directory with > FSLimitException$MaxDirectoryItemsExceededException. > The current structure is > //logs/. This can be > improved with adding date as a subdirectory like > //logs// > {code} > WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService: > Application failed to init aggregation > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.FSLimitException$MaxDirectoryItemsExceededException): > The directory item limit of /app-logs/yarn/logs is exceeded: limit=1048576 > items=1048576 > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxDirItems(FSDirectory.java:2021) > > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:2072) > > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedMkdir(FSDirectory.java:1841) > > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsRecursively(FSNamesystem.java:4351) > > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4262) > > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4221) > > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4194) > > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:813) > > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:600) > > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) > > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.createAppDir(LogAggregationService.java:308) > > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initAppAggregator(LogAggregationService.java:366) > > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initApp(LogAggregationService.java:320) > > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:443) > > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:67) > > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) > > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) > at java.lang.Thread.run(Thread.java:745) > Caused by: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.FSLimitException$MaxDirectoryItemsExceededException): > The directory item limit of /app-logs/yarn/logs is exceeded: limit=1048576 > items=1048576 > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxDirItems(FSDirectory.java:2021) > > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:2072) > > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedMkdir(FSDirectory.java:1841) > > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsRecursively(FSNamesystem.java:4351) > > at >
[jira] [Updated] (YARN-6929) yarn.nodemanager.remote-app-log-dir structure is not scalable
[ https://issues.apache.org/jira/browse/YARN-6929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated YARN-6929: Attachment: YARN-6929.patch > yarn.nodemanager.remote-app-log-dir structure is not scalable > - > > Key: YARN-6929 > URL: https://issues.apache.org/jira/browse/YARN-6929 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation >Affects Versions: 2.7.3 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph > Attachments: YARN-6929.patch > > > The current directory structure for yarn.nodemanager.remote-app-log-dir is > not scalable. Maximum Subdirectory limit by default is 1048576 (HDFS-6102). > With retention yarn.log-aggregation.retain-seconds of 7days, there are more > chances LogAggregationService fails to create a new directory with > FSLimitException$MaxDirectoryItemsExceededException. > The current structure is > //logs/. This can be > improved with adding date as a subdirectory like > //logs// > {code} > WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService: > Application failed to init aggregation > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.FSLimitException$MaxDirectoryItemsExceededException): > The directory item limit of /app-logs/yarn/logs is exceeded: limit=1048576 > items=1048576 > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxDirItems(FSDirectory.java:2021) > > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:2072) > > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedMkdir(FSDirectory.java:1841) > > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsRecursively(FSNamesystem.java:4351) > > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4262) > > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4221) > > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4194) > > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:813) > > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:600) > > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) > > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.createAppDir(LogAggregationService.java:308) > > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initAppAggregator(LogAggregationService.java:366) > > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initApp(LogAggregationService.java:320) > > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:443) > > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:67) > > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) > > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) > at java.lang.Thread.run(Thread.java:745) > Caused by: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.FSLimitException$MaxDirectoryItemsExceededException): > The directory item limit of /app-logs/yarn/logs is exceeded: limit=1048576 > items=1048576 > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxDirItems(FSDirectory.java:2021) > > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:2072) > > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedMkdir(FSDirectory.java:1841) > > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsRecursively(FSNamesystem.java:4351) > > at >
[jira] [Assigned] (YARN-5295) YARN queue-mappings to check Queue is present before submitting job
[ https://issues.apache.org/jira/browse/YARN-5295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph reassigned YARN-5295: --- Assignee: Prabhu Joseph > YARN queue-mappings to check Queue is present before submitting job > --- > > Key: YARN-5295 > URL: https://issues.apache.org/jira/browse/YARN-5295 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Affects Versions: 2.7.2 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph > > In yarn Queue-Mappings, Yarn should check if the queue is present before > submitting the job. If not present it should go to next mapping available. > For example if we have > yarn.scheduler.capacity.queue-mappings=u:%user:%user,g:edw:platform > and I submit job with user "test" and if there is no "test" queue then it > should check the second mapping (g:edw:platform) in the list and if test is > part of edw group it should submit job in platform queue. > Below Sanity checks has to be done for the mapped queue in the list and if it > fails then the the next queue mapping has to be chosen, when there is no > queue mapping passing the sanity check, only then the application has to be > Rejected. > 1. is queue present > 2. is queue not a leaf queue > 3. is user either have ACL Submit_Applications or Administer_Queue of the > queue. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8291) RMRegistryOperationService don't have limit on AsyncPurge threads
Prabhu Joseph created YARN-8291: --- Summary: RMRegistryOperationService don't have limit on AsyncPurge threads Key: YARN-8291 URL: https://issues.apache.org/jira/browse/YARN-8291 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 2.7.3 Reporter: Prabhu Joseph When there are more than 1+ containers finished - RMRegistryOperationService will create 1+ threads for performing AsyncPurge which can slowdown the ResourceManager process. There should be a limit on the number of threads. {code} "RegistryAdminService 554485" #824351 prio=5 os_prio=0 tid=0x7fe4b2bc9800 nid=0xf8ed in Object.wait() [0x7fe31a5e4000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:502) at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1386) - locked <0x0007902ec7d8> (a org.apache.zookeeper.ClientCnxn$Packet) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1040) at org.apache.curator.framework.imps.ExistsBuilderImpl$2.call(ExistsBuilderImpl.java:172) at org.apache.curator.framework.imps.ExistsBuilderImpl$2.call(ExistsBuilderImpl.java:161) at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107) at org.apache.curator.framework.imps.ExistsBuilderImpl.pathInForeground(ExistsBuilderImpl.java:158) at org.apache.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:148) at org.apache.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:36) at org.apache.hadoop.registry.client.impl.zk.CuratorService.zkStat(CuratorService.java:455) at org.apache.hadoop.registry.client.impl.zk.RegistryOperationsService.stat(RegistryOperationsService.java:137) at org.apache.hadoop.registry.client.binding.RegistryUtils.statChildren(RegistryUtils.java:210) at org.apache.hadoop.registry.server.services.RegistryAdminService.purge(RegistryAdminService.java:450) at org.apache.hadoop.registry.server.services.RegistryAdminService.purge(RegistryAdminService.java:520) at org.apache.hadoop.registry.server.services.RegistryAdminService$AsyncPurge.call(RegistryAdminService.java:570) at org.apache.hadoop.registry.server.services.RegistryAdminService$AsyncPurge.call(RegistryAdminService.java:543) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8254) dynamically change log levels for YARN Jobs
[ https://issues.apache.org/jira/browse/YARN-8254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466853#comment-16466853 ] Prabhu Joseph commented on YARN-8254: - YarnClient can request setLogLevel for an application using a new api "yarn application -setLogLevel " to RM. ResourceManager will pass it to ApplicationMaster through AllocateResponse. ApplicationMaster will process the logLevel and pass it to all the task containers as part of the response to statusUpdate. Needs change in each application to support this or can simply ignore. > dynamically change log levels for YARN Jobs > --- > > Key: YARN-8254 > URL: https://issues.apache.org/jira/browse/YARN-8254 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 2.7.3 >Reporter: Prabhu Joseph >Priority: Major > Labels: supportability > > Currently the Log Levels for Daemons can be dynamically changed. It will be > easier while debugging to have same for YARN Jobs. Client can setLogLevel to > ApplicationMaster which can set it for all the containers. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8254) dynamically change log levels for YARN Jobs
[ https://issues.apache.org/jira/browse/YARN-8254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated YARN-8254: Component/s: yarn > dynamically change log levels for YARN Jobs > --- > > Key: YARN-8254 > URL: https://issues.apache.org/jira/browse/YARN-8254 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 2.7.3 >Reporter: Prabhu Joseph >Priority: Major > Labels: supportability > > Currently the Log Levels for Daemons can be dynamically changed. It will be > easier while debugging to have same for YARN Jobs. Client can setLogLevel to > ApplicationMaster which can set it for all the containers. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8254) dynamically change log levels for YARN Jobs
[ https://issues.apache.org/jira/browse/YARN-8254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466767#comment-16466767 ] Prabhu Joseph commented on YARN-8254: - [~Naganarasimha] Just realized this is Application specific. AM has to provide support to change log level to client. JobClient can request setLogLevel to AM. AM will internally setLogLevel for all running containers. Will move this Jira to MapReduce. > dynamically change log levels for YARN Jobs > --- > > Key: YARN-8254 > URL: https://issues.apache.org/jira/browse/YARN-8254 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 2.7.3 >Reporter: Prabhu Joseph >Priority: Major > Labels: supportability > > Currently the Log Levels for Daemons can be dynamically changed. It will be > easier while debugging to have same for YARN Jobs. Client can setLogLevel to > ApplicationMaster which can set it for all the containers. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8254) dynamically change log levels for YARN Jobs
[ https://issues.apache.org/jira/browse/YARN-8254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated YARN-8254: Component/s: (was: yarn) > dynamically change log levels for YARN Jobs > --- > > Key: YARN-8254 > URL: https://issues.apache.org/jira/browse/YARN-8254 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 2.7.3 >Reporter: Prabhu Joseph >Priority: Major > Labels: supportability > > Currently the Log Levels for Daemons can be dynamically changed. It will be > easier while debugging to have same for YARN Jobs. Client can setLogLevel to > ApplicationMaster which can set it for all the containers. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8224) LogAggregation status TIME_OUT for absent container misleading
Prabhu Joseph created YARN-8224: --- Summary: LogAggregation status TIME_OUT for absent container misleading Key: YARN-8224 URL: https://issues.apache.org/jira/browse/YARN-8224 Project: Hadoop YARN Issue Type: Bug Components: log-aggregation Affects Versions: 2.7.3 Reporter: Prabhu Joseph When a container is not launched on NM and it is absent, RM still tries to get the Log Aggregation Status and reports the status as TIME_OUT. (attached screenshot) {code} 2018-04-26 12:47:38,403 WARN containermanager.ContainerManagerImpl (ContainerManagerImpl.java:handle(1070)) - Event EventType: KILL_CONTAINER sent to absent container container_e361_1524687599273_2110_01_000770 2018-04-26 12:49:31,743 WARN containermanager.ContainerManagerImpl (ContainerManagerImpl.java:handle(1086)) - Event EventType: FINISH_APPLICATION sent to absent application application_1524687599273_2110 {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8224) LogAggregation status TIME_OUT for absent container misleading
[ https://issues.apache.org/jira/browse/YARN-8224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated YARN-8224: Description: When a container is not launched on NM and it is absent, RM still tries to get the Log Aggregation Status and reports the status as TIME_OUT. {code} 2018-04-26 12:47:38,403 WARN containermanager.ContainerManagerImpl (ContainerManagerImpl.java:handle(1070)) - Event EventType: KILL_CONTAINER sent to absent container container_e361_1524687599273_2110_01_000770 2018-04-26 12:49:31,743 WARN containermanager.ContainerManagerImpl (ContainerManagerImpl.java:handle(1086)) - Event EventType: FINISH_APPLICATION sent to absent application application_1524687599273_2110 {code} was: When a container is not launched on NM and it is absent, RM still tries to get the Log Aggregation Status and reports the status as TIME_OUT. (attached screenshot) {code} 2018-04-26 12:47:38,403 WARN containermanager.ContainerManagerImpl (ContainerManagerImpl.java:handle(1070)) - Event EventType: KILL_CONTAINER sent to absent container container_e361_1524687599273_2110_01_000770 2018-04-26 12:49:31,743 WARN containermanager.ContainerManagerImpl (ContainerManagerImpl.java:handle(1086)) - Event EventType: FINISH_APPLICATION sent to absent application application_1524687599273_2110 {code} > LogAggregation status TIME_OUT for absent container misleading > -- > > Key: YARN-8224 > URL: https://issues.apache.org/jira/browse/YARN-8224 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation >Affects Versions: 2.7.3 >Reporter: Prabhu Joseph >Priority: Major > > When a container is not launched on NM and it is absent, RM still tries to > get the Log Aggregation Status and reports the status as TIME_OUT. > {code} > 2018-04-26 12:47:38,403 WARN containermanager.ContainerManagerImpl > (ContainerManagerImpl.java:handle(1070)) - Event EventType: KILL_CONTAINER > sent to absent container container_e361_1524687599273_2110_01_000770 > 2018-04-26 12:49:31,743 WARN containermanager.ContainerManagerImpl > (ContainerManagerImpl.java:handle(1086)) - Event EventType: > FINISH_APPLICATION sent to absent application application_1524687599273_2110 > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8224) LogAggregation status TIME_OUT for absent container misleading
[ https://issues.apache.org/jira/browse/YARN-8224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated YARN-8224: Description: When a container is not launched on NM and it is absent, RM still tries to get the Log Aggregation Status and reports the status as TIME_OUT in RM UI. {code} 2018-04-26 12:47:38,403 WARN containermanager.ContainerManagerImpl (ContainerManagerImpl.java:handle(1070)) - Event EventType: KILL_CONTAINER sent to absent container container_e361_1524687599273_2110_01_000770 2018-04-26 12:49:31,743 WARN containermanager.ContainerManagerImpl (ContainerManagerImpl.java:handle(1086)) - Event EventType: FINISH_APPLICATION sent to absent application application_1524687599273_2110 {code} was: When a container is not launched on NM and it is absent, RM still tries to get the Log Aggregation Status and reports the status as TIME_OUT. {code} 2018-04-26 12:47:38,403 WARN containermanager.ContainerManagerImpl (ContainerManagerImpl.java:handle(1070)) - Event EventType: KILL_CONTAINER sent to absent container container_e361_1524687599273_2110_01_000770 2018-04-26 12:49:31,743 WARN containermanager.ContainerManagerImpl (ContainerManagerImpl.java:handle(1086)) - Event EventType: FINISH_APPLICATION sent to absent application application_1524687599273_2110 {code} > LogAggregation status TIME_OUT for absent container misleading > -- > > Key: YARN-8224 > URL: https://issues.apache.org/jira/browse/YARN-8224 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation >Affects Versions: 2.7.3 >Reporter: Prabhu Joseph >Priority: Major > > When a container is not launched on NM and it is absent, RM still tries to > get the Log Aggregation Status and reports the status as TIME_OUT in RM UI. > {code} > 2018-04-26 12:47:38,403 WARN containermanager.ContainerManagerImpl > (ContainerManagerImpl.java:handle(1070)) - Event EventType: KILL_CONTAINER > sent to absent container container_e361_1524687599273_2110_01_000770 > 2018-04-26 12:49:31,743 WARN containermanager.ContainerManagerImpl > (ContainerManagerImpl.java:handle(1086)) - Event EventType: > FINISH_APPLICATION sent to absent application application_1524687599273_2110 > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8279) AggregationLogDeletionService does not honor yarn.log-aggregation.IndexedFormat.remote-app-log-dir-suffix
Prabhu Joseph created YARN-8279: --- Summary: AggregationLogDeletionService does not honor yarn.log-aggregation.IndexedFormat.remote-app-log-dir-suffix Key: YARN-8279 URL: https://issues.apache.org/jira/browse/YARN-8279 Project: Hadoop YARN Issue Type: Bug Components: log-aggregation Affects Versions: 2.7.3 Reporter: Prabhu Joseph AggregationLogDeletionService does not honor yarn.log-aggregation.IndexedFormat.remote-app-log-dir-suffix. AggregationLogService writes the logs into /app-logs//logs-ifile where as AggregationLogDeletion tries to delete from /app-logs//logs. Workaround is to set yarn.log-aggregation.IndexedFormat.remote-app-log-dir-suffix and yarn.nodemanager.remote-app-log-dir-suffix to same value "logs-ifile" AggregationLogDeletionService has to check the format and based upon that choose the suffix. Currently it only checks the older suffix yarn.nodemanager.remote-app-log-dir-suffix. AggregatedLogDeletionService tries to delete older suffix directory. {code} 2018-05-11 08:48:19,989 ERROR logaggregation.AggregatedLogDeletionService (AggregatedLogDeletionService.java:logIOException(182)) - Could not read the contents of hdfs://prabhucluster:8020/app-logs/hive/logs java.io.FileNotFoundException: File hdfs://prabhucluster:8020/app-logs/hive/logs does not exist. at org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:923) at org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:114) at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:985) at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:981) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:992) at org.apache.hadoop.yarn.logaggregation.AggregatedLogDeletionService$LogDeletionTask.deleteOldLogDirsFrom(AggregatedLogDeletionService.java:98) at org.apache.hadoop.yarn.logaggregation.AggregatedLogDeletionService$LogDeletionTask.run(AggregatedLogDeletionService.java:85) at java.util.TimerThread.mainLoop(Timer.java:555) at java.util.TimerThread.run(Timer.java:505) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-8279) AggregationLogDeletionService does not honor yarn.log-aggregation.IndexedFormat.remote-app-log-dir-suffix
[ https://issues.apache.org/jira/browse/YARN-8279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph reassigned YARN-8279: --- Assignee: Tarun Parimi > AggregationLogDeletionService does not honor > yarn.log-aggregation.IndexedFormat.remote-app-log-dir-suffix > - > > Key: YARN-8279 > URL: https://issues.apache.org/jira/browse/YARN-8279 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation >Affects Versions: 2.7.3 >Reporter: Prabhu Joseph >Assignee: Tarun Parimi >Priority: Major > > AggregationLogDeletionService does not honor > yarn.log-aggregation.IndexedFormat.remote-app-log-dir-suffix. > AggregationLogService writes the logs into /app-logs//logs-ifile > where as AggregationLogDeletion tries to delete from > /app-logs//logs. > Workaround is to set > yarn.log-aggregation.IndexedFormat.remote-app-log-dir-suffix and > yarn.nodemanager.remote-app-log-dir-suffix to same value "logs-ifile" > AggregationLogDeletionService has to check the format and based upon that > choose the suffix. Currently it only checks the older suffix > yarn.nodemanager.remote-app-log-dir-suffix. > AggregatedLogDeletionService tries to delete older suffix directory. > {code} > 2018-05-11 08:48:19,989 ERROR logaggregation.AggregatedLogDeletionService > (AggregatedLogDeletionService.java:logIOException(182)) - Could not read the > contents of hdfs://prabhucluster:8020/app-logs/hive/logs > java.io.FileNotFoundException: File > hdfs://prabhucluster:8020/app-logs/hive/logs does not exist. > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:923) > at > org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:114) > at > org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:985) > at > org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:981) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:992) > at > org.apache.hadoop.yarn.logaggregation.AggregatedLogDeletionService$LogDeletionTask.deleteOldLogDirsFrom(AggregatedLogDeletionService.java:98) > at > org.apache.hadoop.yarn.logaggregation.AggregatedLogDeletionService$LogDeletionTask.run(AggregatedLogDeletionService.java:85) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8254) dynamically change log levels for YARN Jobs
Prabhu Joseph created YARN-8254: --- Summary: dynamically change log levels for YARN Jobs Key: YARN-8254 URL: https://issues.apache.org/jira/browse/YARN-8254 Project: Hadoop YARN Issue Type: Improvement Components: yarn Affects Versions: 2.7.3 Reporter: Prabhu Joseph Currently the Log Levels for Daemons can be dynamically changed. It will be easier while debugging to have same for YARN Jobs. Client can setLogLevel to ApplicationMaster which can set it for all the containers. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8254) dynamically change log levels for YARN Jobs
[ https://issues.apache.org/jira/browse/YARN-8254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated YARN-8254: Labels: supportability (was: ) > dynamically change log levels for YARN Jobs > --- > > Key: YARN-8254 > URL: https://issues.apache.org/jira/browse/YARN-8254 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 2.7.3 >Reporter: Prabhu Joseph >Priority: Major > Labels: supportability > > Currently the Log Levels for Daemons can be dynamically changed. It will be > easier while debugging to have same for YARN Jobs. Client can setLogLevel to > ApplicationMaster which can set it for all the containers. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8291) RMRegistryOperationService don't have limit on AsyncPurge threads
[ https://issues.apache.org/jira/browse/YARN-8291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated YARN-8291: Affects Version/s: (was: 2.7.3) 3.0.0 > RMRegistryOperationService don't have limit on AsyncPurge threads > - > > Key: YARN-8291 > URL: https://issues.apache.org/jira/browse/YARN-8291 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.0.0 >Reporter: Prabhu Joseph >Priority: Major > > When there are more than 1+ containers finished - > RMRegistryOperationService will create 1+ threads for performing > AsyncPurge which can slowdown the ResourceManager process. There should be a > limit on the number of threads. > {code} > "RegistryAdminService 554485" #824351 prio=5 os_prio=0 tid=0x7fe4b2bc9800 > nid=0xf8ed in Object.wait() [0x7fe31a5e4000] >java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:502) > at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1386) > - locked <0x0007902ec7d8> (a > org.apache.zookeeper.ClientCnxn$Packet) > at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1040) > at > org.apache.curator.framework.imps.ExistsBuilderImpl$2.call(ExistsBuilderImpl.java:172) > at > org.apache.curator.framework.imps.ExistsBuilderImpl$2.call(ExistsBuilderImpl.java:161) > at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107) > at > org.apache.curator.framework.imps.ExistsBuilderImpl.pathInForeground(ExistsBuilderImpl.java:158) > at > org.apache.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:148) > at > org.apache.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:36) > at > org.apache.hadoop.registry.client.impl.zk.CuratorService.zkStat(CuratorService.java:455) > at > org.apache.hadoop.registry.client.impl.zk.RegistryOperationsService.stat(RegistryOperationsService.java:137) > at > org.apache.hadoop.registry.client.binding.RegistryUtils.statChildren(RegistryUtils.java:210) > at > org.apache.hadoop.registry.server.services.RegistryAdminService.purge(RegistryAdminService.java:450) > at > org.apache.hadoop.registry.server.services.RegistryAdminService.purge(RegistryAdminService.java:520) > at > org.apache.hadoop.registry.server.services.RegistryAdminService$AsyncPurge.call(RegistryAdminService.java:570) > at > org.apache.hadoop.registry.server.services.RegistryAdminService$AsyncPurge.call(RegistryAdminService.java:543) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8291) RMRegistryOperationService don't have limit on AsyncPurge threads
[ https://issues.apache.org/jira/browse/YARN-8291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16480424#comment-16480424 ] Prabhu Joseph commented on YARN-8291: - The trunk code is having this issue. > RMRegistryOperationService don't have limit on AsyncPurge threads > - > > Key: YARN-8291 > URL: https://issues.apache.org/jira/browse/YARN-8291 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.0.0 >Reporter: Prabhu Joseph >Priority: Major > > When there are more than 1+ containers finished - > RMRegistryOperationService will create 1+ threads for performing > AsyncPurge which can slowdown the ResourceManager process. There should be a > limit on the number of threads. > {code} > "RegistryAdminService 554485" #824351 prio=5 os_prio=0 tid=0x7fe4b2bc9800 > nid=0xf8ed in Object.wait() [0x7fe31a5e4000] >java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:502) > at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1386) > - locked <0x0007902ec7d8> (a > org.apache.zookeeper.ClientCnxn$Packet) > at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1040) > at > org.apache.curator.framework.imps.ExistsBuilderImpl$2.call(ExistsBuilderImpl.java:172) > at > org.apache.curator.framework.imps.ExistsBuilderImpl$2.call(ExistsBuilderImpl.java:161) > at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107) > at > org.apache.curator.framework.imps.ExistsBuilderImpl.pathInForeground(ExistsBuilderImpl.java:158) > at > org.apache.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:148) > at > org.apache.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:36) > at > org.apache.hadoop.registry.client.impl.zk.CuratorService.zkStat(CuratorService.java:455) > at > org.apache.hadoop.registry.client.impl.zk.RegistryOperationsService.stat(RegistryOperationsService.java:137) > at > org.apache.hadoop.registry.client.binding.RegistryUtils.statChildren(RegistryUtils.java:210) > at > org.apache.hadoop.registry.server.services.RegistryAdminService.purge(RegistryAdminService.java:450) > at > org.apache.hadoop.registry.server.services.RegistryAdminService.purge(RegistryAdminService.java:520) > at > org.apache.hadoop.registry.server.services.RegistryAdminService$AsyncPurge.call(RegistryAdminService.java:570) > at > org.apache.hadoop.registry.server.services.RegistryAdminService$AsyncPurge.call(RegistryAdminService.java:543) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8279) AggregationLogDeletionService does not honor yarn.log-aggregation.IndexedFormat.remote-app-log-dir-suffix
[ https://issues.apache.org/jira/browse/YARN-8279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated YARN-8279: Affects Version/s: (was: 2.7.3) 2.9.1 > AggregationLogDeletionService does not honor > yarn.log-aggregation.IndexedFormat.remote-app-log-dir-suffix > - > > Key: YARN-8279 > URL: https://issues.apache.org/jira/browse/YARN-8279 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation >Affects Versions: 2.9.1 >Reporter: Prabhu Joseph >Assignee: Tarun Parimi >Priority: Major > > AggregationLogDeletionService does not honor > yarn.log-aggregation.IndexedFormat.remote-app-log-dir-suffix. > AggregationLogService writes the logs into /app-logs//logs-ifile > where as AggregationLogDeletion tries to delete from > /app-logs//logs. > Workaround is to set > yarn.log-aggregation.IndexedFormat.remote-app-log-dir-suffix and > yarn.nodemanager.remote-app-log-dir-suffix to same value "logs-ifile" > AggregationLogDeletionService has to check the format and based upon that > choose the suffix. Currently it only checks the older suffix > yarn.nodemanager.remote-app-log-dir-suffix. > AggregatedLogDeletionService tries to delete older suffix directory. > {code} > 2018-05-11 08:48:19,989 ERROR logaggregation.AggregatedLogDeletionService > (AggregatedLogDeletionService.java:logIOException(182)) - Could not read the > contents of hdfs://prabhucluster:8020/app-logs/hive/logs > java.io.FileNotFoundException: File > hdfs://prabhucluster:8020/app-logs/hive/logs does not exist. > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:923) > at > org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:114) > at > org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:985) > at > org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:981) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:992) > at > org.apache.hadoop.yarn.logaggregation.AggregatedLogDeletionService$LogDeletionTask.deleteOldLogDirsFrom(AggregatedLogDeletionService.java:98) > at > org.apache.hadoop.yarn.logaggregation.AggregatedLogDeletionService$LogDeletionTask.run(AggregatedLogDeletionService.java:85) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8279) AggregationLogDeletionService does not honor yarn.log-aggregation.IndexedFormat.remote-app-log-dir-suffix
[ https://issues.apache.org/jira/browse/YARN-8279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489071#comment-16489071 ] Prabhu Joseph commented on YARN-8279: - [~jlowe] We have faced this on HDP Distribution with Hadoop version 2.7.3 but which has most of the latest code from Apache. The issue will also occur in trunk version. > AggregationLogDeletionService does not honor > yarn.log-aggregation.IndexedFormat.remote-app-log-dir-suffix > - > > Key: YARN-8279 > URL: https://issues.apache.org/jira/browse/YARN-8279 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation >Affects Versions: 2.9.1 >Reporter: Prabhu Joseph >Assignee: Tarun Parimi >Priority: Major > > AggregationLogDeletionService does not honor > yarn.log-aggregation.IndexedFormat.remote-app-log-dir-suffix. > AggregationLogService writes the logs into /app-logs//logs-ifile > where as AggregationLogDeletion tries to delete from > /app-logs//logs. > Workaround is to set > yarn.log-aggregation.IndexedFormat.remote-app-log-dir-suffix and > yarn.nodemanager.remote-app-log-dir-suffix to same value "logs-ifile" > AggregationLogDeletionService has to check the format and based upon that > choose the suffix. Currently it only checks the older suffix > yarn.nodemanager.remote-app-log-dir-suffix. > AggregatedLogDeletionService tries to delete older suffix directory. > {code} > 2018-05-11 08:48:19,989 ERROR logaggregation.AggregatedLogDeletionService > (AggregatedLogDeletionService.java:logIOException(182)) - Could not read the > contents of hdfs://prabhucluster:8020/app-logs/hive/logs > java.io.FileNotFoundException: File > hdfs://prabhucluster:8020/app-logs/hive/logs does not exist. > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:923) > at > org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:114) > at > org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:985) > at > org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:981) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:992) > at > org.apache.hadoop.yarn.logaggregation.AggregatedLogDeletionService$LogDeletionTask.deleteOldLogDirsFrom(AggregatedLogDeletionService.java:98) > at > org.apache.hadoop.yarn.logaggregation.AggregatedLogDeletionService$LogDeletionTask.run(AggregatedLogDeletionService.java:85) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8279) AggregationLogDeletionService does not honor yarn.log-aggregation.IndexedFormat.remote-app-log-dir-suffix
[ https://issues.apache.org/jira/browse/YARN-8279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated YARN-8279: Description: AggregationLogDeletionService does not honor yarn.log-aggregation.IndexedFormat.remote-app-log-dir-suffix. AggregationLogService writes the logs into /app-logs//logs-ifile where as AggregationLogDeletion tries to delete from /app-logs//logs. Workaround is to set yarn.log-aggregation.IndexedFormat.remote-app-log-dir-suffix and yarn.nodemanager.remote-app-log-dir-suffix to same value "logs-ifile" and Restart HistoryServer which serves AggregationLogDeletionService AggregationLogDeletionService has to check the format and based upon that choose the suffix. Currently it only checks the older suffix yarn.nodemanager.remote-app-log-dir-suffix. AggregatedLogDeletionService tries to delete older suffix directory. {code} 2018-05-11 08:48:19,989 ERROR logaggregation.AggregatedLogDeletionService (AggregatedLogDeletionService.java:logIOException(182)) - Could not read the contents of hdfs://prabhucluster:8020/app-logs/hive/logs java.io.FileNotFoundException: File hdfs://prabhucluster:8020/app-logs/hive/logs does not exist. at org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:923) at org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:114) at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:985) at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:981) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:992) at org.apache.hadoop.yarn.logaggregation.AggregatedLogDeletionService$LogDeletionTask.deleteOldLogDirsFrom(AggregatedLogDeletionService.java:98) at org.apache.hadoop.yarn.logaggregation.AggregatedLogDeletionService$LogDeletionTask.run(AggregatedLogDeletionService.java:85) at java.util.TimerThread.mainLoop(Timer.java:555) at java.util.TimerThread.run(Timer.java:505) {code} was: AggregationLogDeletionService does not honor yarn.log-aggregation.IndexedFormat.remote-app-log-dir-suffix. AggregationLogService writes the logs into /app-logs//logs-ifile where as AggregationLogDeletion tries to delete from /app-logs//logs. Workaround is to set yarn.log-aggregation.IndexedFormat.remote-app-log-dir-suffix and yarn.nodemanager.remote-app-log-dir-suffix to same value "logs-ifile" AggregationLogDeletionService has to check the format and based upon that choose the suffix. Currently it only checks the older suffix yarn.nodemanager.remote-app-log-dir-suffix. AggregatedLogDeletionService tries to delete older suffix directory. {code} 2018-05-11 08:48:19,989 ERROR logaggregation.AggregatedLogDeletionService (AggregatedLogDeletionService.java:logIOException(182)) - Could not read the contents of hdfs://prabhucluster:8020/app-logs/hive/logs java.io.FileNotFoundException: File hdfs://prabhucluster:8020/app-logs/hive/logs does not exist. at org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:923) at org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:114) at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:985) at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:981) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:992) at org.apache.hadoop.yarn.logaggregation.AggregatedLogDeletionService$LogDeletionTask.deleteOldLogDirsFrom(AggregatedLogDeletionService.java:98) at org.apache.hadoop.yarn.logaggregation.AggregatedLogDeletionService$LogDeletionTask.run(AggregatedLogDeletionService.java:85) at java.util.TimerThread.mainLoop(Timer.java:555) at java.util.TimerThread.run(Timer.java:505) {code} > AggregationLogDeletionService does not honor > yarn.log-aggregation.IndexedFormat.remote-app-log-dir-suffix > - > > Key: YARN-8279 > URL: https://issues.apache.org/jira/browse/YARN-8279 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation >Affects Versions: 2.9.1 >Reporter: Prabhu Joseph >Assignee: Tarun Parimi >Priority: Major > > AggregationLogDeletionService does not honor > yarn.log-aggregation.IndexedFormat.remote-app-log-dir-suffix. > AggregationLogService writes the logs into /app-logs//logs-ifile
[jira] [Updated] (YARN-6929) yarn.nodemanager.remote-app-log-dir structure is not scalable
[ https://issues.apache.org/jira/browse/YARN-6929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated YARN-6929: Attachment: YARN-6929.2.patch > yarn.nodemanager.remote-app-log-dir structure is not scalable > - > > Key: YARN-6929 > URL: https://issues.apache.org/jira/browse/YARN-6929 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation >Affects Versions: 2.7.3 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph > Attachments: YARN-6929.1.patch, YARN-6929.2.patch, YARN-6929.2.patch, > YARN-6929.patch > > > The current directory structure for yarn.nodemanager.remote-app-log-dir is > not scalable. Maximum Subdirectory limit by default is 1048576 (HDFS-6102). > With retention yarn.log-aggregation.retain-seconds of 7days, there are more > chances LogAggregationService fails to create a new directory with > FSLimitException$MaxDirectoryItemsExceededException. > The current structure is > //logs/. This can be > improved with adding date as a subdirectory like > //logs// > {code} > WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService: > Application failed to init aggregation > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.FSLimitException$MaxDirectoryItemsExceededException): > The directory item limit of /app-logs/yarn/logs is exceeded: limit=1048576 > items=1048576 > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxDirItems(FSDirectory.java:2021) > > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:2072) > > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedMkdir(FSDirectory.java:1841) > > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsRecursively(FSNamesystem.java:4351) > > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4262) > > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4221) > > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4194) > > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:813) > > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:600) > > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) > > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.createAppDir(LogAggregationService.java:308) > > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initAppAggregator(LogAggregationService.java:366) > > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initApp(LogAggregationService.java:320) > > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:443) > > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:67) > > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) > > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) > at java.lang.Thread.run(Thread.java:745) > Caused by: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.FSLimitException$MaxDirectoryItemsExceededException): > The directory item limit of /app-logs/yarn/logs is exceeded: limit=1048576 > items=1048576 > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxDirItems(FSDirectory.java:2021) > > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:2072) > > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedMkdir(FSDirectory.java:1841) > > at >
[jira] [Updated] (YARN-6929) yarn.nodemanager.remote-app-log-dir structure is not scalable
[ https://issues.apache.org/jira/browse/YARN-6929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated YARN-6929: Attachment: YARN-6929.3.patch > yarn.nodemanager.remote-app-log-dir structure is not scalable > - > > Key: YARN-6929 > URL: https://issues.apache.org/jira/browse/YARN-6929 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation >Affects Versions: 2.7.3 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph > Attachments: YARN-6929.1.patch, YARN-6929.2.patch, YARN-6929.2.patch, > YARN-6929.3.patch, YARN-6929.patch > > > The current directory structure for yarn.nodemanager.remote-app-log-dir is > not scalable. Maximum Subdirectory limit by default is 1048576 (HDFS-6102). > With retention yarn.log-aggregation.retain-seconds of 7days, there are more > chances LogAggregationService fails to create a new directory with > FSLimitException$MaxDirectoryItemsExceededException. > The current structure is > //logs/. This can be > improved with adding date as a subdirectory like > //logs// > {code} > WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService: > Application failed to init aggregation > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.FSLimitException$MaxDirectoryItemsExceededException): > The directory item limit of /app-logs/yarn/logs is exceeded: limit=1048576 > items=1048576 > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxDirItems(FSDirectory.java:2021) > > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:2072) > > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedMkdir(FSDirectory.java:1841) > > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsRecursively(FSNamesystem.java:4351) > > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4262) > > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4221) > > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4194) > > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:813) > > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:600) > > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) > > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.createAppDir(LogAggregationService.java:308) > > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initAppAggregator(LogAggregationService.java:366) > > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initApp(LogAggregationService.java:320) > > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:443) > > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:67) > > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) > > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) > at java.lang.Thread.run(Thread.java:745) > Caused by: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.FSLimitException$MaxDirectoryItemsExceededException): > The directory item limit of /app-logs/yarn/logs is exceeded: limit=1048576 > items=1048576 > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxDirItems(FSDirectory.java:2021) > > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:2072) > > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedMkdir(FSDirectory.java:1841) > > at >
[jira] [Commented] (YARN-6929) yarn.nodemanager.remote-app-log-dir structure is not scalable
[ https://issues.apache.org/jira/browse/YARN-6929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16222056#comment-16222056 ] Prabhu Joseph commented on YARN-6929: - [~jlowe] [~rohithsharma] Need your help in reviewing this patch. Failing test case is an existing one YARN-7299. Have did functional testing with below test cases {code} 1. New application logs gets written into correct folder structure inside yarn.nodemanager.remote-app-log-dir 2. yarn logs cli works fine 3. Accessing Logs from RM UI / HistoryServer UI works fine while job is running / complete. {code} > yarn.nodemanager.remote-app-log-dir structure is not scalable > - > > Key: YARN-6929 > URL: https://issues.apache.org/jira/browse/YARN-6929 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation >Affects Versions: 2.7.3 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph > Attachments: YARN-6929.1.patch, YARN-6929.2.patch, YARN-6929.2.patch, > YARN-6929.3.patch, YARN-6929.patch > > > The current directory structure for yarn.nodemanager.remote-app-log-dir is > not scalable. Maximum Subdirectory limit by default is 1048576 (HDFS-6102). > With retention yarn.log-aggregation.retain-seconds of 7days, there are more > chances LogAggregationService fails to create a new directory with > FSLimitException$MaxDirectoryItemsExceededException. > The current structure is > //logs/. This can be > improved with adding date as a subdirectory like > //logs// > {code} > WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService: > Application failed to init aggregation > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.FSLimitException$MaxDirectoryItemsExceededException): > The directory item limit of /app-logs/yarn/logs is exceeded: limit=1048576 > items=1048576 > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxDirItems(FSDirectory.java:2021) > > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:2072) > > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedMkdir(FSDirectory.java:1841) > > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsRecursively(FSNamesystem.java:4351) > > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4262) > > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4221) > > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4194) > > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:813) > > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:600) > > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) > > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.createAppDir(LogAggregationService.java:308) > > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initAppAggregator(LogAggregationService.java:366) > > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initApp(LogAggregationService.java:320) > > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:443) > > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:67) > > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) > > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) > at java.lang.Thread.run(Thread.java:745) > Caused by: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.FSLimitException$MaxDirectoryItemsExceededException): > The directory item limit of /app-logs/yarn/logs