[jira] [Commented] (SLIDER-646) AgentLaunchFailureIT test failing at times due to randomness of initial delay in Chaos Monkey
[ https://issues.apache.org/jira/browse/SLIDER-646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14209560#comment-14209560 ] Steve Loughran commented on SLIDER-646: --- the launch failure test is designed to fail immediately on launch, with no delay. Hence the name of the test. If the test is failing then its the test not picking up the failure, not the test. AgentLaunchFailureIT test failing at times due to randomness of initial delay in Chaos Monkey - Key: SLIDER-646 URL: https://issues.apache.org/jira/browse/SLIDER-646 Project: Slider Issue Type: Bug Reporter: Gour Saha Chaos Monkey initial delay should be deterministic. It is currently set to 60 seconds. Subsequent interval is also set to 60 secs. However AgentLaunchFailureIT fails at times because the AM does not get sufficient time to startup. In one failure scenario it has been seen to fail within 300 ms of Chaos Monkey setup. This test fails about once in every 10 attempts. Here is the test output - {code} -- Test set: org.apache.slider.funtest.lifecycle.AgentLaunchFailureIT --- Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 43.503 sec FAILURE! - in org.apache.slider.funtest.lifecycle.AgentLaunchFailureIT testAgentLaunchFailure(org.apache.slider.funtest.lifecycle.AgentLaunchFailureIT) Time elapsed: 40.903 sec FAILURE! java.lang.AssertionError: Application Launch Failure, exit code 68 Chaos monkey triggered launch failure at org.junit.Assert.fail(Assert.java:88) at org.apache.slider.funtest.framework.CommandTestBase.createTemplatedSliderApplication(CommandTestBase.groovy:676) at org.apache.slider.funtest.lifecycle.AgentLaunchFailureIT.testAgentLaunchFailure(AgentLaunchFailureIT.groovy:71) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) {code} Here is the AM log snippet - {code} 2014-11-12 09:29:34,989 [main] INFO appmaster.SliderAppMaster (SliderAppMaster.java:createAndRunCluster(764)) - Token YARN_AM_RM_TOKEN 2014-11-12 09:29:34,990 [main] INFO agent.AgentUtils (AgentUtils.java:getApplicationMetainfo(43)) - Reading metainfo at .slider/package/CMD_LOGGER/apache-slider-command-logger.zip 2014-11-12 09:29:35,014 [main] INFO tools.SliderUtils (SliderUtils.java:getApplicationResourceInputStream(1692)) - Reading metainfo.xml of size 1995 2014-11-12 09:29:35,096 [main] INFO agent.AgentUtils (AgentUtils.java:getDefaultConfig(64)) - Reading default config file configuration/cl-site.xml at .slider/package/CMD_LOGGER/apache-slider-command-logger.zip 2014-11-12 09:29:35,102 [main] INFO tools.SliderUtils (SliderUtils.java:getApplicationResourceInputStream(1692)) - Reading configuration/cl-site.xml of size 1270 2014-11-12 09:29:35,106 [main] INFO agent.HeartbeatMonitor (HeartbeatMonitor.java:start(46)) - Starting heartbeat monitor with interval 6 2014-11-12 09:29:35,107 [Thread-36] DEBUG agent.HeartbeatMonitor (HeartbeatMonitor.java:run(65)) - Putting monitor to sleep for 6 milliseconds 2014-11-12 09:29:35,181 [main] INFO state.AppState (AppState.java:buildInstance(502)) - Adding role COMMAND_LOGGER 2014-11-12 09:29:35,181 [main] INFO state.AppState (AppState.java:createDynamicProviderRole(585)) - Role COMMAND_LOGGER assigned priority 1 2014-11-12 09:29:35,181 [main] INFO state.AppState (AppState.java:buildRoleRequirementsFromResources(687)) - Role COMMAND_LOGGER has 0 instances specified 2014-11-12 09:29:35,253 [main] DEBUG state.RoleHistory (RoleHistory.java:onBootstrap(370)) - Role history
[jira] [Commented] (SLIDER-646) AgentLaunchFailureIT test failing at times due to randomness of initial delay in Chaos Monkey
[ https://issues.apache.org/jira/browse/SLIDER-646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14209584#comment-14209584 ] Jonathan Maron commented on SLIDER-646: --- Perhaps an exit code of 68 (EXIT_YARN_SERVICE_FINISHED_WITH_ERROR) is not necessarily unexpected or considered a failure in this instance. The application fails simply because it's killed prior to reaching a running state. Perhaps the ability to specify an allowed set of exit codes to CommandTestBase.createTemplatedSliderApplication() is a way to go here? AgentLaunchFailureIT test failing at times due to randomness of initial delay in Chaos Monkey - Key: SLIDER-646 URL: https://issues.apache.org/jira/browse/SLIDER-646 Project: Slider Issue Type: Bug Reporter: Gour Saha Chaos Monkey initial delay should be deterministic. It is currently set to 60 seconds. Subsequent interval is also set to 60 secs. However AgentLaunchFailureIT fails at times because the AM does not get sufficient time to startup. In one failure scenario it has been seen to fail within 300 ms of Chaos Monkey setup. This test fails about once in every 10 attempts. Here is the test output - {code} -- Test set: org.apache.slider.funtest.lifecycle.AgentLaunchFailureIT --- Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 43.503 sec FAILURE! - in org.apache.slider.funtest.lifecycle.AgentLaunchFailureIT testAgentLaunchFailure(org.apache.slider.funtest.lifecycle.AgentLaunchFailureIT) Time elapsed: 40.903 sec FAILURE! java.lang.AssertionError: Application Launch Failure, exit code 68 Chaos monkey triggered launch failure at org.junit.Assert.fail(Assert.java:88) at org.apache.slider.funtest.framework.CommandTestBase.createTemplatedSliderApplication(CommandTestBase.groovy:676) at org.apache.slider.funtest.lifecycle.AgentLaunchFailureIT.testAgentLaunchFailure(AgentLaunchFailureIT.groovy:71) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) {code} Here is the AM log snippet - {code} 2014-11-12 09:29:34,989 [main] INFO appmaster.SliderAppMaster (SliderAppMaster.java:createAndRunCluster(764)) - Token YARN_AM_RM_TOKEN 2014-11-12 09:29:34,990 [main] INFO agent.AgentUtils (AgentUtils.java:getApplicationMetainfo(43)) - Reading metainfo at .slider/package/CMD_LOGGER/apache-slider-command-logger.zip 2014-11-12 09:29:35,014 [main] INFO tools.SliderUtils (SliderUtils.java:getApplicationResourceInputStream(1692)) - Reading metainfo.xml of size 1995 2014-11-12 09:29:35,096 [main] INFO agent.AgentUtils (AgentUtils.java:getDefaultConfig(64)) - Reading default config file configuration/cl-site.xml at .slider/package/CMD_LOGGER/apache-slider-command-logger.zip 2014-11-12 09:29:35,102 [main] INFO tools.SliderUtils (SliderUtils.java:getApplicationResourceInputStream(1692)) - Reading configuration/cl-site.xml of size 1270 2014-11-12 09:29:35,106 [main] INFO agent.HeartbeatMonitor (HeartbeatMonitor.java:start(46)) - Starting heartbeat monitor with interval 6 2014-11-12 09:29:35,107 [Thread-36] DEBUG agent.HeartbeatMonitor (HeartbeatMonitor.java:run(65)) - Putting monitor to sleep for 6 milliseconds 2014-11-12 09:29:35,181 [main] INFO state.AppState (AppState.java:buildInstance(502)) - Adding role COMMAND_LOGGER 2014-11-12 09:29:35,181 [main] INFO state.AppState (AppState.java:createDynamicProviderRole(585)) - Role COMMAND_LOGGER assigned priority 1 2014-11-12 09:29:35,181 [main] INFO state.AppState
[jira] [Commented] (SLIDER-633) Slider should support invocation via Oozie
[ https://issues.apache.org/jira/browse/SLIDER-633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14209634#comment-14209634 ] Jonathan Maron commented on SLIDER-633: --- OK - that's the context I was looking for :) Seems like the right approach (implementing the token merge with a slider defined token file property). Slider should support invocation via Oozie -- Key: SLIDER-633 URL: https://issues.apache.org/jira/browse/SLIDER-633 Project: Slider Issue Type: Improvement Affects Versions: Slider 0.50 Reporter: Lee Yang Attachments: fix_oozie_launch.patch In a secure Hadoop installation, when attempting to launch a slider application via an Oozie shell-action, I see the following exception: {noformat} Stdoutput org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation Token can be issued only with kerberos or web authentication Stdoutput at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDelegationToken(FSNamesystem.java:6757) Stdoutput at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getDelegationToken(NameNodeRpcServer.java:499) Stdoutput at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:921) Stdoutput at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) Stdoutput at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) Stdoutput at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) Stdoutput at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) Stdoutput at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) Stdoutput at java.security.AccessController.doPrivileged(Native Method) Stdoutput at javax.security.auth.Subject.doAs(Subject.java:415) Stdoutput at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1637) Stdoutput at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) Stdoutput Stdoutput at org.apache.hadoop.ipc.Client.call(Client.java:1411) Stdoutput at org.apache.hadoop.ipc.Client.call(Client.java:1364) Stdoutput at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) Stdoutput at com.sun.proxy.$Proxy17.getDelegationToken(Unknown Source) Stdoutput at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getDelegationToken(ClientNamenodeProtocolTranslatorPB.java:864) Stdoutput at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) Stdoutput at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) Stdoutput at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) Stdoutput at java.lang.reflect.Method.invoke(Method.java:601) Stdoutput at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) Stdoutput at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) Stdoutput at com.sun.proxy.$Proxy18.getDelegationToken(Unknown Source) Stdoutput at org.apache.hadoop.hdfs.DFSClient.getDelegationToken(DFSClient.java:947) Stdoutput at org.apache.hadoop.hdfs.DistributedFileSystem.getDelegationToken(DistributedFileSystem.java:1305) Stdoutput at org.apache.hadoop.fs.FileSystem.collectDelegationTokens(FileSystem.java:527) Stdoutput at org.apache.hadoop.fs.FileSystem.addDelegationTokens(FileSystem.java:505) Stdoutput at org.apache.slider.core.launch.AppMasterLauncher.addSecurityTokens(AppMasterLauncher.java:209) Stdoutput at org.apache.slider.core.launch.AppMasterLauncher.completeAppMasterLaunch(AppMasterLauncher.java:183) Stdoutput at org.apache.slider.core.launch.AppMasterLauncher.submitApplication(AppMasterLauncher.java:214) Stdoutput at org.apache.slider.client.SliderClient.launchApplication(SliderClient.java:1127) Stdoutput at org.apache.slider.client.SliderClient.startCluster(SliderClient.java:771) Stdoutput at org.apache.slider.client.SliderClient.actionCreate(SliderClient.java:515) Stdoutput at org.apache.slider.client.SliderClient.runService(SliderClient.java:295) Stdoutput at org.apache.slider.core.main.ServiceLauncher.launchService(ServiceLauncher.java:186) Stdoutput at org.apache.slider.core.main.ServiceLauncher.launchServiceRobustly(ServiceLauncher.java:471) Stdoutput at org.apache.slider.core.main.ServiceLauncher.launchServiceAndExit(ServiceLauncher.java:401) Stdoutput at
[jira] [Assigned] (SLIDER-646) AgentLaunchFailureIT test failing at times due to randomness of initial delay in Chaos Monkey
[ https://issues.apache.org/jira/browse/SLIDER-646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran reassigned SLIDER-646: - Assignee: Steve Loughran AgentLaunchFailureIT test failing at times due to randomness of initial delay in Chaos Monkey - Key: SLIDER-646 URL: https://issues.apache.org/jira/browse/SLIDER-646 Project: Slider Issue Type: Bug Reporter: Gour Saha Assignee: Steve Loughran Chaos Monkey initial delay should be deterministic. It is currently set to 60 seconds. Subsequent interval is also set to 60 secs. However AgentLaunchFailureIT fails at times because the AM does not get sufficient time to startup. In one failure scenario it has been seen to fail within 300 ms of Chaos Monkey setup. This test fails about once in every 10 attempts. Here is the test output - {code} -- Test set: org.apache.slider.funtest.lifecycle.AgentLaunchFailureIT --- Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 43.503 sec FAILURE! - in org.apache.slider.funtest.lifecycle.AgentLaunchFailureIT testAgentLaunchFailure(org.apache.slider.funtest.lifecycle.AgentLaunchFailureIT) Time elapsed: 40.903 sec FAILURE! java.lang.AssertionError: Application Launch Failure, exit code 68 Chaos monkey triggered launch failure at org.junit.Assert.fail(Assert.java:88) at org.apache.slider.funtest.framework.CommandTestBase.createTemplatedSliderApplication(CommandTestBase.groovy:676) at org.apache.slider.funtest.lifecycle.AgentLaunchFailureIT.testAgentLaunchFailure(AgentLaunchFailureIT.groovy:71) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) {code} Here is the AM log snippet - {code} 2014-11-12 09:29:34,989 [main] INFO appmaster.SliderAppMaster (SliderAppMaster.java:createAndRunCluster(764)) - Token YARN_AM_RM_TOKEN 2014-11-12 09:29:34,990 [main] INFO agent.AgentUtils (AgentUtils.java:getApplicationMetainfo(43)) - Reading metainfo at .slider/package/CMD_LOGGER/apache-slider-command-logger.zip 2014-11-12 09:29:35,014 [main] INFO tools.SliderUtils (SliderUtils.java:getApplicationResourceInputStream(1692)) - Reading metainfo.xml of size 1995 2014-11-12 09:29:35,096 [main] INFO agent.AgentUtils (AgentUtils.java:getDefaultConfig(64)) - Reading default config file configuration/cl-site.xml at .slider/package/CMD_LOGGER/apache-slider-command-logger.zip 2014-11-12 09:29:35,102 [main] INFO tools.SliderUtils (SliderUtils.java:getApplicationResourceInputStream(1692)) - Reading configuration/cl-site.xml of size 1270 2014-11-12 09:29:35,106 [main] INFO agent.HeartbeatMonitor (HeartbeatMonitor.java:start(46)) - Starting heartbeat monitor with interval 6 2014-11-12 09:29:35,107 [Thread-36] DEBUG agent.HeartbeatMonitor (HeartbeatMonitor.java:run(65)) - Putting monitor to sleep for 6 milliseconds 2014-11-12 09:29:35,181 [main] INFO state.AppState (AppState.java:buildInstance(502)) - Adding role COMMAND_LOGGER 2014-11-12 09:29:35,181 [main] INFO state.AppState (AppState.java:createDynamicProviderRole(585)) - Role COMMAND_LOGGER assigned priority 1 2014-11-12 09:29:35,181 [main] INFO state.AppState (AppState.java:buildRoleRequirementsFromResources(687)) - Role COMMAND_LOGGER has 0 instances specified 2014-11-12 09:29:35,253 [main] DEBUG state.RoleHistory (RoleHistory.java:onBootstrap(370)) - Role history bootstrapped 2014-11-12 09:29:35,268 [main] INFO appmaster.SliderAppMaster (SliderAppMaster.java:maybeStartMonkey(2183)) - Adding Chaos Monkey scheduled every 60 seconds (0
[jira] [Updated] (SLIDER-646) AgentLaunchFailureIT test failing at times
[ https://issues.apache.org/jira/browse/SLIDER-646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated SLIDER-646: -- Summary: AgentLaunchFailureIT test failing at times (was: AgentLaunchFailureIT test failing at times due to randomness of initial delay in Chaos Monkey) AgentLaunchFailureIT test failing at times -- Key: SLIDER-646 URL: https://issues.apache.org/jira/browse/SLIDER-646 Project: Slider Issue Type: Bug Reporter: Gour Saha Assignee: Steve Loughran Chaos Monkey initial delay should be deterministic. It is currently set to 60 seconds. Subsequent interval is also set to 60 secs. However AgentLaunchFailureIT fails at times because the AM does not get sufficient time to startup. In one failure scenario it has been seen to fail within 300 ms of Chaos Monkey setup. This test fails about once in every 10 attempts. Here is the test output - {code} -- Test set: org.apache.slider.funtest.lifecycle.AgentLaunchFailureIT --- Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 43.503 sec FAILURE! - in org.apache.slider.funtest.lifecycle.AgentLaunchFailureIT testAgentLaunchFailure(org.apache.slider.funtest.lifecycle.AgentLaunchFailureIT) Time elapsed: 40.903 sec FAILURE! java.lang.AssertionError: Application Launch Failure, exit code 68 Chaos monkey triggered launch failure at org.junit.Assert.fail(Assert.java:88) at org.apache.slider.funtest.framework.CommandTestBase.createTemplatedSliderApplication(CommandTestBase.groovy:676) at org.apache.slider.funtest.lifecycle.AgentLaunchFailureIT.testAgentLaunchFailure(AgentLaunchFailureIT.groovy:71) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) {code} Here is the AM log snippet - {code} 2014-11-12 09:29:34,989 [main] INFO appmaster.SliderAppMaster (SliderAppMaster.java:createAndRunCluster(764)) - Token YARN_AM_RM_TOKEN 2014-11-12 09:29:34,990 [main] INFO agent.AgentUtils (AgentUtils.java:getApplicationMetainfo(43)) - Reading metainfo at .slider/package/CMD_LOGGER/apache-slider-command-logger.zip 2014-11-12 09:29:35,014 [main] INFO tools.SliderUtils (SliderUtils.java:getApplicationResourceInputStream(1692)) - Reading metainfo.xml of size 1995 2014-11-12 09:29:35,096 [main] INFO agent.AgentUtils (AgentUtils.java:getDefaultConfig(64)) - Reading default config file configuration/cl-site.xml at .slider/package/CMD_LOGGER/apache-slider-command-logger.zip 2014-11-12 09:29:35,102 [main] INFO tools.SliderUtils (SliderUtils.java:getApplicationResourceInputStream(1692)) - Reading configuration/cl-site.xml of size 1270 2014-11-12 09:29:35,106 [main] INFO agent.HeartbeatMonitor (HeartbeatMonitor.java:start(46)) - Starting heartbeat monitor with interval 6 2014-11-12 09:29:35,107 [Thread-36] DEBUG agent.HeartbeatMonitor (HeartbeatMonitor.java:run(65)) - Putting monitor to sleep for 6 milliseconds 2014-11-12 09:29:35,181 [main] INFO state.AppState (AppState.java:buildInstance(502)) - Adding role COMMAND_LOGGER 2014-11-12 09:29:35,181 [main] INFO state.AppState (AppState.java:createDynamicProviderRole(585)) - Role COMMAND_LOGGER assigned priority 1 2014-11-12 09:29:35,181 [main] INFO state.AppState (AppState.java:buildRoleRequirementsFromResources(687)) - Role COMMAND_LOGGER has 0 instances specified 2014-11-12 09:29:35,253 [main] DEBUG state.RoleHistory (RoleHistory.java:onBootstrap(370)) - Role history bootstrapped 2014-11-12 09:29:35,268 [main] INFO appmaster.SliderAppMaster (SliderAppMaster.java:maybeStartMonkey(2183)) - Adding Chaos Monkey scheduled
[jira] [Commented] (SLIDER-646) AgentLaunchFailureIT test failing at times
[ https://issues.apache.org/jira/browse/SLIDER-646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14209717#comment-14209717 ] Steve Loughran commented on SLIDER-646: --- fix is to tell this launch not to do any waiting for state change, but instead just save app report and return, test case can contain all startup logic and tests. While I'm at it: set attempt count to 1, so there's no risk of double-restart AgentLaunchFailureIT test failing at times -- Key: SLIDER-646 URL: https://issues.apache.org/jira/browse/SLIDER-646 Project: Slider Issue Type: Bug Reporter: Gour Saha Assignee: Steve Loughran Chaos Monkey initial delay should be deterministic. It is currently set to 60 seconds. Subsequent interval is also set to 60 secs. However AgentLaunchFailureIT fails at times because the AM does not get sufficient time to startup. In one failure scenario it has been seen to fail within 300 ms of Chaos Monkey setup. This test fails about once in every 10 attempts. Here is the test output - {code} -- Test set: org.apache.slider.funtest.lifecycle.AgentLaunchFailureIT --- Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 43.503 sec FAILURE! - in org.apache.slider.funtest.lifecycle.AgentLaunchFailureIT testAgentLaunchFailure(org.apache.slider.funtest.lifecycle.AgentLaunchFailureIT) Time elapsed: 40.903 sec FAILURE! java.lang.AssertionError: Application Launch Failure, exit code 68 Chaos monkey triggered launch failure at org.junit.Assert.fail(Assert.java:88) at org.apache.slider.funtest.framework.CommandTestBase.createTemplatedSliderApplication(CommandTestBase.groovy:676) at org.apache.slider.funtest.lifecycle.AgentLaunchFailureIT.testAgentLaunchFailure(AgentLaunchFailureIT.groovy:71) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) {code} Here is the AM log snippet - {code} 2014-11-12 09:29:34,989 [main] INFO appmaster.SliderAppMaster (SliderAppMaster.java:createAndRunCluster(764)) - Token YARN_AM_RM_TOKEN 2014-11-12 09:29:34,990 [main] INFO agent.AgentUtils (AgentUtils.java:getApplicationMetainfo(43)) - Reading metainfo at .slider/package/CMD_LOGGER/apache-slider-command-logger.zip 2014-11-12 09:29:35,014 [main] INFO tools.SliderUtils (SliderUtils.java:getApplicationResourceInputStream(1692)) - Reading metainfo.xml of size 1995 2014-11-12 09:29:35,096 [main] INFO agent.AgentUtils (AgentUtils.java:getDefaultConfig(64)) - Reading default config file configuration/cl-site.xml at .slider/package/CMD_LOGGER/apache-slider-command-logger.zip 2014-11-12 09:29:35,102 [main] INFO tools.SliderUtils (SliderUtils.java:getApplicationResourceInputStream(1692)) - Reading configuration/cl-site.xml of size 1270 2014-11-12 09:29:35,106 [main] INFO agent.HeartbeatMonitor (HeartbeatMonitor.java:start(46)) - Starting heartbeat monitor with interval 6 2014-11-12 09:29:35,107 [Thread-36] DEBUG agent.HeartbeatMonitor (HeartbeatMonitor.java:run(65)) - Putting monitor to sleep for 6 milliseconds 2014-11-12 09:29:35,181 [main] INFO state.AppState (AppState.java:buildInstance(502)) - Adding role COMMAND_LOGGER 2014-11-12 09:29:35,181 [main] INFO state.AppState (AppState.java:createDynamicProviderRole(585)) - Role COMMAND_LOGGER assigned priority 1 2014-11-12 09:29:35,181 [main] INFO state.AppState (AppState.java:buildRoleRequirementsFromResources(687)) - Role COMMAND_LOGGER has 0 instances specified 2014-11-12 09:29:35,253 [main] DEBUG state.RoleHistory (RoleHistory.java:onBootstrap(370)) - Role history bootstrapped
[jira] [Commented] (SLIDER-646) AgentLaunchFailureIT test failing at times
[ https://issues.apache.org/jira/browse/SLIDER-646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14209806#comment-14209806 ] ASF subversion and git services commented on SLIDER-646: Commit 0cb6eaf76dc94f053ad9d0561b5b78a81c736494 in incubator-slider's branch refs/heads/releases/slider-0.60 from [~ste...@apache.org] [ https://git-wip-us.apache.org/repos/asf?p=incubator-slider.git;h=0cb6eaf ] SLIDER-646 intermittent AgentLaunchFailureIT test failures AgentLaunchFailureIT test failing at times -- Key: SLIDER-646 URL: https://issues.apache.org/jira/browse/SLIDER-646 Project: Slider Issue Type: Bug Reporter: Gour Saha Assignee: Steve Loughran Chaos Monkey initial delay should be deterministic. It is currently set to 60 seconds. Subsequent interval is also set to 60 secs. However AgentLaunchFailureIT fails at times because the AM does not get sufficient time to startup. In one failure scenario it has been seen to fail within 300 ms of Chaos Monkey setup. This test fails about once in every 10 attempts. Here is the test output - {code} -- Test set: org.apache.slider.funtest.lifecycle.AgentLaunchFailureIT --- Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 43.503 sec FAILURE! - in org.apache.slider.funtest.lifecycle.AgentLaunchFailureIT testAgentLaunchFailure(org.apache.slider.funtest.lifecycle.AgentLaunchFailureIT) Time elapsed: 40.903 sec FAILURE! java.lang.AssertionError: Application Launch Failure, exit code 68 Chaos monkey triggered launch failure at org.junit.Assert.fail(Assert.java:88) at org.apache.slider.funtest.framework.CommandTestBase.createTemplatedSliderApplication(CommandTestBase.groovy:676) at org.apache.slider.funtest.lifecycle.AgentLaunchFailureIT.testAgentLaunchFailure(AgentLaunchFailureIT.groovy:71) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) {code} Here is the AM log snippet - {code} 2014-11-12 09:29:34,989 [main] INFO appmaster.SliderAppMaster (SliderAppMaster.java:createAndRunCluster(764)) - Token YARN_AM_RM_TOKEN 2014-11-12 09:29:34,990 [main] INFO agent.AgentUtils (AgentUtils.java:getApplicationMetainfo(43)) - Reading metainfo at .slider/package/CMD_LOGGER/apache-slider-command-logger.zip 2014-11-12 09:29:35,014 [main] INFO tools.SliderUtils (SliderUtils.java:getApplicationResourceInputStream(1692)) - Reading metainfo.xml of size 1995 2014-11-12 09:29:35,096 [main] INFO agent.AgentUtils (AgentUtils.java:getDefaultConfig(64)) - Reading default config file configuration/cl-site.xml at .slider/package/CMD_LOGGER/apache-slider-command-logger.zip 2014-11-12 09:29:35,102 [main] INFO tools.SliderUtils (SliderUtils.java:getApplicationResourceInputStream(1692)) - Reading configuration/cl-site.xml of size 1270 2014-11-12 09:29:35,106 [main] INFO agent.HeartbeatMonitor (HeartbeatMonitor.java:start(46)) - Starting heartbeat monitor with interval 6 2014-11-12 09:29:35,107 [Thread-36] DEBUG agent.HeartbeatMonitor (HeartbeatMonitor.java:run(65)) - Putting monitor to sleep for 6 milliseconds 2014-11-12 09:29:35,181 [main] INFO state.AppState (AppState.java:buildInstance(502)) - Adding role COMMAND_LOGGER 2014-11-12 09:29:35,181 [main] INFO state.AppState (AppState.java:createDynamicProviderRole(585)) - Role COMMAND_LOGGER assigned priority 1 2014-11-12 09:29:35,181 [main] INFO state.AppState (AppState.java:buildRoleRequirementsFromResources(687)) - Role COMMAND_LOGGER has 0 instances specified 2014-11-12 09:29:35,253 [main] DEBUG state.RoleHistory
[jira] [Commented] (SLIDER-633) Slider should support invocation via Oozie
[ https://issues.apache.org/jira/browse/SLIDER-633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14209808#comment-14209808 ] Jonathan Maron commented on SLIDER-633: --- Still - it's unclear to me whether the merge of tokens is actually required to be performed by Slider for these oozie/non-kerberos authenticated invocations. To sum up the logic in the patch: based on a flag (in this case the mapreduce token file flag, but appears that it can be generalized to a boolean or authentication type check): 1) The code does not attempt to obtain the delegation tokens from the file system (since there isn't a proper kerberos identity established) 2) The code then leverages the credentials (tokens) from the established login user (apparently there is one established in an oozie invocation) for the container launch context (rather than using the empty credentials - see point 1) Indications are that this works for the scenario in question (any chance of getting a test case we can use?), so I'm not clear on whether the TokenCache merge is actually required... Slider should support invocation via Oozie -- Key: SLIDER-633 URL: https://issues.apache.org/jira/browse/SLIDER-633 Project: Slider Issue Type: Improvement Affects Versions: Slider 0.50 Reporter: Lee Yang Attachments: fix_oozie_launch.patch In a secure Hadoop installation, when attempting to launch a slider application via an Oozie shell-action, I see the following exception: {noformat} Stdoutput org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation Token can be issued only with kerberos or web authentication Stdoutput at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDelegationToken(FSNamesystem.java:6757) Stdoutput at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getDelegationToken(NameNodeRpcServer.java:499) Stdoutput at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:921) Stdoutput at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) Stdoutput at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) Stdoutput at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) Stdoutput at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) Stdoutput at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) Stdoutput at java.security.AccessController.doPrivileged(Native Method) Stdoutput at javax.security.auth.Subject.doAs(Subject.java:415) Stdoutput at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1637) Stdoutput at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) Stdoutput Stdoutput at org.apache.hadoop.ipc.Client.call(Client.java:1411) Stdoutput at org.apache.hadoop.ipc.Client.call(Client.java:1364) Stdoutput at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) Stdoutput at com.sun.proxy.$Proxy17.getDelegationToken(Unknown Source) Stdoutput at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getDelegationToken(ClientNamenodeProtocolTranslatorPB.java:864) Stdoutput at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) Stdoutput at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) Stdoutput at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) Stdoutput at java.lang.reflect.Method.invoke(Method.java:601) Stdoutput at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) Stdoutput at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) Stdoutput at com.sun.proxy.$Proxy18.getDelegationToken(Unknown Source) Stdoutput at org.apache.hadoop.hdfs.DFSClient.getDelegationToken(DFSClient.java:947) Stdoutput at org.apache.hadoop.hdfs.DistributedFileSystem.getDelegationToken(DistributedFileSystem.java:1305) Stdoutput at org.apache.hadoop.fs.FileSystem.collectDelegationTokens(FileSystem.java:527) Stdoutput at org.apache.hadoop.fs.FileSystem.addDelegationTokens(FileSystem.java:505) Stdoutput at org.apache.slider.core.launch.AppMasterLauncher.addSecurityTokens(AppMasterLauncher.java:209) Stdoutput at org.apache.slider.core.launch.AppMasterLauncher.completeAppMasterLaunch(AppMasterLauncher.java:183) Stdoutput at org.apache.slider.core.launch.AppMasterLauncher.submitApplication(AppMasterLauncher.java:214) Stdoutput at
[jira] [Commented] (SLIDER-646) AgentLaunchFailureIT test failing at times
[ https://issues.apache.org/jira/browse/SLIDER-646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14209925#comment-14209925 ] ASF subversion and git services commented on SLIDER-646: Commit e3846d5c720172c9e63b377a70bff690f402b993 in incubator-slider's branch refs/heads/develop from [~ste...@apache.org] [ https://git-wip-us.apache.org/repos/asf?p=incubator-slider.git;h=e3846d5 ] SLIDER-646 intermittent AgentLaunchFailureIT test failures AgentLaunchFailureIT test failing at times -- Key: SLIDER-646 URL: https://issues.apache.org/jira/browse/SLIDER-646 Project: Slider Issue Type: Bug Reporter: Gour Saha Assignee: Steve Loughran Chaos Monkey initial delay should be deterministic. It is currently set to 60 seconds. Subsequent interval is also set to 60 secs. However AgentLaunchFailureIT fails at times because the AM does not get sufficient time to startup. In one failure scenario it has been seen to fail within 300 ms of Chaos Monkey setup. This test fails about once in every 10 attempts. Here is the test output - {code} -- Test set: org.apache.slider.funtest.lifecycle.AgentLaunchFailureIT --- Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 43.503 sec FAILURE! - in org.apache.slider.funtest.lifecycle.AgentLaunchFailureIT testAgentLaunchFailure(org.apache.slider.funtest.lifecycle.AgentLaunchFailureIT) Time elapsed: 40.903 sec FAILURE! java.lang.AssertionError: Application Launch Failure, exit code 68 Chaos monkey triggered launch failure at org.junit.Assert.fail(Assert.java:88) at org.apache.slider.funtest.framework.CommandTestBase.createTemplatedSliderApplication(CommandTestBase.groovy:676) at org.apache.slider.funtest.lifecycle.AgentLaunchFailureIT.testAgentLaunchFailure(AgentLaunchFailureIT.groovy:71) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) {code} Here is the AM log snippet - {code} 2014-11-12 09:29:34,989 [main] INFO appmaster.SliderAppMaster (SliderAppMaster.java:createAndRunCluster(764)) - Token YARN_AM_RM_TOKEN 2014-11-12 09:29:34,990 [main] INFO agent.AgentUtils (AgentUtils.java:getApplicationMetainfo(43)) - Reading metainfo at .slider/package/CMD_LOGGER/apache-slider-command-logger.zip 2014-11-12 09:29:35,014 [main] INFO tools.SliderUtils (SliderUtils.java:getApplicationResourceInputStream(1692)) - Reading metainfo.xml of size 1995 2014-11-12 09:29:35,096 [main] INFO agent.AgentUtils (AgentUtils.java:getDefaultConfig(64)) - Reading default config file configuration/cl-site.xml at .slider/package/CMD_LOGGER/apache-slider-command-logger.zip 2014-11-12 09:29:35,102 [main] INFO tools.SliderUtils (SliderUtils.java:getApplicationResourceInputStream(1692)) - Reading configuration/cl-site.xml of size 1270 2014-11-12 09:29:35,106 [main] INFO agent.HeartbeatMonitor (HeartbeatMonitor.java:start(46)) - Starting heartbeat monitor with interval 6 2014-11-12 09:29:35,107 [Thread-36] DEBUG agent.HeartbeatMonitor (HeartbeatMonitor.java:run(65)) - Putting monitor to sleep for 6 milliseconds 2014-11-12 09:29:35,181 [main] INFO state.AppState (AppState.java:buildInstance(502)) - Adding role COMMAND_LOGGER 2014-11-12 09:29:35,181 [main] INFO state.AppState (AppState.java:createDynamicProviderRole(585)) - Role COMMAND_LOGGER assigned priority 1 2014-11-12 09:29:35,181 [main] INFO state.AppState (AppState.java:buildRoleRequirementsFromResources(687)) - Role COMMAND_LOGGER has 0 instances specified 2014-11-12 09:29:35,253 [main] DEBUG state.RoleHistory
[jira] [Updated] (SLIDER-646) AgentLaunchFailureIT test failing at times
[ https://issues.apache.org/jira/browse/SLIDER-646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated SLIDER-646: -- Component/s: test Sprint: Slider November #1 Fix Version/s: Slider 0.60 AgentLaunchFailureIT test failing at times -- Key: SLIDER-646 URL: https://issues.apache.org/jira/browse/SLIDER-646 Project: Slider Issue Type: Bug Components: test Reporter: Gour Saha Assignee: Steve Loughran Fix For: Slider 0.60 Chaos Monkey initial delay should be deterministic. It is currently set to 60 seconds. Subsequent interval is also set to 60 secs. However AgentLaunchFailureIT fails at times because the AM does not get sufficient time to startup. In one failure scenario it has been seen to fail within 300 ms of Chaos Monkey setup. This test fails about once in every 10 attempts. Here is the test output - {code} -- Test set: org.apache.slider.funtest.lifecycle.AgentLaunchFailureIT --- Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 43.503 sec FAILURE! - in org.apache.slider.funtest.lifecycle.AgentLaunchFailureIT testAgentLaunchFailure(org.apache.slider.funtest.lifecycle.AgentLaunchFailureIT) Time elapsed: 40.903 sec FAILURE! java.lang.AssertionError: Application Launch Failure, exit code 68 Chaos monkey triggered launch failure at org.junit.Assert.fail(Assert.java:88) at org.apache.slider.funtest.framework.CommandTestBase.createTemplatedSliderApplication(CommandTestBase.groovy:676) at org.apache.slider.funtest.lifecycle.AgentLaunchFailureIT.testAgentLaunchFailure(AgentLaunchFailureIT.groovy:71) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) {code} Here is the AM log snippet - {code} 2014-11-12 09:29:34,989 [main] INFO appmaster.SliderAppMaster (SliderAppMaster.java:createAndRunCluster(764)) - Token YARN_AM_RM_TOKEN 2014-11-12 09:29:34,990 [main] INFO agent.AgentUtils (AgentUtils.java:getApplicationMetainfo(43)) - Reading metainfo at .slider/package/CMD_LOGGER/apache-slider-command-logger.zip 2014-11-12 09:29:35,014 [main] INFO tools.SliderUtils (SliderUtils.java:getApplicationResourceInputStream(1692)) - Reading metainfo.xml of size 1995 2014-11-12 09:29:35,096 [main] INFO agent.AgentUtils (AgentUtils.java:getDefaultConfig(64)) - Reading default config file configuration/cl-site.xml at .slider/package/CMD_LOGGER/apache-slider-command-logger.zip 2014-11-12 09:29:35,102 [main] INFO tools.SliderUtils (SliderUtils.java:getApplicationResourceInputStream(1692)) - Reading configuration/cl-site.xml of size 1270 2014-11-12 09:29:35,106 [main] INFO agent.HeartbeatMonitor (HeartbeatMonitor.java:start(46)) - Starting heartbeat monitor with interval 6 2014-11-12 09:29:35,107 [Thread-36] DEBUG agent.HeartbeatMonitor (HeartbeatMonitor.java:run(65)) - Putting monitor to sleep for 6 milliseconds 2014-11-12 09:29:35,181 [main] INFO state.AppState (AppState.java:buildInstance(502)) - Adding role COMMAND_LOGGER 2014-11-12 09:29:35,181 [main] INFO state.AppState (AppState.java:createDynamicProviderRole(585)) - Role COMMAND_LOGGER assigned priority 1 2014-11-12 09:29:35,181 [main] INFO state.AppState (AppState.java:buildRoleRequirementsFromResources(687)) - Role COMMAND_LOGGER has 0 instances specified 2014-11-12 09:29:35,253 [main] DEBUG state.RoleHistory (RoleHistory.java:onBootstrap(370)) - Role history bootstrapped 2014-11-12 09:29:35,268 [main] INFO appmaster.SliderAppMaster (SliderAppMaster.java:maybeStartMonkey(2183)) - Adding Chaos Monkey scheduled every 60
[jira] [Resolved] (SLIDER-646) AgentLaunchFailureIT test failing at times
[ https://issues.apache.org/jira/browse/SLIDER-646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved SLIDER-646. --- Resolution: Fixed AgentLaunchFailureIT test failing at times -- Key: SLIDER-646 URL: https://issues.apache.org/jira/browse/SLIDER-646 Project: Slider Issue Type: Bug Components: test Reporter: Gour Saha Assignee: Steve Loughran Fix For: Slider 0.60 Chaos Monkey initial delay should be deterministic. It is currently set to 60 seconds. Subsequent interval is also set to 60 secs. However AgentLaunchFailureIT fails at times because the AM does not get sufficient time to startup. In one failure scenario it has been seen to fail within 300 ms of Chaos Monkey setup. This test fails about once in every 10 attempts. Here is the test output - {code} -- Test set: org.apache.slider.funtest.lifecycle.AgentLaunchFailureIT --- Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 43.503 sec FAILURE! - in org.apache.slider.funtest.lifecycle.AgentLaunchFailureIT testAgentLaunchFailure(org.apache.slider.funtest.lifecycle.AgentLaunchFailureIT) Time elapsed: 40.903 sec FAILURE! java.lang.AssertionError: Application Launch Failure, exit code 68 Chaos monkey triggered launch failure at org.junit.Assert.fail(Assert.java:88) at org.apache.slider.funtest.framework.CommandTestBase.createTemplatedSliderApplication(CommandTestBase.groovy:676) at org.apache.slider.funtest.lifecycle.AgentLaunchFailureIT.testAgentLaunchFailure(AgentLaunchFailureIT.groovy:71) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) {code} Here is the AM log snippet - {code} 2014-11-12 09:29:34,989 [main] INFO appmaster.SliderAppMaster (SliderAppMaster.java:createAndRunCluster(764)) - Token YARN_AM_RM_TOKEN 2014-11-12 09:29:34,990 [main] INFO agent.AgentUtils (AgentUtils.java:getApplicationMetainfo(43)) - Reading metainfo at .slider/package/CMD_LOGGER/apache-slider-command-logger.zip 2014-11-12 09:29:35,014 [main] INFO tools.SliderUtils (SliderUtils.java:getApplicationResourceInputStream(1692)) - Reading metainfo.xml of size 1995 2014-11-12 09:29:35,096 [main] INFO agent.AgentUtils (AgentUtils.java:getDefaultConfig(64)) - Reading default config file configuration/cl-site.xml at .slider/package/CMD_LOGGER/apache-slider-command-logger.zip 2014-11-12 09:29:35,102 [main] INFO tools.SliderUtils (SliderUtils.java:getApplicationResourceInputStream(1692)) - Reading configuration/cl-site.xml of size 1270 2014-11-12 09:29:35,106 [main] INFO agent.HeartbeatMonitor (HeartbeatMonitor.java:start(46)) - Starting heartbeat monitor with interval 6 2014-11-12 09:29:35,107 [Thread-36] DEBUG agent.HeartbeatMonitor (HeartbeatMonitor.java:run(65)) - Putting monitor to sleep for 6 milliseconds 2014-11-12 09:29:35,181 [main] INFO state.AppState (AppState.java:buildInstance(502)) - Adding role COMMAND_LOGGER 2014-11-12 09:29:35,181 [main] INFO state.AppState (AppState.java:createDynamicProviderRole(585)) - Role COMMAND_LOGGER assigned priority 1 2014-11-12 09:29:35,181 [main] INFO state.AppState (AppState.java:buildRoleRequirementsFromResources(687)) - Role COMMAND_LOGGER has 0 instances specified 2014-11-12 09:29:35,253 [main] DEBUG state.RoleHistory (RoleHistory.java:onBootstrap(370)) - Role history bootstrapped 2014-11-12 09:29:35,268 [main] INFO appmaster.SliderAppMaster (SliderAppMaster.java:maybeStartMonkey(2183)) - Adding Chaos Monkey scheduled every 60 seconds (0 hours -delay 60 2014-11-12 09:29:35,269 [main] INFO
[jira] [Commented] (SLIDER-264) Test runners to fail fast if app is already finished
[ https://issues.apache.org/jira/browse/SLIDER-264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14209939#comment-14209939 ] Steve Loughran commented on SLIDER-264: --- with the ability to save app reports, extract app ids and then use {{slider lookup}} to query by appid, there's no need for confusion between app instances, so failfast can be implemented in the unit tests. It is already done in the funtests Test runners to fail fast if app is already finished Key: SLIDER-264 URL: https://issues.apache.org/jira/browse/SLIDER-264 Project: Slider Issue Type: Improvement Components: test Affects Versions: Slider 0.40 Reporter: Steve Loughran Priority: Minor Fix For: Slider 2.0.0 From SLIDER-230: bq. the test logic might be able to be improved as it was repeatedly waiting for Accumulo to start when it was already in the FINISHED state. We might be able to short-circuit the test logic to fail faster in this case. the test runner may be spinning until started or timeout, when it should bail when finished. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (SLIDER-647) allocation requests not being satisfied when a cluster goes to labels
Steve Loughran created SLIDER-647: - Summary: allocation requests not being satisfied when a cluster goes to labels Key: SLIDER-647 URL: https://issues.apache.org/jira/browse/SLIDER-647 Project: Slider Issue Type: Bug Components: appmaster Affects Versions: Slider 0.60 Environment: cluster that was upgraded to labels. Reporter: Steve Loughran Assignee: Steve Loughran Reported problem # app installed on cluster; working # cluster upgraded to labels # when app started, container requests remaining outstanding # deleting the role history fixed this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (SLIDER-647) allocation requests not being satisfied when a cluster goes to labels
[ https://issues.apache.org/jira/browse/SLIDER-647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14209945#comment-14209945 ] Steve Loughran commented on SLIDER-647: --- -need to look at how resource requests are made, and validate that they are for relaxed placement. allocation requests not being satisfied when a cluster goes to labels - Key: SLIDER-647 URL: https://issues.apache.org/jira/browse/SLIDER-647 Project: Slider Issue Type: Bug Components: appmaster Affects Versions: Slider 0.60 Environment: cluster that was upgraded to labels. Reporter: Steve Loughran Assignee: Steve Loughran Reported problem # app installed on cluster; working # cluster upgraded to labels # when app started, container requests remaining outstanding # deleting the role history fixed this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Slider-develop - Build # 467 - Failure
The Apache Jenkins build system has built Slider-develop (build #467) Status: Failure Check console output at https://builds.apache.org/job/Slider-develop/467/ to view the results.
[jira] [Commented] (SLIDER-633) Slider should support invocation via Oozie
[ https://issues.apache.org/jira/browse/SLIDER-633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14210015#comment-14210015 ] Lee Yang commented on SLIDER-633: - For Jonathan's initial comment #1 above, I generated the patch from master (incubating-0.50.2), because we're currently testing slider against a Hadoop 2.5 grid. As for the logic, I was mostly looking for a simple way to switch between Oozie mode (with credentials/tokens provided by oozie) vs. interactive mode (with delegation tokens obtained from the file system). Since I was injecting this new setting, where it wasn't used before, it was just a simple way to differentiate between these modes. If there's a better way to identify which mode it's operating in, that would be preferrred. As for a simple test, I can probably cobble together a simple oozie workflow, with a shell-action that invokes slider to run a java process. Unfortunately, I'm not sure how that might be integrated into the build/tests. Slider should support invocation via Oozie -- Key: SLIDER-633 URL: https://issues.apache.org/jira/browse/SLIDER-633 Project: Slider Issue Type: Improvement Affects Versions: Slider 0.50 Reporter: Lee Yang Attachments: fix_oozie_launch.patch In a secure Hadoop installation, when attempting to launch a slider application via an Oozie shell-action, I see the following exception: {noformat} Stdoutput org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation Token can be issued only with kerberos or web authentication Stdoutput at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDelegationToken(FSNamesystem.java:6757) Stdoutput at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getDelegationToken(NameNodeRpcServer.java:499) Stdoutput at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:921) Stdoutput at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) Stdoutput at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) Stdoutput at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) Stdoutput at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) Stdoutput at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) Stdoutput at java.security.AccessController.doPrivileged(Native Method) Stdoutput at javax.security.auth.Subject.doAs(Subject.java:415) Stdoutput at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1637) Stdoutput at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) Stdoutput Stdoutput at org.apache.hadoop.ipc.Client.call(Client.java:1411) Stdoutput at org.apache.hadoop.ipc.Client.call(Client.java:1364) Stdoutput at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) Stdoutput at com.sun.proxy.$Proxy17.getDelegationToken(Unknown Source) Stdoutput at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getDelegationToken(ClientNamenodeProtocolTranslatorPB.java:864) Stdoutput at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) Stdoutput at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) Stdoutput at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) Stdoutput at java.lang.reflect.Method.invoke(Method.java:601) Stdoutput at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) Stdoutput at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) Stdoutput at com.sun.proxy.$Proxy18.getDelegationToken(Unknown Source) Stdoutput at org.apache.hadoop.hdfs.DFSClient.getDelegationToken(DFSClient.java:947) Stdoutput at org.apache.hadoop.hdfs.DistributedFileSystem.getDelegationToken(DistributedFileSystem.java:1305) Stdoutput at org.apache.hadoop.fs.FileSystem.collectDelegationTokens(FileSystem.java:527) Stdoutput at org.apache.hadoop.fs.FileSystem.addDelegationTokens(FileSystem.java:505) Stdoutput at org.apache.slider.core.launch.AppMasterLauncher.addSecurityTokens(AppMasterLauncher.java:209) Stdoutput at org.apache.slider.core.launch.AppMasterLauncher.completeAppMasterLaunch(AppMasterLauncher.java:183) Stdoutput at org.apache.slider.core.launch.AppMasterLauncher.submitApplication(AppMasterLauncher.java:214) Stdoutput at org.apache.slider.client.SliderClient.launchApplication(SliderClient.java:1127) Stdoutput at
[jira] [Commented] (SLIDER-647) allocation requests not being satisfied when a cluster goes to labels
[ https://issues.apache.org/jira/browse/SLIDER-647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14210126#comment-14210126 ] Steve Loughran commented on SLIDER-647: --- propose:- # relax locality by default # option will be in {{yarn.placement.strict=true}} # global default: false, components can override # add IT test. allocation requests not being satisfied when a cluster goes to labels - Key: SLIDER-647 URL: https://issues.apache.org/jira/browse/SLIDER-647 Project: Slider Issue Type: Bug Components: appmaster Affects Versions: Slider 0.60 Environment: cluster that was upgraded to labels. Reporter: Steve Loughran Assignee: Steve Loughran Reported problem # app installed on cluster; working # cluster upgraded to labels # when app started, container requests remaining outstanding # deleting the role history fixed this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (SLIDER-647) allocation requests not being satisfied when a cluster goes to labels
[ https://issues.apache.org/jira/browse/SLIDER-647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated SLIDER-647: -- Priority: Blocker (was: Major) allocation requests not being satisfied when a cluster goes to labels - Key: SLIDER-647 URL: https://issues.apache.org/jira/browse/SLIDER-647 Project: Slider Issue Type: Bug Components: appmaster Affects Versions: Slider 0.60 Environment: cluster that was upgraded to labels. Reporter: Steve Loughran Assignee: Steve Loughran Priority: Blocker Reported problem # app installed on cluster; working # cluster upgraded to labels # when app started, container requests remaining outstanding # deleting the role history fixed this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Q: How to define component configuration?
https://issues.apache.org/jira/browse/SLIDER-648 created. Thanks! On Tue, Nov 11, 2014 at 5:32 PM, Sumit Mohanty sumit.moha...@gmail.com wrote: components: { COMPONENT1: { *mykey: myvalue* }, This is not wired up in the AgentProviderService to send to the agents. So as a work-around you may have to use something like global: { COMPONENT1.mykey: myvaluep, Can you file a JIRA to add support for reading component specific configs from appConfig.json and make them available at the container? This seems to be a good feature to support. -Sumit On Tue, Nov 11, 2014 at 2:42 PM, hsy...@gmail.com hsy...@gmail.com wrote: Thanks Steve, but I logged all the properties in params.py, I couldn't find any key named *mykey* *config = Script.get_config()* for key in config.keys(): print key: + key for key in config['global'].keys(): print key: + key Best On Tue, Nov 11, 2014 at 5:04 AM, Steve Loughran ste...@hortonworks.com wrote: that should be it. What happens each component gets the properties of component-level union global-level that is, everything that is global, extended with anything that is at the component level. If a component overrides the global value, that override is picked up It's essentially a form of prototype-based programming, except only of properties, not methods: http://en.wikipedia.org/wiki/Prototype-based_programming On 11 November 2014 01:30, hsy...@gmail.com hsy...@gmail.com wrote: Thanks Ted, but back to my first question, how can you define component level property? in appConfig.json? I tried to define like this : { schema: http://example.org/specification/v2.0.0;, metadata: { }, global: { application.def: app-package-0.1.zip, java_home: /usr/lib/jvm/java-7-oracle/, package_list: files/app.tgz, agent.conf: /user/siyuan/agent/conf/agent.ini, site.global.app_user: siyuan, site.global.app_root: ${AGENT_WORK_ROOT}/app/install/kafka_2.10-0.8.1.1, site.global.app_install_dir: ${AGENT_WORK_ROOT}/app/install, site.global.pid_file: ${AGENT_WORK_ROOT}/app/run/app.pid, }, components: { COMPONENT1: { *mykey: myvalue* }, slider-appmaster: { jvm.heapsize: 256M } } } Is it able to make the component1 read the value for *mykey* ? Best, Siyuan On Mon, Nov 10, 2014 at 4:15 PM, Ted Yu yuzhih...@gmail.com wrote: To my knowledge, there is no direct support for this. You can create different components, each with corresponding properties. Cheers On Mon, Nov 10, 2014 at 4:13 PM, hsy...@gmail.com hsy...@gmail.com wrote: If I want to have several instances of some component. But I want to set some of the properties to different value for different instances. How can I do it? Thanks! Best, Siyuan On Mon, Nov 10, 2014 at 1:26 PM, hsy...@gmail.com hsy...@gmail.com wrote: Hi guys, Is there an example of component configuration? Is there a way to give different value to same property for different instances? Siyuan -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- thanks Sumit
[jira] [Created] (SLIDER-648) Support component level properties
Siyuan Hua created SLIDER-648: - Summary: Support component level properties Key: SLIDER-648 URL: https://issues.apache.org/jira/browse/SLIDER-648 Project: Slider Issue Type: Improvement Reporter: Siyuan Hua Nowadays I could define properties like this global:{ site.COMPONENT1.mykey : myvalue } That's better to define component properties in component section like this: components: { COMPONENT1: { *mykey: myvalue* }, -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (SLIDER-633) Slider should support invocation via Oozie
[ https://issues.apache.org/jira/browse/SLIDER-633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14210174#comment-14210174 ] Jonathan Maron commented on SLIDER-633: --- Instead of a flag we may be able to simply check the authentication method (only KERBEROS would allow for a token retrieval from the FS). If you could cobble something together it may help with functional testing of the patch as well as maybe provide hints for an appropriate unit testing/functional testing approach. Slider should support invocation via Oozie -- Key: SLIDER-633 URL: https://issues.apache.org/jira/browse/SLIDER-633 Project: Slider Issue Type: Improvement Affects Versions: Slider 0.50 Reporter: Lee Yang Attachments: fix_oozie_launch.patch In a secure Hadoop installation, when attempting to launch a slider application via an Oozie shell-action, I see the following exception: {noformat} Stdoutput org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation Token can be issued only with kerberos or web authentication Stdoutput at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDelegationToken(FSNamesystem.java:6757) Stdoutput at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getDelegationToken(NameNodeRpcServer.java:499) Stdoutput at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:921) Stdoutput at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) Stdoutput at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) Stdoutput at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) Stdoutput at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) Stdoutput at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) Stdoutput at java.security.AccessController.doPrivileged(Native Method) Stdoutput at javax.security.auth.Subject.doAs(Subject.java:415) Stdoutput at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1637) Stdoutput at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) Stdoutput Stdoutput at org.apache.hadoop.ipc.Client.call(Client.java:1411) Stdoutput at org.apache.hadoop.ipc.Client.call(Client.java:1364) Stdoutput at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) Stdoutput at com.sun.proxy.$Proxy17.getDelegationToken(Unknown Source) Stdoutput at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getDelegationToken(ClientNamenodeProtocolTranslatorPB.java:864) Stdoutput at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) Stdoutput at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) Stdoutput at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) Stdoutput at java.lang.reflect.Method.invoke(Method.java:601) Stdoutput at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) Stdoutput at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) Stdoutput at com.sun.proxy.$Proxy18.getDelegationToken(Unknown Source) Stdoutput at org.apache.hadoop.hdfs.DFSClient.getDelegationToken(DFSClient.java:947) Stdoutput at org.apache.hadoop.hdfs.DistributedFileSystem.getDelegationToken(DistributedFileSystem.java:1305) Stdoutput at org.apache.hadoop.fs.FileSystem.collectDelegationTokens(FileSystem.java:527) Stdoutput at org.apache.hadoop.fs.FileSystem.addDelegationTokens(FileSystem.java:505) Stdoutput at org.apache.slider.core.launch.AppMasterLauncher.addSecurityTokens(AppMasterLauncher.java:209) Stdoutput at org.apache.slider.core.launch.AppMasterLauncher.completeAppMasterLaunch(AppMasterLauncher.java:183) Stdoutput at org.apache.slider.core.launch.AppMasterLauncher.submitApplication(AppMasterLauncher.java:214) Stdoutput at org.apache.slider.client.SliderClient.launchApplication(SliderClient.java:1127) Stdoutput at org.apache.slider.client.SliderClient.startCluster(SliderClient.java:771) Stdoutput at org.apache.slider.client.SliderClient.actionCreate(SliderClient.java:515) Stdoutput at org.apache.slider.client.SliderClient.runService(SliderClient.java:295) Stdoutput at org.apache.slider.core.main.ServiceLauncher.launchService(ServiceLauncher.java:186) Stdoutput at org.apache.slider.core.main.ServiceLauncher.launchServiceRobustly(ServiceLauncher.java:471) Stdoutput at
Slider-develop - Build # 468 - Fixed
The Apache Jenkins build system has built Slider-develop (build #468) Status: Fixed Check console output at https://builds.apache.org/job/Slider-develop/468/ to view the results.
[jira] [Commented] (SLIDER-633) Slider should support invocation via Oozie
[ https://issues.apache.org/jira/browse/SLIDER-633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14211227#comment-14211227 ] Jonathan Maron commented on SLIDER-633: --- Lee - are you explicitly setting the mapreduce.job.credentials.binary property or are you finding that it is available in the oozie launched slider client from the instantiated Configration? It may be appropriate to continue using it since it is an indication of the yarn/oozie provided credentials for the workflow (and associated mapper). Slider should support invocation via Oozie -- Key: SLIDER-633 URL: https://issues.apache.org/jira/browse/SLIDER-633 Project: Slider Issue Type: Improvement Affects Versions: Slider 0.50 Reporter: Lee Yang Attachments: fix_oozie_launch.patch In a secure Hadoop installation, when attempting to launch a slider application via an Oozie shell-action, I see the following exception: {noformat} Stdoutput org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation Token can be issued only with kerberos or web authentication Stdoutput at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDelegationToken(FSNamesystem.java:6757) Stdoutput at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getDelegationToken(NameNodeRpcServer.java:499) Stdoutput at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:921) Stdoutput at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) Stdoutput at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) Stdoutput at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) Stdoutput at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) Stdoutput at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) Stdoutput at java.security.AccessController.doPrivileged(Native Method) Stdoutput at javax.security.auth.Subject.doAs(Subject.java:415) Stdoutput at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1637) Stdoutput at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) Stdoutput Stdoutput at org.apache.hadoop.ipc.Client.call(Client.java:1411) Stdoutput at org.apache.hadoop.ipc.Client.call(Client.java:1364) Stdoutput at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) Stdoutput at com.sun.proxy.$Proxy17.getDelegationToken(Unknown Source) Stdoutput at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getDelegationToken(ClientNamenodeProtocolTranslatorPB.java:864) Stdoutput at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) Stdoutput at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) Stdoutput at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) Stdoutput at java.lang.reflect.Method.invoke(Method.java:601) Stdoutput at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) Stdoutput at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) Stdoutput at com.sun.proxy.$Proxy18.getDelegationToken(Unknown Source) Stdoutput at org.apache.hadoop.hdfs.DFSClient.getDelegationToken(DFSClient.java:947) Stdoutput at org.apache.hadoop.hdfs.DistributedFileSystem.getDelegationToken(DistributedFileSystem.java:1305) Stdoutput at org.apache.hadoop.fs.FileSystem.collectDelegationTokens(FileSystem.java:527) Stdoutput at org.apache.hadoop.fs.FileSystem.addDelegationTokens(FileSystem.java:505) Stdoutput at org.apache.slider.core.launch.AppMasterLauncher.addSecurityTokens(AppMasterLauncher.java:209) Stdoutput at org.apache.slider.core.launch.AppMasterLauncher.completeAppMasterLaunch(AppMasterLauncher.java:183) Stdoutput at org.apache.slider.core.launch.AppMasterLauncher.submitApplication(AppMasterLauncher.java:214) Stdoutput at org.apache.slider.client.SliderClient.launchApplication(SliderClient.java:1127) Stdoutput at org.apache.slider.client.SliderClient.startCluster(SliderClient.java:771) Stdoutput at org.apache.slider.client.SliderClient.actionCreate(SliderClient.java:515) Stdoutput at org.apache.slider.client.SliderClient.runService(SliderClient.java:295) Stdoutput at org.apache.slider.core.main.ServiceLauncher.launchService(ServiceLauncher.java:186) Stdoutput at org.apache.slider.core.main.ServiceLauncher.launchServiceRobustly(ServiceLauncher.java:471)
[jira] [Commented] (SLIDER-625) Regression: interactively setting passwords in CredentialProvider no longer working
[ https://issues.apache.org/jira/browse/SLIDER-625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14211244#comment-14211244 ] ASF subversion and git services commented on SLIDER-625: Commit 313156e8edcbf6d5634b9b763fbe636c6fb84a31 in incubator-slider's branch refs/heads/develop from [~billie.rina...@gmail.com] [ https://git-wip-us.apache.org/repos/asf?p=incubator-slider.git;h=313156e ] SLIDER-625 moved password prompt to a variable as per steve's suggestion Regression: interactively setting passwords in CredentialProvider no longer working --- Key: SLIDER-625 URL: https://issues.apache.org/jira/browse/SLIDER-625 Project: Slider Issue Type: Bug Reporter: Billie Rinaldi Assignee: Billie Rinaldi Fix For: Slider 0.70 {noformat} 2014-11-07 09:00:45,642 [main] INFO client.SliderClient (SliderClient.java:checkForCredentials(652)) - Creating credentials for root.initial.password in jceks://hdfs/user/hrt_qa/accumulo-a1.jceks java.lang.NullPointerException at org.apache.hadoop.security.alias.CredentialShell$PasswordReader.readPassword(CredentialShell.java:408) at org.apache.hadoop.security.alias.CredentialShell.promptForCredential(CredentialShell.java:379) at org.apache.hadoop.security.alias.CredentialShell$CreateCommand.execute(CredentialShell.java:348) at org.apache.hadoop.security.alias.CredentialShell.run(CredentialShell.java:68) at org.apache.slider.client.SliderClient.checkForCredentials(SliderClient.java:653) at org.apache.slider.client.SliderClient.actionCreate(SliderClient.java:615) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (SLIDER-633) Slider should support invocation via Oozie
[ https://issues.apache.org/jira/browse/SLIDER-633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14211291#comment-14211291 ] Lee Yang commented on SLIDER-633: - I have to explicitly set it per my first comment above (currently creating a conf file on the fly, since that was the simplest way I could find to inject this into slider dynamically). Oozie provides the location of tokens in an ENV var to the shell-action called HADOOP_TOKEN_FILE_LOCATION. Then, I basically have to set mapreduce.job.credentials.binary to this value, s.t. it gets pulled in by the various hadoop apis. Slider should support invocation via Oozie -- Key: SLIDER-633 URL: https://issues.apache.org/jira/browse/SLIDER-633 Project: Slider Issue Type: Improvement Affects Versions: Slider 0.50 Reporter: Lee Yang Attachments: fix_oozie_launch.patch In a secure Hadoop installation, when attempting to launch a slider application via an Oozie shell-action, I see the following exception: {noformat} Stdoutput org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation Token can be issued only with kerberos or web authentication Stdoutput at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDelegationToken(FSNamesystem.java:6757) Stdoutput at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getDelegationToken(NameNodeRpcServer.java:499) Stdoutput at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:921) Stdoutput at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) Stdoutput at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) Stdoutput at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) Stdoutput at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) Stdoutput at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) Stdoutput at java.security.AccessController.doPrivileged(Native Method) Stdoutput at javax.security.auth.Subject.doAs(Subject.java:415) Stdoutput at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1637) Stdoutput at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) Stdoutput Stdoutput at org.apache.hadoop.ipc.Client.call(Client.java:1411) Stdoutput at org.apache.hadoop.ipc.Client.call(Client.java:1364) Stdoutput at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) Stdoutput at com.sun.proxy.$Proxy17.getDelegationToken(Unknown Source) Stdoutput at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getDelegationToken(ClientNamenodeProtocolTranslatorPB.java:864) Stdoutput at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) Stdoutput at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) Stdoutput at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) Stdoutput at java.lang.reflect.Method.invoke(Method.java:601) Stdoutput at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) Stdoutput at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) Stdoutput at com.sun.proxy.$Proxy18.getDelegationToken(Unknown Source) Stdoutput at org.apache.hadoop.hdfs.DFSClient.getDelegationToken(DFSClient.java:947) Stdoutput at org.apache.hadoop.hdfs.DistributedFileSystem.getDelegationToken(DistributedFileSystem.java:1305) Stdoutput at org.apache.hadoop.fs.FileSystem.collectDelegationTokens(FileSystem.java:527) Stdoutput at org.apache.hadoop.fs.FileSystem.addDelegationTokens(FileSystem.java:505) Stdoutput at org.apache.slider.core.launch.AppMasterLauncher.addSecurityTokens(AppMasterLauncher.java:209) Stdoutput at org.apache.slider.core.launch.AppMasterLauncher.completeAppMasterLaunch(AppMasterLauncher.java:183) Stdoutput at org.apache.slider.core.launch.AppMasterLauncher.submitApplication(AppMasterLauncher.java:214) Stdoutput at org.apache.slider.client.SliderClient.launchApplication(SliderClient.java:1127) Stdoutput at org.apache.slider.client.SliderClient.startCluster(SliderClient.java:771) Stdoutput at org.apache.slider.client.SliderClient.actionCreate(SliderClient.java:515) Stdoutput at org.apache.slider.client.SliderClient.runService(SliderClient.java:295) Stdoutput at org.apache.slider.core.main.ServiceLauncher.launchService(ServiceLauncher.java:186) Stdoutput at
[jira] [Resolved] (SLIDER-647) allocation requests not being satisfied when a cluster goes to labels
[ https://issues.apache.org/jira/browse/SLIDER-647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved SLIDER-647. --- Resolution: Fixed Fix Version/s: Slider 0.60 allocation requests not being satisfied when a cluster goes to labels - Key: SLIDER-647 URL: https://issues.apache.org/jira/browse/SLIDER-647 Project: Slider Issue Type: Bug Components: appmaster Affects Versions: Slider 0.60 Environment: cluster that was upgraded to labels. Reporter: Steve Loughran Assignee: Steve Loughran Priority: Blocker Fix For: Slider 0.60 Reported problem # app installed on cluster; working # cluster upgraded to labels # when app started, container requests remaining outstanding # deleting the role history fixed this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (SLIDER-647) allocation requests not being satisfied when a cluster goes to labels
[ https://issues.apache.org/jira/browse/SLIDER-647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14211418#comment-14211418 ] ASF subversion and git services commented on SLIDER-647: Commit b86360263c99fc9d1f140460c96cba6afb3f7ae9 in incubator-slider's branch refs/heads/develop from [~ste...@apache.org] [ https://git-wip-us.apache.org/repos/asf?p=incubator-slider.git;h=b863602 ] Merge branch 'feature/SLIDER-647-placement' into develop allocation requests not being satisfied when a cluster goes to labels - Key: SLIDER-647 URL: https://issues.apache.org/jira/browse/SLIDER-647 Project: Slider Issue Type: Bug Components: appmaster Affects Versions: Slider 0.60 Environment: cluster that was upgraded to labels. Reporter: Steve Loughran Assignee: Steve Loughran Priority: Blocker Fix For: Slider 0.60 Reported problem # app installed on cluster; working # cluster upgraded to labels # when app started, container requests remaining outstanding # deleting the role history fixed this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (SLIDER-647) allocation requests not being satisfied when a cluster goes to labels
[ https://issues.apache.org/jira/browse/SLIDER-647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14211417#comment-14211417 ] ASF subversion and git services commented on SLIDER-647: Commit f6abb46c7174a6d4e3d7887938b148734c91dc5e in incubator-slider's branch refs/heads/develop from [~ste...@apache.org] [ https://git-wip-us.apache.org/repos/asf?p=incubator-slider.git;h=f6abb46 ] SLIDER-647 allocation requests not being satisfied when a cluster goes to labels allocation requests not being satisfied when a cluster goes to labels - Key: SLIDER-647 URL: https://issues.apache.org/jira/browse/SLIDER-647 Project: Slider Issue Type: Bug Components: appmaster Affects Versions: Slider 0.60 Environment: cluster that was upgraded to labels. Reporter: Steve Loughran Assignee: Steve Loughran Priority: Blocker Fix For: Slider 0.60 Reported problem # app installed on cluster; working # cluster upgraded to labels # when app started, container requests remaining outstanding # deleting the role history fixed this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (SLIDER-649) TestStandaloneAMRestart fails intermittently on windows
Jonathan Maron created SLIDER-649: - Summary: TestStandaloneAMRestart fails intermittently on windows Key: SLIDER-649 URL: https://issues.apache.org/jira/browse/SLIDER-649 Project: Slider Issue Type: Bug Reporter: Jonathan Maron Failure is probably do invalid client interactions during AM lifecycle. For time time being will try to address these with longer sleep times. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (SLIDER-649) TestStandaloneAMRestart fails intermittently on windows
[ https://issues.apache.org/jira/browse/SLIDER-649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14211433#comment-14211433 ] ASF subversion and git services commented on SLIDER-649: Commit f50adbda7af9347b68530c6fd2b6ebe200c7e1fb in incubator-slider's branch refs/heads/develop from [~jmaron] [ https://git-wip-us.apache.org/repos/asf?p=incubator-slider.git;h=f50adbd ] SLIDER-649 extends test sleep times TestStandaloneAMRestart fails intermittently on windows --- Key: SLIDER-649 URL: https://issues.apache.org/jira/browse/SLIDER-649 Project: Slider Issue Type: Bug Reporter: Jonathan Maron Failure is probably do invalid client interactions during AM lifecycle. For time time being will try to address these with longer sleep times. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (SLIDER-555) AM needs to get log aggregation friendly log4j
[ https://issues.apache.org/jira/browse/SLIDER-555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14211715#comment-14211715 ] ASF subversion and git services commented on SLIDER-555: Commit bb873fdeee82d30d74e0d622ec1d0e1a46cd1aaf in incubator-slider's branch refs/heads/develop from [~sumitmohanty] [ https://git-wip-us.apache.org/repos/asf?p=incubator-slider.git;h=bb873fd ] SLIDER-555. Default ZK node is not getting deleted when you destroy an application AM needs to get log aggregation friendly log4j -- Key: SLIDER-555 URL: https://issues.apache.org/jira/browse/SLIDER-555 Project: Slider Issue Type: Bug Components: appmaster Reporter: Steve Loughran Assignee: Gour Saha Fix For: Slider 0.60 The AM needs to get log4j settings -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (SLIDER-647) allocation requests not being satisfied when a cluster goes to labels
[ https://issues.apache.org/jira/browse/SLIDER-647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14211722#comment-14211722 ] ASF subversion and git services commented on SLIDER-647: Commit c7186e13ddd2c1e45e4faf84119b5bb2d04fd00e in incubator-slider's branch refs/heads/releases/slider-0.60 from [~ste...@apache.org] [ https://git-wip-us.apache.org/repos/asf?p=incubator-slider.git;h=c7186e1 ] SLIDER-647 allocation requests not being satisfied when a cluster goes to labels allocation requests not being satisfied when a cluster goes to labels - Key: SLIDER-647 URL: https://issues.apache.org/jira/browse/SLIDER-647 Project: Slider Issue Type: Bug Components: appmaster Affects Versions: Slider 0.60 Environment: cluster that was upgraded to labels. Reporter: Steve Loughran Assignee: Steve Loughran Priority: Blocker Fix For: Slider 0.60 Reported problem # app installed on cluster; working # cluster upgraded to labels # when app started, container requests remaining outstanding # deleting the role history fixed this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (SLIDER-555) AM needs to get log aggregation friendly log4j
[ https://issues.apache.org/jira/browse/SLIDER-555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14211723#comment-14211723 ] ASF subversion and git services commented on SLIDER-555: Commit 2c6cc4973edba44c5aaeea93653742e519c7188c in incubator-slider's branch refs/heads/releases/slider-0.60 from [~sumitmohanty] [ https://git-wip-us.apache.org/repos/asf?p=incubator-slider.git;h=2c6cc49 ] SLIDER-555. Default ZK node is not getting deleted when you destroy an application AM needs to get log aggregation friendly log4j -- Key: SLIDER-555 URL: https://issues.apache.org/jira/browse/SLIDER-555 Project: Slider Issue Type: Bug Components: appmaster Reporter: Steve Loughran Assignee: Gour Saha Fix For: Slider 0.60 The AM needs to get log4j settings -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (SLIDER-649) TestStandaloneAMRestart fails intermittently on windows
[ https://issues.apache.org/jira/browse/SLIDER-649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14211760#comment-14211760 ] ASF subversion and git services commented on SLIDER-649: Commit fb25d9356353c79bd2fd89bf73b913aaa508a630 in incubator-slider's branch refs/heads/releases/slider-0.60 from [~jmaron] [ https://git-wip-us.apache.org/repos/asf?p=incubator-slider.git;h=fb25d93 ] SLIDER-649 extends test sleep times TestStandaloneAMRestart fails intermittently on windows --- Key: SLIDER-649 URL: https://issues.apache.org/jira/browse/SLIDER-649 Project: Slider Issue Type: Bug Reporter: Jonathan Maron Failure is probably do invalid client interactions during AM lifecycle. For time time being will try to address these with longer sleep times. -- This message was sent by Atlassian JIRA (v6.3.4#6332)