[jira] [Commented] (APEXMALHAR-2566) NPE in FSWindowDataManager
[ https://issues.apache.org/jira/browse/APEXMALHAR-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16804597#comment-16804597 ] Pramod Immaneni commented on APEXMALHAR-2566: - [~vikram25] looks like the project settings allow only committers to be assigned JIRAs which shouldn't be the case, contributors should also be allowed. While I figure out how that can be changed, I think it is safe to say you can work on it as no one else has commented on it. > NPE in FSWindowDataManager > -- > > Key: APEXMALHAR-2566 > URL: https://issues.apache.org/jira/browse/APEXMALHAR-2566 > Project: Apache Apex Malhar > Issue Type: Bug >Reporter: Pramod Immaneni >Priority: Major > > Running into a null pointer exception during recovery in FSWindowDataManager > implementation which in turn is causing operator to be stuck in a recovery > loop. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (APEXMALHAR-2566) NPE in FSWindowDataManager
[ https://issues.apache.org/jira/browse/APEXMALHAR-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16799364#comment-16799364 ] Pramod Immaneni commented on APEXMALHAR-2566: - java.lang.NullPointerException at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:213) at org.apache.apex.malhar.lib.wal.FSWindowDataManager.retrieve(FSWindowDataManager.java:487) at org.apache.apex.malhar.lib.wal.FSWindowDataManager.retrieve(FSWindowDataManager.java:448) at com.example.myapexapp.HTTPSPostOperator.beginWindow(HTTPSPostOperator.java:183) at com.datatorrent.stram.engine.GenericNode.run(GenericNode.java:306) at com.datatorrent.stram.engine.StreamingContainer$2.run(StreamingContainer.java:1407) > NPE in FSWindowDataManager > -- > > Key: APEXMALHAR-2566 > URL: https://issues.apache.org/jira/browse/APEXMALHAR-2566 > Project: Apache Apex Malhar > Issue Type: Bug >Reporter: Pramod Immaneni >Priority: Major > > Running into a null pointer exception during recovery in FSWindowDataManager > implementation which in turn is causing operator to be stuck in a recovery > loop. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (APEXMALHAR-2566) NPE in FSWindowDataManager
Pramod Immaneni created APEXMALHAR-2566: --- Summary: NPE in FSWindowDataManager Key: APEXMALHAR-2566 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2566 Project: Apache Apex Malhar Issue Type: Bug Reporter: Pramod Immaneni Running into a null pointer exception during recovery in FSWindowDataManager implementation which in turn is causing operator to be stuck in a recovery loop. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (APEXCORE-796) Docker based deployment
[ https://issues.apache.org/jira/browse/APEXCORE-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16580385#comment-16580385 ] Pramod Immaneni commented on APEXCORE-796: -- We had some community discussions on this and the consensus was to support kubernetes cluster management and use docker as the underlying container runtime as opposed to directly supporting docker. The design would be flexible enough to theoretically plugin other management systems like swarm in the future. Because of the stateful nature of processing and strong requirements for the correctness of results even in case of fault recovery and the stateful dynamic scaling requirements, I suspect apex engine would need to participate in a considerable capacity in both the fault tolerance and scaling aspects with what kubernetes already provides and what Istio would further augment. In other words, I don't think what kubernetes provides out of the box will be sufficient. It's good to see interest in this functionality in the community. > Docker based deployment > --- > > Key: APEXCORE-796 > URL: https://issues.apache.org/jira/browse/APEXCORE-796 > Project: Apache Apex Core > Issue Type: New Feature >Reporter: Thomas Weise >Priority: Major > Labels: roadmap > Attachments: Docker_K8es.jpg > > > Apex should support deployment using Docker as alternative to application > packages. Docker images provide a simple and standard way to package > dependencies and solve isolation from the host environment. This will be > particularly helpful when applications depend on native, non-JVM packages > like Python and R, that otherwise need to be installed separately. Docker > support will also be a step towards supporting other cluster managers like > Kubernetes, Mesos and Swarm. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (APEXCORE-724) Support for Kubernetes
[ https://issues.apache.org/jira/browse/APEXCORE-724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pramod Immaneni reassigned APEXCORE-724: Assignee: Pramod Immaneni > Support for Kubernetes > -- > > Key: APEXCORE-724 > URL: https://issues.apache.org/jira/browse/APEXCORE-724 > Project: Apache Apex Core > Issue Type: New Feature >Reporter: Thomas Weise >Assignee: Pramod Immaneni >Priority: Major > Labels: roadmap > > It should be possible to run Apex applications on Kubernetes. This will also > require that Apex applications can be packaged as containers (Docker or other > supported container). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (APEXMALHAR-2565) Creation of IMDG for data persistence as of now this is handled by In memory , tied to sticky session
[ https://issues.apache.org/jira/browse/APEXMALHAR-2565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16580363#comment-16580363 ] Pramod Immaneni commented on APEXMALHAR-2565: - The in-memory state is checkpointed to a persistent store, by default hdfs, so no data should be lost. There are also efficient state management and storage mechanisms available if there is a lot of data to hold in state. The checkpointing is done periodically as the processing moves forward so you are capturing a fairly recent snapshot of the state, to recover from in case of failures. In the documentation and in the public domain you will also be able to find some presentations and videos on more advanced topics like exactly once processing. > Creation of IMDG for data persistence as of now this is handled by In memory > , tied to sticky session > - > > Key: APEXMALHAR-2565 > URL: https://issues.apache.org/jira/browse/APEXMALHAR-2565 > Project: Apache Apex Malhar > Issue Type: Improvement > Components: AppData >Reporter: Mahesh >Priority: Major > > The Apex Malhar has the In memory that is warmed up ( data loaded) to the In > memory on the startup, that is Data is also lost when the Node goes down, > this has to be handled with a cluster of IMDG, that can increase the > performance of the processing and also Fault-tolerant, with the current > in-memory Model and with sticky Node (session), if Node fails the same has to > be restarted . Pls look into this if this is an improvement adds value or not -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (APEXCORE-796) Docker based deployment
[ https://issues.apache.org/jira/browse/APEXCORE-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553684#comment-16553684 ] Pramod Immaneni commented on APEXCORE-796: -- Yes, this Jira does not require YARN independence but my goal is to support docker via a kubernetes port and add an abstraction in apex to be able to run on both cluster managers by having an independent layer and cluster-specific implementations, hence the reference to APEXCORE-724 in the comment. Agreed it is a substantial effort and I will propose a plan. It is possible to add support for docker in apex to run under YARN, by enhancing apex cli etc and using functionality like you mentioned at runtime, but I am not sure how much usage it will get. I think docker will be more used with a kubernetes runtime. > Docker based deployment > --- > > Key: APEXCORE-796 > URL: https://issues.apache.org/jira/browse/APEXCORE-796 > Project: Apache Apex Core > Issue Type: New Feature >Reporter: Thomas Weise >Priority: Major > Labels: roadmap > > Apex should support deployment using Docker as alternative to application > packages. Docker images provide a simple and standard way to package > dependencies and solve isolation from the host environment. This will be > particularly helpful when applications depend on native, non-JVM packages > like Python and R, that otherwise need to be installed separately. Docker > support will also be a step towards supporting other cluster managers like > Kubernetes, Mesos and Swarm. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (APEXCORE-724) Support for Kubernetes
[ https://issues.apache.org/jira/browse/APEXCORE-724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16551024#comment-16551024 ] Pramod Immaneni commented on APEXCORE-724: -- Please see comment on APEXCORE-796 > Support for Kubernetes > -- > > Key: APEXCORE-724 > URL: https://issues.apache.org/jira/browse/APEXCORE-724 > Project: Apache Apex Core > Issue Type: New Feature >Reporter: Thomas Weise >Priority: Major > Labels: roadmap > > It should be possible to run Apex applications on Kubernetes. This will also > require that Apex applications can be packaged as containers (Docker or other > supported container). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (APEXCORE-796) Docker based deployment
[ https://issues.apache.org/jira/browse/APEXCORE-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16551023#comment-16551023 ] Pramod Immaneni commented on APEXCORE-796: -- I would like to look into this and APEXCORE-724, adding support for the platform to run under kubernetes via docker runtime. > Docker based deployment > --- > > Key: APEXCORE-796 > URL: https://issues.apache.org/jira/browse/APEXCORE-796 > Project: Apache Apex Core > Issue Type: New Feature >Reporter: Thomas Weise >Priority: Major > Labels: roadmap > > Apex should support deployment using Docker as alternative to application > packages. Docker images provide a simple and standard way to package > dependencies and solve isolation from the host environment. This will be > particularly helpful when applications depend on native, non-JVM packages > like Python and R, that otherwise need to be installed separately. Docker > support will also be a step towards supporting other cluster managers like > Kubernetes, Mesos and Swarm. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (APEXCORE-817) StramLocalCluster.testDynamicLoading test failing on travis
[ https://issues.apache.org/jira/browse/APEXCORE-817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16517710#comment-16517710 ] Pramod Immaneni commented on APEXCORE-817: -- The test is failing because of a higher major version for a class being loaded compared to the running vm. The test is invoking javac to compile a test class and later trying to load it. There is a javac in the PATH with a higher major version than the current vm resulting in a class that is unloadable and resulting in an error. The fix is to use the same compiler version as the running vm. > StramLocalCluster.testDynamicLoading test failing on travis > --- > > Key: APEXCORE-817 > URL: https://issues.apache.org/jira/browse/APEXCORE-817 > Project: Apache Apex Core > Issue Type: Bug >Reporter: Pramod Immaneni >Assignee: Pramod Immaneni >Priority: Major > > The test is failing with the following logs > > Tests run: 4, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 14.843 sec > <<< FAILURE! - in com.datatorrent.stram.StramLocalClusterTest > testDynamicLoading(com.datatorrent.stram.StramLocalClusterTest) Time elapsed: > 1.375 sec <<< ERROR! > java.lang.UnsupportedClassVersionError: POJO has been compiled by a more > recent version of the Java Runtime (class file version 53.0), this version of > the Java Runtime only recognizes class file versions up to 52.0 > at > com.datatorrent.stram.StramLocalClusterTest.testDynamicLoading(StramLocalClusterTest.java:296) > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (APEXCORE-817) StramLocalCluster.testDynamicLoading test failing on travis
Pramod Immaneni created APEXCORE-817: Summary: StramLocalCluster.testDynamicLoading test failing on travis Key: APEXCORE-817 URL: https://issues.apache.org/jira/browse/APEXCORE-817 Project: Apache Apex Core Issue Type: Bug Reporter: Pramod Immaneni Assignee: Pramod Immaneni The test is failing with the following logs Tests run: 4, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 14.843 sec <<< FAILURE! - in com.datatorrent.stram.StramLocalClusterTest testDynamicLoading(com.datatorrent.stram.StramLocalClusterTest) Time elapsed: 1.375 sec <<< ERROR! java.lang.UnsupportedClassVersionError: POJO has been compiled by a more recent version of the Java Runtime (class file version 53.0), this version of the Java Runtime only recognizes class file versions up to 52.0 at com.datatorrent.stram.StramLocalClusterTest.testDynamicLoading(StramLocalClusterTest.java:296) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (APEXCORE-807) In secure mode containers are failing after one day and the application is failing after seven days
[ https://issues.apache.org/jira/browse/APEXCORE-807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pramod Immaneni resolved APEXCORE-807. -- Resolution: Fixed Fix Version/s: 4.0.0 > In secure mode containers are failing after one day and the application is > failing after seven days > --- > > Key: APEXCORE-807 > URL: https://issues.apache.org/jira/browse/APEXCORE-807 > Project: Apache Apex Core > Issue Type: Bug >Reporter: Pramod Immaneni >Assignee: Pramod Immaneni >Priority: Major > Fix For: 4.0.0 > > > java.lang.RuntimeException: java.util.concurrent.ExecutionException: > java.lang.RuntimeException: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): > token (HDFS_DELEGATION_TOKEN token nn for xx) can't be found in cache > at com.google.common.base.Throwables.propagate(Throwables.java:156) > at com.datatorrent.stram.engine.Node.reportStats(Node.java:489) > at com.datatorrent.stram.engine.GenericNode.reportStats(GenericNode.java:825) > at > com.datatorrent.stram.engine.GenericNode.processEndWindow(GenericNode.java:184) > at com.datatorrent.stram.engine.GenericNode.run(GenericNode.java:397) > at > com.datatorrent.stram.engine.StreamingContainer$2.run(StreamingContainer.java:1465) > Caused by: java.util.concurrent.ExecutionException: > java.lang.RuntimeException: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): > token (HDFS_DELEGATION_TOKEN token nn for xx) can't be found in cache > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at com.datatorrent.stram.engine.Node.reportStats(Node.java:482) > ... 4 more > Caused by: java.lang.RuntimeException: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): > token (HDFS_DELEGATION_TOKEN token nn for xx) can't be found in cache > at com.google.common.base.Throwables.propagate(Throwables.java:156) > at > com.datatorrent.common.util.AsyncFSStorageAgent.copyToHDFS(AsyncFSStorageAgent.java:131) > at > com.datatorrent.common.util.AsyncFSStorageAgent.flush(AsyncFSStorageAgent.java:156) > at com.datatorrent.stram.engine.Node$CheckpointHandler.call(Node.java:706) > at com.datatorrent.stram.engine.Node$CheckpointHandler.call(Node.java:696) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): > token (HDFS_DELEGATION_TOKEN token nn for xx) can't be found in cache > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1554) > at org.apache.hadoop.ipc.Client.call(Client.java:1498) > at org.apache.hadoop.ipc.Client.call(Client.java:1398) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233) > at com.sun.proxy.$Proxy10.create(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:313) > at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:291) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:203) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:185) > at com.sun.proxy.$Proxy11.create(Unknown Source) > at > org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1822) > at org.apache.hadoop.hdfs.DFSClient.primitiveCreate(DFSClient.java:1762) > at org.apache.hadoop.fs.Hdfs.createInternal(Hdfs.java:104) > at org.apache.hadoop.fs.Hdfs.createInternal(Hdfs.java:60) > at > org.apache.hadoop.fs.AbstractFileSystem.create(AbstractFileSystem.java:585) > at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:688) > at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:684) > at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90) > at org.apache.hadoop.fs.FileContext.create(FileContext.java:684) > at >
[jira] [Created] (APEXCORE-810) Concurrent modification exception during connection cleanup in buffer server
Pramod Immaneni created APEXCORE-810: Summary: Concurrent modification exception during connection cleanup in buffer server Key: APEXCORE-810 URL: https://issues.apache.org/jira/browse/APEXCORE-810 Project: Apache Apex Core Issue Type: Bug Reporter: Pramod Immaneni Assignee: Pramod Immaneni ERROR com.datatorrent.bufferserver.server.Server: Buffer server Server@56cfec7c\{address=/0:0:0:0:0:0:0:0:45081} failed to tear down subscriber Subscriber@2ff22212{ln=LogicalNode@36d87f9eidentifier=tcp://xx:45081/2.output.1, upstream=2.output.1, group=rand_console/3.input, partitions=[], iterator=com.datatorrent.bufferserver.internal.DataList$DataListIterator@6caeed6a{da=com.datatorrent.bufferserver.internal.DataList$Block@506501e4{identifier=2.output.1, data=16777216, readingOffset=0, writingOffset=1822, starting_window=59dc9c31, ending_window=59dc9c300055, refCount=2, uniqueIdentifier=0, next=null, future=null.java.util.ConcurrentModificationException at java.util.HashMap$HashIterator.nextEntry(HashMap.java:922) at java.util.HashMap$KeyIterator.next(HashMap.java:956) at com.datatorrent.bufferserver.internal.LogicalNode.removeChannel(LogicalNode.java:118) at com.datatorrent.bufferserver.server.Server$3.run(Server.java:410) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (APEXCORE-809) Documentation does not build with current versions (8.11.x) of nodejs
Pramod Immaneni created APEXCORE-809: Summary: Documentation does not build with current versions (8.11.x) of nodejs Key: APEXCORE-809 URL: https://issues.apache.org/jira/browse/APEXCORE-809 Project: Apache Apex Core Issue Type: Improvement Components: Website Reporter: Pramod Immaneni Documentation fails to build with currently available versions of nodejs. One has to go back to the older release 6.9.1 for the build to work. Error is with the contextify module, it is no longer used in newer version as it has been integrated into nodejs directly and is available in a different form. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (APEXCORE-808) Change min supported java version dependency to Java 8
Pramod Immaneni created APEXCORE-808: Summary: Change min supported java version dependency to Java 8 Key: APEXCORE-808 URL: https://issues.apache.org/jira/browse/APEXCORE-808 Project: Apache Apex Core Issue Type: Improvement Reporter: Pramod Immaneni Fix For: 4.0.0 Current 3.x series has jdk 7 as the minimum required java version. Change this to jdk8 for 4.x. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (APEXCORE-807) In secure mode containers are failing after one day and the application is failing after seven days
[ https://issues.apache.org/jira/browse/APEXCORE-807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pramod Immaneni updated APEXCORE-807: - Issue Type: Bug (was: Task) > In secure mode containers are failing after one day and the application is > failing after seven days > --- > > Key: APEXCORE-807 > URL: https://issues.apache.org/jira/browse/APEXCORE-807 > Project: Apache Apex Core > Issue Type: Bug >Reporter: Pramod Immaneni >Assignee: Pramod Immaneni >Priority: Major > > java.lang.RuntimeException: java.util.concurrent.ExecutionException: > java.lang.RuntimeException: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): > token (HDFS_DELEGATION_TOKEN token nn for xx) can't be found in cache > at com.google.common.base.Throwables.propagate(Throwables.java:156) > at com.datatorrent.stram.engine.Node.reportStats(Node.java:489) > at com.datatorrent.stram.engine.GenericNode.reportStats(GenericNode.java:825) > at > com.datatorrent.stram.engine.GenericNode.processEndWindow(GenericNode.java:184) > at com.datatorrent.stram.engine.GenericNode.run(GenericNode.java:397) > at > com.datatorrent.stram.engine.StreamingContainer$2.run(StreamingContainer.java:1465) > Caused by: java.util.concurrent.ExecutionException: > java.lang.RuntimeException: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): > token (HDFS_DELEGATION_TOKEN token nn for xx) can't be found in cache > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at com.datatorrent.stram.engine.Node.reportStats(Node.java:482) > ... 4 more > Caused by: java.lang.RuntimeException: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): > token (HDFS_DELEGATION_TOKEN token nn for xx) can't be found in cache > at com.google.common.base.Throwables.propagate(Throwables.java:156) > at > com.datatorrent.common.util.AsyncFSStorageAgent.copyToHDFS(AsyncFSStorageAgent.java:131) > at > com.datatorrent.common.util.AsyncFSStorageAgent.flush(AsyncFSStorageAgent.java:156) > at com.datatorrent.stram.engine.Node$CheckpointHandler.call(Node.java:706) > at com.datatorrent.stram.engine.Node$CheckpointHandler.call(Node.java:696) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): > token (HDFS_DELEGATION_TOKEN token nn for xx) can't be found in cache > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1554) > at org.apache.hadoop.ipc.Client.call(Client.java:1498) > at org.apache.hadoop.ipc.Client.call(Client.java:1398) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233) > at com.sun.proxy.$Proxy10.create(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:313) > at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:291) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:203) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:185) > at com.sun.proxy.$Proxy11.create(Unknown Source) > at > org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1822) > at org.apache.hadoop.hdfs.DFSClient.primitiveCreate(DFSClient.java:1762) > at org.apache.hadoop.fs.Hdfs.createInternal(Hdfs.java:104) > at org.apache.hadoop.fs.Hdfs.createInternal(Hdfs.java:60) > at > org.apache.hadoop.fs.AbstractFileSystem.create(AbstractFileSystem.java:585) > at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:688) > at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:684) > at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90) > at org.apache.hadoop.fs.FileContext.create(FileContext.java:684) > at >
[jira] [Commented] (APEXCORE-807) In secure mode containers are failing after one day and the application is failing after seven days
[ https://issues.apache.org/jira/browse/APEXCORE-807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16442548#comment-16442548 ] Pramod Immaneni commented on APEXCORE-807: -- This is happening because the tokens are not being renewed by yarn on a daily basis. To get around this, the application needs to renew the tokens before the daily renewal expiry period just like it refreshes the tokens before the seven day lifetime expiry period. > In secure mode containers are failing after one day and the application is > failing after seven days > --- > > Key: APEXCORE-807 > URL: https://issues.apache.org/jira/browse/APEXCORE-807 > Project: Apache Apex Core > Issue Type: Task >Reporter: Pramod Immaneni >Assignee: Pramod Immaneni >Priority: Major > > java.lang.RuntimeException: java.util.concurrent.ExecutionException: > java.lang.RuntimeException: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): > token (HDFS_DELEGATION_TOKEN token nn for xx) can't be found in cache > at com.google.common.base.Throwables.propagate(Throwables.java:156) > at com.datatorrent.stram.engine.Node.reportStats(Node.java:489) > at com.datatorrent.stram.engine.GenericNode.reportStats(GenericNode.java:825) > at > com.datatorrent.stram.engine.GenericNode.processEndWindow(GenericNode.java:184) > at com.datatorrent.stram.engine.GenericNode.run(GenericNode.java:397) > at > com.datatorrent.stram.engine.StreamingContainer$2.run(StreamingContainer.java:1465) > Caused by: java.util.concurrent.ExecutionException: > java.lang.RuntimeException: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): > token (HDFS_DELEGATION_TOKEN token nn for xx) can't be found in cache > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at com.datatorrent.stram.engine.Node.reportStats(Node.java:482) > ... 4 more > Caused by: java.lang.RuntimeException: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): > token (HDFS_DELEGATION_TOKEN token nn for xx) can't be found in cache > at com.google.common.base.Throwables.propagate(Throwables.java:156) > at > com.datatorrent.common.util.AsyncFSStorageAgent.copyToHDFS(AsyncFSStorageAgent.java:131) > at > com.datatorrent.common.util.AsyncFSStorageAgent.flush(AsyncFSStorageAgent.java:156) > at com.datatorrent.stram.engine.Node$CheckpointHandler.call(Node.java:706) > at com.datatorrent.stram.engine.Node$CheckpointHandler.call(Node.java:696) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): > token (HDFS_DELEGATION_TOKEN token nn for xx) can't be found in cache > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1554) > at org.apache.hadoop.ipc.Client.call(Client.java:1498) > at org.apache.hadoop.ipc.Client.call(Client.java:1398) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233) > at com.sun.proxy.$Proxy10.create(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:313) > at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:291) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:203) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:185) > at com.sun.proxy.$Proxy11.create(Unknown Source) > at > org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1822) > at org.apache.hadoop.hdfs.DFSClient.primitiveCreate(DFSClient.java:1762) > at org.apache.hadoop.fs.Hdfs.createInternal(Hdfs.java:104) > at org.apache.hadoop.fs.Hdfs.createInternal(Hdfs.java:60) > at > org.apache.hadoop.fs.AbstractFileSystem.create(AbstractFileSystem.java:585) > at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:688) > at
[jira] [Created] (APEXCORE-807) In secure mode tokens containers are failing after one day and the application is failing after seven days
Pramod Immaneni created APEXCORE-807: Summary: In secure mode tokens containers are failing after one day and the application is failing after seven days Key: APEXCORE-807 URL: https://issues.apache.org/jira/browse/APEXCORE-807 Project: Apache Apex Core Issue Type: Task Reporter: Pramod Immaneni Assignee: Pramod Immaneni java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): token (HDFS_DELEGATION_TOKEN token nn for xx) can't be found in cache at com.google.common.base.Throwables.propagate(Throwables.java:156) at com.datatorrent.stram.engine.Node.reportStats(Node.java:489) at com.datatorrent.stram.engine.GenericNode.reportStats(GenericNode.java:825) at com.datatorrent.stram.engine.GenericNode.processEndWindow(GenericNode.java:184) at com.datatorrent.stram.engine.GenericNode.run(GenericNode.java:397) at com.datatorrent.stram.engine.StreamingContainer$2.run(StreamingContainer.java:1465) Caused by: java.util.concurrent.ExecutionException: java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): token (HDFS_DELEGATION_TOKEN token nn for xx) can't be found in cache at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at com.datatorrent.stram.engine.Node.reportStats(Node.java:482) ... 4 more Caused by: java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): token (HDFS_DELEGATION_TOKEN token nn for xx) can't be found in cache at com.google.common.base.Throwables.propagate(Throwables.java:156) at com.datatorrent.common.util.AsyncFSStorageAgent.copyToHDFS(AsyncFSStorageAgent.java:131) at com.datatorrent.common.util.AsyncFSStorageAgent.flush(AsyncFSStorageAgent.java:156) at com.datatorrent.stram.engine.Node$CheckpointHandler.call(Node.java:706) at com.datatorrent.stram.engine.Node$CheckpointHandler.call(Node.java:696) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): token (HDFS_DELEGATION_TOKEN token nn for xx) can't be found in cache at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1554) at org.apache.hadoop.ipc.Client.call(Client.java:1498) at org.apache.hadoop.ipc.Client.call(Client.java:1398) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233) at com.sun.proxy.$Proxy10.create(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:313) at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:291) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:203) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:185) at com.sun.proxy.$Proxy11.create(Unknown Source) at org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1822) at org.apache.hadoop.hdfs.DFSClient.primitiveCreate(DFSClient.java:1762) at org.apache.hadoop.fs.Hdfs.createInternal(Hdfs.java:104) at org.apache.hadoop.fs.Hdfs.createInternal(Hdfs.java:60) at org.apache.hadoop.fs.AbstractFileSystem.create(AbstractFileSystem.java:585) at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:688) at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:684) at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90) at org.apache.hadoop.fs.FileContext.create(FileContext.java:684) at com.datatorrent.common.util.AsyncFSStorageAgent.copyToHDFS(AsyncFSStorageAgent.java:119) ... 9 more -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (APEXCORE-807) In secure mode containers are failing after one day and the application is failing after seven days
[ https://issues.apache.org/jira/browse/APEXCORE-807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pramod Immaneni updated APEXCORE-807: - Summary: In secure mode containers are failing after one day and the application is failing after seven days (was: In secure mode tokens containers are failing after one day and the application is failing after seven days) > In secure mode containers are failing after one day and the application is > failing after seven days > --- > > Key: APEXCORE-807 > URL: https://issues.apache.org/jira/browse/APEXCORE-807 > Project: Apache Apex Core > Issue Type: Task >Reporter: Pramod Immaneni >Assignee: Pramod Immaneni >Priority: Major > > java.lang.RuntimeException: java.util.concurrent.ExecutionException: > java.lang.RuntimeException: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): > token (HDFS_DELEGATION_TOKEN token nn for xx) can't be found in cache > at com.google.common.base.Throwables.propagate(Throwables.java:156) > at com.datatorrent.stram.engine.Node.reportStats(Node.java:489) > at com.datatorrent.stram.engine.GenericNode.reportStats(GenericNode.java:825) > at > com.datatorrent.stram.engine.GenericNode.processEndWindow(GenericNode.java:184) > at com.datatorrent.stram.engine.GenericNode.run(GenericNode.java:397) > at > com.datatorrent.stram.engine.StreamingContainer$2.run(StreamingContainer.java:1465) > Caused by: java.util.concurrent.ExecutionException: > java.lang.RuntimeException: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): > token (HDFS_DELEGATION_TOKEN token nn for xx) can't be found in cache > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at com.datatorrent.stram.engine.Node.reportStats(Node.java:482) > ... 4 more > Caused by: java.lang.RuntimeException: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): > token (HDFS_DELEGATION_TOKEN token nn for xx) can't be found in cache > at com.google.common.base.Throwables.propagate(Throwables.java:156) > at > com.datatorrent.common.util.AsyncFSStorageAgent.copyToHDFS(AsyncFSStorageAgent.java:131) > at > com.datatorrent.common.util.AsyncFSStorageAgent.flush(AsyncFSStorageAgent.java:156) > at com.datatorrent.stram.engine.Node$CheckpointHandler.call(Node.java:706) > at com.datatorrent.stram.engine.Node$CheckpointHandler.call(Node.java:696) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): > token (HDFS_DELEGATION_TOKEN token nn for xx) can't be found in cache > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1554) > at org.apache.hadoop.ipc.Client.call(Client.java:1498) > at org.apache.hadoop.ipc.Client.call(Client.java:1398) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233) > at com.sun.proxy.$Proxy10.create(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:313) > at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:291) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:203) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:185) > at com.sun.proxy.$Proxy11.create(Unknown Source) > at > org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1822) > at org.apache.hadoop.hdfs.DFSClient.primitiveCreate(DFSClient.java:1762) > at org.apache.hadoop.fs.Hdfs.createInternal(Hdfs.java:104) > at org.apache.hadoop.fs.Hdfs.createInternal(Hdfs.java:60) > at > org.apache.hadoop.fs.AbstractFileSystem.create(AbstractFileSystem.java:585) > at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:688) > at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:684) > at
[jira] [Updated] (APEXCORE-755) Deprecate dt.* properties and make them available with apex. keys
[ https://issues.apache.org/jira/browse/APEXCORE-755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pramod Immaneni updated APEXCORE-755: - Fix Version/s: (was: 3.7.0) 4.0.0 > Deprecate dt.* properties and make them available with apex. keys > - > > Key: APEXCORE-755 > URL: https://issues.apache.org/jira/browse/APEXCORE-755 > Project: Apache Apex Core > Issue Type: Improvement >Reporter: Sanjay M Pujare >Priority: Minor > Fix For: 4.0.0 > > > Need to deprecate dt. properties and make them available with apex. keys. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (APEXCORE-789) Update security doc to describe the impact of SSL enablement on truststores
[ https://issues.apache.org/jira/browse/APEXCORE-789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pramod Immaneni resolved APEXCORE-789. -- Resolution: Fixed Fix Version/s: 3.7.0 > Update security doc to describe the impact of SSL enablement on truststores > --- > > Key: APEXCORE-789 > URL: https://issues.apache.org/jira/browse/APEXCORE-789 > Project: Apache Apex Core > Issue Type: Documentation > Components: Documentation >Reporter: Sanjay M Pujare >Assignee: Sanjay M Pujare >Priority: Minor > Fix For: 3.7.0 > > > Enabling SSL in the Stram Webapp and using a self-signed or private cert > requires updating the various trust-stores esp. the RM app proxy that > connects to the Stram. This needs to be elaborated in the docs -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (APEXCORE-804) Setting to let application know when DAG is not being created for launch
Pramod Immaneni created APEXCORE-804: Summary: Setting to let application know when DAG is not being created for launch Key: APEXCORE-804 URL: https://issues.apache.org/jira/browse/APEXCORE-804 Project: Apache Apex Core Issue Type: Improvement Reporter: Pramod Immaneni Assignee: Pramod Immaneni The command get-app-package-info runs the populateDAG method of an application to construct the DAG but does not actually launch the DAG. An application developer does not know in which context the populateDAG is being called. For example, if they are recording application starts in an external system from populateDAG, they will have false entries there. This can be solved in different ways such as introducing another method in StreamingApplication or more parameters to populateDAG but a non disruptive option would be to add a property in the configuration object that is passed to populateDAG to indicate if it is simulate/test mode or real launch. An application developer can use this property to take the appropriate actions See the discussion here https://lists.apache.org/thread.html/21eb9b40d3c663dc39cad1f43a3e06403052bea8c351f97ae5a2efc4@%3Cdev.apex.apache.org%3E -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (APEXCORE-801) Committer guidelines for dependency CVE failures
[ https://issues.apache.org/jira/browse/APEXCORE-801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pramod Immaneni resolved APEXCORE-801. -- Resolution: Fixed Fix Version/s: 3.7.0 > Committer guidelines for dependency CVE failures > > > Key: APEXCORE-801 > URL: https://issues.apache.org/jira/browse/APEXCORE-801 > Project: Apache Apex Core > Issue Type: Documentation > Components: Website >Reporter: Pramod Immaneni >Assignee: Pramod Immaneni > Fix For: 3.7.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (APEXCORE-801) Committer guidelines for dependency CVE failures
Pramod Immaneni created APEXCORE-801: Summary: Committer guidelines for dependency CVE failures Key: APEXCORE-801 URL: https://issues.apache.org/jira/browse/APEXCORE-801 Project: Apache Apex Core Issue Type: Documentation Components: Website Reporter: Pramod Immaneni -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (APEXCORE-790) Enforce dependency analysis for CVE in CI builds
[ https://issues.apache.org/jira/browse/APEXCORE-790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pramod Immaneni resolved APEXCORE-790. -- Resolution: Fixed Fix Version/s: 3.7.0 > Enforce dependency analysis for CVE in CI builds > > > Key: APEXCORE-790 > URL: https://issues.apache.org/jira/browse/APEXCORE-790 > Project: Apache Apex Core > Issue Type: Task >Reporter: Vlad Rozov >Assignee: Vlad Rozov >Priority: Minor > Fix For: 3.7.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (APEXCORE-791) Gateway security settings need to be available in the DAG
Pramod Immaneni created APEXCORE-791: Summary: Gateway security settings need to be available in the DAG Key: APEXCORE-791 URL: https://issues.apache.org/jira/browse/APEXCORE-791 Project: Apache Apex Core Issue Type: Bug Reporter: Pramod Immaneni Assignee: Pramod Immaneni Gateway connect address attribute is available while constructing the DAG but other gateway security-related attributes such as GATEWAY_USE_SSL are not. These need to be made available as well. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (APEXMALHAR-2548) Pubsub operators do not use the correct URL to connect to web socket server in SSL mode
Pramod Immaneni created APEXMALHAR-2548: --- Summary: Pubsub operators do not use the correct URL to connect to web socket server in SSL mode Key: APEXMALHAR-2548 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2548 Project: Apache Apex Malhar Issue Type: Bug Reporter: Pramod Immaneni Assignee: Pramod Immaneni In SSL mode, pub sub operators need to use wss protocol instead of ws. Today they use ws. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (APEXCORE-786) LoggerUtil should allow to add/remove/list appenders for a specified logger
[ https://issues.apache.org/jira/browse/APEXCORE-786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pramod Immaneni resolved APEXCORE-786. -- Resolution: Fixed Fix Version/s: 3.7.0 > LoggerUtil should allow to add/remove/list appenders for a specified logger > --- > > Key: APEXCORE-786 > URL: https://issues.apache.org/jira/browse/APEXCORE-786 > Project: Apache Apex Core > Issue Type: Improvement >Reporter: Vlad Rozov >Assignee: Vlad Rozov >Priority: Minor > Fix For: 3.7.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (APEXMALHAR-2515) HBase output operator Multi Table feature.
[ https://issues.apache.org/jira/browse/APEXMALHAR-2515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pramod Immaneni resolved APEXMALHAR-2515. - Resolution: Fixed Fix Version/s: 3.8.0 > HBase output operator Multi Table feature. > -- > > Key: APEXMALHAR-2515 > URL: https://issues.apache.org/jira/browse/APEXMALHAR-2515 > Project: Apache Apex Malhar > Issue Type: Improvement >Reporter: Velineni Lakshmi Prasanna >Assignee: Velineni Lakshmi Prasanna > Fix For: 3.8.0 > > > Write to multiple HBase tables. Table name will be part of the incoming > tuple. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (APEXCORE-714) Reusable instance operator recovery
[ https://issues.apache.org/jira/browse/APEXCORE-714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16166465#comment-16166465 ] Pramod Immaneni commented on APEXCORE-714: -- @tweise I was thinking about this and restoring the stream to last fully processed window will not work correctly the way things work today with bufferserver and queues and will require more extensive changes. The reason is as follows, let's say an operator is recovered to a window that is min(fully process window of itself and downstream operators) in an event of an upstream operator failure, when the stream is re-opened to bufferserver, the old data for that stream will be cleaned up resulting in data from checkpoint to recovered window to be purged. If there is a second subsequent failure event in a downstream operator to this operator before the next committed, the older data will not be available. To do it correctly we would need to do pause and continue the output streams at the same time when the input streams are being restored. I would presume the same would be true with the local queues if it were container local. It's possible but I would like to take it up in a subsequent task. What I would like to do now is leave the window restoration as what it is today and only reuse the instance, so it will not be uber optimized in case of at-least-once as it will reprocess from checkpoint window. This also means it will be applicable for both processing modes. In future. I will make further optimizations for at least once case not affecting the at most once case. Also [~sandesh] brought up a point that in some cases operator may need to know that this is happening so I will provide an optional notification as well. > Reusable instance operator recovery > --- > > Key: APEXCORE-714 > URL: https://issues.apache.org/jira/browse/APEXCORE-714 > Project: Apache Apex Core > Issue Type: Improvement >Reporter: Pramod Immaneni >Assignee: Pramod Immaneni > > In a failure scenario, when a container fails, it is redeployed along with > all the operators in it. The operators downstream to these operators are also > redeployed within their containers. The operators are restored from their > checkpoint and connect to the appropriate point in the stream according to > the processing mode. In at least once mode, for example, the data is replayed > from the same checkpoint > Restoring an operator state from checkpoint could turn out to be a costly > operation depending on the size of the state. In some use cases, based on the > operator logic, when there is an upstream failure, without restoring the > operator from checkpoint and reusing the current instance, will still produce > the same results with the data replayed from the last fully processed window. > The operator state can remain the same as it was before the upstream failure > by reusing the same operator instance from before and only the streams and > window reset to the window after the last fully processed window to guarantee > the at least once processing of tuples. If the container where the operator > itself is running goes down, it would need to be restored from the checkpoint > of course. This scenario occurs in some batch use cases with operators that > have a large state. > I would like to propose adding the ability for a user to explicitly identify > operators to be of this type and the corresponding functionality in the > engine to handle their recovery in the way described above by not restoring > their state from checkpoint, reusing the instance and restoring the stream to > the window after the last fully processed window for the operator. When > operators are not identified to be of this type, the default behavior is what > it is today and nothing changes. > I have done some prototyping on the engine side to ensure that this is > possible with our current code base without requiring a massive overhaul, > especially the restoration of the operator instance within the Node in the > streaming container, the re-establishment of the subscriber stream to a > window in the buffer server where the publisher (upstream) hasn't yet reached > as it would be restarting from checkpoint and have been able to get it all > working successfully. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (APEXMALHAR-2541) Fix travis-ci build
[ https://issues.apache.org/jira/browse/APEXMALHAR-2541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pramod Immaneni resolved APEXMALHAR-2541. - Resolution: Fixed Fix Version/s: 3.8.0 > Fix travis-ci build > --- > > Key: APEXMALHAR-2541 > URL: https://issues.apache.org/jira/browse/APEXMALHAR-2541 > Project: Apache Apex Malhar > Issue Type: Task >Reporter: Vlad Rozov >Assignee: Vlad Rozov >Priority: Minor > Fix For: 3.8.0 > > > after default environment was change to Trusty from Precise travis-ci build > is killed due to out of memory -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (APEXCORE-714) Reusable instance operator recovery
[ https://issues.apache.org/jira/browse/APEXCORE-714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16155592#comment-16155592 ] Pramod Immaneni commented on APEXCORE-714: -- @tweise Since this kind of recovery can apply for any of the "processing modes", using a separate recovery mode attribute for this. > Reusable instance operator recovery > --- > > Key: APEXCORE-714 > URL: https://issues.apache.org/jira/browse/APEXCORE-714 > Project: Apache Apex Core > Issue Type: Improvement >Reporter: Pramod Immaneni >Assignee: Pramod Immaneni > > In a failure scenario, when a container fails, it is redeployed along with > all the operators in it. The operators downstream to these operators are also > redeployed within their containers. The operators are restored from their > checkpoint and connect to the appropriate point in the stream according to > the processing mode. In at least once mode, for example, the data is replayed > from the same checkpoint > Restoring an operator state from checkpoint could turn out to be a costly > operation depending on the size of the state. In some use cases, based on the > operator logic, when there is an upstream failure, without restoring the > operator from checkpoint and reusing the current instance, will still produce > the same results with the data replayed from the last fully processed window. > The operator state can remain the same as it was before the upstream failure > by reusing the same operator instance from before and only the streams and > window reset to the window after the last fully processed window to guarantee > the at least once processing of tuples. If the container where the operator > itself is running goes down, it would need to be restored from the checkpoint > of course. This scenario occurs in some batch use cases with operators that > have a large state. > I would like to propose adding the ability for a user to explicitly identify > operators to be of this type and the corresponding functionality in the > engine to handle their recovery in the way described above by not restoring > their state from checkpoint, reusing the instance and restoring the stream to > the window after the last fully processed window for the operator. When > operators are not identified to be of this type, the default behavior is what > it is today and nothing changes. > I have done some prototyping on the engine side to ensure that this is > possible with our current code base without requiring a massive overhaul, > especially the restoration of the operator instance within the Node in the > streaming container, the re-establishment of the subscriber stream to a > window in the buffer server where the publisher (upstream) hasn't yet reached > as it would be restarting from checkpoint and have been able to get it all > working successfully. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (APEXMALHAR-2540) Serialization exception with throttle example
Pramod Immaneni created APEXMALHAR-2540: --- Summary: Serialization exception with throttle example Key: APEXMALHAR-2540 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2540 Project: Apache Apex Malhar Issue Type: Bug Reporter: Pramod Immaneni Assignee: Pramod Immaneni There is a serialization exception with the ThrottlingStatsListener during stram checkpointing which causes the throttle example to fail. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (APEXMALHAR-2539) Solace MQ operators
[ https://issues.apache.org/jira/browse/APEXMALHAR-2539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16149193#comment-16149193 ] Pramod Immaneni commented on APEXMALHAR-2539: - This would be for the input operators, the output operators will be handled in a separate JIRA. > Solace MQ operators > --- > > Key: APEXMALHAR-2539 > URL: https://issues.apache.org/jira/browse/APEXMALHAR-2539 > Project: Apache Apex Malhar > Issue Type: Improvement >Reporter: Pramod Immaneni >Assignee: Pramod Immaneni > > Solace is a high-performance messaging system that has a rich set of > features. Input and output operators are needed to consume and send messages > to the system. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (APEXMALHAR-2539) Solace MQ operators
Pramod Immaneni created APEXMALHAR-2539: --- Summary: Solace MQ operators Key: APEXMALHAR-2539 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2539 Project: Apache Apex Malhar Issue Type: Improvement Reporter: Pramod Immaneni Assignee: Pramod Immaneni Solace is a high-performance messaging system that has a rich set of features. Input and output operators are needed to consume and send messages to the system. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (APEXCORE-553) Add google analytics tracking to apex website
[ https://issues.apache.org/jira/browse/APEXCORE-553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pramod Immaneni resolved APEXCORE-553. -- Resolution: Fixed > Add google analytics tracking to apex website > - > > Key: APEXCORE-553 > URL: https://issues.apache.org/jira/browse/APEXCORE-553 > Project: Apache Apex Core > Issue Type: Improvement > Components: Website >Reporter: Ashwin Chandra Putta >Assignee: Michelle Xiao > > Add Google Analytics tracking to the apex.apache.org website so that we can > learn about website traffic and what users are looking for. > {code} > > (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ > (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new > Date();a=s.createElement(o), > > m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) > > })(window,document,'script','https://www.google-analytics.com/analytics.js','ga'); > ga('create', 'UA-85540278-1', 'auto'); > ga('send', 'pageview'); > > {code} > Add privacy footer to indicate collection of data. Reference: > http://www.apache.org/foundation/policies/privacy.html > Discussion: > https://lists.apache.org/thread.html/f1af3f19ecd2532eb2f5cea2a81fda744912112df5259e2d55c05b88@%3Cdev.apex.apache.org%3E -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (APEXCORE-780) Travis-CI Build failures to be fixed
[ https://issues.apache.org/jira/browse/APEXCORE-780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pramod Immaneni resolved APEXCORE-780. -- Resolution: Fixed Fix Version/s: 3.7.0 > Travis-CI Build failures to be fixed > > > Key: APEXCORE-780 > URL: https://issues.apache.org/jira/browse/APEXCORE-780 > Project: Apache Apex Core > Issue Type: Bug >Reporter: Sanjay M Pujare >Assignee: Vlad Rozov >Priority: Minor > Fix For: 3.7.0 > > > Automated builds triggered by PRs are failing on Travis-CI intermittently for > unknown reasons. Needs to be fixed to make CI useful. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (APEXCORE-779) In unit tests Yarn containers must use the same JVM as the test itself.
[ https://issues.apache.org/jira/browse/APEXCORE-779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pramod Immaneni resolved APEXCORE-779. -- Resolution: Fixed Fix Version/s: 3.7.0 > In unit tests Yarn containers must use the same JVM as the test itself. > --- > > Key: APEXCORE-779 > URL: https://issues.apache.org/jira/browse/APEXCORE-779 > Project: Apache Apex Core > Issue Type: Bug >Reporter: Vlad Rozov >Assignee: Vlad Rozov >Priority: Minor > Fix For: 3.7.0 > > > When Yarn containers are launched from a unit test, they should use > "java.home" property to find what java executable to use as JAVA_HOME may > point not to a proper location. Using PATH to find java must be avoided, as > the default java on a test box may be a wrong version or not a properly > configured JVM. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (APEXCORE-777) Application Master may not shutdown due to incorrect numRequestedContainers counting
[ https://issues.apache.org/jira/browse/APEXCORE-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16127863#comment-16127863 ] Pramod Immaneni commented on APEXCORE-777: -- This seems like an important issue to fix. [~sanjaypujare] are you planning to look into this issue as you are probably familiar with this part of the code. > Application Master may not shutdown due to incorrect numRequestedContainers > counting > > > Key: APEXCORE-777 > URL: https://issues.apache.org/jira/browse/APEXCORE-777 > Project: Apache Apex Core > Issue Type: Bug >Reporter: Vlad Rozov >Priority: Minor > > Consider a scenario where App master requests a container from Yarn > (numRequestedContainers = 1). There is not enough resources and the request > timeouts. My understanding is that App master will re-request it again but > the number of requested containers will not change (one newly requested, one > removed). Let's assume that App master, by the time Yarn responds back > decides that it does not need any. If Yarn responds with one allocated > containers, numRequestedContainers will go to 0 (correct), but Yarn may > respond back with 2 allocated containers if by the time App Master sends the > second request it already allocated a container requested in the original > request (the one that timeouted) as Yarn does not guarantee that removed > request is fullfilled (see Yarn doc). Will not in this case > numRequestedContainers be -1 due to the bulk decrement? -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (APEXMALHAR-2534) In TopNWordsWithQueries example, input data from individual files is sorted and written in a single file instead of different files.
[ https://issues.apache.org/jira/browse/APEXMALHAR-2534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16109477#comment-16109477 ] Pramod Immaneni commented on APEXMALHAR-2534: - Could you clarify if you are referring the per-file output that should be going to separate files. > In TopNWordsWithQueries example, input data from individual files is sorted > and written in a single file instead of different files. > - > > Key: APEXMALHAR-2534 > URL: https://issues.apache.org/jira/browse/APEXMALHAR-2534 > Project: Apache Apex Malhar > Issue Type: Bug >Affects Versions: 3.7.0 >Reporter: Velineni Lakshmi Prasanna >Assignee: Velineni Lakshmi Prasanna >Priority: Minor > Fix For: 3.8.0 > > Original Estimate: 24h > Remaining Estimate: 24h > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (APEXCORE-764) Refactor Plugin locator service
[ https://issues.apache.org/jira/browse/APEXCORE-764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pramod Immaneni resolved APEXCORE-764. -- Resolution: Fixed Fix Version/s: 3.7.0 > Refactor Plugin locator service > --- > > Key: APEXCORE-764 > URL: https://issues.apache.org/jira/browse/APEXCORE-764 > Project: Apache Apex Core > Issue Type: Improvement >Reporter: Vlad Rozov >Assignee: Vlad Rozov >Priority: Minor > Fix For: 3.7.0 > > > * Plugin locator should return Set not a Collection of configured plugins as > duplicates should not be allowed. > * Plugin locator should locate Plugin only. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (APEXCORE-694) Use correct annotation for nullable and not nullable arguments
[ https://issues.apache.org/jira/browse/APEXCORE-694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16100221#comment-16100221 ] Pramod Immaneni commented on APEXCORE-694: -- jsr305 is dormant, I see musings on the web that it never became a standard. Can you verify? > Use correct annotation for nullable and not nullable arguments > -- > > Key: APEXCORE-694 > URL: https://issues.apache.org/jira/browse/APEXCORE-694 > Project: Apache Apex Core > Issue Type: Improvement >Reporter: Vlad Rozov >Priority: Minor > > jsr305 suggests usage of javax.annotation.Nonnull. Usage of > javax.validation.constraints.NotNull should be avoided. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (APEXCORE-765) Exceptions being logged for optional functionality while retrieving stram web service info for web service clients
[ https://issues.apache.org/jira/browse/APEXCORE-765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16093554#comment-16093554 ] Pramod Immaneni commented on APEXCORE-765: -- I would like to change this to logging an information message (without the trace) in the INFO level or if DEBUG is enabled, then log the exception trace. > Exceptions being logged for optional functionality while retrieving stram web > service info for web service clients > -- > > Key: APEXCORE-765 > URL: https://issues.apache.org/jira/browse/APEXCORE-765 > Project: Apache Apex Core > Issue Type: Bug >Reporter: Pramod Immaneni >Assignee: Pramod Immaneni > > An apex application, at runtime, can have additional (optional) security acl > information to control who can access it. When connecting to the stram web > service, the connection information is first retrieved. During this process > this acl information is also collected. If this is not present the agent > collecting the information, logs an exception, giving a fall impression that > something is wrong. If this information is being collected repeatedly, for > example for different apps, there are many exceptions in the logs, making it > look like there is a bigger problem. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (APEXCORE-765) Exceptions being logged for optional functionality while retrieving stram web service info for web service clients
Pramod Immaneni created APEXCORE-765: Summary: Exceptions being logged for optional functionality while retrieving stram web service info for web service clients Key: APEXCORE-765 URL: https://issues.apache.org/jira/browse/APEXCORE-765 Project: Apache Apex Core Issue Type: Bug Reporter: Pramod Immaneni Assignee: Pramod Immaneni An apex application, at runtime, can have additional (optional) security acl information to control who can access it. When connecting to the stram web service, the connection information is first retrieved. During this process this acl information is also collected. If this is not present the agent collecting the information, logs an exception, giving a fall impression that something is wrong. If this information is being collected repeatedly, for example for different apps, there are many exceptions in the logs, making it look like there is a bigger problem. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (APEXCORE-734) StramLocalCluster may not terminate properly
[ https://issues.apache.org/jira/browse/APEXCORE-734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pramod Immaneni resolved APEXCORE-734. -- Resolution: Fixed Fix Version/s: 3.7.0 > StramLocalCluster may not terminate properly > > > Key: APEXCORE-734 > URL: https://issues.apache.org/jira/browse/APEXCORE-734 > Project: Apache Apex Core > Issue Type: Improvement >Reporter: Vlad Rozov >Assignee: Vlad Rozov > Fix For: 3.7.0 > > > When StramLocalCluster is run asynchronously it may be shutdown during > StramLocalCluster initialization leading to termination without performing > necessary termination sequence. Runtime exception during the run may also > lead to improper termination sequence. For example: > {noformat} > Exception in thread "master" java.lang.RuntimeException: > java.lang.InterruptedException > at com.datatorrent.bufferserver.server.Server.run(Server.java:154) > at > com.datatorrent.stram.StramLocalCluster.run(StramLocalCluster.java:474) > at > com.datatorrent.stram.StramLocalCluster.run(StramLocalCluster.java:459) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1302) > at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231) > at com.datatorrent.bufferserver.server.Server.run(Server.java:152) > ... 3 more > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (APEXCORE-759) Topology validation needs to happen correctly for operators connected using THREAD_LOCAL stream locality
[ https://issues.apache.org/jira/browse/APEXCORE-759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16088853#comment-16088853 ] Pramod Immaneni commented on APEXCORE-759: -- It would be better to keep it configurable using a setting like strict or platform optimized. > Topology validation needs to happen correctly for operators connected using > THREAD_LOCAL stream locality > > > Key: APEXCORE-759 > URL: https://issues.apache.org/jira/browse/APEXCORE-759 > Project: Apache Apex Core > Issue Type: Bug >Reporter: Vinay Bangalore Srikanth > Attachments: apex (1).log, apex.log > > > In my application - > http://node0.morado.com:9090/static/#/ops/apps/application_1499808956620_0190 > , I have set the locality to be THREAD_LOCAL. > Upstream operator - Random string generator (with 4 partitions) > Downstream operator - Custom console operator (with 5 partitions) > The containers are getting killed. Exceptions from container-launch are seen. > Logical message has to be displayed by handling exceptions. > Logs are attached for one of the containers - Id: 01_000126 that was killed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (APEXCORE-759) Topology validation needs to happen correctly for operators connected using THREAD_LOCAL stream locality
[ https://issues.apache.org/jira/browse/APEXCORE-759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pramod Immaneni updated APEXCORE-759: - Summary: Topology validation needs to happen correctly for operators connected using THREAD_LOCAL stream locality (was: Operators connected using THREAD_LOCAL stream locality should enforce parallel partition scheme) > Topology validation needs to happen correctly for operators connected using > THREAD_LOCAL stream locality > > > Key: APEXCORE-759 > URL: https://issues.apache.org/jira/browse/APEXCORE-759 > Project: Apache Apex Core > Issue Type: Bug >Reporter: Vinay Bangalore Srikanth > Attachments: apex (1).log, apex.log > > > In my application - > http://node0.morado.com:9090/static/#/ops/apps/application_1499808956620_0190 > , I have set the locality to be THREAD_LOCAL. > Upstream operator - Random string generator (with 4 partitions) > Downstream operator - Custom console operator (with 5 partitions) > The containers are getting killed. Exceptions from container-launch are seen. > Logical message has to be displayed by handling exceptions. > Logs are attached for one of the containers - Id: 01_000126 that was killed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (APEXCORE-759) Operators connected using THREAD_LOCAL stream locality should enforce parallel partition scheme
[ https://issues.apache.org/jira/browse/APEXCORE-759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16087765#comment-16087765 ] Pramod Immaneni commented on APEXCORE-759: -- Thanks vinay. I will update the JIRA title accordingly. > Operators connected using THREAD_LOCAL stream locality should enforce > parallel partition scheme > --- > > Key: APEXCORE-759 > URL: https://issues.apache.org/jira/browse/APEXCORE-759 > Project: Apache Apex Core > Issue Type: Bug >Reporter: Vinay Bangalore Srikanth > Attachments: apex (1).log, apex.log > > > In my application - > http://node0.morado.com:9090/static/#/ops/apps/application_1499808956620_0190 > , I have set the locality to be THREAD_LOCAL. > Upstream operator - Random string generator (with 4 partitions) > Downstream operator - Custom console operator (with 5 partitions) > The containers are getting killed. Exceptions from container-launch are seen. > Logical message has to be displayed by handling exceptions. > Logs are attached for one of the containers - Id: 01_000126 that was killed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (APEXCORE-759) Operators connected using THREAD_LOCAL stream locality should enforce parallel partition scheme
[ https://issues.apache.org/jira/browse/APEXCORE-759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16086770#comment-16086770 ] Pramod Immaneni commented on APEXCORE-759: -- Will double check and see whether 1xN was already implemented and we need to add the appropriate consistency checks or if 1xN configuration will have to be supported anew. Vlad to answer your question, yes it will be one thread for all. > Operators connected using THREAD_LOCAL stream locality should enforce > parallel partition scheme > --- > > Key: APEXCORE-759 > URL: https://issues.apache.org/jira/browse/APEXCORE-759 > Project: Apache Apex Core > Issue Type: Bug >Reporter: Vinay Bangalore Srikanth > Attachments: apex (1).log, apex.log > > > In my application - > http://node0.morado.com:9090/static/#/ops/apps/application_1499808956620_0190 > , I have set the locality to be THREAD_LOCAL. > Upstream operator - Random string generator (with 4 partitions) > Downstream operator - Custom console operator (with 5 partitions) > The containers are getting killed. Exceptions from container-launch are seen. > Logical message has to be displayed by handling exceptions. > Logs are attached for one of the containers - Id: 01_000126 that was killed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (APEXCORE-759) THREAD_LOCAL exception not handled.
[ https://issues.apache.org/jira/browse/APEXCORE-759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16086681#comment-16086681 ] Pramod Immaneni edited comment on APEXCORE-759 at 7/14/17 1:06 AM: --- We catch this when parallel partitioning and MxN are specified at the same time, maybe we are not doing it in this case. It may not be possible to catch this at dag validation, not until physical plan is created, because of custom partitioners. was (Author: pramodssimmaneni): We catch this when parallel partitioning and MxN are specified at the same time, maybe we are not doing it in this case. It may not be possible to catch this at dag validation, not until physical plan is created. > THREAD_LOCAL exception not handled. > > > Key: APEXCORE-759 > URL: https://issues.apache.org/jira/browse/APEXCORE-759 > Project: Apache Apex Core > Issue Type: Bug >Reporter: Vinay Bangalore Srikanth > Attachments: apex (1).log, apex.log > > > In my application - > http://node0.morado.com:9090/static/#/ops/apps/application_1499808956620_0190 > , I have set the locality to be THREAD_LOCAL. > Upstream operator - Random string generator (with 4 partitions) > Downstream operator - Custom console operator (with 5 partitions) > The containers are getting killed. Exceptions from container-launch are seen. > Logical message has to be displayed by handling exceptions. > Logs are attached for one of the containers - Id: 01_000126 that was killed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (APEXCORE-759) THREAD_LOCAL exception not handled.
[ https://issues.apache.org/jira/browse/APEXCORE-759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16086681#comment-16086681 ] Pramod Immaneni commented on APEXCORE-759: -- We catch this when parallel partitioning and MxN are specified at the same time, maybe we are not doing it in this case. It may not be possible to catch this at dag validation, not until physical plan is created. > THREAD_LOCAL exception not handled. > > > Key: APEXCORE-759 > URL: https://issues.apache.org/jira/browse/APEXCORE-759 > Project: Apache Apex Core > Issue Type: Bug >Reporter: Vinay Bangalore Srikanth > Attachments: apex (1).log, apex.log > > > In my application - > http://node0.morado.com:9090/static/#/ops/apps/application_1499808956620_0190 > , I have set the locality to be THREAD_LOCAL. > Upstream operator - Random string generator (with 4 partitions) > Downstream operator - Custom console operator (with 5 partitions) > The containers are getting killed. Exceptions from container-launch are seen. > Logical message has to be displayed by handling exceptions. > Logs are attached for one of the containers - Id: 01_000126 that was killed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (APEXCORE-759) THREAD_LOCAL exception not handled.
[ https://issues.apache.org/jira/browse/APEXCORE-759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16086598#comment-16086598 ] Pramod Immaneni commented on APEXCORE-759: -- Looks like it is because of the mismatch in the number of partitions between upstream and downstream. Will look into it. > THREAD_LOCAL exception not handled. > > > Key: APEXCORE-759 > URL: https://issues.apache.org/jira/browse/APEXCORE-759 > Project: Apache Apex Core > Issue Type: Bug >Reporter: Vinay Bangalore Srikanth > Attachments: apex (1).log, apex.log > > > In my application - > http://node0.morado.com:9090/static/#/ops/apps/application_1499808956620_0190 > , I have set the locality to be THREAD_LOCAL. > Upstream operator - Random string generator (with 4 partitions) > Downstream operator - Custom console operator (with 5 partitions) > The containers are getting killed. Exceptions from container-launch are seen. > Logical message has to be displayed by handling exceptions. > Logs are attached for one of the containers - Id: 01_000126 that was killed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (APEXCORE-758) Unify Apex plugin configuration settings
[ https://issues.apache.org/jira/browse/APEXCORE-758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16086431#comment-16086431 ] Pramod Immaneni commented on APEXCORE-758: -- There needs to be a configuration for a plugin as to where it applies and could contain other information in the future. It may not be required for the user to specify this configuration and instead specified by the developer, but there should be a configuration nonetheless. > Unify Apex plugin configuration settings > > > Key: APEXCORE-758 > URL: https://issues.apache.org/jira/browse/APEXCORE-758 > Project: Apache Apex Core > Issue Type: Improvement >Reporter: Vlad Rozov > > There is no need to have multiple conf key for different types of plugins. > Plugins should share the same key like "apex.plugins". If possible, it will > be good to have an ability to enable plugins at an application level. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (APEXMALHAR-2434) JMSTransactionableStore uses Session.createQueue() which fails
[ https://issues.apache.org/jira/browse/APEXMALHAR-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pramod Immaneni resolved APEXMALHAR-2434. - Resolution: Fixed Fix Version/s: 3.8.0 > JMSTransactionableStore uses Session.createQueue() which fails > -- > > Key: APEXMALHAR-2434 > URL: https://issues.apache.org/jira/browse/APEXMALHAR-2434 > Project: Apache Apex Malhar > Issue Type: Bug >Reporter: Sanjay M Pujare > Fix For: 3.8.0 > > > JMSTransactionableStore needs to create a queue for storing metadata > (lastWindowId etc) that will work across invocations to support fault > tolerance and idempotency. > However as the createQueue Javadocs says: "Note that this method is not for > creating the physical queue. The physical creation of queues is an > administrative task and is not to be initiated by the JMS API." This causes a > failure in actual tests with a production JMS based broker (such as IBM > MQSeries). We will need to fix this in one of the following ways: > - using an alternative store (HDFS or JDBC) > - allow the user to specify a name for this metadata queue via a property > - generate the name in a deterministic fashion from the subject of the queue > The last 2 alternatives assume that the application user has created this > metadata queue ahead of time from the admin console. We will need to document > this in the malhar docs. The last alternative looks most attractive except if > there are multiple JMS output operators (say partitions of an operator) > writing to the same queue (subject) we will have to use some additional logic > for them to share this single metadata queue. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (APEXMALHAR-2518) Kafka input operator stops reading tuples when there is a UNKNOWN_MEMBER_ID error during committed offset processing
Pramod Immaneni created APEXMALHAR-2518: --- Summary: Kafka input operator stops reading tuples when there is a UNKNOWN_MEMBER_ID error during committed offset processing Key: APEXMALHAR-2518 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2518 Project: Apache Apex Malhar Issue Type: Bug Reporter: Pramod Immaneni Assignee: Pramod Immaneni Kafka 0.9 operator stores offsets that are completely processed and no longer needed (committed offsets) back in kafka. It does so by making a kafka API call. If the response from kafka server to this call comes back with an UNKNOWN_MEMBER_ID error, it results in the kafka consumer state changing to needing partition re-assignment and no further messages are returned by the consumer. There are a couple of other errors that result in the same state including when rebalance is in progress. What exactly caused this error is not known but the following is the likely reason due to the conditions surrounding the application. When the operator has temporarily stalled due to back-pressure exerted by the slow downstream, it will eventually stall the operator kafka consumer thread that is reading messages from kafka. This will result in the thread not making any kafka consumer API calls and it will result in no heartbeats being sent to kafka server. This can cause the server to evict the consumer after a timeout period. This could have been the cause for the UNKNOWN_MEMBER_ID error. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (APEXCORE-745) Buffer server may stop processing tuples when backpressure is enabled
[ https://issues.apache.org/jira/browse/APEXCORE-745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pramod Immaneni resolved APEXCORE-745. -- Resolution: Fixed Fix Version/s: 3.7.0 > Buffer server may stop processing tuples when backpressure is enabled > - > > Key: APEXCORE-745 > URL: https://issues.apache.org/jira/browse/APEXCORE-745 > Project: Apache Apex Core > Issue Type: Bug > Components: Buffer Server >Affects Versions: 3.6.0 >Reporter: Vlad Rozov >Assignee: Vlad Rozov >Priority: Critical > Fix For: 3.7.0 > > > When backpressure is enabled, blocks released by a publisher are not evicted. > This may lead to a condition (for example when there is a delay between > publisher and subscribers request) where the publisher publishes all data > blocks that the buffer server is allowed to allocate before subscribers > submit subscription requests. It leads to the publisher being blocked, so it > does not accept any new data and does not notify subscribers that the new > data is available. At the same time subscribers are not scheduled to run > after they catchup (or exit catchup), so the publisher will be blocked > indefinitely. Restarting the downstream container (due to, for example, > blocked downstream operator) repeat the same sequence and does not help. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (APEXCORE-751) Using a scheduled executor in buffer server for sending data to subscribers
Pramod Immaneni created APEXCORE-751: Summary: Using a scheduled executor in buffer server for sending data to subscribers Key: APEXCORE-751 URL: https://issues.apache.org/jira/browse/APEXCORE-751 Project: Apache Apex Core Issue Type: Task Reporter: Pramod Immaneni Assignee: Pramod Immaneni Priority: Minor Currently, sending data to subscribers happens from tasks that need to be submitted to an executor on a continual basis to run. These submissions happen when there is new data from publishers, when subscribers connect for the first time or resume or by existing tasks when new data arrives while the task is executing. There have been corner case scenarios where the tasks were not submitted and sending data to subscribers stalled indefinitely. Investigate usage of a more reliable mechanism such as a scheduled executor that can guarantee that at least a single sending of data task will be scheduled on a timeout even if there no other send tasks are submitted. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (APEXCORE-747) Provide additional ToStringStyle options
[ https://issues.apache.org/jira/browse/APEXCORE-747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pramod Immaneni resolved APEXCORE-747. -- Resolution: Fixed Fix Version/s: 3.7.0 > Provide additional ToStringStyle options > > > Key: APEXCORE-747 > URL: https://issues.apache.org/jira/browse/APEXCORE-747 > Project: Apache Apex Core > Issue Type: Improvement >Reporter: Vlad Rozov >Assignee: Vlad Rozov >Priority: Minor > Fix For: 3.7.0 > > > Existing org.apache.commons.lang.builder.ToStringStyle do not support simple > class name. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (APEXCORE-682) Not able to view application from a web service when a user specified launch path is provided
[ https://issues.apache.org/jira/browse/APEXCORE-682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pramod Immaneni resolved APEXCORE-682. -- Resolution: Fixed Fix Version/s: 3.7.0 > Not able to view application from a web service when a user specified launch > path is provided > - > > Key: APEXCORE-682 > URL: https://issues.apache.org/jira/browse/APEXCORE-682 > Project: Apache Apex Core > Issue Type: Bug >Reporter: devendra tagare >Assignee: devendra tagare > Fix For: 3.7.0 > > > Hi, > I was trying to launch an apex app from a user defined path by specifying > dt.attr.APPLICATION_PATH = > No other user has access to the above path. > The application launch works fine but I am not able to get the application > info from a webservice since the retrieveWebServicesInfo in StramAgent is > failing on an AccessControlException for the permissions.json file which is > not in use. > Since this file is not a mandatory file, we can catch an IOException here > which would encompass the existing FileNotFoundException & > AccessControlException and let the call return the StramWebServicesInfo > object. > Proposed change : > try (FSDataInputStream is = fileSystem.open(new Path(appPath, > "permissions.json"))) { > permissionsInfo = new JSONObject(IOUtils.toString(is)); > } catch (IOException ex) { > // ignore if file is not found > LOG.info("Exception in accessing the permissions file", ex); > } > [~PramodSSImmaneni] - does this approach look fine ? > Thanks, > Dev -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (APEXCORE-712) Support distribution of custom SSL material to the Stram node while launch the app
[ https://issues.apache.org/jira/browse/APEXCORE-712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16057870#comment-16057870 ] Pramod Immaneni commented on APEXCORE-712: -- I think you mean to have SSL_CONFIG as a more complex object like a POJO instead of a filename that refers to a file that contains all ssl related properties. That works for me in terms of more direct way to specifying these properties. Why would we need the long naming, using hadoop specific names, for the properties of that object. Why can't they be specified with regular field names like keystoreLocation, keystorePassord, keyPassword etc. This way the user does not need to be aware of the actual property names being used by hadoop and is more portable in case we support another runtime in the future. For convenience, for the user, the entire object with the field values can be specified as a single String property as defined by the StringCodec.Object2String class. I would prefer a more targeted name like MANAGEMENT_SSL_CONFIG as I am not sure if this would be enough to support any SSL configuration needed in future (like if this is used for encryption data flow itself). But, if you want to go with SSL_CONFIG I would be ok. > Support distribution of custom SSL material to the Stram node while launch > the app > -- > > Key: APEXCORE-712 > URL: https://issues.apache.org/jira/browse/APEXCORE-712 > Project: Apache Apex Core > Issue Type: Improvement >Reporter: Sanjay M Pujare >Assignee: devendra tagare > Original Estimate: 2h > Remaining Estimate: 2h > > This JIRA is dependent on APEXCORE-711. APEXCORE-711 talks about using a > custom SSL configuration but assumes the SSL files (ssl-server.xml and the > keystore) are already available on any cluster node so when the Stram starts > it is able to find them. There are cases where users don't want to do this > and they expect the Apex client to package these files so that they are > copied to the App master node so when Stram starts it will find them in the > expected location. > Enhance the Apex client/launcher to distribute the custom SSL files (XML and > the keystore) along with the application jars/resources so the user does not > need to pre-distribute the custom SSL files. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (APEXCORE-740) Setup plugins does not identify operator classes because they're loaded through different classloaders
[ https://issues.apache.org/jira/browse/APEXCORE-740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pramod Immaneni resolved APEXCORE-740. -- Resolution: Fixed Fix Version/s: 3.7.0 > Setup plugins does not identify operator classes because they're loaded > through different classloaders > -- > > Key: APEXCORE-740 > URL: https://issues.apache.org/jira/browse/APEXCORE-740 > Project: Apache Apex Core > Issue Type: Bug >Reporter: Chinmay Kolhatkar >Assignee: Chinmay Kolhatkar > Fix For: 3.7.0 > > > Apexcli loads setup plugin and operator classes with different thread context > class loader. Because of this, the setup plugin does not understand the > operator classes. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (APEXCORE-733) Add ability to use impersonated user's HDFS path for storing application resources
[ https://issues.apache.org/jira/browse/APEXCORE-733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16049851#comment-16049851 ] Pramod Immaneni commented on APEXCORE-733: -- Sounds good in principle from a preliminary reading. Let me go through it in detail and also the PR and will let you know. > Add ability to use impersonated user's HDFS path for storing application > resources > -- > > Key: APEXCORE-733 > URL: https://issues.apache.org/jira/browse/APEXCORE-733 > Project: Apache Apex Core > Issue Type: Improvement >Reporter: Pramod Immaneni >Assignee: Sanjay M Pujare > > When an application is launched using impersonation, the impersonating user's > hdfs home folder is used to store application resources such as the jars > needed to launch the application and post launch checkpoints, tuple recording > etc. > In some scenarios this is not desirable as the impersonated user needs to > have access to the impersonating user's folders. We need the ability to be > able to use the impersonated user's home folder in these cases. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (APEXCORE-735) Upgrade maven-dependency-plugin
[ https://issues.apache.org/jira/browse/APEXCORE-735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pramod Immaneni resolved APEXCORE-735. -- Resolution: Fixed Fix Version/s: 3.7.0 > Upgrade maven-dependency-plugin > --- > > Key: APEXCORE-735 > URL: https://issues.apache.org/jira/browse/APEXCORE-735 > Project: Apache Apex Core > Issue Type: Dependency upgrade >Reporter: Vlad Rozov >Assignee: Vlad Rozov >Priority: Minor > Fix For: 3.7.0 > > > Upgrade maven-dependency-plugin to 2.10 (the same as the apache parent pom or > more recent version). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (APEXCORE-741) Upgrade netlet dependency to 1.3.1
[ https://issues.apache.org/jira/browse/APEXCORE-741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pramod Immaneni resolved APEXCORE-741. -- Resolution: Fixed Fix Version/s: 3.7.0 > Upgrade netlet dependency to 1.3.1 > -- > > Key: APEXCORE-741 > URL: https://issues.apache.org/jira/browse/APEXCORE-741 > Project: Apache Apex Core > Issue Type: Dependency upgrade >Reporter: Vlad Rozov >Assignee: Vlad Rozov >Priority: Trivial > Fix For: 3.7.0 > > > There are few fixes in netlet 1.3.1 that will be good to have for the Apex > local cluster, Apex Beam runner in local mode and unit tests. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (APEXCORE-742) Yarn client not being correctly initialized in all cases
Pramod Immaneni created APEXCORE-742: Summary: Yarn client not being correctly initialized in all cases Key: APEXCORE-742 URL: https://issues.apache.org/jira/browse/APEXCORE-742 Project: Apache Apex Core Issue Type: Bug Reporter: Pramod Immaneni Assignee: Pramod Immaneni There are multiple places in the code where a YarnClient instance is being created and initialized. There are a couple of places where this is not being done correctly. Also, most of the creation and initialization is similar and can be abstracted into a common method. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (APEXCORE-736) Unable to fetch application master container report with kerberized web services in STRAM
[ https://issues.apache.org/jira/browse/APEXCORE-736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pramod Immaneni updated APEXCORE-736: - Summary: Unable to fetch application master container report with kerberized web services in STRAM (was: Unable to fetch container report with kerberized web services in STRAM) > Unable to fetch application master container report with kerberized web > services in STRAM > - > > Key: APEXCORE-736 > URL: https://issues.apache.org/jira/browse/APEXCORE-736 > Project: Apache Apex Core > Issue Type: Bug >Reporter: devendra tagare >Assignee: devendra tagare > > Currently we are using a WebServicesClient in StreamingContainerManager to > get container report from Yarn. > Using the WebServicesClient will throw and exception if the webservices are > kerberized. > We should use a YARN client instead which can access kerberized YARN > webservices as per its access levels. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (APEXCORE-736) Unable to fetch container report with kerberized web services in STRAM
[ https://issues.apache.org/jira/browse/APEXCORE-736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pramod Immaneni updated APEXCORE-736: - Summary: Unable to fetch container report with kerberized web services in STRAM (was: Security fix : YarnClient in StreamingContainerManager) > Unable to fetch container report with kerberized web services in STRAM > -- > > Key: APEXCORE-736 > URL: https://issues.apache.org/jira/browse/APEXCORE-736 > Project: Apache Apex Core > Issue Type: Bug >Reporter: devendra tagare >Assignee: devendra tagare > > Currently we are using a WebServicesClient in StreamingContainerManager to > get container report from Yarn. > Using the WebServicesClient will throw and exception if the webservices are > kerberized. > We should use a YARN client instead which can access kerberized YARN > webservices as per its access levels. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (APEXCORE-736) Security fix : YarnClient in StreamingContainerManager
[ https://issues.apache.org/jira/browse/APEXCORE-736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16035997#comment-16035997 ] Pramod Immaneni commented on APEXCORE-736: -- The exception is thrown because yarn web services would expect the client to authenticate with kerberos credentials and the client, in this case the container, has none. > Security fix : YarnClient in StreamingContainerManager > -- > > Key: APEXCORE-736 > URL: https://issues.apache.org/jira/browse/APEXCORE-736 > Project: Apache Apex Core > Issue Type: Bug >Reporter: devendra tagare >Assignee: devendra tagare > > Currently we are using a WebServicesClient in StreamingContainerManager to > get container report from Yarn. > Using the WebServicesClient will throw and exception if the webservices are > kerberized. > We should use a YARN client instead which can access kerberized YARN > webservices as per its access levels. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (APEXCORE-732) Container fails where there is a serialization problem in tuple recording
[ https://issues.apache.org/jira/browse/APEXCORE-732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16029554#comment-16029554 ] Pramod Immaneni commented on APEXCORE-732: -- Here is an excerpt of an exception that one might see Exception Message... at com.datatorrent.common.codec.JsonStreamCodec.toByteArray(JsonStreamCodec.java:99) at com.datatorrent.stram.debug.TupleRecorder.writeTuple(TupleRecorder.java:400) at com.datatorrent.stram.debug.TupleRecorder$RecorderSink.put(TupleRecorder.java:504) at com.datatorrent.stram.debug.MuxSink.put(MuxSink.java:55) at com.datatorrent.stram.debug.TappedReservoir.put(TappedReservoir.java:80) at com.datatorrent.stram.stream.BufferServerSubscriber$BufferReservoir.sweep(BufferServerSubscriber.java:288) at com.datatorrent.stram.debug.TappedReservoir.sweep(TappedReservoir.java:56) at com.datatorrent.stram.engine.GenericNode.run(GenericNode.java:269) at com.datatorrent.stram.engine.StreamingContainer$2.run(StreamingContainer.java:1441) > Container fails where there is a serialization problem in tuple recording > - > > Key: APEXCORE-732 > URL: https://issues.apache.org/jira/browse/APEXCORE-732 > Project: Apache Apex Core > Issue Type: Bug >Reporter: Pramod Immaneni >Assignee: Pramod Immaneni >Priority: Minor > > Tuple recording is an optional feature used to record data for debugging > purposes. If there are exceptions while saving the recorded data or > transporting it to the client they are logged and processing continues, > however, if there is an issue in the serialization it is not caught causing > the container to fail. This serialization is json based and does not have any > relation to the serialization of tuples between operators which could still > be fine even though this serialization is having a problem. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (APEXCORE-733) Add ability to use impersonating user's hdfs path for storing application resources
[ https://issues.apache.org/jira/browse/APEXCORE-733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16029539#comment-16029539 ] Pramod Immaneni commented on APEXCORE-733: -- This was discussed on the dev list and a couple of approaches of different approaches to solve it came up in the discussion. It is [here|https://lists.apache.org/thread.html/6abb0f58427a70396f943f99adc7534431016024f61703b248ce7bfb@%3Cdev.apex.apache.org%3E]. A vote was taken and the results are [here|https://lists.apache.org/thread.html/2e4cd9bbfaeec64c8953923c79fdf86377a9dae389bcdb5208979309@%3Cdev.apex.apache.org%3E] The approach is to keep the current behavior as the default and an option that the user can specify to indicate using the impersonated user's resources, which could be just hdfs folder path today but could include other resources in the future. > Add ability to use impersonating user's hdfs path for storing application > resources > --- > > Key: APEXCORE-733 > URL: https://issues.apache.org/jira/browse/APEXCORE-733 > Project: Apache Apex Core > Issue Type: Improvement >Reporter: Pramod Immaneni >Assignee: Sanjay M Pujare > > When an application is launched using impersonation, the impersonating user's > hdfs home folder is used to store application resources such as the jars > needed to launch the application and post launch checkpoints, tuple recording > etc. > In some scenarios this is not desirable as the impersonated user needs to > have access to the impersonating user's folders. We need the ability to be > able to use the impersonated user's home folder in these cases. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (APEXCORE-733) Add ability to use impersonating user's hdfs path for storing application resources
Pramod Immaneni created APEXCORE-733: Summary: Add ability to use impersonating user's hdfs path for storing application resources Key: APEXCORE-733 URL: https://issues.apache.org/jira/browse/APEXCORE-733 Project: Apache Apex Core Issue Type: Improvement Reporter: Pramod Immaneni Assignee: Sanjay M Pujare When an application is launched using impersonation, the impersonating user's hdfs home folder is used to store application resources such as the jars needed to launch the application and post launch checkpoints, tuple recording etc. In some scenarios this is not desirable as the impersonated user needs to have access to the impersonating user's folders. We need the ability to be able to use the impersonated user's home folder in these cases. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (APEXCORE-732) Container fails where there is a serialization problem in tuple recording
Pramod Immaneni created APEXCORE-732: Summary: Container fails where there is a serialization problem in tuple recording Key: APEXCORE-732 URL: https://issues.apache.org/jira/browse/APEXCORE-732 Project: Apache Apex Core Issue Type: Bug Reporter: Pramod Immaneni Assignee: Pramod Immaneni Priority: Minor Tuple recording is an optional feature used to record data for debugging purposes. If there are exceptions while saving the recorded data or transporting it to the client they are logged and processing continues, however, if there is an issue in the serialization it is not caught causing the container to fail. This serialization is json based and does not have any relation to the serialization of tuples between operators which could still be fine even though this serialization is having a problem. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (APEXCORE-726) Impersonating user unable to access application resources when ACLs are enabled
[ https://issues.apache.org/jira/browse/APEXCORE-726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16024413#comment-16024413 ] Pramod Immaneni commented on APEXCORE-726: -- ACLs needed to access the application would need to be specified for the impersonating user when there is no default admin acl. > Impersonating user unable to access application resources when ACLs are > enabled > --- > > Key: APEXCORE-726 > URL: https://issues.apache.org/jira/browse/APEXCORE-726 > Project: Apache Apex Core > Issue Type: Bug > Components: Security >Reporter: Pramod Immaneni >Assignee: Pramod Immaneni > > When an application is launched using impersonation in a hadoop cluster that > has ACLs enabled and no default admin acl, the impersonating user is unable > to access application resources from hadoop services such as logs. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (APEXCORE-727) Create and publish binary releases along with the source releases
[ https://issues.apache.org/jira/browse/APEXCORE-727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pramod Immaneni reassigned APEXCORE-727: Assignee: Pramod Immaneni > Create and publish binary releases along with the source releases > - > > Key: APEXCORE-727 > URL: https://issues.apache.org/jira/browse/APEXCORE-727 > Project: Apache Apex Core > Issue Type: Task >Reporter: Pramod Immaneni >Assignee: Pramod Immaneni >Priority: Minor > > Currently, we only do source releases. It would be easier for the users if > there were binaries readily available. Create and publish binary release on > the website along with sources when a release is made. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (APEXCORE-731) Modifications to the build to generate binary package
[ https://issues.apache.org/jira/browse/APEXCORE-731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pramod Immaneni reassigned APEXCORE-731: Assignee: Thomas Weise > Modifications to the build to generate binary package > - > > Key: APEXCORE-731 > URL: https://issues.apache.org/jira/browse/APEXCORE-731 > Project: Apache Apex Core > Issue Type: Sub-task >Reporter: Pramod Immaneni >Assignee: Thomas Weise >Priority: Minor > > The maven build would need to be updated to also optionally generate package > containing the binaries. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (APEXCORE-731) Additions to the build artifacts to create binary package
Pramod Immaneni created APEXCORE-731: Summary: Additions to the build artifacts to create binary package Key: APEXCORE-731 URL: https://issues.apache.org/jira/browse/APEXCORE-731 Project: Apache Apex Core Issue Type: Sub-task Reporter: Pramod Immaneni Priority: Minor The maven build would need to be updated to also optionally generate package containing the binaries. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (APEXCORE-731) Modifications to the build to generate binary package
[ https://issues.apache.org/jira/browse/APEXCORE-731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pramod Immaneni updated APEXCORE-731: - Summary: Modifications to the build to generate binary package (was: Additions to the build artifacts to create binary package) > Modifications to the build to generate binary package > - > > Key: APEXCORE-731 > URL: https://issues.apache.org/jira/browse/APEXCORE-731 > Project: Apache Apex Core > Issue Type: Sub-task >Reporter: Pramod Immaneni >Priority: Minor > > The maven build would need to be updated to also optionally generate package > containing the binaries. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (APEXCORE-726) Impersonating user unable to access application resources when ACLs are enabled
[ https://issues.apache.org/jira/browse/APEXCORE-726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pramod Immaneni updated APEXCORE-726: - Affects Version/s: (was: 3.6.0) > Impersonating user unable to access application resources when ACLs are > enabled > --- > > Key: APEXCORE-726 > URL: https://issues.apache.org/jira/browse/APEXCORE-726 > Project: Apache Apex Core > Issue Type: Bug > Components: Security >Reporter: Pramod Immaneni >Assignee: Pramod Immaneni > > When an application is launched using impersonation in a hadoop cluster that > has ACLs enabled and no default admin acl, the impersonating user is unable > to access application resources from hadoop services such as logs. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (APEXCORE-726) Impersonating user unable to access application resources when ACLs are enabled
[ https://issues.apache.org/jira/browse/APEXCORE-726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16018857#comment-16018857 ] Pramod Immaneni commented on APEXCORE-726: -- It's in earlier versions as well. I used Affects Version 3.6.0 to indicate 3.6.0 and earlier versions. > Impersonating user unable to access application resources when ACLs are > enabled > --- > > Key: APEXCORE-726 > URL: https://issues.apache.org/jira/browse/APEXCORE-726 > Project: Apache Apex Core > Issue Type: Bug > Components: Security >Affects Versions: 3.6.0 >Reporter: Pramod Immaneni >Assignee: Pramod Immaneni > > When an application is launched using impersonation in a hadoop cluster that > has ACLs enabled and no default admin acl, the impersonating user is unable > to access application resources from hadoop services such as logs. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (APEXCORE-729) Include an example in the release
Pramod Immaneni created APEXCORE-729: Summary: Include an example in the release Key: APEXCORE-729 URL: https://issues.apache.org/jira/browse/APEXCORE-729 Project: Apache Apex Core Issue Type: Sub-task Reporter: Pramod Immaneni Priority: Minor Include a self-contained sample example that the user can run easily and see how the platform works, in the binary. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (APEXCORE-728) Include licenses for dependencies
Pramod Immaneni created APEXCORE-728: Summary: Include licenses for dependencies Key: APEXCORE-728 URL: https://issues.apache.org/jira/browse/APEXCORE-728 Project: Apache Apex Core Issue Type: Sub-task Reporter: Pramod Immaneni Priority: Minor The correct LICENSE and NOTICE files should be included based on the dependencies that are being packaged in the binary. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (APEXCORE-727) Create and publish binary releases along with the source releases
[ https://issues.apache.org/jira/browse/APEXCORE-727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pramod Immaneni updated APEXCORE-727: - Priority: Minor (was: Major) > Create and publish binary releases along with the source releases > - > > Key: APEXCORE-727 > URL: https://issues.apache.org/jira/browse/APEXCORE-727 > Project: Apache Apex Core > Issue Type: Task >Reporter: Pramod Immaneni >Priority: Minor > > Currently, we only do source releases. It would be easier for the users if > there were binaries readily available. Create and publish binary release on > the website along with sources when a release is made. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (APEXCORE-727) Create and publish binary releases along with the source releases
[ https://issues.apache.org/jira/browse/APEXCORE-727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16018853#comment-16018853 ] Pramod Immaneni commented on APEXCORE-727: -- For example, on how to package the binary release, look at https://github.com/atrato/apex-cli-package > Create and publish binary releases along with the source releases > - > > Key: APEXCORE-727 > URL: https://issues.apache.org/jira/browse/APEXCORE-727 > Project: Apache Apex Core > Issue Type: Task >Reporter: Pramod Immaneni > > Currently, we only do source releases. It would be easier for the users if > there were binaries readily available. Create and publish binary release on > the website along with sources when a release is made. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (APEXCORE-727) Create and publish binary releases along with the source releases
Pramod Immaneni created APEXCORE-727: Summary: Create and publish binary releases along with the source releases Key: APEXCORE-727 URL: https://issues.apache.org/jira/browse/APEXCORE-727 Project: Apache Apex Core Issue Type: Task Reporter: Pramod Immaneni Currently, we only do source releases. It would be easier for the users if there were binaries readily available. Create and publish binary release on the website along with sources when a release is made. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (APEXCORE-726) Impersonating user unable to access application resources when ACLs are enabled
Pramod Immaneni created APEXCORE-726: Summary: Impersonating user unable to access application resources when ACLs are enabled Key: APEXCORE-726 URL: https://issues.apache.org/jira/browse/APEXCORE-726 Project: Apache Apex Core Issue Type: Bug Components: Security Affects Versions: 3.6.0 Reporter: Pramod Immaneni Assignee: Pramod Immaneni When an application is launched using impersonation in a hadoop cluster that has ACLs enabled and no default admin acl, the impersonating user is unable to access application resources from hadoop services such as logs. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (APEXMALHAR-2475) CacheStore needn't expire data if it read-only data
[ https://issues.apache.org/jira/browse/APEXMALHAR-2475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pramod Immaneni resolved APEXMALHAR-2475. - Resolution: Fixed Fix Version/s: 3.8.0 > CacheStore needn't expire data if it read-only data > --- > > Key: APEXMALHAR-2475 > URL: https://issues.apache.org/jira/browse/APEXMALHAR-2475 > Project: Apache Apex Malhar > Issue Type: Sub-task >Reporter: Pramod Immaneni >Assignee: Oliver Winke > Fix For: 3.8.0 > > > The db CacheStore implementation supports expiry of data on read or write > after a configurable expiry period. The default is one minute. If the data is > read-only there is no need to expire this data. The max cache size property > will anyway ensure that the cache size doesn't grow indefinitely. The > CacheManager can provide the meta-information whether the data is read-only. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (APEXMALHAR-2474) FSLoader only returns value at the beginning
[ https://issues.apache.org/jira/browse/APEXMALHAR-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pramod Immaneni resolved APEXMALHAR-2474. - Resolution: Fixed Fix Version/s: 3.8.0 > FSLoader only returns value at the beginning > > > Key: APEXMALHAR-2474 > URL: https://issues.apache.org/jira/browse/APEXMALHAR-2474 > Project: Apache Apex Malhar > Issue Type: Sub-task >Reporter: Pramod Immaneni >Assignee: Oliver Winke > Fix For: 3.8.0 > > > FSLoader implements Backup store for db CacheManager. In the initial load, it > reads all the lines of the file, line by line, and returns a Map of key-value > pairs with a key-value pair for every line. It returns data only on the > initial load and thereafter it returns null for any key lookup. Also, there > is no need to load all the data in the file and return it if the primary > cache cannot hold all the entries. These issues need to be addressed and it > also helps if the CacheManager supplies meta-information such as how much > information should be loaded and returned in the initial load. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (APEXMALHAR-2473) Support for global cache meta information in db CacheManager
[ https://issues.apache.org/jira/browse/APEXMALHAR-2473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pramod Immaneni resolved APEXMALHAR-2473. - Resolution: Fixed Fix Version/s: 3.8.0 > Support for global cache meta information in db CacheManager > > > Key: APEXMALHAR-2473 > URL: https://issues.apache.org/jira/browse/APEXMALHAR-2473 > Project: Apache Apex Malhar > Issue Type: Improvement >Reporter: Pramod Immaneni >Assignee: Oliver Winke > Fix For: 3.8.0 > > > Currently db CacheManager has no knowledge of characteristics of the data or > the cache stores, so it handles all scenarios uniformly. This may not be the > optimal implementation in all cases. Better optimizations can be performed in > the manager if this information is known. A few examples, if the data is > read-only the keys in the primary cache need not be refreshed like they are > being done daily today, if the primary cache size is known the number of > initial entries loaded from backup needn't exceed it. Add support for such > general cache meta information in the manager. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (APEXCORE-721) Announcement section on the website is not uptodate
Pramod Immaneni created APEXCORE-721: Summary: Announcement section on the website is not uptodate Key: APEXCORE-721 URL: https://issues.apache.org/jira/browse/APEXCORE-721 Project: Apache Apex Core Issue Type: Bug Components: Website Reporter: Pramod Immaneni Assignee: Pramod Immaneni Announcement section on the main page on the website still lists malhar 3.6.0 and core 3.5.0 as the latest releases. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (APEXCORE-711) Support custom SSL keystore for the Stram REST API web service
[ https://issues.apache.org/jira/browse/APEXCORE-711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pramod Immaneni resolved APEXCORE-711. -- Resolution: Fixed Fix Version/s: 3.7.0 > Support custom SSL keystore for the Stram REST API web service > -- > > Key: APEXCORE-711 > URL: https://issues.apache.org/jira/browse/APEXCORE-711 > Project: Apache Apex Core > Issue Type: Improvement >Reporter: Sanjay M Pujare >Assignee: Sanjay M Pujare > Fix For: 3.7.0 > > Original Estimate: 72h > Remaining Estimate: 72h > > Currently StrAM supports only the default Hadoop SSL configuration for the > web-service because it uses org.apache.hadoop.yarn.webapp.WebApps helper > class which has the limitation of only using the default Hadoop SSL config > that is read from Hadoop's ssl-server.xml resource file. Some users have run > into a situation where Hadoops' SSL keystore is not available on most cluster > nodes or the Stram process doesn't have read access to the keystore even when > present. So there is a need for the Stram to use a custom SSL keystore and > configuration that does not suffer from these limitations. > There is already a PR https://github.com/apache/hadoop/pull/213 to Hadoop to > support this in Hadoop and it is in the process of getting merged soon. > After that Stram needs to be enhanced (this JIRA) to accept the location of a > custom ssl-server.xml file (supplied by the client via a DAG attribute) and > use the values from that file to set up the config object to be passed to > WebApps which will end up using the custom SSL configuration. This approach > has already been verified in a prototype. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (APEXCORE-715) Remove unnecessary @Evolving annotation in engine
[ https://issues.apache.org/jira/browse/APEXCORE-715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pramod Immaneni resolved APEXCORE-715. -- Resolution: Fixed > Remove unnecessary @Evolving annotation in engine > - > > Key: APEXCORE-715 > URL: https://issues.apache.org/jira/browse/APEXCORE-715 > Project: Apache Apex Core > Issue Type: Improvement >Reporter: Vlad Rozov >Assignee: Vlad Rozov >Priority: Minor > > \@Evolving annotation in the engine are not necessary and introduce > unnecessarily practice and wrong impression that some interfaces and/or > classes are more stable compared to others. All interfaces and classes in the > engine are not subject to the semantic version checks and may change from a > release to a release. The same applies to the buffer server. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (APEXCORE-716) Add a package level javadoc for engine api about it's intended use
Pramod Immaneni created APEXCORE-716: Summary: Add a package level javadoc for engine api about it's intended use Key: APEXCORE-716 URL: https://issues.apache.org/jira/browse/APEXCORE-716 Project: Apache Apex Core Issue Type: Improvement Reporter: Pramod Immaneni Assignee: Pramod Immaneni Currently, there is an engine api package that allows the use of some internal engine functionality. However, there are no guarantees on the stability of the api or in which scenarios it can be used. This should be documented. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (APEXCORE-714) Reusable instance operator recovery
[ https://issues.apache.org/jira/browse/APEXCORE-714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15985816#comment-15985816 ] Pramod Immaneni commented on APEXCORE-714: -- Thomas, My responses are inline.. On Tue, Apr 25, 2017 at 8:42 AM, Thomas Weisewrote: Pramod, Sounds like some sort of alternative "processing mode" that from engine perspective allows potentially inconsistent state when there is a pipeline failure. This is of course only something the user can decide. >> Calling it an alternate processing mode is a good idea. Does the proposal assume that the operator state is immutable (or what is sometimes tagged with the stateless annotation)? For example an operator that has to load a large amount of state from another source before it can process the first tuple? >> Operator state can change, not necessarily stateless. Stateless may not >> automatically fall into this category as our current definition of stateless >> denotes window level stateless and not necessarily tuple level. Also, it would be an optimization but not something that will help with SLA if the operator still needs to be recovered when its own container fails. It might help to clarify that and also why there is a need to recover in the batch use case (vs. reprocess). >> Correct, as I mentioned in the last statement, if the container where the >> operator is running itself goes down then it is recovery from checkpoint and >> business as usual. I may have misspoken about batch, meant to say apps where >> operators have large state not necessarily batch, the use case we are >> dealing with is batch and restart is not practical as the run takes a long >> time. Thanks, Pramod > Reusable instance operator recovery > --- > > Key: APEXCORE-714 > URL: https://issues.apache.org/jira/browse/APEXCORE-714 > Project: Apache Apex Core > Issue Type: Improvement >Reporter: Pramod Immaneni >Assignee: Pramod Immaneni > > In a failure scenario, when a container fails, it is redeployed along with > all the operators in it. The operators downstream to these operators are also > redeployed within their containers. The operators are restored from their > checkpoint and connect to the appropriate point in the stream according to > the processing mode. In at least once mode, for example, the data is replayed > from the same checkpoint > Restoring an operator state from checkpoint could turn out to be a costly > operation depending on the size of the state. In some use cases, based on the > operator logic, when there is an upstream failure, without restoring the > operator from checkpoint and reusing the current instance, will still produce > the same results with the data replayed from the last fully processed window. > The operator state can remain the same as it was before the upstream failure > by reusing the same operator instance from before and only the streams and > window reset to the window after the last fully processed window to guarantee > the at least once processing of tuples. If the container where the operator > itself is running goes down, it would need to be restored from the checkpoint > of course. This scenario occurs in some batch use cases with operators that > have a large state. > I would like to propose adding the ability for a user to explicitly identify > operators to be of this type and the corresponding functionality in the > engine to handle their recovery in the way described above by not restoring > their state from checkpoint, reusing the instance and restoring the stream to > the window after the last fully processed window for the operator. When > operators are not identified to be of this type, the default behavior is what > it is today and nothing changes. > I have done some prototyping on the engine side to ensure that this is > possible with our current code base without requiring a massive overhaul, > especially the restoration of the operator instance within the Node in the > streaming container, the re-establishment of the subscriber stream to a > window in the buffer server where the publisher (upstream) hasn't yet reached > as it would be restarting from checkpoint and have been able to get it all > working successfully. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (APEXCORE-714) Reusable instance operator recovery
Pramod Immaneni created APEXCORE-714: Summary: Reusable instance operator recovery Key: APEXCORE-714 URL: https://issues.apache.org/jira/browse/APEXCORE-714 Project: Apache Apex Core Issue Type: Improvement Reporter: Pramod Immaneni Assignee: Pramod Immaneni In a failure scenario, when a container fails, it is redeployed along with all the operators in it. The operators downstream to these operators are also redeployed within their containers. The operators are restored from their checkpoint and connect to the appropriate point in the stream according to the processing mode. In at least once mode, for example, the data is replayed from the same checkpoint Restoring an operator state from checkpoint could turn out to be a costly operation depending on the size of the state. In some use cases, based on the operator logic, when there is an upstream failure, without restoring the operator from checkpoint and reusing the current instance, will still produce the same results with the data replayed from the last fully processed window. The operator state can remain the same as it was before the upstream failure by reusing the same operator instance from before and only the streams and window reset to the window after the last fully processed window to guarantee the at least once processing of tuples. If the container where the operator itself is running goes down, it would need to be restored from the checkpoint of course. This scenario occurs in some batch use cases with operators that have a large state. I would like to propose adding the ability for a user to explicitly identify operators to be of this type and the corresponding functionality in the engine to handle their recovery in the way described above by not restoring their state from checkpoint, reusing the instance and restoring the stream to the window after the last fully processed window for the operator. When operators are not identified to be of this type, the default behavior is what it is today and nothing changes. I have done some prototyping on the engine side to ensure that this is possible with our current code base without requiring a massive overhaul, especially the restoration of the operator instance within the Node in the streaming container, the re-establishment of the subscriber stream to a window in the buffer server where the publisher (upstream) hasn't yet reached as it would be restarting from checkpoint and have been able to get it all working successfully. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (APEXCORE-641) Subscribers/DataListeners may not be scheduled to execute even when they have data to process
[ https://issues.apache.org/jira/browse/APEXCORE-641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pramod Immaneni resolved APEXCORE-641. -- Resolution: Fixed > Subscribers/DataListeners may not be scheduled to execute even when they have > data to process > - > > Key: APEXCORE-641 > URL: https://issues.apache.org/jira/browse/APEXCORE-641 > Project: Apache Apex Core > Issue Type: Bug > Components: Buffer Server >Affects Versions: 3.2.0, 3.3.0, 3.2.1, 3.4.0, 3.5.0, 3.6.0 >Reporter: Vlad Rozov >Assignee: Vlad Rozov > Fix For: 3.6.0 > > > Buffer server iterates over DataListeners aka LogicalNodes and each > LogicalNode tries to send to it's downstream all data that Publisher added to > the DataList. When an output port is connected to multiple partitions or > downstream operators (2 or more DataListeners/LogicalNodes) there may be more > data published to the DataList after first few DataListeners in the listeners > set iterated over DataList and reached the last block published so far. The > data published while the last DataListeners sends data to it's downstream > will not be processed by other DataListeners until Publisher adds more data > to the DataList. This may lead to blocked operators, as Buffer server may > stop processing data completely in case Publisher fills more than one Data > block while a single DataListener sends data to it's downstream and there are > more Subscribers/DataListeners than number of in memory blocks allowed (8). > In such case, Publisher will be suspended, and there will be no task > scheduled to process data already published to the DataList. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (APEXCORE-710) Plugin manager does not report exceptions from plugins
Pramod Immaneni created APEXCORE-710: Summary: Plugin manager does not report exceptions from plugins Key: APEXCORE-710 URL: https://issues.apache.org/jira/browse/APEXCORE-710 Project: Apache Apex Core Issue Type: Sub-task Reporter: Pramod Immaneni If there are exceptions during execution of a plugin they should be reported. The plugin should possibly be taken down as it might be in a bad state. An event may need to be raised in stram to notify users. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (APEXCORE-699) Investigate versioning for plugins
[ https://issues.apache.org/jira/browse/APEXCORE-699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pramod Immaneni updated APEXCORE-699: - Fix Version/s: 3.6.0 > Investigate versioning for plugins > -- > > Key: APEXCORE-699 > URL: https://issues.apache.org/jira/browse/APEXCORE-699 > Project: Apache Apex Core > Issue Type: Sub-task >Reporter: Pramod Immaneni >Assignee: Tushar Gosavi >Priority: Minor > Fix For: 3.6.0 > > > Having versioning information in the plugin would help in dealing with > compatibility with older plugins when the plugin interface changes. This > needs to be investigated. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (APEXCORE-700) Make the plugin registration interface uniform
[ https://issues.apache.org/jira/browse/APEXCORE-700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pramod Immaneni updated APEXCORE-700: - Fix Version/s: 3.6.0 > Make the plugin registration interface uniform > -- > > Key: APEXCORE-700 > URL: https://issues.apache.org/jira/browse/APEXCORE-700 > Project: Apache Apex Core > Issue Type: Sub-task >Reporter: Pramod Immaneni >Assignee: Pramod Immaneni >Priority: Minor > Fix For: 3.6.0 > > > The user-facing plugin registration for DAG setup plugins is slightly > different from runtime plugins. It would be better to have a uniform way to > do this. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (APEXCORE-699) Investigate versioning for plugins
[ https://issues.apache.org/jira/browse/APEXCORE-699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15972063#comment-15972063 ] Pramod Immaneni commented on APEXCORE-699: -- Thanks for checking this. What if the version that the plugin returns is an enum that is defined by the framework, then the user can't return some random version number that the framework will not know what to do with. Second you could have a convenience constant like CURRENT_VERSION (equivalent to the VERSION variable above) that the user can simply use in their plugin for convenience. Regd, the base class, if we put getVersion in the BasePlugin (provided by the engine) and they extend it, aren't we back to the same problem when the new version of the engine has a newer base class. > Investigate versioning for plugins > -- > > Key: APEXCORE-699 > URL: https://issues.apache.org/jira/browse/APEXCORE-699 > Project: Apache Apex Core > Issue Type: Sub-task >Reporter: Pramod Immaneni >Assignee: Tushar Gosavi >Priority: Minor > > Having versioning information in the plugin would help in dealing with > compatibility with older plugins when the plugin interface changes. This > needs to be investigated. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Closed] (APEXCORE-618) ClassNotFoundException thrown when launching app on secure Hadoop
[ https://issues.apache.org/jira/browse/APEXCORE-618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pramod Immaneni closed APEXCORE-618. > ClassNotFoundException thrown when launching app on secure Hadoop > - > > Key: APEXCORE-618 > URL: https://issues.apache.org/jira/browse/APEXCORE-618 > Project: Apache Apex Core > Issue Type: Bug > Environment: hadoop 2.7.3 >Reporter: David Yan >Assignee: David Yan > > When launching an app on a kerberos enabled hadoop (plain Apache Hadoop), > launching pi demo throws this error: > {code} > 2017-01-21 02:13:50,208 ERROR com.datatorrent.stram.StreamingAppMaster: > Exiting Application Master > java.lang.NoClassDefFoundError: > com/sun/jersey/client/apache4/ApacheHttpClient4Handler > at > com.datatorrent.stram.util.WebServicesClient.(WebServicesClient.java:153) > at > com.datatorrent.stram.util.WebServicesClient.(WebServicesClient.java:140) > at > com.datatorrent.stram.StreamingContainerManager.getAppMasterContainerInfo(StreamingContainerManager.java:481) > at > com.datatorrent.stram.StreamingContainerManager.init(StreamingContainerManager.java:448) > at > com.datatorrent.stram.StreamingContainerManager.(StreamingContainerManager.java:420) > at > com.datatorrent.stram.StreamingContainerManager.getInstance(StreamingContainerManager.java:3065) > at > com.datatorrent.stram.StreamingAppMasterService.serviceInit(StreamingAppMasterService.java:552) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > com.datatorrent.stram.StreamingAppMaster.main(StreamingAppMaster.java:102) > Caused by: java.lang.ClassNotFoundException: > com.sun.jersey.client.apache4.ApacheHttpClient4Handler > at java.net.URLClassLoader.findClass(URLClassLoader.java:381) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > ... 9 more > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)