[jira] [Commented] (FLINK-5712) update several deprecated configuration options
[ https://issues.apache.org/jira/browse/FLINK-5712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15862212#comment-15862212 ] ASF GitHub Bot commented on FLINK-5712: --- Github user barcahead commented on a diff in the pull request: https://github.com/apache/flink/pull/3267#discussion_r100660350 --- Diff: flink-mesos/src/main/java/org/apache/flink/mesos/runtime/clusterframework/services/MesosServicesUtils.java --- @@ -40,9 +41,11 @@ public static MesosServices createMesosServices(Configuration configuration) thr return new StandaloneMesosServices(); case ZOOKEEPER: - final String zkMesosRootPath = configuration.getString( + final String zkMesosRootPath = ConfigurationUtil.getStringWithDeprecatedKeys( + configuration, ConfigConstants.HA_ZOOKEEPER_MESOS_WORKERS_PATH, - ConfigConstants.DEFAULT_ZOOKEEPER_MESOS_WORKERS_PATH); + ConfigConstants.DEFAULT_ZOOKEEPER_MESOS_WORKERS_PATH, + ConfigConstants.ZOOKEEPER_MESOS_WORKERS_PATH); --- End diff -- Thanks for the review. I looked into `HighAvailabilityOptions` and found that it doesn't contain all the HA options, some options are still in `ConfigConstants`. It looks like this part is still in the middle of refactoring, right? I also heard some discussion from @greghogan about generating configuration document from `ConfigOption` automatically. Is there any work I can help with, like move options from `ConfigConstants` to corresponding `xxConfigOptions` files or the automatic work? If it is the right direction and there is some work I can do, I would say to have another PR for the work, if not I would just move `HA_ZOOKEEPER_MESOS_WORKERS_PATH` to `HighAvailabilityOptions` and finish this PR. > update several deprecated configuration options > > > Key: FLINK-5712 > URL: https://issues.apache.org/jira/browse/FLINK-5712 > Project: Flink > Issue Type: Bug > Components: Documentation, Mesos >Affects Versions: 1.2.0, 1.3.0 >Reporter: Yelei Feng >Priority: Minor > Labels: configuration, document > Fix For: 1.3.0 > > > 1. We should use 'containerized.heap-cutoff-ratio' and > 'containerized.heap-cutoff-min' instead of deprecated yarn-specific options > in configuration doc. > 2. In mesos mode, we still use deprecated naming convention of zookeeper - > 'recovery.zookeeper.path.mesos-workers'. We should make it consistent with > other zookeeper options by using > 'high-availability.zookeeper.path.mesos-workers'. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] flink pull request #3267: [FLINK-5712] [config] update several deprecated co...
Github user barcahead commented on a diff in the pull request: https://github.com/apache/flink/pull/3267#discussion_r100660350 --- Diff: flink-mesos/src/main/java/org/apache/flink/mesos/runtime/clusterframework/services/MesosServicesUtils.java --- @@ -40,9 +41,11 @@ public static MesosServices createMesosServices(Configuration configuration) thr return new StandaloneMesosServices(); case ZOOKEEPER: - final String zkMesosRootPath = configuration.getString( + final String zkMesosRootPath = ConfigurationUtil.getStringWithDeprecatedKeys( + configuration, ConfigConstants.HA_ZOOKEEPER_MESOS_WORKERS_PATH, - ConfigConstants.DEFAULT_ZOOKEEPER_MESOS_WORKERS_PATH); + ConfigConstants.DEFAULT_ZOOKEEPER_MESOS_WORKERS_PATH, + ConfigConstants.ZOOKEEPER_MESOS_WORKERS_PATH); --- End diff -- Thanks for the review. I looked into `HighAvailabilityOptions` and found that it doesn't contain all the HA options, some options are still in `ConfigConstants`. It looks like this part is still in the middle of refactoring, right? I also heard some discussion from @greghogan about generating configuration document from `ConfigOption` automatically. Is there any work I can help with, like move options from `ConfigConstants` to corresponding `xxConfigOptions` files or the automatic work? If it is the right direction and there is some work I can do, I would say to have another PR for the work, if not I would just move `HA_ZOOKEEPER_MESOS_WORKERS_PATH` to `HighAvailabilityOptions` and finish this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Comment Edited] (FLINK-5773) Cannot cast scala.util.Failure to org.apache.flink.runtime.messages.Acknowledge
[ https://issues.apache.org/jira/browse/FLINK-5773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15862176#comment-15862176 ] sunjincheng edited comment on FLINK-5773 at 2/11/17 2:54 AM: - HI, [~colinbreame], We can look at `setMaxParallelism` repeated scala'doc. {code} Sets the maximum degree of parallelism defined for the program. The maximum degree of parallelism specifies the upper limit for dynamic scaling. It also defines the number of key groups used for partitioned state. {code} This set value `setMaxParallelism(valueA)` is the `setParallelism(valueB)` associated which requires (valueA >= valueB). The concurrency of your program In your local default parallelism may be 4, so request valueA>= 4, you can try to set `env.setParallelism (1) `then you can `setMaxParallelism` any number greater than 0, can you try it? was (Author: sunjincheng121): HI, [~colinbreame], We can look at xx repeated scala'doc. {code} Sets the maximum degree of parallelism defined for the program. The maximum degree of parallelism specifies the upper limit for dynamic scaling. It also defines the number of key groups used for partitioned state. {code} This set value setMaxParallelism(valueA) is the setParallelism(valueB) associated which requires (valueA >= valueB). The concurrency of your program In your local default parallelism may be 4, so request valueA>= 4, you can try to set env .setParallelism (1) then you can setMaxParallelism any number greater than 0, can you try it? > Cannot cast scala.util.Failure to > org.apache.flink.runtime.messages.Acknowledge > --- > > Key: FLINK-5773 > URL: https://issues.apache.org/jira/browse/FLINK-5773 > Project: Flink > Issue Type: Bug > Components: DataStream API >Affects Versions: 1.2.0 >Reporter: Colin Breame > Fix For: 1.2.1 > > > The exception below happens when I set the > StreamExecutionEnvironment.setMaxParallelism() to anything less than 4. > Let me know if you need more information. > {code} > Caused by: java.lang.ClassCastException: Cannot cast scala.util.Failure to > org.apache.flink.runtime.messages.Acknowledge > at java.lang.Class.cast(Class.java:3369) > at scala.concurrent.Future$$anonfun$mapTo$1.apply(Future.scala:405) > at scala.util.Success$$anonfun$map$1.apply(Try.scala:206) > at scala.util.Try$.apply(Try.scala:161) > at scala.util.Success.map(Try.scala:206) > at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235) > at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235) > at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32) > at > scala.concurrent.Future$InternalCallbackExecutor$Batch$$anonfun$run$1.processBatch$1(Future.scala:643) > at > scala.concurrent.Future$InternalCallbackExecutor$Batch$$anonfun$run$1.apply$mcV$sp(Future.scala:658) > at > scala.concurrent.Future$InternalCallbackExecutor$Batch$$anonfun$run$1.apply(Future.scala:635) > at > scala.concurrent.Future$InternalCallbackExecutor$Batch$$anonfun$run$1.apply(Future.scala:635) > at > scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72) > at > scala.concurrent.Future$InternalCallbackExecutor$Batch.run(Future.scala:634) > at > scala.concurrent.Future$InternalCallbackExecutor$.scala$concurrent$Future$InternalCallbackExecutor$$unbatchedExecute(Future.scala:694) > at > scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:685) > at > scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40) > at > scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248) > at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:266) > at > org.apache.flink.runtime.taskmanager.TaskManager.submitTask(TaskManager.scala:1206) > at > org.apache.flink.runtime.taskmanager.TaskManager.org$apache$flink$runtime$taskmanager$TaskManager$$handleTaskMessage(TaskManager.scala:458) > at > org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$handleMessage$1.applyOrElse(TaskManager.scala:280) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) > at > org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:36) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) > at >
[jira] [Commented] (FLINK-5773) Cannot cast scala.util.Failure to org.apache.flink.runtime.messages.Acknowledge
[ https://issues.apache.org/jira/browse/FLINK-5773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15862176#comment-15862176 ] sunjincheng commented on FLINK-5773: HI, [~colinbreame], We can look at xx repeated scala'doc. {code} Sets the maximum degree of parallelism defined for the program. The maximum degree of parallelism specifies the upper limit for dynamic scaling. It also defines the number of key groups used for partitioned state. {code} This set value setMaxParallelism(valueA) is the setParallelism(valueB) associated which requires (valueA >= valueB). The concurrency of your program In your local default parallelism may be 4, so request valueA>= 4, you can try to set env .setParallelism (1) then you can setMaxParallelism any number greater than 0, can you try it? > Cannot cast scala.util.Failure to > org.apache.flink.runtime.messages.Acknowledge > --- > > Key: FLINK-5773 > URL: https://issues.apache.org/jira/browse/FLINK-5773 > Project: Flink > Issue Type: Bug > Components: DataStream API >Affects Versions: 1.2.0 >Reporter: Colin Breame > Fix For: 1.2.1 > > > The exception below happens when I set the > StreamExecutionEnvironment.setMaxParallelism() to anything less than 4. > Let me know if you need more information. > {code} > Caused by: java.lang.ClassCastException: Cannot cast scala.util.Failure to > org.apache.flink.runtime.messages.Acknowledge > at java.lang.Class.cast(Class.java:3369) > at scala.concurrent.Future$$anonfun$mapTo$1.apply(Future.scala:405) > at scala.util.Success$$anonfun$map$1.apply(Try.scala:206) > at scala.util.Try$.apply(Try.scala:161) > at scala.util.Success.map(Try.scala:206) > at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235) > at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235) > at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32) > at > scala.concurrent.Future$InternalCallbackExecutor$Batch$$anonfun$run$1.processBatch$1(Future.scala:643) > at > scala.concurrent.Future$InternalCallbackExecutor$Batch$$anonfun$run$1.apply$mcV$sp(Future.scala:658) > at > scala.concurrent.Future$InternalCallbackExecutor$Batch$$anonfun$run$1.apply(Future.scala:635) > at > scala.concurrent.Future$InternalCallbackExecutor$Batch$$anonfun$run$1.apply(Future.scala:635) > at > scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72) > at > scala.concurrent.Future$InternalCallbackExecutor$Batch.run(Future.scala:634) > at > scala.concurrent.Future$InternalCallbackExecutor$.scala$concurrent$Future$InternalCallbackExecutor$$unbatchedExecute(Future.scala:694) > at > scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:685) > at > scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40) > at > scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248) > at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:266) > at > org.apache.flink.runtime.taskmanager.TaskManager.submitTask(TaskManager.scala:1206) > at > org.apache.flink.runtime.taskmanager.TaskManager.org$apache$flink$runtime$taskmanager$TaskManager$$handleTaskMessage(TaskManager.scala:458) > at > org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$handleMessage$1.applyOrElse(TaskManager.scala:280) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) > at > org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:36) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) > at > org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33) > at > org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28) > at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118) > at > org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28) > at akka.actor.Actor$class.aroundReceive(Actor.scala:467) > at > org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:122) > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) > at
[jira] [Comment Edited] (FLINK-3861) Add Scala's BigInteger and BigDecimal to Scala API
[ https://issues.apache.org/jira/browse/FLINK-3861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15862061#comment-15862061 ] Artem Stasiuk edited comment on FLINK-3861 at 2/11/17 2:17 AM: --- As I understand to be able use {code}BigDecimal{code} we need to add that type to {code}BasicBasicTypeInfo{code} same for Java Big Types, however that module doesn't have dependency on Scala. Can I add it? was (Author: terma): As I understand to be able use `BigDecimal` we need to add that type to {code}BasicBasicTypeInfo{code} same for Java Big Types, however that module doesn't have dependency on Scala. Can I add it? > Add Scala's BigInteger and BigDecimal to Scala API > -- > > Key: FLINK-3861 > URL: https://issues.apache.org/jira/browse/FLINK-3861 > Project: Flink > Issue Type: New Feature > Components: Type Serialization System >Reporter: Timo Walther > > In Java we now support {{java.math.BigDecimal/BigInteger}} as basic types. > However, Scala wraps these types into {{scala.math.BigDecimal/BigInteger}}. > These classes should also be supported to be in sync with the Java API. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (FLINK-3861) Add Scala's BigInteger and BigDecimal to Scala API
[ https://issues.apache.org/jira/browse/FLINK-3861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15862061#comment-15862061 ] Artem Stasiuk edited comment on FLINK-3861 at 2/11/17 12:20 AM: As I understand to be able use `BigDecimal` we need to add that type to {code}BasicBasicTypeInfo{code} same for Java Big Types, however that module doesn't have dependency on Scala. Can I add it? was (Author: terma): As I understand to be able use {code}BigDecimal{code} we need to add that type to {code}BasicBasicTypeInfo{code} same for Java Big Types, however that module doesn't have dependency on Scala. Can I add it? > Add Scala's BigInteger and BigDecimal to Scala API > -- > > Key: FLINK-3861 > URL: https://issues.apache.org/jira/browse/FLINK-3861 > Project: Flink > Issue Type: New Feature > Components: Type Serialization System >Reporter: Timo Walther > > In Java we now support {{java.math.BigDecimal/BigInteger}} as basic types. > However, Scala wraps these types into {{scala.math.BigDecimal/BigInteger}}. > These classes should also be supported to be in sync with the Java API. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-3861) Add Scala's BigInteger and BigDecimal to Scala API
[ https://issues.apache.org/jira/browse/FLINK-3861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15862061#comment-15862061 ] Artem Stasiuk commented on FLINK-3861: -- As I understand to be able use ```BigDecimal``` we need to add that type to ```BasicBasicTypeInfo``` same for Java Big Types, however that module doesn't have dependency on Scala. Can I add it? > Add Scala's BigInteger and BigDecimal to Scala API > -- > > Key: FLINK-3861 > URL: https://issues.apache.org/jira/browse/FLINK-3861 > Project: Flink > Issue Type: New Feature > Components: Type Serialization System >Reporter: Timo Walther > > In Java we now support {{java.math.BigDecimal/BigInteger}} as basic types. > However, Scala wraps these types into {{scala.math.BigDecimal/BigInteger}}. > These classes should also be supported to be in sync with the Java API. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (FLINK-3861) Add Scala's BigInteger and BigDecimal to Scala API
[ https://issues.apache.org/jira/browse/FLINK-3861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15862061#comment-15862061 ] Artem Stasiuk edited comment on FLINK-3861 at 2/11/17 12:19 AM: As I understand to be able use ```BigDecimal``` we need to add that type to {code}BasicBasicTypeInfo{code} same for Java Big Types, however that module doesn't have dependency on Scala. Can I add it? was (Author: terma): As I understand to be able use ```BigDecimal``` we need to add that type to ```BasicBasicTypeInfo``` same for Java Big Types, however that module doesn't have dependency on Scala. Can I add it? > Add Scala's BigInteger and BigDecimal to Scala API > -- > > Key: FLINK-3861 > URL: https://issues.apache.org/jira/browse/FLINK-3861 > Project: Flink > Issue Type: New Feature > Components: Type Serialization System >Reporter: Timo Walther > > In Java we now support {{java.math.BigDecimal/BigInteger}} as basic types. > However, Scala wraps these types into {{scala.math.BigDecimal/BigInteger}}. > These classes should also be supported to be in sync with the Java API. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (FLINK-3861) Add Scala's BigInteger and BigDecimal to Scala API
[ https://issues.apache.org/jira/browse/FLINK-3861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15862061#comment-15862061 ] Artem Stasiuk edited comment on FLINK-3861 at 2/11/17 12:19 AM: As I understand to be able use {code}BigDecimal{code} we need to add that type to {code}BasicBasicTypeInfo{code} same for Java Big Types, however that module doesn't have dependency on Scala. Can I add it? was (Author: terma): As I understand to be able use ```BigDecimal``` we need to add that type to {code}BasicBasicTypeInfo{code} same for Java Big Types, however that module doesn't have dependency on Scala. Can I add it? > Add Scala's BigInteger and BigDecimal to Scala API > -- > > Key: FLINK-3861 > URL: https://issues.apache.org/jira/browse/FLINK-3861 > Project: Flink > Issue Type: New Feature > Components: Type Serialization System >Reporter: Timo Walther > > In Java we now support {{java.math.BigDecimal/BigInteger}} as basic types. > However, Scala wraps these types into {{scala.math.BigDecimal/BigInteger}}. > These classes should also be supported to be in sync with the Java API. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-1731) Add kMeans clustering algorithm to machine learning library
[ https://issues.apache.org/jira/browse/FLINK-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861919#comment-15861919 ] ASF GitHub Bot commented on FLINK-1731: --- Github user skonto commented on the issue: https://github.com/apache/flink/pull/3192 @sachingoel0101 np as soon as you are ready let me know. Also @tillrohrmann made some comments to take into consideration. > Add kMeans clustering algorithm to machine learning library > --- > > Key: FLINK-1731 > URL: https://issues.apache.org/jira/browse/FLINK-1731 > Project: Flink > Issue Type: New Feature > Components: Machine Learning Library >Reporter: Till Rohrmann >Assignee: Peter Schrott > Labels: ML > > The Flink repository already contains a kMeans implementation but it is not > yet ported to the machine learning library. I assume that only the used data > types have to be adapted and then it can be more or less directly moved to > flink-ml. > The kMeans++ [1] and the kMeans|| [2] algorithm constitute a better > implementation because the improve the initial seeding phase to achieve near > optimal clustering. It might be worthwhile to implement kMeans||. > Resources: > [1] http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf > [2] http://theory.stanford.edu/~sergei/papers/vldb12-kmpar.pdf -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] flink issue #3192: [FLINK-1731][ml] Add KMeans clustering(Lloyd's algorithm)
Github user skonto commented on the issue: https://github.com/apache/flink/pull/3192 @sachingoel0101 np as soon as you are ready let me know. Also @tillrohrmann made some comments to take into consideration. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (FLINK-5783) flink-connector-kinesis java.lang.NoClassDefFoundError: org/apache/http/conn/ssl/SSLSocketFactory
Huy Huynh created FLINK-5783: Summary: flink-connector-kinesis java.lang.NoClassDefFoundError: org/apache/http/conn/ssl/SSLSocketFactory Key: FLINK-5783 URL: https://issues.apache.org/jira/browse/FLINK-5783 Project: Flink Issue Type: Bug Components: Kinesis Connector Affects Versions: 1.1.3 Reporter: Huy Huynh Priority: Trivial I got the error below while running flink consumer application the first time using kinesis connector. I was able to fix it by modifying the flink-connector-kinesis pom file to use the latest Kinesis and AWS SDK versions then rebuild the jar. Updated pom: flink-connector-kinesis_2.11 flink-connector-kinesis 1.11.86 1.7.3 0.12.3 Errror: Exception in thread "main" org.apache.flink.runtime.client.JobExecutionException: Job execution failed. at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$8.apply$mcV$sp(JobManager.scala:822) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$8.apply(JobManager.scala:768) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$8.apply(JobManager.scala:768) at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:401) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Caused by: java.lang.NoClassDefFoundError: org/apache/http/conn/ssl/SSLSocketFactory at org.apache.flink.kinesis.shaded.com.amazonaws.AmazonWebServiceClient.(AmazonWebServiceClient.java:136) at org.apache.flink.kinesis.shaded.com.amazonaws.services.kinesis.AmazonKinesisClient.(AmazonKinesisClient.java:221) at org.apache.flink.kinesis.shaded.com.amazonaws.services.kinesis.AmazonKinesisClient.(AmazonKinesisClient.java:197) at org.apache.flink.streaming.connectors.kinesis.util.AWSUtil.createKinesisClient(AWSUtil.java:56) at org.apache.flink.streaming.connectors.kinesis.proxy.KinesisProxy.(KinesisProxy.java:124) at org.apache.flink.streaming.connectors.kinesis.proxy.KinesisProxy.create(KinesisProxy.java:182) at org.apache.flink.streaming.connectors.kinesis.internals.KinesisDataFetcher.(KinesisDataFetcher.java:188) at org.apache.flink.streaming.connectors.kinesis.FlinkKinesisConsumer.run(FlinkKinesisConsumer.java:198) at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:80) at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:53) at org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:56) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:266) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:585) at java.lang.Thread.run(Thread.java:745) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] flink issue #3293: [FLINK-5745] set an uncaught exception handler for netty ...
Github user StephanEwen commented on the issue: https://github.com/apache/flink/pull/3293 #3290 is in --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-5745) Set uncaught exception handler for Netty threads
[ https://issues.apache.org/jira/browse/FLINK-5745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861605#comment-15861605 ] ASF GitHub Bot commented on FLINK-5745: --- Github user StephanEwen commented on the issue: https://github.com/apache/flink/pull/3293 #3290 is in > Set uncaught exception handler for Netty threads > > > Key: FLINK-5745 > URL: https://issues.apache.org/jira/browse/FLINK-5745 > Project: Flink > Issue Type: Improvement > Components: Network >Reporter: Ufuk Celebi >Priority: Minor > > We pass a thread factory for the Netty event loop threads (see > {{NettyServer}} and {{NettyClient}}), but don't set an uncaught exception > handler. Let's add a JVM terminating handler that exits the process in cause > of fatal errors. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (FLINK-5618) Add queryable state documentation
[ https://issues.apache.org/jira/browse/FLINK-5618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber updated FLINK-5618: --- Issue Type: Sub-task (was: Improvement) Parent: FLINK-5430 > Add queryable state documentation > - > > Key: FLINK-5618 > URL: https://issues.apache.org/jira/browse/FLINK-5618 > Project: Flink > Issue Type: Sub-task > Components: Documentation >Reporter: Ufuk Celebi > Fix For: 1.2.0, 1.3.0 > > > Adds docs about how to use queryable state usage. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-4364) Implement TaskManager side of heartbeat from JobManager
[ https://issues.apache.org/jira/browse/FLINK-4364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861562#comment-15861562 ] ASF GitHub Bot commented on FLINK-4364: --- Github user wangzhijiang999 commented on the issue: https://github.com/apache/flink/pull/3151 @tillrohrmann , I have submitted the updates that may cover your suggestions. There are still two issues that I am not confirmed. First, for heartbeat interval and timeout default values in **ConfigConstants**, they are not invalid currently and you can modify it with your professional experience. Second, the introduction of **ScheduledExecutorService** in **RPCService**, my initial idea is trying to use the existing scheduler in **RPCService**, but it can not be got from **AkkaRPCService** implementation. Another way is to replace the current **ScheduledExecutorService** parameter with **RPCService** in construction of **HeartbeatManagerSenderImpl**, and the **RPCService** can also schedule the heartbeat request. But the return value of **scheduleRunnable** method in **RPCService** is conflict with that in **HeartbeatManagerSenderImpl**. So I just bring another single thread pool in **RPCService** for use currently. Maybe the number of threads in pool can refer to number of cpu cores. Maybe there are still something to be polished, and I am willing for further modifications by your comments. BTW, the heartbeat interaction between TM and RM will be submitted in another PR after this confirmation because of some common points. > Implement TaskManager side of heartbeat from JobManager > --- > > Key: FLINK-4364 > URL: https://issues.apache.org/jira/browse/FLINK-4364 > Project: Flink > Issue Type: Sub-task > Components: Cluster Management >Reporter: Zhijiang Wang >Assignee: Zhijiang Wang > > The {{JobManager}} initiates heartbeat messages via (JobID, JmLeaderID), and > the {{TaskManager}} will report metrics info for each heartbeat. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-5782) Support GPU calculations
[ https://issues.apache.org/jira/browse/FLINK-5782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861563#comment-15861563 ] Trevor Grant commented on FLINK-5782: - 2) -1 to removing sparse array support. You don't want to be serializing tdf matrices / vectors as dense. I think you would be better served to add sparse vectors to ND4J or provide a converter. 4) Bigger question- are you wanting to enable this in the streaming or batch? The persist method is only necessary for batch. Streaming is a whole other bag of worms. 5) To be clear- IF you are considering batch only, and FLINK-1730 is addressed, then this issue is resolved as of Mahout 0.13.0 via MAHOUT-1885 https://issues.apache.org/jira/browse/MAHOUT-1885 In general is batch ML still of interest? Have you asked on the ND4J list why they moved away from Flink support? > Support GPU calculations > > > Key: FLINK-5782 > URL: https://issues.apache.org/jira/browse/FLINK-5782 > Project: Flink > Issue Type: Improvement > Components: Core >Affects Versions: 1.3.0 >Reporter: Kate Eri >Priority: Minor > > This ticket was initiated as continuation of the dev discussion thread: [New > Flink team member - Kate Eri (Integration with DL4J > topic)|http://mail-archives.apache.org/mod_mbox/flink-dev/201702.mbox/browser] > > Recently we have proposed the idea to integrate > [Deeplearning4J|https://deeplearning4j.org/index.html] with Apache Flink. > It is known that DL models training is resource demanding process, so > training on CPU could converge much longer than on GPU. > But not only for DL training GPU usage could be supposed, but also for > optimization of graph analytics and other typical data manipulations, nice > overview of GPU related problems is presented [Accelerating Spark workloads > using > GPUs|https://www.oreilly.com/learning/accelerating-spark-workloads-using-gpus]. > Currently the community pointed the following issues to consider: > 1)Flink would like to avoid to write one more time its own GPU support, > to reduce engineering burden. That’s why such libraries like > [ND4J|http://nd4j.org/userguide] should be considered. > 2)Currently Flink uses [Breeze|https://github.com/scalanlp/breeze], to > optimize linear algebra calculations, ND4J can’t be integrated as is, because > it still doesn’t support [sparse arrays|http://nd4j.org/userguide#faq]. Maybe > this issue should be simply closed to enable ND4J usage? > 3)The calculations would have to work with both available and not > available GPUs. If the system detects that GPUs are available, then ideally > it would exploit them. Thus GPU resource management could be incorporated in > [FLINK-5131|https://issues.apache.org/jira/browse/FLINK-5131] (only > suggested). > 4)It was mentioned that as far Flink takes care of shipping data around > the cluster, also it will perform its dump out to GPU for calculation and > load back up. In practice, the lack of a persist method for intermediate > results makes this troublesome (not because of GPUs but for calculating any > sort of complex algorithm we expect to be able to cache intermediate results). > That’s why the Ticket > [FLINK-1730|https://issues.apache.org/jira/browse/FLINK-1730] must be > implemented to solve such problem. > 5)Also it was recommended to take a look at Apache Mahout, at least to > get the experience with GPU integration and check its > https://github.com/apache/mahout/tree/master/viennacl-omp > https://github.com/apache/mahout/tree/master/viennacl > 6)Also experience of Netflix regarding this question could be considered: > [Distributed Neural Networks with GPUs in the AWS > Cloud|http://techblog.netflix.com/search/label/CUDA] > This is considered as master ticket for GPU related ticktes -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] flink issue #3151: [FLINK-4364][runtime]mplement TaskManager side of heartbe...
Github user wangzhijiang999 commented on the issue: https://github.com/apache/flink/pull/3151 @tillrohrmann , I have submitted the updates that may cover your suggestions. There are still two issues that I am not confirmed. First, for heartbeat interval and timeout default values in **ConfigConstants**, they are not invalid currently and you can modify it with your professional experience. Second, the introduction of **ScheduledExecutorService** in **RPCService**, my initial idea is trying to use the existing scheduler in **RPCService**, but it can not be got from **AkkaRPCService** implementation. Another way is to replace the current **ScheduledExecutorService** parameter with **RPCService** in construction of **HeartbeatManagerSenderImpl**, and the **RPCService** can also schedule the heartbeat request. But the return value of **scheduleRunnable** method in **RPCService** is conflict with that in **HeartbeatManagerSenderImpl**. So I just bring another single thread pool in **RPCService** for use currently. Maybe the number of threads in pool can refer to number of cpu cores. Maybe there are still something to be polished, and I am willing for further modifications by your comments. BTW, the heartbeat interaction between TM and RM will be submitted in another PR after this confirmation because of some common points. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (FLINK-5782) Support GPU calculations
Kate Eri created FLINK-5782: --- Summary: Support GPU calculations Key: FLINK-5782 URL: https://issues.apache.org/jira/browse/FLINK-5782 Project: Flink Issue Type: Improvement Components: Core Affects Versions: 1.3.0 Reporter: Kate Eri Priority: Minor This ticket was initiated as continuation of the dev discussion thread: [New Flink team member - Kate Eri (Integration with DL4J topic)|http://mail-archives.apache.org/mod_mbox/flink-dev/201702.mbox/browser] Recently we have proposed the idea to integrate [Deeplearning4J|https://deeplearning4j.org/index.html] with Apache Flink. It is known that DL models training is resource demanding process, so training on CPU could converge much longer than on GPU. But not only for DL training GPU usage could be supposed, but also for optimization of graph analytics and other typical data manipulations, nice overview of GPU related problems is presented [Accelerating Spark workloads using GPUs|https://www.oreilly.com/learning/accelerating-spark-workloads-using-gpus]. Currently the community pointed the following issues to consider: 1) Flink would like to avoid to write one more time its own GPU support, to reduce engineering burden. That’s why such libraries like [ND4J|http://nd4j.org/userguide] should be considered. 2) Currently Flink uses [Breeze|https://github.com/scalanlp/breeze], to optimize linear algebra calculations, ND4J can’t be integrated as is, because it still doesn’t support [sparse arrays|http://nd4j.org/userguide#faq]. Maybe this issue should be simply closed to enable ND4J usage? 3) The calculations would have to work with both available and not available GPUs. If the system detects that GPUs are available, then ideally it would exploit them. Thus GPU resource management could be incorporated in [FLINK-5131|https://issues.apache.org/jira/browse/FLINK-5131] (only suggested). 4) It was mentioned that as far Flink takes care of shipping data around the cluster, also it will perform its dump out to GPU for calculation and load back up. In practice, the lack of a persist method for intermediate results makes this troublesome (not because of GPUs but for calculating any sort of complex algorithm we expect to be able to cache intermediate results). That’s why the Ticket [FLINK-1730|https://issues.apache.org/jira/browse/FLINK-1730] must be implemented to solve such problem. 5) Also it was recommended to take a look at Apache Mahout, at least to get the experience with GPU integration and check its https://github.com/apache/mahout/tree/master/viennacl-omp https://github.com/apache/mahout/tree/master/viennacl 6) Also experience of Netflix regarding this question could be considered: [Distributed Neural Networks with GPUs in the AWS Cloud|http://techblog.netflix.com/search/label/CUDA] This is considered as master ticket for GPU related ticktes -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (FLINK-5781) Generation HTML from ConfigOption
Ufuk Celebi created FLINK-5781: -- Summary: Generation HTML from ConfigOption Key: FLINK-5781 URL: https://issues.apache.org/jira/browse/FLINK-5781 Project: Flink Issue Type: Sub-task Components: Documentation Reporter: Ufuk Celebi Assignee: Ufuk Celebi Use the ConfigOption instances to generate a HTML page that we can use to include in the docs configuration page. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (FLINK-5773) Cannot cast scala.util.Failure to org.apache.flink.runtime.messages.Acknowledge
[ https://issues.apache.org/jira/browse/FLINK-5773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aljoscha Krettek updated FLINK-5773: Fix Version/s: 1.2.1 > Cannot cast scala.util.Failure to > org.apache.flink.runtime.messages.Acknowledge > --- > > Key: FLINK-5773 > URL: https://issues.apache.org/jira/browse/FLINK-5773 > Project: Flink > Issue Type: Bug > Components: DataStream API >Affects Versions: 1.2.0 >Reporter: Colin Breame > Fix For: 1.2.1 > > > The exception below happens when I set the > StreamExecutionEnvironment.setMaxParallelism() to anything less than 4. > Let me know if you need more information. > {code} > Caused by: java.lang.ClassCastException: Cannot cast scala.util.Failure to > org.apache.flink.runtime.messages.Acknowledge > at java.lang.Class.cast(Class.java:3369) > at scala.concurrent.Future$$anonfun$mapTo$1.apply(Future.scala:405) > at scala.util.Success$$anonfun$map$1.apply(Try.scala:206) > at scala.util.Try$.apply(Try.scala:161) > at scala.util.Success.map(Try.scala:206) > at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235) > at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235) > at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32) > at > scala.concurrent.Future$InternalCallbackExecutor$Batch$$anonfun$run$1.processBatch$1(Future.scala:643) > at > scala.concurrent.Future$InternalCallbackExecutor$Batch$$anonfun$run$1.apply$mcV$sp(Future.scala:658) > at > scala.concurrent.Future$InternalCallbackExecutor$Batch$$anonfun$run$1.apply(Future.scala:635) > at > scala.concurrent.Future$InternalCallbackExecutor$Batch$$anonfun$run$1.apply(Future.scala:635) > at > scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72) > at > scala.concurrent.Future$InternalCallbackExecutor$Batch.run(Future.scala:634) > at > scala.concurrent.Future$InternalCallbackExecutor$.scala$concurrent$Future$InternalCallbackExecutor$$unbatchedExecute(Future.scala:694) > at > scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:685) > at > scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40) > at > scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248) > at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:266) > at > org.apache.flink.runtime.taskmanager.TaskManager.submitTask(TaskManager.scala:1206) > at > org.apache.flink.runtime.taskmanager.TaskManager.org$apache$flink$runtime$taskmanager$TaskManager$$handleTaskMessage(TaskManager.scala:458) > at > org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$handleMessage$1.applyOrElse(TaskManager.scala:280) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) > at > org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:36) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) > at > org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33) > at > org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28) > at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118) > at > org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28) > at akka.actor.Actor$class.aroundReceive(Actor.scala:467) > at > org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:122) > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) > at akka.actor.ActorCell.invoke(ActorCell.scala:487) > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238) > at akka.dispatch.Mailbox.run(Mailbox.scala:220) > ... 5 more > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-5773) Cannot cast scala.util.Failure to org.apache.flink.runtime.messages.Acknowledge
[ https://issues.apache.org/jira/browse/FLINK-5773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861485#comment-15861485 ] Aljoscha Krettek commented on FLINK-5773: - The exception you're seeing is actually masking another exception. Somewhere down there we get this: {code} java.lang.IllegalArgumentException: Max parallelism must be >= than parallelism. {code} I guess your parallelism is 1 on that local cluster, that's the reason for that exception. This is definitely a bug, thanks for reporting! > Cannot cast scala.util.Failure to > org.apache.flink.runtime.messages.Acknowledge > --- > > Key: FLINK-5773 > URL: https://issues.apache.org/jira/browse/FLINK-5773 > Project: Flink > Issue Type: Bug > Components: DataStream API >Affects Versions: 1.2.0 >Reporter: Colin Breame > > The exception below happens when I set the > StreamExecutionEnvironment.setMaxParallelism() to anything less than 4. > Let me know if you need more information. > {code} > Caused by: java.lang.ClassCastException: Cannot cast scala.util.Failure to > org.apache.flink.runtime.messages.Acknowledge > at java.lang.Class.cast(Class.java:3369) > at scala.concurrent.Future$$anonfun$mapTo$1.apply(Future.scala:405) > at scala.util.Success$$anonfun$map$1.apply(Try.scala:206) > at scala.util.Try$.apply(Try.scala:161) > at scala.util.Success.map(Try.scala:206) > at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235) > at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235) > at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32) > at > scala.concurrent.Future$InternalCallbackExecutor$Batch$$anonfun$run$1.processBatch$1(Future.scala:643) > at > scala.concurrent.Future$InternalCallbackExecutor$Batch$$anonfun$run$1.apply$mcV$sp(Future.scala:658) > at > scala.concurrent.Future$InternalCallbackExecutor$Batch$$anonfun$run$1.apply(Future.scala:635) > at > scala.concurrent.Future$InternalCallbackExecutor$Batch$$anonfun$run$1.apply(Future.scala:635) > at > scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72) > at > scala.concurrent.Future$InternalCallbackExecutor$Batch.run(Future.scala:634) > at > scala.concurrent.Future$InternalCallbackExecutor$.scala$concurrent$Future$InternalCallbackExecutor$$unbatchedExecute(Future.scala:694) > at > scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:685) > at > scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40) > at > scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248) > at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:266) > at > org.apache.flink.runtime.taskmanager.TaskManager.submitTask(TaskManager.scala:1206) > at > org.apache.flink.runtime.taskmanager.TaskManager.org$apache$flink$runtime$taskmanager$TaskManager$$handleTaskMessage(TaskManager.scala:458) > at > org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$handleMessage$1.applyOrElse(TaskManager.scala:280) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) > at > org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:36) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) > at > org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33) > at > org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28) > at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118) > at > org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28) > at akka.actor.Actor$class.aroundReceive(Actor.scala:467) > at > org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:122) > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) > at akka.actor.ActorCell.invoke(ActorCell.scala:487) > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238) > at akka.dispatch.Mailbox.run(Mailbox.scala:220) > ... 5 more > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (FLINK-5780) Extend ConfigOption with descriptions
Ufuk Celebi created FLINK-5780: -- Summary: Extend ConfigOption with descriptions Key: FLINK-5780 URL: https://issues.apache.org/jira/browse/FLINK-5780 Project: Flink Issue Type: Sub-task Components: Core, Documentation Reporter: Ufuk Celebi Assignee: Ufuk Celebi The {{ConfigOption}} type is meant to replace the flat {{ConfigConstants}}. As part of automating the generation of a docs config page we need to extend {{ConfigOption}} with description fields. >From the ML discussion, these could be: {code} void shortDescription(String); void longDescription(String); {code} In practice, the description string should contain HTML/Markdown. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Closed] (FLINK-5415) ContinuousFileProcessingTest failed on travis
[ https://issues.apache.org/jira/browse/FLINK-5415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aljoscha Krettek closed FLINK-5415. --- Resolution: Fixed Fix Version/s: 1.3.0 Fixed on master in: f6709b4a48a843a0a1818fd59b98d32f82d6184f > ContinuousFileProcessingTest failed on travis > - > > Key: FLINK-5415 > URL: https://issues.apache.org/jira/browse/FLINK-5415 > Project: Flink > Issue Type: Improvement > Components: DataStream API, Streaming Connectors >Affects Versions: 1.3.0 >Reporter: Chesnay Schepler >Assignee: Aljoscha Krettek > Labels: test-stability > Fix For: 1.3.0 > > > https://s3.amazonaws.com/archive.travis-ci.org/jobs/189171123/log.txt > testFunctionRestore(org.apache.flink.hdfstests.ContinuousFileProcessingTest) > Time elapsed: 0.162 sec <<< FAILURE! > java.lang.AssertionError: expected:<1483623669528> but > was:<-9223372036854775808> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at > org.apache.flink.hdfstests.ContinuousFileProcessingTest.testFunctionRestore(ContinuousFileProcessingTest.java:761) > testProcessOnce(org.apache.flink.hdfstests.ContinuousFileProcessingTest) > Time elapsed: 0.045 sec <<< FAILURE! > java.lang.AssertionError: null > at org.junit.Assert.fail(Assert.java:86) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.junit.Assert.assertFalse(Assert.java:64) > at org.junit.Assert.assertFalse(Assert.java:74) > at > org.apache.flink.hdfstests.ContinuousFileProcessingTest.createFileAndFillWithData(ContinuousFileProcessingTest.java:958) > at > org.apache.flink.hdfstests.ContinuousFileProcessingTest.testProcessOnce(ContinuousFileProcessingTest.java:675) > testFileReadingOperatorWithIngestionTime(org.apache.flink.hdfstests.ContinuousFileProcessingTest) > Time elapsed: 0.002 sec <<< FAILURE! > java.lang.AssertionError: null > at org.junit.Assert.fail(Assert.java:86) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.junit.Assert.assertFalse(Assert.java:64) > at org.junit.Assert.assertFalse(Assert.java:74) > at > org.apache.flink.hdfstests.ContinuousFileProcessingTest.createFileAndFillWithData(ContinuousFileProcessingTest.java:958) > at > org.apache.flink.hdfstests.ContinuousFileProcessingTest.testFileReadingOperatorWithIngestionTime(ContinuousFileProcessingTest.java:150) > testSortingOnModTime(org.apache.flink.hdfstests.ContinuousFileProcessingTest) > Time elapsed: 0.002 sec <<< FAILURE! > java.lang.AssertionError: null > at org.junit.Assert.fail(Assert.java:86) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.junit.Assert.assertFalse(Assert.java:64) > at org.junit.Assert.assertFalse(Assert.java:74) > at > org.apache.flink.hdfstests.ContinuousFileProcessingTest.createFileAndFillWithData(ContinuousFileProcessingTest.java:958) > at > org.apache.flink.hdfstests.ContinuousFileProcessingTest.testSortingOnModTime(ContinuousFileProcessingTest.java:596) > testFileReadingOperatorWithEventTime(org.apache.flink.hdfstests.ContinuousFileProcessingTest) > Time elapsed: 0.001 sec <<< FAILURE! > java.lang.AssertionError: null > at org.junit.Assert.fail(Assert.java:86) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.junit.Assert.assertFalse(Assert.java:64) > at org.junit.Assert.assertFalse(Assert.java:74) > at > org.apache.flink.hdfstests.ContinuousFileProcessingTest.createFileAndFillWithData(ContinuousFileProcessingTest.java:958) > at > org.apache.flink.hdfstests.ContinuousFileProcessingTest.testFileReadingOperatorWithEventTime(ContinuousFileProcessingTest.java:308) > testProcessContinuously(org.apache.flink.hdfstests.ContinuousFileProcessingTest) > Time elapsed: 0.001 sec <<< FAILURE! > java.lang.AssertionError: null > at org.junit.Assert.fail(Assert.java:86) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.junit.Assert.assertFalse(Assert.java:64) > at org.junit.Assert.assertFalse(Assert.java:74) > at > org.apache.flink.hdfstests.ContinuousFileProcessingTest.createFileAndFillWithData(ContinuousFileProcessingTest.java:958) > at > org.apache.flink.hdfstests.ContinuousFileProcessingTest.testProcessContinuously(ContinuousFileProcessingTest.java:771) > Results : > Failed tests: > > ContinuousFileProcessingTest.testFileReadingOperatorWithEventTime:308->createFileAndFillWithData:958 > null > > ContinuousFileProcessingTest.testFileReadingOperatorWithIngestionTime:150->createFileAndFillWithData:958 > null >
[jira] [Created] (FLINK-5779) Auto generate configuration docs
Ufuk Celebi created FLINK-5779: -- Summary: Auto generate configuration docs Key: FLINK-5779 URL: https://issues.apache.org/jira/browse/FLINK-5779 Project: Flink Issue Type: Improvement Components: Documentation Reporter: Ufuk Celebi Assignee: Ufuk Celebi As per discussion on the mailing list we need to improve on the configuration documentation page (http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Discuss-Organizing-Documentation-for-Configuration-Options-td15773.html). We decided to try to (semi) automate this in order to not get of sync in the future. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (FLINK-5778) Split FileStateHandle into fileName and basePath
Ufuk Celebi created FLINK-5778: -- Summary: Split FileStateHandle into fileName and basePath Key: FLINK-5778 URL: https://issues.apache.org/jira/browse/FLINK-5778 Project: Flink Issue Type: Sub-task Components: State Backends, Checkpointing Reporter: Ufuk Celebi Assignee: Ufuk Celebi Store the statePath as a basePath and a fileName and allow to overwrite the basePath. We cannot overwrite the base path as long as the state handle is still in flight and not persisted. Otherwise we risk a resource leak. We need this in order to be able to relocate savepoints. {code} interface RelativeBaseLocationStreamStateHandle { void clearBaseLocation(); void setBaseLocation(String baseLocation); } {code} FileStateHandle should implement this and the SavepointSerializer should forward the calls when a savepoint is stored or loaded, clear before store and set after load. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (FLINK-5777) Pass savepoint information to CheckpointingOperation
Ufuk Celebi created FLINK-5777: -- Summary: Pass savepoint information to CheckpointingOperation Key: FLINK-5777 URL: https://issues.apache.org/jira/browse/FLINK-5777 Project: Flink Issue Type: Sub-task Components: State Backends, Checkpointing Reporter: Ufuk Celebi Assignee: Ufuk Celebi In order to make savepoints self contained in a single directory, we need to pass some information to {{StreamTask#CheckpointingOperation}}. I propose to extend the {{CheckpointMetaData}} for this. We currently have some overlap with CheckpointMetaData, CheckpointMetrics, and manually passed checkpoint ID and checkpoint timestamps. We should restrict CheckpointMetaData to the integral meta data that needs to be passed to StreamTask#CheckpointingOperation. This means that we move the CheckpointMetrics out of the CheckpointMetaData and the BarrierBuffer/BarrierTracker create CheckpointMetrics separately and send it back with the acknowledge message. CheckpointMetaData should be extended with the following properties: - boolean isSavepoint - String targetDirectory There are two code paths that lead to the CheckpointingOperation: 1. From CheckpointCoordinator via RPC to StreamTask#triggerCheckpoint - Execution#triggerCheckpoint(long, long) => triggerCheckpoint(CheckpointMetaData) - TaskManagerGateway#triggerCheckpoint(ExecutionAttemptID, JobID, long, long) => TaskManagerGateway#triggerCheckpoint(ExecutionAttemptID, JobID, CheckpointMetaData) - Task#triggerCheckpointBarrier(long, long) => Task#triggerCheckpointBarrier(CheckpointMetaData) 2. From intermediate streams via the CheckpointBarrier to StreamTask#triggerCheckpointOnBarrier - triggerCheckpointOnBarrier(CheckpointMetaData) => triggerCheckpointOnBarrier(CheckpointMetaData, CheckpointMetrics) - CheckpointBarrier(long, long) => CheckpointBarrier(CheckpointMetaData) - AcknowledgeCheckpoint(CheckpointMetaData) => AcknowledgeCheckpoint(long, CheckpointMetrics) The state backends provide another stream factory that is called in CheckpointingOperation when the meta data indicates savepoint. The state backends can choose whether they return the regular checkpoint stream factory in that case or a special one for savepoints. That way backends that don’t checkpoint to a file system can special case savepoints easily. - FsStateBackend: return special FsCheckpointStreamFactory with different directory layout - MemoryStateBackend: return regular checkpoint stream factory (MemCheckpointStreamFactory) => The _metadata file will contain all state as the state handles are part of it -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (FLINK-5766) Unify NoAvailableResourceException handling on ExecutionGraph
[ https://issues.apache.org/jira/browse/FLINK-5766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen resolved FLINK-5766. - Resolution: Fixed Fixed in 3bde6ffb6f55ec7ff807633ab1e79d9238e5a942 > Unify NoAvailableResourceException handling on ExecutionGraph > - > > Key: FLINK-5766 > URL: https://issues.apache.org/jira/browse/FLINK-5766 > Project: Flink > Issue Type: Improvement > Components: Distributed Coordination >Affects Versions: 1.2.0 >Reporter: Stephan Ewen >Assignee: Stephan Ewen > Fix For: 1.3.0 > > > Currently, there are to ways that a {{NoAvailableResourcesException}} can be > handled: > - It is either thrown synchronously, when trying to obtain a slot from the > {{Scheduler}} > - Or it is returned via an exceptionally completed Future, if the > allocation completed asynchronously in the {{SlotPool}}. > Since both cases work with futures (some eagerly completed), we should drop > the path where the allocation method throws the exception directly and only > keep the more general path with exceptionally completed Future. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-5718) Handle JVM Fatal Exceptions in Tasks
[ https://issues.apache.org/jira/browse/FLINK-5718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861427#comment-15861427 ] ASF GitHub Bot commented on FLINK-5718: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/3276 > Handle JVM Fatal Exceptions in Tasks > > > Key: FLINK-5718 > URL: https://issues.apache.org/jira/browse/FLINK-5718 > Project: Flink > Issue Type: Improvement > Components: Local Runtime >Reporter: Stephan Ewen >Assignee: Stephan Ewen > Fix For: 1.3.0 > > > The TaskManager catches and handles all types of exceptions right now (all > {{Throwables}}). The intention behind that is: > - Many {{Error}} subclasses are recoverable for the TaskManagers, such as > failure to load/link user code > - We want to give eager notifications to the JobManager in case something > in a task goes wrong. > However, there are some exceptions which should probably simply terminate the > JVM, if caught in the task thread, because they may leave the JVM in a > dysfunctional limbo state: > - {{OutOfMemoryError}} > - {{InternalError}} > - {{UnknownError}} > - {{ZipError}} > These are basically the subclasses of {{VirtualMachineError}}, except for > {{StackOverflowError}}, which is recoverable and usually recovered already by > the time the exception has been thrown and the stack unwound. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Closed] (FLINK-5718) Handle JVM Fatal Exceptions in Tasks
[ https://issues.apache.org/jira/browse/FLINK-5718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen closed FLINK-5718. --- > Handle JVM Fatal Exceptions in Tasks > > > Key: FLINK-5718 > URL: https://issues.apache.org/jira/browse/FLINK-5718 > Project: Flink > Issue Type: Improvement > Components: Local Runtime >Reporter: Stephan Ewen >Assignee: Stephan Ewen > Fix For: 1.3.0 > > > The TaskManager catches and handles all types of exceptions right now (all > {{Throwables}}). The intention behind that is: > - Many {{Error}} subclasses are recoverable for the TaskManagers, such as > failure to load/link user code > - We want to give eager notifications to the JobManager in case something > in a task goes wrong. > However, there are some exceptions which should probably simply terminate the > JVM, if caught in the task thread, because they may leave the JVM in a > dysfunctional limbo state: > - {{OutOfMemoryError}} > - {{InternalError}} > - {{UnknownError}} > - {{ZipError}} > These are basically the subclasses of {{VirtualMachineError}}, except for > {{StackOverflowError}}, which is recoverable and usually recovered already by > the time the exception has been thrown and the stack unwound. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] flink pull request #3290: [FLINK-5759] [jobmanager] Set UncaughtExceptionHan...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/3290 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-5759) Set an UncaughtExceptionHandler for all Thread Pools in JobManager
[ https://issues.apache.org/jira/browse/FLINK-5759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861426#comment-15861426 ] ASF GitHub Bot commented on FLINK-5759: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/3290 > Set an UncaughtExceptionHandler for all Thread Pools in JobManager > -- > > Key: FLINK-5759 > URL: https://issues.apache.org/jira/browse/FLINK-5759 > Project: Flink > Issue Type: Bug > Components: JobManager >Affects Versions: 1.2.0 >Reporter: Stephan Ewen >Assignee: Stephan Ewen > Fix For: 1.3.0 > > > Currently, the thread pools of the {{JobManager}} do not have any > {{UncaughtExceptionHandler}}. > While uncaught exceptions are rare (Flink handles exceptions aggressively in > most places), when exceptions slip through in these threads (which execute > future responses and delayed actions), the JobManager may be in an > inconsistent state and not function properly any more. > We should add a handler that results in a process kill in the case of > uncaught exceptions. Letting the JobManager be restarted by the respective > cluster framework is the only guaranteed way to be safe. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Closed] (FLINK-5766) Unify NoAvailableResourceException handling on ExecutionGraph
[ https://issues.apache.org/jira/browse/FLINK-5766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen closed FLINK-5766. --- > Unify NoAvailableResourceException handling on ExecutionGraph > - > > Key: FLINK-5766 > URL: https://issues.apache.org/jira/browse/FLINK-5766 > Project: Flink > Issue Type: Improvement > Components: Distributed Coordination >Affects Versions: 1.2.0 >Reporter: Stephan Ewen >Assignee: Stephan Ewen > Fix For: 1.3.0 > > > Currently, there are to ways that a {{NoAvailableResourcesException}} can be > handled: > - It is either thrown synchronously, when trying to obtain a slot from the > {{Scheduler}} > - Or it is returned via an exceptionally completed Future, if the > allocation completed asynchronously in the {{SlotPool}}. > Since both cases work with futures (some eagerly completed), we should drop > the path where the allocation method throws the exception directly and only > keep the more general path with exceptionally completed Future. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (FLINK-5616) YarnPreConfiguredMasterHaServicesTest fails sometimes
[ https://issues.apache.org/jira/browse/FLINK-5616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann reassigned FLINK-5616: Assignee: Till Rohrmann > YarnPreConfiguredMasterHaServicesTest fails sometimes > - > > Key: FLINK-5616 > URL: https://issues.apache.org/jira/browse/FLINK-5616 > Project: Flink > Issue Type: Bug > Components: YARN >Affects Versions: 1.2.0, 1.3.0 >Reporter: Aljoscha Krettek >Assignee: Till Rohrmann > Labels: test-stability > > This is the relevant part from the log: > {code} > --- > T E S T S > --- > Running > org.apache.flink.yarn.highavailability.YarnPreConfiguredMasterHaServicesTest > Formatting using clusterid: testClusterID > Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 11.407 sec - > in > org.apache.flink.yarn.highavailability.YarnPreConfiguredMasterHaServicesTest > Running > org.apache.flink.yarn.highavailability.YarnIntraNonHaMasterServicesTest > Formatting using clusterid: testClusterID > Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 3.479 sec <<< > FAILURE! - in > org.apache.flink.yarn.highavailability.YarnIntraNonHaMasterServicesTest > testClosingReportsToLeader(org.apache.flink.yarn.highavailability.YarnIntraNonHaMasterServicesTest) > Time elapsed: 0.836 sec <<< FAILURE! > org.mockito.exceptions.verification.WantedButNotInvoked: > Wanted but not invoked: > leaderContender.handleError(); > -> at > org.apache.flink.yarn.highavailability.YarnIntraNonHaMasterServicesTest.testClosingReportsToLeader(YarnIntraNonHaMasterServicesTest.java:120) > Actually, there were zero interactions with this mock. > at > org.apache.flink.yarn.highavailability.YarnIntraNonHaMasterServicesTest.testClosingReportsToLeader(YarnIntraNonHaMasterServicesTest.java:120) > Running org.apache.flink.yarn.YarnFlinkResourceManagerTest > Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.82 sec - in > org.apache.flink.yarn.YarnFlinkResourceManagerTest > Running org.apache.flink.yarn.YarnClusterDescriptorTest > java.lang.RuntimeException: Couldn't deploy Yarn cluster > at > org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploy(AbstractYarnClusterDescriptor.java:425) > at > org.apache.flink.yarn.YarnClusterDescriptorTest.testConfigOverwrite(YarnClusterDescriptorTest.java:90) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:483) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48) > at org.junit.rules.RunRules.evaluate(RunRules.java:20) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:283) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:173) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:128) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:203) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:155) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103) >
[jira] [Updated] (FLINK-5623) TempBarrier dam has been closed
[ https://issues.apache.org/jira/browse/FLINK-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-5623: - Component/s: (was: Distributed Coordination) Local Runtime > TempBarrier dam has been closed > --- > > Key: FLINK-5623 > URL: https://issues.apache.org/jira/browse/FLINK-5623 > Project: Flink > Issue Type: Bug > Components: Local Runtime >Affects Versions: 1.3.0 >Reporter: Greg Hogan > > PageRank (see PR from FLINK-4896) results in the following error. Can be > reproduced by changing {{AsmTestBase:63}} to {{env = > executionEnvironment.createLocalEnvironment();}} then running > {{PageRankTest}} (fails for Simple and RMat graph tests, succeeds for > Complete graph test). > {noformat} > org.apache.flink.client.program.ProgramInvocationException: The program > execution failed: Job execution failed. > at > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:427) > at > org.apache.flink.client.program.StandaloneClusterClient.submitJob(StandaloneClusterClient.java:101) > at > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:400) > at > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:387) > at > org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:62) > at > org.apache.flink.graph.asm.dataset.AbstractDataSetAnalytic.execute(AbstractDataSetAnalytic.java:55) > at org.apache.flink.graph.drivers.PageRank.print(PageRank.java:113) > at org.apache.flink.graph.Runner.main(Runner.java:257) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:528) > at > org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:419) > at > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:339) > at > org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:834) > at org.apache.flink.client.CliFrontend.run(CliFrontend.java:259) > at > org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1076) > at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1123) > at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1120) > at > org.apache.flink.runtime.security.HadoopSecurityContext$1.run(HadoopSecurityContext.java:43) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > at > org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:40) > at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1120) > Caused by: org.apache.flink.runtime.client.JobExecutionException: Job > execution failed. > at > org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply$mcV$sp(JobManager.scala:905) > at > org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply(JobManager.scala:848) > at > org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply(JobManager.scala:848) > at > scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) > at > scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) > at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40) > at > akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397) > at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > at > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > at > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > Caused by: java.lang.RuntimeException: An error occurred creating the temp > table. > at > org.apache.flink.runtime.operators.TempBarrier.getIterator(TempBarrier.java:98) > at > org.apache.flink.runtime.operators.BatchTask.getInput(BatchTask.java:1090) > at > org.apache.flink.runtime.operators.BatchTask.resetAllInputs(BatchTask.java:895) > at >
[jira] [Resolved] (FLINK-5759) Set an UncaughtExceptionHandler for all Thread Pools in JobManager
[ https://issues.apache.org/jira/browse/FLINK-5759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen resolved FLINK-5759. - Resolution: Fixed Fixed in ef77c254dadbe4c04810681fe765f5ec7d2a7400 > Set an UncaughtExceptionHandler for all Thread Pools in JobManager > -- > > Key: FLINK-5759 > URL: https://issues.apache.org/jira/browse/FLINK-5759 > Project: Flink > Issue Type: Bug > Components: JobManager >Affects Versions: 1.2.0 >Reporter: Stephan Ewen >Assignee: Stephan Ewen > Fix For: 1.3.0 > > > Currently, the thread pools of the {{JobManager}} do not have any > {{UncaughtExceptionHandler}}. > While uncaught exceptions are rare (Flink handles exceptions aggressively in > most places), when exceptions slip through in these threads (which execute > future responses and delayed actions), the JobManager may be in an > inconsistent state and not function properly any more. > We should add a handler that results in a process kill in the case of > uncaught exceptions. Letting the JobManager be restarted by the respective > cluster framework is the only guaranteed way to be safe. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (FLINK-5718) Handle JVM Fatal Exceptions in Tasks
[ https://issues.apache.org/jira/browse/FLINK-5718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen resolved FLINK-5718. - Resolution: Fixed Fix Version/s: 1.3.0 Fixed via dfc6fba5b9830e6a7804a6a0c9f69b36bf772730 > Handle JVM Fatal Exceptions in Tasks > > > Key: FLINK-5718 > URL: https://issues.apache.org/jira/browse/FLINK-5718 > Project: Flink > Issue Type: Improvement > Components: Local Runtime >Reporter: Stephan Ewen >Assignee: Stephan Ewen > Fix For: 1.3.0 > > > The TaskManager catches and handles all types of exceptions right now (all > {{Throwables}}). The intention behind that is: > - Many {{Error}} subclasses are recoverable for the TaskManagers, such as > failure to load/link user code > - We want to give eager notifications to the JobManager in case something > in a task goes wrong. > However, there are some exceptions which should probably simply terminate the > JVM, if caught in the task thread, because they may leave the JVM in a > dysfunctional limbo state: > - {{OutOfMemoryError}} > - {{InternalError}} > - {{UnknownError}} > - {{ZipError}} > These are basically the subclasses of {{VirtualMachineError}}, except for > {{StackOverflowError}}, which is recoverable and usually recovered already by > the time the exception has been thrown and the stack unwound. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Closed] (FLINK-5759) Set an UncaughtExceptionHandler for all Thread Pools in JobManager
[ https://issues.apache.org/jira/browse/FLINK-5759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen closed FLINK-5759. --- > Set an UncaughtExceptionHandler for all Thread Pools in JobManager > -- > > Key: FLINK-5759 > URL: https://issues.apache.org/jira/browse/FLINK-5759 > Project: Flink > Issue Type: Bug > Components: JobManager >Affects Versions: 1.2.0 >Reporter: Stephan Ewen >Assignee: Stephan Ewen > Fix For: 1.3.0 > > > Currently, the thread pools of the {{JobManager}} do not have any > {{UncaughtExceptionHandler}}. > While uncaught exceptions are rare (Flink handles exceptions aggressively in > most places), when exceptions slip through in these threads (which execute > future responses and delayed actions), the JobManager may be in an > inconsistent state and not function properly any more. > We should add a handler that results in a process kill in the case of > uncaught exceptions. Letting the JobManager be restarted by the respective > cluster framework is the only guaranteed way to be safe. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] flink pull request #3276: [FLINK-5718] [core] TaskManagers exit the JVM on f...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/3276 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-5712) update several deprecated configuration options
[ https://issues.apache.org/jira/browse/FLINK-5712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861423#comment-15861423 ] ASF GitHub Bot commented on FLINK-5712: --- Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/3267#discussion_r100556946 --- Diff: flink-mesos/src/main/java/org/apache/flink/mesos/runtime/clusterframework/services/MesosServicesUtils.java --- @@ -40,9 +41,11 @@ public static MesosServices createMesosServices(Configuration configuration) thr return new StandaloneMesosServices(); case ZOOKEEPER: - final String zkMesosRootPath = configuration.getString( + final String zkMesosRootPath = ConfigurationUtil.getStringWithDeprecatedKeys( + configuration, ConfigConstants.HA_ZOOKEEPER_MESOS_WORKERS_PATH, - ConfigConstants.DEFAULT_ZOOKEEPER_MESOS_WORKERS_PATH); + ConfigConstants.DEFAULT_ZOOKEEPER_MESOS_WORKERS_PATH, + ConfigConstants.ZOOKEEPER_MESOS_WORKERS_PATH); --- End diff -- I think it's better if we replace this directly by a `ConfigOption`. There you can also define deprecated keys. This is the encouraged way to read configuration values. Take a look at `HighAvailabilityOptions` to see how it is used. > update several deprecated configuration options > > > Key: FLINK-5712 > URL: https://issues.apache.org/jira/browse/FLINK-5712 > Project: Flink > Issue Type: Bug > Components: Documentation, Mesos >Affects Versions: 1.2.0, 1.3.0 >Reporter: Yelei Feng >Priority: Minor > Labels: configuration, document > Fix For: 1.3.0 > > > 1. We should use 'containerized.heap-cutoff-ratio' and > 'containerized.heap-cutoff-min' instead of deprecated yarn-specific options > in configuration doc. > 2. In mesos mode, we still use deprecated naming convention of zookeeper - > 'recovery.zookeeper.path.mesos-workers'. We should make it consistent with > other zookeeper options by using > 'high-availability.zookeeper.path.mesos-workers'. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] flink pull request #3267: [FLINK-5712] [config] update several deprecated co...
Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/3267#discussion_r100556946 --- Diff: flink-mesos/src/main/java/org/apache/flink/mesos/runtime/clusterframework/services/MesosServicesUtils.java --- @@ -40,9 +41,11 @@ public static MesosServices createMesosServices(Configuration configuration) thr return new StandaloneMesosServices(); case ZOOKEEPER: - final String zkMesosRootPath = configuration.getString( + final String zkMesosRootPath = ConfigurationUtil.getStringWithDeprecatedKeys( + configuration, ConfigConstants.HA_ZOOKEEPER_MESOS_WORKERS_PATH, - ConfigConstants.DEFAULT_ZOOKEEPER_MESOS_WORKERS_PATH); + ConfigConstants.DEFAULT_ZOOKEEPER_MESOS_WORKERS_PATH, + ConfigConstants.ZOOKEEPER_MESOS_WORKERS_PATH); --- End diff -- I think it's better if we replace this directly by a `ConfigOption`. There you can also define deprecated keys. This is the encouraged way to read configuration values. Take a look at `HighAvailabilityOptions` to see how it is used. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-5710) Add ProcTime() function to indicate StreamSQL
[ https://issues.apache.org/jira/browse/FLINK-5710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861401#comment-15861401 ] ASF GitHub Bot commented on FLINK-5710: --- Github user tonycox commented on the issue: https://github.com/apache/flink/pull/3271 Hi @huawei-flink, please name your PR with "[FLINK-XXX] Jira title text" pattern > Add ProcTime() function to indicate StreamSQL > - > > Key: FLINK-5710 > URL: https://issues.apache.org/jira/browse/FLINK-5710 > Project: Flink > Issue Type: New Feature > Components: DataStream API >Reporter: Stefano Bortoli >Assignee: Stefano Bortoli >Priority: Minor > > procTime() is a parameterless scalar function that just indicates processing > time mode -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-5744) Check remote connection in Flink-shell
[ https://issues.apache.org/jira/browse/FLINK-5744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861403#comment-15861403 ] Till Rohrmann commented on FLINK-5744: -- Hi Anton, could you specify what exactly the problem is? > Check remote connection in Flink-shell > -- > > Key: FLINK-5744 > URL: https://issues.apache.org/jira/browse/FLINK-5744 > Project: Flink > Issue Type: Improvement > Components: Scala Shell >Reporter: Anton Solovev >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] flink issue #3271: FLINK-5710
Github user tonycox commented on the issue: https://github.com/apache/flink/pull/3271 Hi @huawei-flink, please name your PR with "[FLINK-XXX] Jira title text" pattern --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-5481) Cannot create Collection of Row
[ https://issues.apache.org/jira/browse/FLINK-5481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861389#comment-15861389 ] ASF GitHub Bot commented on FLINK-5481: --- Github user tonycox commented on the issue: https://github.com/apache/flink/pull/3127 I think I should rename this issuea to "Simplify Row creation". What do you think? > Cannot create Collection of Row > > > Key: FLINK-5481 > URL: https://issues.apache.org/jira/browse/FLINK-5481 > Project: Flink > Issue Type: Bug > Components: DataSet API, Table API & SQL >Affects Versions: 1.2.0 >Reporter: Anton Solovev >Assignee: Anton Solovev >Priority: Trivial > > When we use {{ExecutionEnvironment#fromElements(X... data)}} it takes first > element of {{data}} to define a type. If first Row in collection has wrong > number of fields (there are nulls) TypeExtractor returns not RowTypeInfo, but > GenericType -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] flink issue #3127: [FLINK-5481] Add type extraction from collection
Github user tonycox commented on the issue: https://github.com/apache/flink/pull/3127 I think I should rename this issuea to "Simplify Row creation". What do you think? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (FLINK-5776) Improve XXMapRunner support create instance by carrying constructor parameters
sunjincheng created FLINK-5776: -- Summary: Improve XXMapRunner support create instance by carrying constructor parameters Key: FLINK-5776 URL: https://issues.apache.org/jira/browse/FLINK-5776 Project: Flink Issue Type: Improvement Components: Table API & SQL Reporter: sunjincheng Assignee: sunjincheng At present, MapRunner FlatMapRunner only supports create non-parameter instance, but sometimes we need to carry constructor parameters to instantiate, so I would like to improve XXMapRunner support create instance by carrying constructor parameters. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-5476) Fail fast if trying to submit a job to a non-existing Flink cluster
[ https://issues.apache.org/jira/browse/FLINK-5476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861388#comment-15861388 ] Dmitrii Kniazev commented on FLINK-5476: For solving this task i decide invoke method StandaloneClusterClient#getClusterStatus() inside StandaloneClusterClient#waitForClusterToBeReady() to check cluster availability. But it causes to fall of flink cluster (v1.3) with following error: {panel:title=Log} 2017-02-10 15:35:25,460 INFO org.apache.flink.runtime.clusterframework.standalone.StandaloneResourceManager - Consolidated 1 TaskManagers 2017-02-10 15:36:16,588 ERROR akka.actor.OneForOneStrategy - GetClusterStatus (of class org.apache.flink.runtime.clusterframework.messages.GetClusterStatus) scala.MatchError: GetClusterStatus (of class org.apache.flink.runtime.clusterframework.messages.GetClusterStatus) at scala.PartialFunction$$anon$1.apply(PartialFunction.scala:248) at scala.PartialFunction$$anon$1.apply(PartialFunction.scala:246) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1.applyOrElse(JobManager.scala:290) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) at org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33) at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28) at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118) at org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28) at akka.actor.Actor$class.aroundReceive(Actor.scala:467) at org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:118) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) at akka.actor.ActorCell.invoke(ActorCell.scala:487) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238) at akka.dispatch.Mailbox.run(Mailbox.scala:220) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) 2017-02-10 15:36:16,593 INFO org.apache.flink.runtime.jobmanager.JobManager - Stopping JobManager akka.tcp://flink@localhost:6123/user/jobmanager. 2017-02-10 15:36:16,599 INFO org.apache.flink.runtime.taskmanager.TaskManager - TaskManager akka://flink/user/taskmanager disconnects from JobManager akka://flink/user/jobmanager: JobManager requested disconnect: JobManager is shuttind down. 2017-02-10 15:36:16,599 INFO org.apache.flink.runtime.taskmanager.TaskManager - Disassociating from JobManager 2017-02-10 15:36:16,600 INFO org.apache.flink.runtime.blob.BlobServer - Stopped BLOB server at 0.0.0.0:46780 2017-02-10 15:36:16,603 ERROR org.apache.flink.runtime.jobmanager.JobManager - Actor akka://flink/user/jobmanager#-364585011 terminated, stopping process... 2017-02-10 15:36:16,603 INFO org.apache.flink.runtime.blob.BlobCache - Shutting down BlobCache 2017-02-10 15:36:16,605 INFO org.apache.flink.runtime.taskmanager.TaskManager - Trying to register at JobManager akka.tcp://flink@localhost:6123/user/jobmanager (attempt 1, timeout: 500 milliseconds) 2017-02-10 15:36:16,708 INFO org.apache.flink.runtime.webmonitor.WebRuntimeMonitor - Removing web dashboard root cache directory /tmp/flink-web-ad368bef-7394-4065-8c98-704fb94777b6 2017-02-10 15:36:16,714 INFO org.apache.flink.runtime.webmonitor.WebRuntimeMonitor - Removing web dashboard jar upload directory /tmp/flink-web-b0f08882-207c-49cc-bfef-30badbfab011 2017-02-10 15:36:16,722 INFO org.apache.flink.runtime.io.disk.iomanager.IOManager - I/O manager removed spill file directory /tmp/flink-io-387025c9-b52c-4b71-9122-8d8d96c5a8a6 {panel} I think it a bug. What do you think about it and about
[jira] [Closed] (FLINK-5733) Link to Bahir connectors
[ https://issues.apache.org/jira/browse/FLINK-5733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Hogan closed FLINK-5733. - Resolution: Fixed Closing since the documentation is being updated by [~wints]'s pull request. > Link to Bahir connectors > > > Key: FLINK-5733 > URL: https://issues.apache.org/jira/browse/FLINK-5733 > Project: Flink > Issue Type: Improvement > Components: Documentation, Project Website >Affects Versions: 1.3.0, 1.2.1 >Reporter: Greg Hogan >Assignee: Greg Hogan >Priority: Minor > > The [ecosystem|https://flink.apache.org/ecosystem.html] page lists and links > to connectors included in the Flink distribution. Add to this list the > connectors in [bahir-flink|https://github.com/apache/bahir-flink]. > Also add Bahir connectors to the > [connectors|https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/connectors/index.html] > documentation. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (FLINK-5017) Introduce StreamStatus stream element to allow for temporarily idle streaming sources
[ https://issues.apache.org/jira/browse/FLINK-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tzu-Li (Gordon) Tai resolved FLINK-5017. Resolution: Fixed Resolved for {{master}} via http://git-wip-us.apache.org/repos/asf/flink/commit/6630513 > Introduce StreamStatus stream element to allow for temporarily idle streaming > sources > - > > Key: FLINK-5017 > URL: https://issues.apache.org/jira/browse/FLINK-5017 > Project: Flink > Issue Type: New Feature > Components: Streaming >Reporter: Tzu-Li (Gordon) Tai >Assignee: Tzu-Li (Gordon) Tai > Fix For: 1.3.0 > > Attachments: operator_chain_with_multiple_network_outputs.png > > > A {{StreamStatus}} element informs receiving operators whether or not they > should continue to expect watermarks from the sending operator. There are 2 > kinds of status, namely {{IDLE}} and {{ACTIVE}}. Watermark status elements > are generated at the sources, and may be propagated through the operators of > the topology using {{Output#emitWatermarkStatus(WatermarkStatus)}}. > Sources and downstream operators should emit either of the status elements > once it changes between "watermark-idle" and "watermark-active" states. > A source is considered "watermark-idle" if it will not emit records for an > indefinite amount of time. This is the case, for example, for Flink's Kafka > Consumer, where sources might initially have no assigned partitions to read > from, or no records can be read from the assigned partitions. Once the source > detects that it will resume emitting data, it is considered > "watermark-active". > Downstream operators with multiple inputs (ex. head operators of a > {{OneInputStreamTask}} or {{TwoInputStreamTask}}) should not wait for > watermarks from an upstream operator that is "watermark-idle" when deciding > whether or not to advance the operator's current watermark. When a downstream > operator determines that all upstream operators are "watermark-idle" (i.e. > when all input channels have received the watermark idle status element), > then the operator is considered to also be "watermark-idle", as it will > temporarily be unable to advance its own watermark. This is always the case > for operators that only read from a single upstream operator. Once an > operator is considered "watermark-idle", it should itself forward its idle > status to inform downstream operators. The operator is considered to be back > to "watermark-active" as soon as at least one of its upstream operators > resume to be "watermark-active" (i.e. when at least one input channel > receives the watermark active status element), and should also forward its > active status to inform downstream operators. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-5017) Introduce StreamStatus stream element to allow for temporarily idle streaming sources
[ https://issues.apache.org/jira/browse/FLINK-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861364#comment-15861364 ] ASF GitHub Bot commented on FLINK-5017: --- Github user tzulitai commented on the issue: https://github.com/apache/flink/pull/2801 @aljoscha thanks! closing this > Introduce StreamStatus stream element to allow for temporarily idle streaming > sources > - > > Key: FLINK-5017 > URL: https://issues.apache.org/jira/browse/FLINK-5017 > Project: Flink > Issue Type: New Feature > Components: Streaming >Reporter: Tzu-Li (Gordon) Tai >Assignee: Tzu-Li (Gordon) Tai > Fix For: 1.3.0 > > Attachments: operator_chain_with_multiple_network_outputs.png > > > A {{StreamStatus}} element informs receiving operators whether or not they > should continue to expect watermarks from the sending operator. There are 2 > kinds of status, namely {{IDLE}} and {{ACTIVE}}. Watermark status elements > are generated at the sources, and may be propagated through the operators of > the topology using {{Output#emitWatermarkStatus(WatermarkStatus)}}. > Sources and downstream operators should emit either of the status elements > once it changes between "watermark-idle" and "watermark-active" states. > A source is considered "watermark-idle" if it will not emit records for an > indefinite amount of time. This is the case, for example, for Flink's Kafka > Consumer, where sources might initially have no assigned partitions to read > from, or no records can be read from the assigned partitions. Once the source > detects that it will resume emitting data, it is considered > "watermark-active". > Downstream operators with multiple inputs (ex. head operators of a > {{OneInputStreamTask}} or {{TwoInputStreamTask}}) should not wait for > watermarks from an upstream operator that is "watermark-idle" when deciding > whether or not to advance the operator's current watermark. When a downstream > operator determines that all upstream operators are "watermark-idle" (i.e. > when all input channels have received the watermark idle status element), > then the operator is considered to also be "watermark-idle", as it will > temporarily be unable to advance its own watermark. This is always the case > for operators that only read from a single upstream operator. Once an > operator is considered "watermark-idle", it should itself forward its idle > status to inform downstream operators. The operator is considered to be back > to "watermark-active" as soon as at least one of its upstream operators > resume to be "watermark-active" (i.e. when at least one input channel > receives the watermark active status element), and should also forward its > active status to inform downstream operators. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-5017) Introduce StreamStatus stream element to allow for temporarily idle streaming sources
[ https://issues.apache.org/jira/browse/FLINK-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861365#comment-15861365 ] ASF GitHub Bot commented on FLINK-5017: --- Github user tzulitai closed the pull request at: https://github.com/apache/flink/pull/2801 > Introduce StreamStatus stream element to allow for temporarily idle streaming > sources > - > > Key: FLINK-5017 > URL: https://issues.apache.org/jira/browse/FLINK-5017 > Project: Flink > Issue Type: New Feature > Components: Streaming >Reporter: Tzu-Li (Gordon) Tai >Assignee: Tzu-Li (Gordon) Tai > Fix For: 1.3.0 > > Attachments: operator_chain_with_multiple_network_outputs.png > > > A {{StreamStatus}} element informs receiving operators whether or not they > should continue to expect watermarks from the sending operator. There are 2 > kinds of status, namely {{IDLE}} and {{ACTIVE}}. Watermark status elements > are generated at the sources, and may be propagated through the operators of > the topology using {{Output#emitWatermarkStatus(WatermarkStatus)}}. > Sources and downstream operators should emit either of the status elements > once it changes between "watermark-idle" and "watermark-active" states. > A source is considered "watermark-idle" if it will not emit records for an > indefinite amount of time. This is the case, for example, for Flink's Kafka > Consumer, where sources might initially have no assigned partitions to read > from, or no records can be read from the assigned partitions. Once the source > detects that it will resume emitting data, it is considered > "watermark-active". > Downstream operators with multiple inputs (ex. head operators of a > {{OneInputStreamTask}} or {{TwoInputStreamTask}}) should not wait for > watermarks from an upstream operator that is "watermark-idle" when deciding > whether or not to advance the operator's current watermark. When a downstream > operator determines that all upstream operators are "watermark-idle" (i.e. > when all input channels have received the watermark idle status element), > then the operator is considered to also be "watermark-idle", as it will > temporarily be unable to advance its own watermark. This is always the case > for operators that only read from a single upstream operator. Once an > operator is considered "watermark-idle", it should itself forward its idle > status to inform downstream operators. The operator is considered to be back > to "watermark-active" as soon as at least one of its upstream operators > resume to be "watermark-active" (i.e. when at least one input channel > receives the watermark active status element), and should also forward its > active status to inform downstream operators. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] flink pull request #2801: [FLINK-5017] [streaming] Introduce StreamStatus to...
Github user tzulitai closed the pull request at: https://github.com/apache/flink/pull/2801 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #2801: [FLINK-5017] [streaming] Introduce StreamStatus to facili...
Github user tzulitai commented on the issue: https://github.com/apache/flink/pull/2801 @aljoscha thanks! closing this --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-5017) Introduce StreamStatus stream element to allow for temporarily idle streaming sources
[ https://issues.apache.org/jira/browse/FLINK-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861362#comment-15861362 ] ASF GitHub Bot commented on FLINK-5017: --- Github user aljoscha commented on the issue: https://github.com/apache/flink/pull/2801 Thanks for your very good work! Could you please close this PR and the Jira issue, I just merged. > Introduce StreamStatus stream element to allow for temporarily idle streaming > sources > - > > Key: FLINK-5017 > URL: https://issues.apache.org/jira/browse/FLINK-5017 > Project: Flink > Issue Type: New Feature > Components: Streaming >Reporter: Tzu-Li (Gordon) Tai >Assignee: Tzu-Li (Gordon) Tai > Fix For: 1.3.0 > > Attachments: operator_chain_with_multiple_network_outputs.png > > > A {{StreamStatus}} element informs receiving operators whether or not they > should continue to expect watermarks from the sending operator. There are 2 > kinds of status, namely {{IDLE}} and {{ACTIVE}}. Watermark status elements > are generated at the sources, and may be propagated through the operators of > the topology using {{Output#emitWatermarkStatus(WatermarkStatus)}}. > Sources and downstream operators should emit either of the status elements > once it changes between "watermark-idle" and "watermark-active" states. > A source is considered "watermark-idle" if it will not emit records for an > indefinite amount of time. This is the case, for example, for Flink's Kafka > Consumer, where sources might initially have no assigned partitions to read > from, or no records can be read from the assigned partitions. Once the source > detects that it will resume emitting data, it is considered > "watermark-active". > Downstream operators with multiple inputs (ex. head operators of a > {{OneInputStreamTask}} or {{TwoInputStreamTask}}) should not wait for > watermarks from an upstream operator that is "watermark-idle" when deciding > whether or not to advance the operator's current watermark. When a downstream > operator determines that all upstream operators are "watermark-idle" (i.e. > when all input channels have received the watermark idle status element), > then the operator is considered to also be "watermark-idle", as it will > temporarily be unable to advance its own watermark. This is always the case > for operators that only read from a single upstream operator. Once an > operator is considered "watermark-idle", it should itself forward its idle > status to inform downstream operators. The operator is considered to be back > to "watermark-active" as soon as at least one of its upstream operators > resume to be "watermark-active" (i.e. when at least one input channel > receives the watermark active status element), and should also forward its > active status to inform downstream operators. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] flink issue #2801: [FLINK-5017] [streaming] Introduce StreamStatus to facili...
Github user aljoscha commented on the issue: https://github.com/apache/flink/pull/2801 Thanks for your very good work! ð Could you please close this PR and the Jira issue, I just merged. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Assigned] (FLINK-5760) "Introduction to Apache Flink" links to 1.1 release
[ https://issues.apache.org/jira/browse/FLINK-5760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Winters reassigned FLINK-5760: --- Assignee: Mike Winters > "Introduction to Apache Flink" links to 1.1 release > --- > > Key: FLINK-5760 > URL: https://issues.apache.org/jira/browse/FLINK-5760 > Project: Flink > Issue Type: Bug > Components: Project Website >Reporter: Timo Walther >Assignee: Mike Winters > > Multiple links in "Introduction to Apache Flink" point to the 1.1 release. > They should always point to the current release. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-5760) "Introduction to Apache Flink" links to 1.1 release
[ https://issues.apache.org/jira/browse/FLINK-5760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861351#comment-15861351 ] Mike Winters commented on FLINK-5760: - Fixed by this PR: https://github.com/apache/flink-web/pull/46 > "Introduction to Apache Flink" links to 1.1 release > --- > > Key: FLINK-5760 > URL: https://issues.apache.org/jira/browse/FLINK-5760 > Project: Flink > Issue Type: Bug > Components: Project Website >Reporter: Timo Walther >Assignee: Mike Winters > > Multiple links in "Introduction to Apache Flink" point to the 1.1 release. > They should always point to the current release. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-5733) Link to Bahir connectors
[ https://issues.apache.org/jira/browse/FLINK-5733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861347#comment-15861347 ] Mike Winters commented on FLINK-5733: - [~greghogan] this PR includes linking to the bahir-flink directory in the Bahir repo from the 'Ecosystem' page: https://github.com/apache/flink-web/pull/46 (which I believe is the closest thing to Bahir-Flink docs right now) > Link to Bahir connectors > > > Key: FLINK-5733 > URL: https://issues.apache.org/jira/browse/FLINK-5733 > Project: Flink > Issue Type: Improvement > Components: Documentation, Project Website >Affects Versions: 1.3.0, 1.2.1 >Reporter: Greg Hogan >Assignee: Greg Hogan >Priority: Minor > > The [ecosystem|https://flink.apache.org/ecosystem.html] page lists and links > to connectors included in the Flink distribution. Add to this list the > connectors in [bahir-flink|https://github.com/apache/bahir-flink]. > Also add Bahir connectors to the > [connectors|https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/connectors/index.html] > documentation. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-5309) documentation links on the home page point to 1.2-SNAPSHOT
[ https://issues.apache.org/jira/browse/FLINK-5309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861345#comment-15861345 ] Mike Winters commented on FLINK-5309: - https://github.com/apache/flink-web/pull/46 covers these updates > documentation links on the home page point to 1.2-SNAPSHOT > -- > > Key: FLINK-5309 > URL: https://issues.apache.org/jira/browse/FLINK-5309 > Project: Flink > Issue Type: Bug > Components: Project Website >Reporter: Nico Kruber >Assignee: Mike Winters > > The main website at https://flink.apache.org/ has several links to the > documentation but despite advertising a stable release download, all of those > links point to the 1.2 branch. > This should be set to the same stable version's documentation instead. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (FLINK-5775) NullReferenceException when running job on local cluster
Colin Breame created FLINK-5775: --- Summary: NullReferenceException when running job on local cluster Key: FLINK-5775 URL: https://issues.apache.org/jira/browse/FLINK-5775 Project: Flink Issue Type: Bug Components: DataStream API Affects Versions: 1.2.0 Reporter: Colin Breame When the job is submitted to a local Flink cluster (started using 'start-local.sh'), the exception below is produced. It might be worth pointing out that the job reads from a local file (using StreamExecutionEnvironment.readFile()). {code} Caused by: java.lang.NullPointerException at org.apache.flink.core.fs.Path.normalizePath(Path.java:258) at org.apache.flink.core.fs.Path.(Path.java:144) at org.apache.flink.core.fs.local.LocalFileSystem.pathToFile(LocalFileSystem.java:138) at org.apache.flink.core.fs.local.LocalFileSystem.getFileStatus(LocalFileSystem.java:97) at org.apache.flink.core.fs.FileSystem.exists(FileSystem.java:464) at org.apache.flink.core.fs.SafetyNetWrapperFileSystem.exists(SafetyNetWrapperFileSystem.java:99) at org.apache.flink.streaming.api.functions.source.ContinuousFileMonitoringFunction.run(ContinuousFileMonitoringFunction.java:191) at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:78) at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:55) at org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:56) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:272) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:655) at java.lang.Thread.run(Thread.java:745) {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-5745) Set uncaught exception handler for Netty threads
[ https://issues.apache.org/jira/browse/FLINK-5745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861301#comment-15861301 ] ASF GitHub Bot commented on FLINK-5745: --- Github user NicoK commented on the issue: https://github.com/apache/flink/pull/3293 @uce I'll extract the inner class and use it here as well as soon as the final #3290 is merged > Set uncaught exception handler for Netty threads > > > Key: FLINK-5745 > URL: https://issues.apache.org/jira/browse/FLINK-5745 > Project: Flink > Issue Type: Improvement > Components: Network >Reporter: Ufuk Celebi >Priority: Minor > > We pass a thread factory for the Netty event loop threads (see > {{NettyServer}} and {{NettyClient}}), but don't set an uncaught exception > handler. Let's add a JVM terminating handler that exits the process in cause > of fatal errors. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] flink issue #3293: [FLINK-5745] set an uncaught exception handler for netty ...
Github user NicoK commented on the issue: https://github.com/apache/flink/pull/3293 @uce I'll extract the inner class and use it here as well as soon as the final #3290 is merged --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Assigned] (FLINK-5415) ContinuousFileProcessingTest failed on travis
[ https://issues.apache.org/jira/browse/FLINK-5415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aljoscha Krettek reassigned FLINK-5415: --- Assignee: Aljoscha Krettek > ContinuousFileProcessingTest failed on travis > - > > Key: FLINK-5415 > URL: https://issues.apache.org/jira/browse/FLINK-5415 > Project: Flink > Issue Type: Improvement > Components: DataStream API, Streaming Connectors >Affects Versions: 1.3.0 >Reporter: Chesnay Schepler >Assignee: Aljoscha Krettek > Labels: test-stability > > https://s3.amazonaws.com/archive.travis-ci.org/jobs/189171123/log.txt > testFunctionRestore(org.apache.flink.hdfstests.ContinuousFileProcessingTest) > Time elapsed: 0.162 sec <<< FAILURE! > java.lang.AssertionError: expected:<1483623669528> but > was:<-9223372036854775808> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at > org.apache.flink.hdfstests.ContinuousFileProcessingTest.testFunctionRestore(ContinuousFileProcessingTest.java:761) > testProcessOnce(org.apache.flink.hdfstests.ContinuousFileProcessingTest) > Time elapsed: 0.045 sec <<< FAILURE! > java.lang.AssertionError: null > at org.junit.Assert.fail(Assert.java:86) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.junit.Assert.assertFalse(Assert.java:64) > at org.junit.Assert.assertFalse(Assert.java:74) > at > org.apache.flink.hdfstests.ContinuousFileProcessingTest.createFileAndFillWithData(ContinuousFileProcessingTest.java:958) > at > org.apache.flink.hdfstests.ContinuousFileProcessingTest.testProcessOnce(ContinuousFileProcessingTest.java:675) > testFileReadingOperatorWithIngestionTime(org.apache.flink.hdfstests.ContinuousFileProcessingTest) > Time elapsed: 0.002 sec <<< FAILURE! > java.lang.AssertionError: null > at org.junit.Assert.fail(Assert.java:86) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.junit.Assert.assertFalse(Assert.java:64) > at org.junit.Assert.assertFalse(Assert.java:74) > at > org.apache.flink.hdfstests.ContinuousFileProcessingTest.createFileAndFillWithData(ContinuousFileProcessingTest.java:958) > at > org.apache.flink.hdfstests.ContinuousFileProcessingTest.testFileReadingOperatorWithIngestionTime(ContinuousFileProcessingTest.java:150) > testSortingOnModTime(org.apache.flink.hdfstests.ContinuousFileProcessingTest) > Time elapsed: 0.002 sec <<< FAILURE! > java.lang.AssertionError: null > at org.junit.Assert.fail(Assert.java:86) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.junit.Assert.assertFalse(Assert.java:64) > at org.junit.Assert.assertFalse(Assert.java:74) > at > org.apache.flink.hdfstests.ContinuousFileProcessingTest.createFileAndFillWithData(ContinuousFileProcessingTest.java:958) > at > org.apache.flink.hdfstests.ContinuousFileProcessingTest.testSortingOnModTime(ContinuousFileProcessingTest.java:596) > testFileReadingOperatorWithEventTime(org.apache.flink.hdfstests.ContinuousFileProcessingTest) > Time elapsed: 0.001 sec <<< FAILURE! > java.lang.AssertionError: null > at org.junit.Assert.fail(Assert.java:86) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.junit.Assert.assertFalse(Assert.java:64) > at org.junit.Assert.assertFalse(Assert.java:74) > at > org.apache.flink.hdfstests.ContinuousFileProcessingTest.createFileAndFillWithData(ContinuousFileProcessingTest.java:958) > at > org.apache.flink.hdfstests.ContinuousFileProcessingTest.testFileReadingOperatorWithEventTime(ContinuousFileProcessingTest.java:308) > testProcessContinuously(org.apache.flink.hdfstests.ContinuousFileProcessingTest) > Time elapsed: 0.001 sec <<< FAILURE! > java.lang.AssertionError: null > at org.junit.Assert.fail(Assert.java:86) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.junit.Assert.assertFalse(Assert.java:64) > at org.junit.Assert.assertFalse(Assert.java:74) > at > org.apache.flink.hdfstests.ContinuousFileProcessingTest.createFileAndFillWithData(ContinuousFileProcessingTest.java:958) > at > org.apache.flink.hdfstests.ContinuousFileProcessingTest.testProcessContinuously(ContinuousFileProcessingTest.java:771) > Results : > Failed tests: > > ContinuousFileProcessingTest.testFileReadingOperatorWithEventTime:308->createFileAndFillWithData:958 > null > > ContinuousFileProcessingTest.testFileReadingOperatorWithIngestionTime:150->createFileAndFillWithData:958 > null > ContinuousFileProcessingTest.testFunctionRestore:761 > expected:<1483623669528> but was:<-9223372036854775808> > >
[jira] [Closed] (FLINK-5774) ContinuousFileProcessingTest has test instability
[ https://issues.apache.org/jira/browse/FLINK-5774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aljoscha Krettek closed FLINK-5774. --- Resolution: Duplicate Ah dammit, this is a duplicate of FLINK-5415, closing again. > ContinuousFileProcessingTest has test instability > - > > Key: FLINK-5774 > URL: https://issues.apache.org/jira/browse/FLINK-5774 > Project: Flink > Issue Type: Bug > Components: Streaming Connectors >Reporter: Aljoscha Krettek >Assignee: Aljoscha Krettek > > Inserting a {{Thread.sleep(200)}} in > {{ContinuousFileMonitoringFunction.monitorDirAndForwardSplits()}} will make > the tests fail reliably. Normally, it occurs now and then due to "natural" > slow downs on Travis: > log: https://api.travis-ci.org/jobs/199977242/log.txt?deansi=true > The condition that wait's for the file monitoring function to reach a certain > state needs to be tougher. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-5415) ContinuousFileProcessingTest failed on travis
[ https://issues.apache.org/jira/browse/FLINK-5415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861299#comment-15861299 ] Aljoscha Krettek commented on FLINK-5415: - Inserting a {{Thread.sleep(200)}} in {{ContinuousFileMonitoringFunction.monitorDirAndForwardSplits()}} will make the tests fail reliably. Normally, it occurs now and then due to "natural" slow downs on Travis: log: https://api.travis-ci.org/jobs/199977242/log.txt?deansi=true The condition that wait's for the file monitoring function to reach a certain state needs to be tougher. > ContinuousFileProcessingTest failed on travis > - > > Key: FLINK-5415 > URL: https://issues.apache.org/jira/browse/FLINK-5415 > Project: Flink > Issue Type: Improvement > Components: DataStream API, Streaming Connectors >Affects Versions: 1.3.0 >Reporter: Chesnay Schepler >Assignee: Aljoscha Krettek > Labels: test-stability > > https://s3.amazonaws.com/archive.travis-ci.org/jobs/189171123/log.txt > testFunctionRestore(org.apache.flink.hdfstests.ContinuousFileProcessingTest) > Time elapsed: 0.162 sec <<< FAILURE! > java.lang.AssertionError: expected:<1483623669528> but > was:<-9223372036854775808> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at > org.apache.flink.hdfstests.ContinuousFileProcessingTest.testFunctionRestore(ContinuousFileProcessingTest.java:761) > testProcessOnce(org.apache.flink.hdfstests.ContinuousFileProcessingTest) > Time elapsed: 0.045 sec <<< FAILURE! > java.lang.AssertionError: null > at org.junit.Assert.fail(Assert.java:86) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.junit.Assert.assertFalse(Assert.java:64) > at org.junit.Assert.assertFalse(Assert.java:74) > at > org.apache.flink.hdfstests.ContinuousFileProcessingTest.createFileAndFillWithData(ContinuousFileProcessingTest.java:958) > at > org.apache.flink.hdfstests.ContinuousFileProcessingTest.testProcessOnce(ContinuousFileProcessingTest.java:675) > testFileReadingOperatorWithIngestionTime(org.apache.flink.hdfstests.ContinuousFileProcessingTest) > Time elapsed: 0.002 sec <<< FAILURE! > java.lang.AssertionError: null > at org.junit.Assert.fail(Assert.java:86) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.junit.Assert.assertFalse(Assert.java:64) > at org.junit.Assert.assertFalse(Assert.java:74) > at > org.apache.flink.hdfstests.ContinuousFileProcessingTest.createFileAndFillWithData(ContinuousFileProcessingTest.java:958) > at > org.apache.flink.hdfstests.ContinuousFileProcessingTest.testFileReadingOperatorWithIngestionTime(ContinuousFileProcessingTest.java:150) > testSortingOnModTime(org.apache.flink.hdfstests.ContinuousFileProcessingTest) > Time elapsed: 0.002 sec <<< FAILURE! > java.lang.AssertionError: null > at org.junit.Assert.fail(Assert.java:86) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.junit.Assert.assertFalse(Assert.java:64) > at org.junit.Assert.assertFalse(Assert.java:74) > at > org.apache.flink.hdfstests.ContinuousFileProcessingTest.createFileAndFillWithData(ContinuousFileProcessingTest.java:958) > at > org.apache.flink.hdfstests.ContinuousFileProcessingTest.testSortingOnModTime(ContinuousFileProcessingTest.java:596) > testFileReadingOperatorWithEventTime(org.apache.flink.hdfstests.ContinuousFileProcessingTest) > Time elapsed: 0.001 sec <<< FAILURE! > java.lang.AssertionError: null > at org.junit.Assert.fail(Assert.java:86) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.junit.Assert.assertFalse(Assert.java:64) > at org.junit.Assert.assertFalse(Assert.java:74) > at > org.apache.flink.hdfstests.ContinuousFileProcessingTest.createFileAndFillWithData(ContinuousFileProcessingTest.java:958) > at > org.apache.flink.hdfstests.ContinuousFileProcessingTest.testFileReadingOperatorWithEventTime(ContinuousFileProcessingTest.java:308) > testProcessContinuously(org.apache.flink.hdfstests.ContinuousFileProcessingTest) > Time elapsed: 0.001 sec <<< FAILURE! > java.lang.AssertionError: null > at org.junit.Assert.fail(Assert.java:86) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.junit.Assert.assertFalse(Assert.java:64) > at org.junit.Assert.assertFalse(Assert.java:74) > at > org.apache.flink.hdfstests.ContinuousFileProcessingTest.createFileAndFillWithData(ContinuousFileProcessingTest.java:958) > at > org.apache.flink.hdfstests.ContinuousFileProcessingTest.testProcessContinuously(ContinuousFileProcessingTest.java:771) > Results
[jira] [Commented] (FLINK-5745) Set uncaught exception handler for Netty threads
[ https://issues.apache.org/jira/browse/FLINK-5745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861293#comment-15861293 ] ASF GitHub Bot commented on FLINK-5745: --- Github user NicoK commented on the issue: https://github.com/apache/flink/pull/3293 I was actually looking through the code to find something like this but it seems that every class does this locally for now. Global exit codes make sense though - also for documentation > Set uncaught exception handler for Netty threads > > > Key: FLINK-5745 > URL: https://issues.apache.org/jira/browse/FLINK-5745 > Project: Flink > Issue Type: Improvement > Components: Network >Reporter: Ufuk Celebi >Priority: Minor > > We pass a thread factory for the Netty event loop threads (see > {{NettyServer}} and {{NettyClient}}), but don't set an uncaught exception > handler. Let's add a JVM terminating handler that exits the process in cause > of fatal errors. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (FLINK-5774) ContinuousFileProcessingTest has test instability
Aljoscha Krettek created FLINK-5774: --- Summary: ContinuousFileProcessingTest has test instability Key: FLINK-5774 URL: https://issues.apache.org/jira/browse/FLINK-5774 Project: Flink Issue Type: Bug Components: Streaming Connectors Reporter: Aljoscha Krettek Assignee: Aljoscha Krettek Inserting a {{Thread.sleep(200)}} in {{ContinuousFileMonitoringFunction.monitorDirAndForwardSplits()}} will make the tests fail reliably. Normally, it occurs now and then due to "natural" slow downs on Travis: log: https://api.travis-ci.org/jobs/199977242/log.txt?deansi=true The condition that wait's for the file monitoring function to reach a certain state needs to be tougher. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] flink issue #3293: [FLINK-5745] set an uncaught exception handler for netty ...
Github user NicoK commented on the issue: https://github.com/apache/flink/pull/3293 I was actually looking through the code to find something like this but it seems that every class does this locally for now. Global exit codes make sense though - also for documentation --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-5718) Handle JVM Fatal Exceptions in Tasks
[ https://issues.apache.org/jira/browse/FLINK-5718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861288#comment-15861288 ] ASF GitHub Bot commented on FLINK-5718: --- Github user StephanEwen commented on the issue: https://github.com/apache/flink/pull/3276 Addressing the comment and merging this... > Handle JVM Fatal Exceptions in Tasks > > > Key: FLINK-5718 > URL: https://issues.apache.org/jira/browse/FLINK-5718 > Project: Flink > Issue Type: Improvement > Components: Local Runtime >Reporter: Stephan Ewen >Assignee: Stephan Ewen > > The TaskManager catches and handles all types of exceptions right now (all > {{Throwables}}). The intention behind that is: > - Many {{Error}} subclasses are recoverable for the TaskManagers, such as > failure to load/link user code > - We want to give eager notifications to the JobManager in case something > in a task goes wrong. > However, there are some exceptions which should probably simply terminate the > JVM, if caught in the task thread, because they may leave the JVM in a > dysfunctional limbo state: > - {{OutOfMemoryError}} > - {{InternalError}} > - {{UnknownError}} > - {{ZipError}} > These are basically the subclasses of {{VirtualMachineError}}, except for > {{StackOverflowError}}, which is recoverable and usually recovered already by > the time the exception has been thrown and the stack unwound. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] flink issue #3276: [FLINK-5718] [core] TaskManagers exit the JVM on fatal ex...
Github user StephanEwen commented on the issue: https://github.com/apache/flink/pull/3276 Addressing the comment and merging this... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #3293: [FLINK-5745] set an uncaught exception handler for netty ...
Github user StephanEwen commented on the issue: https://github.com/apache/flink/pull/3293 Looking at this and at my previous pull request, I am wondering if we should actually define and collect exit codes somewhere globally, like in `flink-core:org.apache.flink.configuration.ExitCodes`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-5745) Set uncaught exception handler for Netty threads
[ https://issues.apache.org/jira/browse/FLINK-5745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861285#comment-15861285 ] ASF GitHub Bot commented on FLINK-5745: --- Github user StephanEwen commented on the issue: https://github.com/apache/flink/pull/3293 Looking at this and at my previous pull request, I am wondering if we should actually define and collect exit codes somewhere globally, like in `flink-core:org.apache.flink.configuration.ExitCodes`. > Set uncaught exception handler for Netty threads > > > Key: FLINK-5745 > URL: https://issues.apache.org/jira/browse/FLINK-5745 > Project: Flink > Issue Type: Improvement > Components: Network >Reporter: Ufuk Celebi >Priority: Minor > > We pass a thread factory for the Netty event loop threads (see > {{NettyServer}} and {{NettyClient}}), but don't set an uncaught exception > handler. Let's add a JVM terminating handler that exits the process in cause > of fatal errors. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-5759) Set an UncaughtExceptionHandler for all Thread Pools in JobManager
[ https://issues.apache.org/jira/browse/FLINK-5759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861281#comment-15861281 ] ASF GitHub Bot commented on FLINK-5759: --- Github user StephanEwen commented on the issue: https://github.com/apache/flink/pull/3290 Addressing the comments and merging this... > Set an UncaughtExceptionHandler for all Thread Pools in JobManager > -- > > Key: FLINK-5759 > URL: https://issues.apache.org/jira/browse/FLINK-5759 > Project: Flink > Issue Type: Bug > Components: JobManager >Affects Versions: 1.2.0 >Reporter: Stephan Ewen >Assignee: Stephan Ewen > Fix For: 1.3.0 > > > Currently, the thread pools of the {{JobManager}} do not have any > {{UncaughtExceptionHandler}}. > While uncaught exceptions are rare (Flink handles exceptions aggressively in > most places), when exceptions slip through in these threads (which execute > future responses and delayed actions), the JobManager may be in an > inconsistent state and not function properly any more. > We should add a handler that results in a process kill in the case of > uncaught exceptions. Letting the JobManager be restarted by the respective > cluster framework is the only guaranteed way to be safe. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] flink issue #3290: [FLINK-5759] [jobmanager] Set UncaughtExceptionHandlers f...
Github user StephanEwen commented on the issue: https://github.com/apache/flink/pull/3290 Addressing the comments and merging this... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-5745) Set uncaught exception handler for Netty threads
[ https://issues.apache.org/jira/browse/FLINK-5745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861278#comment-15861278 ] ASF GitHub Bot commented on FLINK-5745: --- Github user uce commented on the issue: https://github.com/apache/flink/pull/3293 I thought that this is only going to be one, sorry. Imo only one is good. The thread name is logged in the message. > Set uncaught exception handler for Netty threads > > > Key: FLINK-5745 > URL: https://issues.apache.org/jira/browse/FLINK-5745 > Project: Flink > Issue Type: Improvement > Components: Network >Reporter: Ufuk Celebi >Priority: Minor > > We pass a thread factory for the Netty event loop threads (see > {{NettyServer}} and {{NettyClient}}), but don't set an uncaught exception > handler. Let's add a JVM terminating handler that exits the process in cause > of fatal errors. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] flink issue #3293: [FLINK-5745] set an uncaught exception handler for netty ...
Github user uce commented on the issue: https://github.com/apache/flink/pull/3293 I thought that this is only going to be one, sorry. Imo only one is good. The thread name is logged in the message. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-5759) Set an UncaughtExceptionHandler for all Thread Pools in JobManager
[ https://issues.apache.org/jira/browse/FLINK-5759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861274#comment-15861274 ] ASF GitHub Bot commented on FLINK-5759: --- Github user StephanEwen commented on a diff in the pull request: https://github.com/apache/flink/pull/3290#discussion_r100535521 --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/JobManagerServices.java --- @@ -116,12 +116,17 @@ public static JobManagerServices fromConfiguration( final FiniteDuration timeout; try { timeout = AkkaUtils.getTimeout(config); - } catch (NumberFormatException e) { + } + catch (NumberFormatException e) { --- End diff -- Yes, that is still my old code style config. IntelliJ sometimes triggers some local reformatting. @shijinkui Updating the code style has been a discussion forever. To include this into the style, one would need to fix many styles. But it is ultimately a good idea to have this, agreed. > Set an UncaughtExceptionHandler for all Thread Pools in JobManager > -- > > Key: FLINK-5759 > URL: https://issues.apache.org/jira/browse/FLINK-5759 > Project: Flink > Issue Type: Bug > Components: JobManager >Affects Versions: 1.2.0 >Reporter: Stephan Ewen >Assignee: Stephan Ewen > Fix For: 1.3.0 > > > Currently, the thread pools of the {{JobManager}} do not have any > {{UncaughtExceptionHandler}}. > While uncaught exceptions are rare (Flink handles exceptions aggressively in > most places), when exceptions slip through in these threads (which execute > future responses and delayed actions), the JobManager may be in an > inconsistent state and not function properly any more. > We should add a handler that results in a process kill in the case of > uncaught exceptions. Letting the JobManager be restarted by the respective > cluster framework is the only guaranteed way to be safe. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] flink pull request #3290: [FLINK-5759] [jobmanager] Set UncaughtExceptionHan...
Github user StephanEwen commented on a diff in the pull request: https://github.com/apache/flink/pull/3290#discussion_r100535521 --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/JobManagerServices.java --- @@ -116,12 +116,17 @@ public static JobManagerServices fromConfiguration( final FiniteDuration timeout; try { timeout = AkkaUtils.getTimeout(config); - } catch (NumberFormatException e) { + } + catch (NumberFormatException e) { --- End diff -- Yes, that is still my old code style config. IntelliJ sometimes triggers some local reformatting. @shijinkui Updating the code style has been a discussion forever. To include this into the style, one would need to fix many styles. But it is ultimately a good idea to have this, agreed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-5759) Set an UncaughtExceptionHandler for all Thread Pools in JobManager
[ https://issues.apache.org/jira/browse/FLINK-5759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861271#comment-15861271 ] ASF GitHub Bot commented on FLINK-5759: --- Github user StephanEwen commented on a diff in the pull request: https://github.com/apache/flink/pull/3290#discussion_r100535249 --- Diff: flink-mesos/src/main/java/org/apache/flink/mesos/runtime/clusterframework/MesosApplicationMasterRunner.java --- @@ -216,11 +220,11 @@ protected int runPrivileged(Configuration config, Configuration dynamicPropertie futureExecutor = Executors.newScheduledThreadPool( numberProcessors, - new NamedThreadFactory("mesos-jobmanager-future-", "-thread-")); + new ExecutorThreadFactory("mesos-jobmanager-future")); --- End diff -- The pool is not really tied to Akka. Akka has its own threads for the actors. The JobManager actor uses the "future" pool for futures produced by the actors. The ExecutionGraph also uses that pool for some callbacks. > Set an UncaughtExceptionHandler for all Thread Pools in JobManager > -- > > Key: FLINK-5759 > URL: https://issues.apache.org/jira/browse/FLINK-5759 > Project: Flink > Issue Type: Bug > Components: JobManager >Affects Versions: 1.2.0 >Reporter: Stephan Ewen >Assignee: Stephan Ewen > Fix For: 1.3.0 > > > Currently, the thread pools of the {{JobManager}} do not have any > {{UncaughtExceptionHandler}}. > While uncaught exceptions are rare (Flink handles exceptions aggressively in > most places), when exceptions slip through in these threads (which execute > future responses and delayed actions), the JobManager may be in an > inconsistent state and not function properly any more. > We should add a handler that results in a process kill in the case of > uncaught exceptions. Letting the JobManager be restarted by the respective > cluster framework is the only guaranteed way to be safe. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-5759) Set an UncaughtExceptionHandler for all Thread Pools in JobManager
[ https://issues.apache.org/jira/browse/FLINK-5759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861272#comment-15861272 ] ASF GitHub Bot commented on FLINK-5759: --- Github user StephanEwen commented on a diff in the pull request: https://github.com/apache/flink/pull/3290#discussion_r100535279 --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/util/ExecutorThreadFactory.java --- @@ -18,49 +18,112 @@ package org.apache.flink.runtime.util; +import java.lang.Thread.UncaughtExceptionHandler; import java.util.concurrent.ThreadFactory; import java.util.concurrent.atomic.AtomicInteger; import org.slf4j.Logger; import org.slf4j.LoggerFactory; +import static org.apache.flink.util.Preconditions.checkNotNull; + +/** + * A thread --- End diff -- True, incomplete, will fix that. > Set an UncaughtExceptionHandler for all Thread Pools in JobManager > -- > > Key: FLINK-5759 > URL: https://issues.apache.org/jira/browse/FLINK-5759 > Project: Flink > Issue Type: Bug > Components: JobManager >Affects Versions: 1.2.0 >Reporter: Stephan Ewen >Assignee: Stephan Ewen > Fix For: 1.3.0 > > > Currently, the thread pools of the {{JobManager}} do not have any > {{UncaughtExceptionHandler}}. > While uncaught exceptions are rare (Flink handles exceptions aggressively in > most places), when exceptions slip through in these threads (which execute > future responses and delayed actions), the JobManager may be in an > inconsistent state and not function properly any more. > We should add a handler that results in a process kill in the case of > uncaught exceptions. Letting the JobManager be restarted by the respective > cluster framework is the only guaranteed way to be safe. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] flink pull request #3290: [FLINK-5759] [jobmanager] Set UncaughtExceptionHan...
Github user StephanEwen commented on a diff in the pull request: https://github.com/apache/flink/pull/3290#discussion_r100535279 --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/util/ExecutorThreadFactory.java --- @@ -18,49 +18,112 @@ package org.apache.flink.runtime.util; +import java.lang.Thread.UncaughtExceptionHandler; import java.util.concurrent.ThreadFactory; import java.util.concurrent.atomic.AtomicInteger; import org.slf4j.Logger; import org.slf4j.LoggerFactory; +import static org.apache.flink.util.Preconditions.checkNotNull; + +/** + * A thread --- End diff -- True, incomplete, will fix that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request #3290: [FLINK-5759] [jobmanager] Set UncaughtExceptionHan...
Github user StephanEwen commented on a diff in the pull request: https://github.com/apache/flink/pull/3290#discussion_r100535249 --- Diff: flink-mesos/src/main/java/org/apache/flink/mesos/runtime/clusterframework/MesosApplicationMasterRunner.java --- @@ -216,11 +220,11 @@ protected int runPrivileged(Configuration config, Configuration dynamicPropertie futureExecutor = Executors.newScheduledThreadPool( numberProcessors, - new NamedThreadFactory("mesos-jobmanager-future-", "-thread-")); + new ExecutorThreadFactory("mesos-jobmanager-future")); --- End diff -- The pool is not really tied to Akka. Akka has its own threads for the actors. The JobManager actor uses the "future" pool for futures produced by the actors. The ExecutionGraph also uses that pool for some callbacks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request #3290: [FLINK-5759] [jobmanager] Set UncaughtExceptionHan...
Github user StephanEwen commented on a diff in the pull request: https://github.com/apache/flink/pull/3290#discussion_r100534934 --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/filecache/FileCache.java --- @@ -99,7 +99,8 @@ public FileCache(String[] tempDirectories) throws IOException { this.shutdownHook = createShutdownHook(this, LOG); this.entries = new HashMap>>(); - this.executorService = Executors.newScheduledThreadPool(10, ExecutorThreadFactory.INSTANCE); + this.executorService = Executors.newScheduledThreadPool(10, --- End diff -- This PR just did not want to change anything else than what its goal was. The `10` is pretty magic, though, agreed. Something relative to the number of cores seems to make more sense, intuitively. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-5759) Set an UncaughtExceptionHandler for all Thread Pools in JobManager
[ https://issues.apache.org/jira/browse/FLINK-5759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861266#comment-15861266 ] ASF GitHub Bot commented on FLINK-5759: --- Github user StephanEwen commented on a diff in the pull request: https://github.com/apache/flink/pull/3290#discussion_r100534934 --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/filecache/FileCache.java --- @@ -99,7 +99,8 @@ public FileCache(String[] tempDirectories) throws IOException { this.shutdownHook = createShutdownHook(this, LOG); this.entries = new HashMap>>(); - this.executorService = Executors.newScheduledThreadPool(10, ExecutorThreadFactory.INSTANCE); + this.executorService = Executors.newScheduledThreadPool(10, --- End diff -- This PR just did not want to change anything else than what its goal was. The `10` is pretty magic, though, agreed. Something relative to the number of cores seems to make more sense, intuitively. > Set an UncaughtExceptionHandler for all Thread Pools in JobManager > -- > > Key: FLINK-5759 > URL: https://issues.apache.org/jira/browse/FLINK-5759 > Project: Flink > Issue Type: Bug > Components: JobManager >Affects Versions: 1.2.0 >Reporter: Stephan Ewen >Assignee: Stephan Ewen > Fix For: 1.3.0 > > > Currently, the thread pools of the {{JobManager}} do not have any > {{UncaughtExceptionHandler}}. > While uncaught exceptions are rare (Flink handles exceptions aggressively in > most places), when exceptions slip through in these threads (which execute > future responses and delayed actions), the JobManager may be in an > inconsistent state and not function properly any more. > We should add a handler that results in a process kill in the case of > uncaught exceptions. Letting the JobManager be restarted by the respective > cluster framework is the only guaranteed way to be safe. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-5745) Set uncaught exception handler for Netty threads
[ https://issues.apache.org/jira/browse/FLINK-5745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861249#comment-15861249 ] ASF GitHub Bot commented on FLINK-5745: --- Github user NicoK commented on the issue: https://github.com/apache/flink/pull/3293 wouldn't it be `NettyServer$FatalExitExceptionHandler` vs. `ExecutorThreadFactory$FatalExitExceptionHandler`? > Set uncaught exception handler for Netty threads > > > Key: FLINK-5745 > URL: https://issues.apache.org/jira/browse/FLINK-5745 > Project: Flink > Issue Type: Improvement > Components: Network >Reporter: Ufuk Celebi >Priority: Minor > > We pass a thread factory for the Netty event loop threads (see > {{NettyServer}} and {{NettyClient}}), but don't set an uncaught exception > handler. Let's add a JVM terminating handler that exits the process in cause > of fatal errors. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] flink issue #3293: [FLINK-5745] set an uncaught exception handler for netty ...
Github user NicoK commented on the issue: https://github.com/apache/flink/pull/3293 wouldn't it be `NettyServer$FatalExitExceptionHandler` vs. `ExecutorThreadFactory$FatalExitExceptionHandler`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-5745) Set uncaught exception handler for Netty threads
[ https://issues.apache.org/jira/browse/FLINK-5745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861224#comment-15861224 ] ASF GitHub Bot commented on FLINK-5745: --- Github user uce commented on the issue: https://github.com/apache/flink/pull/3293 I think it's logged as `FatalExitExceptionHandler` in both places right now as well (Stephan's PR and your PR). So nothing would change if we move it out, or am I overlooking something? > Set uncaught exception handler for Netty threads > > > Key: FLINK-5745 > URL: https://issues.apache.org/jira/browse/FLINK-5745 > Project: Flink > Issue Type: Improvement > Components: Network >Reporter: Ufuk Celebi >Priority: Minor > > We pass a thread factory for the Netty event loop threads (see > {{NettyServer}} and {{NettyClient}}), but don't set an uncaught exception > handler. Let's add a JVM terminating handler that exits the process in cause > of fatal errors. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] flink issue #3293: [FLINK-5745] set an uncaught exception handler for netty ...
Github user uce commented on the issue: https://github.com/apache/flink/pull/3293 I think it's logged as `FatalExitExceptionHandler` in both places right now as well (Stephan's PR and your PR). So nothing would change if we move it out, or am I overlooking something? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-5745) Set uncaught exception handler for Netty threads
[ https://issues.apache.org/jira/browse/FLINK-5745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861212#comment-15861212 ] ASF GitHub Bot commented on FLINK-5745: --- Github user NicoK commented on the issue: https://github.com/apache/flink/pull/3293 would be a different LOG handler though - does it make sense to have two or is it enough to have a single one in an outer class? > Set uncaught exception handler for Netty threads > > > Key: FLINK-5745 > URL: https://issues.apache.org/jira/browse/FLINK-5745 > Project: Flink > Issue Type: Improvement > Components: Network >Reporter: Ufuk Celebi >Priority: Minor > > We pass a thread factory for the Netty event loop threads (see > {{NettyServer}} and {{NettyClient}}), but don't set an uncaught exception > handler. Let's add a JVM terminating handler that exits the process in cause > of fatal errors. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] flink issue #3293: [FLINK-5745] set an uncaught exception handler for netty ...
Github user NicoK commented on the issue: https://github.com/apache/flink/pull/3293 would be a different LOG handler though - does it make sense to have two or is it enough to have a single one in an outer class? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-5654) Add processing time OVER RANGE BETWEEN x PRECEDING aggregation to SQL
[ https://issues.apache.org/jira/browse/FLINK-5654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861207#comment-15861207 ] radu commented on FLINK-5654: - Thanks Haohui for the suggestion. You are right with the suggestions you make and this is the path i took. However, I was referring to what should happen in the translateToPlan function. There you have the 2 options i mentioned to build the logic to compute the aggregate > Add processing time OVER RANGE BETWEEN x PRECEDING aggregation to SQL > - > > Key: FLINK-5654 > URL: https://issues.apache.org/jira/browse/FLINK-5654 > Project: Flink > Issue Type: Sub-task > Components: Table API & SQL >Reporter: Fabian Hueske >Assignee: radu > > The goal of this issue is to add support for OVER RANGE aggregations on > processing time streams to the SQL interface. > Queries similar to the following should be supported: > {code} > SELECT > a, > SUM(b) OVER (PARTITION BY c ORDER BY procTime() RANGE BETWEEN INTERVAL '1' > HOUR PRECEDING AND CURRENT ROW) AS sumB, > MIN(b) OVER (PARTITION BY c ORDER BY procTime() RANGE BETWEEN INTERVAL '1' > HOUR PRECEDING AND CURRENT ROW) AS minB > FROM myStream > {code} > The following restrictions should initially apply: > - All OVER clauses in the same SELECT clause must be exactly the same. > - The PARTITION BY clause is optional (no partitioning results in single > threaded execution). > - The ORDER BY clause may only have procTime() as parameter. procTime() is a > parameterless scalar function that just indicates processing time mode. > - UNBOUNDED PRECEDING is not supported (see FLINK-5657) > - FOLLOWING is not supported. > The restrictions will be resolved in follow up issues. If we find that some > of the restrictions are trivial to address, we can add the functionality in > this issue as well. > This issue includes: > - Design of the DataStream operator to compute OVER ROW aggregates > - Translation from Calcite's RelNode representation (LogicalProject with > RexOver expression). -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-2168) Add HBaseTableSource
[ https://issues.apache.org/jira/browse/FLINK-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861181#comment-15861181 ] ASF GitHub Bot commented on FLINK-2168: --- Github user tonycox commented on the issue: https://github.com/apache/flink/pull/3149 @ramkrish86 Try to add ``` -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath="/tmp" ``` as parameter of JVM > Add HBaseTableSource > > > Key: FLINK-2168 > URL: https://issues.apache.org/jira/browse/FLINK-2168 > Project: Flink > Issue Type: New Feature > Components: Table API & SQL >Affects Versions: 0.9 >Reporter: Fabian Hueske >Assignee: ramkrishna.s.vasudevan >Priority: Minor > > Add a {{HBaseTableSource}} to read data from a HBase table. The > {{HBaseTableSource}} should implement the {{ProjectableTableSource}} > (FLINK-3848) and {{FilterableTableSource}} (FLINK-3849) interfaces. > The implementation can be based on Flink's {{TableInputFormat}}. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] flink issue #3149: FLINK-2168 Add HBaseTableSource
Github user tonycox commented on the issue: https://github.com/apache/flink/pull/3149 @ramkrish86 Try to add ``` -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath="/tmp" ``` as parameter of JVM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (FLINK-5773) Cannot cast scala.util.Failure to org.apache.flink.runtime.messages.Acknowledge
Colin Breame created FLINK-5773: --- Summary: Cannot cast scala.util.Failure to org.apache.flink.runtime.messages.Acknowledge Key: FLINK-5773 URL: https://issues.apache.org/jira/browse/FLINK-5773 Project: Flink Issue Type: Bug Components: DataStream API Affects Versions: 1.2.0 Reporter: Colin Breame The exception below happens when I set the StreamExecutionEnvironment.setMaxParallelism() to anything less than 4. Let me know if you need more information. {code} Caused by: java.lang.ClassCastException: Cannot cast scala.util.Failure to org.apache.flink.runtime.messages.Acknowledge at java.lang.Class.cast(Class.java:3369) at scala.concurrent.Future$$anonfun$mapTo$1.apply(Future.scala:405) at scala.util.Success$$anonfun$map$1.apply(Try.scala:206) at scala.util.Try$.apply(Try.scala:161) at scala.util.Success.map(Try.scala:206) at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235) at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235) at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32) at scala.concurrent.Future$InternalCallbackExecutor$Batch$$anonfun$run$1.processBatch$1(Future.scala:643) at scala.concurrent.Future$InternalCallbackExecutor$Batch$$anonfun$run$1.apply$mcV$sp(Future.scala:658) at scala.concurrent.Future$InternalCallbackExecutor$Batch$$anonfun$run$1.apply(Future.scala:635) at scala.concurrent.Future$InternalCallbackExecutor$Batch$$anonfun$run$1.apply(Future.scala:635) at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72) at scala.concurrent.Future$InternalCallbackExecutor$Batch.run(Future.scala:634) at scala.concurrent.Future$InternalCallbackExecutor$.scala$concurrent$Future$InternalCallbackExecutor$$unbatchedExecute(Future.scala:694) at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:685) at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40) at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248) at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:266) at org.apache.flink.runtime.taskmanager.TaskManager.submitTask(TaskManager.scala:1206) at org.apache.flink.runtime.taskmanager.TaskManager.org$apache$flink$runtime$taskmanager$TaskManager$$handleTaskMessage(TaskManager.scala:458) at org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$handleMessage$1.applyOrElse(TaskManager.scala:280) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) at org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:36) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33) at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28) at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118) at org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28) at akka.actor.Actor$class.aroundReceive(Actor.scala:467) at org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:122) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) at akka.actor.ActorCell.invoke(ActorCell.scala:487) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238) at akka.dispatch.Mailbox.run(Mailbox.scala:220) ... 5 more {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (FLINK-5771) DelimitedInputFormat does not correctly handle multi-byte delimiters
[ https://issues.apache.org/jira/browse/FLINK-5771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Breame updated FLINK-5771: Summary: DelimitedInputFormat does not correctly handle multi-byte delimiters (was: DelimitedInputFormat does not correctly handle muli-byte delimiters) > DelimitedInputFormat does not correctly handle multi-byte delimiters > > > Key: FLINK-5771 > URL: https://issues.apache.org/jira/browse/FLINK-5771 > Project: Flink > Issue Type: Bug > Components: filesystem-connector >Affects Versions: 1.2.0 >Reporter: Colin Breame > Attachments: Test.java, test.txt > > > The DelimitedInputFormat does not correctly handle multi-byte delimiters. > The reader sometimes misses a delimiter if it is preceded by the first byte > from the delimiter. This results in two records (or more) being returned > from a single call to nextRecord. > See attached test case. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-5745) Set uncaught exception handler for Netty threads
[ https://issues.apache.org/jira/browse/FLINK-5745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861154#comment-15861154 ] ASF GitHub Bot commented on FLINK-5745: --- Github user uce commented on the issue: https://github.com/apache/flink/pull/3293 Should we instead of copying the uncaught exception handler code move it out to an outer class? You would have to base this PR on #3290, move the class out and only use it here. > Set uncaught exception handler for Netty threads > > > Key: FLINK-5745 > URL: https://issues.apache.org/jira/browse/FLINK-5745 > Project: Flink > Issue Type: Improvement > Components: Network >Reporter: Ufuk Celebi >Priority: Minor > > We pass a thread factory for the Netty event loop threads (see > {{NettyServer}} and {{NettyClient}}), but don't set an uncaught exception > handler. Let's add a JVM terminating handler that exits the process in cause > of fatal errors. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] flink issue #3293: [FLINK-5745] set an uncaught exception handler for netty ...
Github user uce commented on the issue: https://github.com/apache/flink/pull/3293 Should we instead of copying the uncaught exception handler code move it out to an outer class? You would have to base this PR on #3290, move the class out and only use it here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Updated] (FLINK-5650) Flink-python tests executing cost too long time
[ https://issues.apache.org/jira/browse/FLINK-5650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] shijinkui updated FLINK-5650: - Description: When execute `mvn clean test` in flink-python, it will wait more than half hour after the console output below: --- T E S T S --- Running org.apache.flink.python.api.PythonPlanBinderTest log4j:WARN No appenders could be found for logger (org.apache.flink.python.api.PythonPlanBinderTest). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. The stack below: "main" prio=5 tid=0x7f8d7780b800 nid=0x1c03 waiting on condition [0x79fd8000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.apache.flink.python.api.streaming.plan.PythonPlanStreamer.startPython(PythonPlanStreamer.java:70) at org.apache.flink.python.api.streaming.plan.PythonPlanStreamer.open(PythonPlanStreamer.java:50) at org.apache.flink.python.api.PythonPlanBinder.startPython(PythonPlanBinder.java:211) at org.apache.flink.python.api.PythonPlanBinder.runPlan(PythonPlanBinder.java:141) at org.apache.flink.python.api.PythonPlanBinder.main(PythonPlanBinder.java:114) at org.apache.flink.python.api.PythonPlanBinderTest.testProgram(PythonPlanBinderTest.java:83) at org.apache.flink.test.util.JavaProgramTestBase.testJobWithoutObjectReuse(JavaProgramTestBase.java:174) this is the jstack: https://gist.github.com/shijinkui/af47e8bc6c9f748336bf52efd3df94b0 was: When execute `mvn clean test` in flink-python, it will wait more than half hour after the console output below: --- T E S T S --- Running org.apache.flink.python.api.PythonPlanBinderTest log4j:WARN No appenders could be found for logger (org.apache.flink.python.api.PythonPlanBinderTest). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. The stack below: "main" prio=5 tid=0x7f8d7780b800 nid=0x1c03 waiting on condition [0x79fd8000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.apache.flink.python.api.streaming.plan.PythonPlanStreamer.startPython(PythonPlanStreamer.java:70) at org.apache.flink.python.api.streaming.plan.PythonPlanStreamer.open(PythonPlanStreamer.java:50) at org.apache.flink.python.api.PythonPlanBinder.startPython(PythonPlanBinder.java:211) at org.apache.flink.python.api.PythonPlanBinder.runPlan(PythonPlanBinder.java:141) at org.apache.flink.python.api.PythonPlanBinder.main(PythonPlanBinder.java:114) at org.apache.flink.python.api.PythonPlanBinderTest.testProgram(PythonPlanBinderTest.java:83) at org.apache.flink.test.util.JavaProgramTestBase.testJobWithoutObjectReuse(JavaProgramTestBase.java:174) > Flink-python tests executing cost too long time > --- > > Key: FLINK-5650 > URL: https://issues.apache.org/jira/browse/FLINK-5650 > Project: Flink > Issue Type: Bug > Components: Python API, Tests >Affects Versions: 1.2.0 >Reporter: shijinkui >Priority: Critical > Fix For: 1.2.1 > > > When execute `mvn clean test` in flink-python, it will wait more than half > hour after the console output below: > --- > T E S T S > --- > Running org.apache.flink.python.api.PythonPlanBinderTest > log4j:WARN No appenders could be found for logger > (org.apache.flink.python.api.PythonPlanBinderTest). > log4j:WARN Please initialize the log4j system properly. > log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more > info. > The stack below: > "main" prio=5 tid=0x7f8d7780b800 nid=0x1c03 waiting on condition > [0x79fd8000] >java.lang.Thread.State: TIMED_WAITING (sleeping) > at java.lang.Thread.sleep(Native Method) > at > org.apache.flink.python.api.streaming.plan.PythonPlanStreamer.startPython(PythonPlanStreamer.java:70) > at > org.apache.flink.python.api.streaming.plan.PythonPlanStreamer.open(PythonPlanStreamer.java:50) > at > org.apache.flink.python.api.PythonPlanBinder.startPython(PythonPlanBinder.java:211) > at > org.apache.flink.python.api.PythonPlanBinder.runPlan(PythonPlanBinder.java:141) > at > org.apache.flink.python.api.PythonPlanBinder.main(PythonPlanBinder.java:114) > at >
[jira] [Updated] (FLINK-5650) Flink-python tests executing cost too long time
[ https://issues.apache.org/jira/browse/FLINK-5650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] shijinkui updated FLINK-5650: - Summary: Flink-python tests executing cost too long time (was: Flink-python tests can time out) > Flink-python tests executing cost too long time > --- > > Key: FLINK-5650 > URL: https://issues.apache.org/jira/browse/FLINK-5650 > Project: Flink > Issue Type: Bug > Components: Python API, Tests >Affects Versions: 1.2.0 >Reporter: shijinkui >Priority: Critical > Fix For: 1.2.1 > > > When execute `mvn clean test` in flink-python, it will wait more than half > hour after the console output below: > --- > T E S T S > --- > Running org.apache.flink.python.api.PythonPlanBinderTest > log4j:WARN No appenders could be found for logger > (org.apache.flink.python.api.PythonPlanBinderTest). > log4j:WARN Please initialize the log4j system properly. > log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more > info. > The stack below: > "main" prio=5 tid=0x7f8d7780b800 nid=0x1c03 waiting on condition > [0x79fd8000] >java.lang.Thread.State: TIMED_WAITING (sleeping) > at java.lang.Thread.sleep(Native Method) > at > org.apache.flink.python.api.streaming.plan.PythonPlanStreamer.startPython(PythonPlanStreamer.java:70) > at > org.apache.flink.python.api.streaming.plan.PythonPlanStreamer.open(PythonPlanStreamer.java:50) > at > org.apache.flink.python.api.PythonPlanBinder.startPython(PythonPlanBinder.java:211) > at > org.apache.flink.python.api.PythonPlanBinder.runPlan(PythonPlanBinder.java:141) > at > org.apache.flink.python.api.PythonPlanBinder.main(PythonPlanBinder.java:114) > at > org.apache.flink.python.api.PythonPlanBinderTest.testProgram(PythonPlanBinderTest.java:83) > at > org.apache.flink.test.util.JavaProgramTestBase.testJobWithoutObjectReuse(JavaProgramTestBase.java:174) -- This message was sent by Atlassian JIRA (v6.3.15#6346)