Re: Job xxx not found exception when starting Flink program in Local

2018-11-12 Thread 徐涛
After I restart my computer, the error is gone.


> 在 2018年11月13日,下午2:53,徐涛  写道:
> 
> Hi Experts,
>   When I start Flink program in local, I found that the following 
> exception throws out, I do not know why it happens because it happens in 
> sudden, some hours ago the program can start successfully.
>   Could anyone help to explain it?
>   Thanks a lot!
> 
> 2018-11-13 14:48:45 [flink-akka.actor.default-dispatcher-60] ERROR 
> o.a.f.r.r.h.job.JobDetailsHandler - Exception occurred in REST handler.
> org.apache.flink.runtime.rest.NotFoundException: Job 
> 512a21d9f992d4884f836abb82c64f0d not found
>   at 
> org.apache.flink.runtime.rest.handler.job.AbstractExecutionGraphHandler.lambda$handleRequest$1(AbstractExecutionGraphHandler.java:90)
>  ~[flink-runtime_2.11-1.6.2.jar:1.6.2]
>   at 
> java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)
>  ~[na:1.8.0_172]
>   at 
> java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852)
>  ~[na:1.8.0_172]
>   at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
>  ~[na:1.8.0_172]
>   at 
> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
>  ~[na:1.8.0_172]
>   at 
> org.apache.flink.runtime.rest.handler.legacy.ExecutionGraphCache.lambda$getExecutionGraph$0(ExecutionGraphCache.java:133)
>  ~[flink-runtime_2.11-1.6.2.jar:1.6.2]
>   at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
>  ~[na:1.8.0_172]
>   at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
>  [na:1.8.0_172]
>   at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
>  ~[na:1.8.0_172]
>   at 
> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
>  ~[na:1.8.0_172]
>   at 
> org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:772)
>  ~[flink-runtime_2.11-1.6.2.jar:1.6.2]
>   at akka.dispatch.OnComplete.internal(Future.scala:258) 
> ~[akka-actor_2.11-2.4.20.jar:na]
>   at akka.dispatch.OnComplete.internal(Future.scala:256) 
> ~[akka-actor_2.11-2.4.20.jar:na]
>   at akka.dispatch.japi$CallbackBridge.apply(Future.scala:186) 
> ~[akka-actor_2.11-2.4.20.jar:na]
>   at akka.dispatch.japi$CallbackBridge.apply(Future.scala:183) 
> ~[akka-actor_2.11-2.4.20.jar:na]
>   at 
> scala.concurrent.impl.CallbackRunnable.run$$$capture(Promise.scala:32) 
> ~[scala-library-2.11.8.jar:na]
>   at scala.concurrent.impl.CallbackRunnable.run(Promise.scala) 
> ~[scala-library-2.11.8.jar:na]
>   at 
> org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:83)
>  ~[flink-runtime_2.11-1.6.2.jar:1.6.2]
>   at 
> scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40) 
> ~[scala-library-2.11.8.jar:na]
>   at 
> scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248) 
> ~[scala-library-2.11.8.jar:na]
>   at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:534) 
> ~[akka-actor_2.11-2.4.20.jar:na]
>   at 
> akka.pattern.PipeToSupport$PipeableFuture$$anonfun$pipeTo$1.applyOrElse(PipeToSupport.scala:20)
>  ~[akka-actor_2.11-2.4.20.jar:na]
>   at 
> akka.pattern.PipeToSupport$PipeableFuture$$anonfun$pipeTo$1.applyOrElse(PipeToSupport.scala:18)
>  ~[akka-actor_2.11-2.4.20.jar:na]
>   at scala.concurrent.Future$$anonfun$andThen$1.apply(Future.scala:436) 
> ~[scala-library-2.11.8.jar:na]
>   at scala.concurrent.Future$$anonfun$andThen$1.apply(Future.scala:435) 
> ~[scala-library-2.11.8.jar:na]
>   at 
> scala.concurrent.impl.CallbackRunnable.run$$$capture(Promise.scala:32) 
> ~[scala-library-2.11.8.jar:na]
>   at scala.concurrent.impl.CallbackRunnable.run(Promise.scala) 
> ~[scala-library-2.11.8.jar:na]
>   at 
> akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)
>  ~[akka-actor_2.11-2.4.20.jar:na]
>   at 
> akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:91)
>  ~[akka-actor_2.11-2.4.20.jar:na]
>   at 
> akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91)
>  ~[akka-actor_2.11-2.4.20.jar:na]
>   at 
> akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91)
>  ~[akka-actor_2.11-2.4.20.jar:na]
>   at 
> scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72) 
> ~[scala-library-2.11.8.jar:na]
>   at 
> akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:90) 
> ~[akka-actor_2.11-2.4.20.jar:na]
>   at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:39) 
> ~[akka-actor_2.11-2.4.20.jar:na]
>   at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:415)
>  ~[akk

***UNCHECKED*** Job xxx not found exception when starting Flink program in Local

2018-11-12 Thread 徐涛
Hi Experts,	When I start Flink program in local, I found that the following exception throws out, I do not know why it happens because it happens in sudden, some hours ago the program can start successfully.	Could anyone help to explain it?	Thanks a lot!2018-11-13 14:48:45 [flink-akka.actor.default-dispatcher-60] ERROR o.a.f.r.r.h.job.JobDetailsHandler - Exception occurred in REST handler.org.apache.flink.runtime.rest.NotFoundException: Job 512a21d9f992d4884f836abb82c64f0d not found	at org.apache.flink.runtime.rest.handler.job.AbstractExecutionGraphHandler.lambda$handleRequest$1(AbstractExecutionGraphHandler.java:90) ~[flink-runtime_2.11-1.6.2.jar:1.6.2]	at java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870) ~[na:1.8.0_172]	at java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852) ~[na:1.8.0_172]	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) ~[na:1.8.0_172]	at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977) ~[na:1.8.0_172]	at org.apache.flink.runtime.rest.handler.legacy.ExecutionGraphCache.lambda$getExecutionGraph$0(ExecutionGraphCache.java:133) ~[flink-runtime_2.11-1.6.2.jar:1.6.2]	at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760) ~[na:1.8.0_172]	at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736) [na:1.8.0_172]	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) ~[na:1.8.0_172]	at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977) ~[na:1.8.0_172]	at org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:772) ~[flink-runtime_2.11-1.6.2.jar:1.6.2]	at akka.dispatch.OnComplete.internal(Future.scala:258) ~[akka-actor_2.11-2.4.20.jar:na]	at akka.dispatch.OnComplete.internal(Future.scala:256) ~[akka-actor_2.11-2.4.20.jar:na]	at akka.dispatch.japi$CallbackBridge.apply(Future.scala:186) ~[akka-actor_2.11-2.4.20.jar:na]	at akka.dispatch.japi$CallbackBridge.apply(Future.scala:183) ~[akka-actor_2.11-2.4.20.jar:na]	at scala.concurrent.impl.CallbackRunnable.run$$$capture(Promise.scala:32) ~[scala-library-2.11.8.jar:na]	at scala.concurrent.impl.CallbackRunnable.run(Promise.scala) ~[scala-library-2.11.8.jar:na]	at org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:83) ~[flink-runtime_2.11-1.6.2.jar:1.6.2]	at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40) ~[scala-library-2.11.8.jar:na]	at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248) ~[scala-library-2.11.8.jar:na]	at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:534) ~[akka-actor_2.11-2.4.20.jar:na]	at akka.pattern.PipeToSupport$PipeableFuture$$anonfun$pipeTo$1.applyOrElse(PipeToSupport.scala:20) ~[akka-actor_2.11-2.4.20.jar:na]	at akka.pattern.PipeToSupport$PipeableFuture$$anonfun$pipeTo$1.applyOrElse(PipeToSupport.scala:18) ~[akka-actor_2.11-2.4.20.jar:na]	at scala.concurrent.Future$$anonfun$andThen$1.apply(Future.scala:436) ~[scala-library-2.11.8.jar:na]	at scala.concurrent.Future$$anonfun$andThen$1.apply(Future.scala:435) ~[scala-library-2.11.8.jar:na]	at scala.concurrent.impl.CallbackRunnable.run$$$capture(Promise.scala:32) ~[scala-library-2.11.8.jar:na]	at scala.concurrent.impl.CallbackRunnable.run(Promise.scala) ~[scala-library-2.11.8.jar:na]	at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55) ~[akka-actor_2.11-2.4.20.jar:na]	at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:91) ~[akka-actor_2.11-2.4.20.jar:na]	at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91) ~[akka-actor_2.11-2.4.20.jar:na]	at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91) ~[akka-actor_2.11-2.4.20.jar:na]	at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72) ~[scala-library-2.11.8.jar:na]	at akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:90) ~[akka-actor_2.11-2.4.20.jar:na]	at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:39) ~[akka-actor_2.11-2.4.20.jar:na]	at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:415) ~[akka-actor_2.11-2.4.20.jar:na]	at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) ~[scala-library-2.11.8.jar:na]	at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) ~[scala-library-2.11.8.jar:na]	at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) ~[scala-library-2.11.8.jar:na]	at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) ~[scala-library-2.11.8.jar:na]Caused by: org.apache.flink.runtime.messages.FlinkJobNotFoundException: Could not find Flink job (512a21d9f992d4884f836abb82c64f0d)

Job xxx not found exception when starting Flink program in Local

2018-11-12 Thread 徐涛
Hi Experts,
When I start Flink program in local, I found that the following 
exception throws out, I do not know why it happens because it happens in 
sudden, some hours ago the program can start successfully.
Could anyone help to explain it?
Thanks a lot!

2018-11-13 14:48:45 [flink-akka.actor.default-dispatcher-60] ERROR 
o.a.f.r.r.h.job.JobDetailsHandler - Exception occurred in REST handler.
org.apache.flink.runtime.rest.NotFoundException: Job 
512a21d9f992d4884f836abb82c64f0d not found
at 
org.apache.flink.runtime.rest.handler.job.AbstractExecutionGraphHandler.lambda$handleRequest$1(AbstractExecutionGraphHandler.java:90)
 ~[flink-runtime_2.11-1.6.2.jar:1.6.2]
at 
java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)
 ~[na:1.8.0_172]
at 
java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852)
 ~[na:1.8.0_172]
at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) 
~[na:1.8.0_172]
at 
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
 ~[na:1.8.0_172]
at 
org.apache.flink.runtime.rest.handler.legacy.ExecutionGraphCache.lambda$getExecutionGraph$0(ExecutionGraphCache.java:133)
 ~[flink-runtime_2.11-1.6.2.jar:1.6.2]
at 
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
 ~[na:1.8.0_172]
at 
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
 [na:1.8.0_172]
at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) 
~[na:1.8.0_172]
at 
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
 ~[na:1.8.0_172]
at 
org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:772)
 ~[flink-runtime_2.11-1.6.2.jar:1.6.2]
at akka.dispatch.OnComplete.internal(Future.scala:258) 
~[akka-actor_2.11-2.4.20.jar:na]
at akka.dispatch.OnComplete.internal(Future.scala:256) 
~[akka-actor_2.11-2.4.20.jar:na]
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:186) 
~[akka-actor_2.11-2.4.20.jar:na]
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:183) 
~[akka-actor_2.11-2.4.20.jar:na]
at 
scala.concurrent.impl.CallbackRunnable.run$$$capture(Promise.scala:32) 
~[scala-library-2.11.8.jar:na]
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala) 
~[scala-library-2.11.8.jar:na]
at 
org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:83)
 ~[flink-runtime_2.11-1.6.2.jar:1.6.2]
at 
scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40) 
~[scala-library-2.11.8.jar:na]
at 
scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248) 
~[scala-library-2.11.8.jar:na]
at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:534) 
~[akka-actor_2.11-2.4.20.jar:na]
at 
akka.pattern.PipeToSupport$PipeableFuture$$anonfun$pipeTo$1.applyOrElse(PipeToSupport.scala:20)
 ~[akka-actor_2.11-2.4.20.jar:na]
at 
akka.pattern.PipeToSupport$PipeableFuture$$anonfun$pipeTo$1.applyOrElse(PipeToSupport.scala:18)
 ~[akka-actor_2.11-2.4.20.jar:na]
at scala.concurrent.Future$$anonfun$andThen$1.apply(Future.scala:436) 
~[scala-library-2.11.8.jar:na]
at scala.concurrent.Future$$anonfun$andThen$1.apply(Future.scala:435) 
~[scala-library-2.11.8.jar:na]
at 
scala.concurrent.impl.CallbackRunnable.run$$$capture(Promise.scala:32) 
~[scala-library-2.11.8.jar:na]
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala) 
~[scala-library-2.11.8.jar:na]
at 
akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)
 ~[akka-actor_2.11-2.4.20.jar:na]
at 
akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:91)
 ~[akka-actor_2.11-2.4.20.jar:na]
at 
akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91)
 ~[akka-actor_2.11-2.4.20.jar:na]
at 
akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91)
 ~[akka-actor_2.11-2.4.20.jar:na]
at 
scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72) 
~[scala-library-2.11.8.jar:na]
at 
akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:90) 
~[akka-actor_2.11-2.4.20.jar:na]
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:39) 
~[akka-actor_2.11-2.4.20.jar:na]
at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:415)
 ~[akka-actor_2.11-2.4.20.jar:na]
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) 
~[scala-library-2.11.8.jar:na]
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1

***UNCHECKED*** Table To String

2018-11-12 Thread Steve Beischlien
I have created a project to use SQL but instead of printing the output as
below I need the output in a STRING so I can write it to a dynamoDB table.

How do I convert this "result" to a STRING or is there a suggestion of some
other way I should sink to dynamoDB?  Any example code would REALLY help.
THANKS!!

Table result = tableEnv.sql("SELECT 'Alert ',t_sKey, t_deviceID,
t_sValue FROM SENSORS WHERE t_sKey='TEMPERATURE' AND t_sValue > " +
TEMPERATURE_THRESHOLD);
tableEnv.toAppendStream(result, Row.class).print();

Any assistance would be very appreciated.

Thanks


Job xxx not found exception when starting Flink program in Local

2018-11-12 Thread 徐涛
Hi Experts,
When I start Flink program in local, I found that the following 
exception throws out, I do not know why it happens because it happens in 
sudden, some hours ago the program can start successfully.
Could anyone help to explain it?
Thanks a lot!

2018-11-13 14:48:45 [flink-akka.actor.default-dispatcher-60] ERROR 
o.a.f.r.r.h.job.JobDetailsHandler - Exception occurred in REST handler.
org.apache.flink.runtime.rest.NotFoundException: Job 
512a21d9f992d4884f836abb82c64f0d not found
at 
org.apache.flink.runtime.rest.handler.job.AbstractExecutionGraphHandler.lambda$handleRequest$1(AbstractExecutionGraphHandler.java:90)
 ~[flink-runtime_2.11-1.6.2.jar:1.6.2]
at 
java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)
 ~[na:1.8.0_172]
at 
java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852)
 ~[na:1.8.0_172]
at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) 
~[na:1.8.0_172]
at 
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
 ~[na:1.8.0_172]
at 
org.apache.flink.runtime.rest.handler.legacy.ExecutionGraphCache.lambda$getExecutionGraph$0(ExecutionGraphCache.java:133)
 ~[flink-runtime_2.11-1.6.2.jar:1.6.2]
at 
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
 ~[na:1.8.0_172]
at 
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
 [na:1.8.0_172]
at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) 
~[na:1.8.0_172]
at 
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
 ~[na:1.8.0_172]
at 
org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:772)
 ~[flink-runtime_2.11-1.6.2.jar:1.6.2]
at akka.dispatch.OnComplete.internal(Future.scala:258) 
~[akka-actor_2.11-2.4.20.jar:na]
at akka.dispatch.OnComplete.internal(Future.scala:256) 
~[akka-actor_2.11-2.4.20.jar:na]
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:186) 
~[akka-actor_2.11-2.4.20.jar:na]
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:183) 
~[akka-actor_2.11-2.4.20.jar:na]
at 
scala.concurrent.impl.CallbackRunnable.run$$$capture(Promise.scala:32) 
~[scala-library-2.11.8.jar:na]
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala) 
~[scala-library-2.11.8.jar:na]
at 
org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:83)
 ~[flink-runtime_2.11-1.6.2.jar:1.6.2]
at 
scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40) 
~[scala-library-2.11.8.jar:na]
at 
scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248) 
~[scala-library-2.11.8.jar:na]
at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:534) 
~[akka-actor_2.11-2.4.20.jar:na]
at 
akka.pattern.PipeToSupport$PipeableFuture$$anonfun$pipeTo$1.applyOrElse(PipeToSupport.scala:20)
 ~[akka-actor_2.11-2.4.20.jar:na]
at 
akka.pattern.PipeToSupport$PipeableFuture$$anonfun$pipeTo$1.applyOrElse(PipeToSupport.scala:18)
 ~[akka-actor_2.11-2.4.20.jar:na]
at scala.concurrent.Future$$anonfun$andThen$1.apply(Future.scala:436) 
~[scala-library-2.11.8.jar:na]
at scala.concurrent.Future$$anonfun$andThen$1.apply(Future.scala:435) 
~[scala-library-2.11.8.jar:na]
at 
scala.concurrent.impl.CallbackRunnable.run$$$capture(Promise.scala:32) 
~[scala-library-2.11.8.jar:na]
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala) 
~[scala-library-2.11.8.jar:na]
at 
akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)
 ~[akka-actor_2.11-2.4.20.jar:na]
at 
akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:91)
 ~[akka-actor_2.11-2.4.20.jar:na]
at 
akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91)
 ~[akka-actor_2.11-2.4.20.jar:na]
at 
akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91)
 ~[akka-actor_2.11-2.4.20.jar:na]
at 
scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72) 
~[scala-library-2.11.8.jar:na]
at 
akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:90) 
~[akka-actor_2.11-2.4.20.jar:na]
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:39) 
~[akka-actor_2.11-2.4.20.jar:na]
at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:415)
 ~[akka-actor_2.11-2.4.20.jar:na]
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) 
~[scala-library-2.11.8.jar:na]
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1

Re: Queryable state when key is UUID - getting Kyro Exception

2018-11-12 Thread bupt_ljy
Hi, Jayant
 The key you specified in getKvState function should be the key of the keyed 
stream instead of the key of the map. From what I’ve seen 
onhttps://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/queryable_state.html,
 this feature only supports managed keyed state.
 By the way, I think we should optimize the error messages with which what 
Jayant met.


Best,
Jiayi Liao


Original Message
Sender:Jayant ametawittyam...@gmail.com
Recipient:trohrmanntrohrm...@apache.org
Cc:bupt_ljybupt_...@163.com; Tzu-Li (Gordon) taitzuli...@apache.org; 
useru...@flink.apache.org
Date:Tuesday, Nov 13, 2018 13:39
Subject:Re: Queryable state when key is UUID - getting Kyro Exception


Hi Till,
Here is the client snippet. Here Rule is a custom POJO that I use.


public static void main(String[] args)
 throws IOException, InterruptedException, ExecutionException {
 UUID uuid = UUID.fromString("2ba14b80-e6ff-11e8-908b-9bd8fd37bffb");

 QueryableStateClient client = new QueryableStateClient("127.0.1.1", 9069);
 ExecutionConfig config = new ExecutionConfig();
 client.setExecutionConfig(config);

 MapStateDescriptorUUID, Rule descriptor = new 
MapStateDescriptor("rulePatterns", UUID.class,
 Rule.class);
 CompletableFutureMapStateUUID, Rule resultFuture =
 client.getKvState(JobID.fromHexString("337f4476388fabc6f29bb4fb9107082c"), 
"rules",
 uuid, TypeInformation.of(UUID.class), descriptor);

 while (!resultFuture.isDone()) {
 Thread.sleep(1000);
 }
 resultFuture.whenComplete((result, throwable) - {
 if (throwable != null) {
 throwable.printStackTrace();
 } else {
 try {
 System.out.println(result.get(uuid));
 } catch (Exception e) {
 e.printStackTrace();
 }
 }
 });
}


Below is the stack trace:


Caused by: java.lang.RuntimeException: Error while processing request with ID 
12. Caused by: java.io.IOException: Unable to deserialize key and namespace. 
This indicates a mismatch in the key/namespace serializers used by the KvState 
instance and this access.
at 
org.apache.flink.queryablestate.client.state.serialization.KvStateSerializer.deserializeKeyAndNamespace(KvStateSerializer.java:107)
at 
org.apache.flink.runtime.state.heap.AbstractHeapState.getSerializedValue(AbstractHeapState.java:93)
at 
org.apache.flink.queryablestate.server.KvStateServerHandler.handleRequest(KvStateServerHandler.java:87)
at 
org.apache.flink.queryablestate.server.KvStateServerHandler.handleRequest(KvStateServerHandler.java:49)
at 
org.apache.flink.queryablestate.network.AbstractServerHandler$AsyncRequestTask.run(AbstractServerHandler.java:229)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.EOFException
at 
org.apache.flink.core.memory.DataInputDeserializer.readUnsignedByte(DataInputDeserializer.java:307)
at org.apache.flink.types.StringValue.readString(StringValue.java:770)
at 
org.apache.flink.api.common.typeutils.base.StringSerializer.deserialize(StringSerializer.java:69)
at 
org.apache.flink.api.common.typeutils.base.StringSerializer.deserialize(StringSerializer.java:28)
at 
org.apache.flink.api.java.typeutils.runtime.TupleSerializer.deserialize(TupleSerializer.java:136)
at 
org.apache.flink.api.java.typeutils.runtime.TupleSerializer.deserialize(TupleSerializer.java:30)
at 
org.apache.flink.queryablestate.client.state.serialization.KvStateSerializer.deserializeKeyAndNamespace(KvStateSerializer.java:94)
... 9 more


at 
org.apache.flink.queryablestate.server.KvStateServerHandler.handleRequest(KvStateServerHandler.java:98)
at 
org.apache.flink.queryablestate.server.KvStateServerHandler.handleRequest(KvStateServerHandler.java:49)
at 
org.apache.flink.queryablestate.network.AbstractServerHandler$AsyncRequestTask.run(AbstractServerHandler.java:229)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)


at 
org.apache.flink.queryablestate.network.AbstractServerHandler$AsyncRequestTask.lambda$run$0(AbstractServerHandler.java:266)
at 
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
at 
java.util.concurrent.CompletableFuture.uniWhenCompleteStage(CompletableFuture.java:778)
at 
java.util.concurrent.CompletableFuture.whenComplete(CompletableFuture.java:2140)
at 
org.apache.flink.queryablestate.network.AbstractServerHandler$AsyncRequestTask.run(AbstractServerHandler.java:229)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 

Re: Queryable state when key is UUID - getting Kyro Exception

2018-11-12 Thread Jayant Ameta
Hi Till,
Here is the client snippet. Here Rule is a custom POJO that I use.

public static void main(String[] args)
throws IOException, InterruptedException, ExecutionException {
  UUID uuid = UUID.fromString("2ba14b80-e6ff-11e8-908b-9bd8fd37bffb");

  QueryableStateClient client = new QueryableStateClient("127.0.1.1", 9069);
  ExecutionConfig config = new ExecutionConfig();
  client.setExecutionConfig(config);

  MapStateDescriptor descriptor = new
MapStateDescriptor<>("rulePatterns", UUID.class,
  Rule.class);
  CompletableFuture> resultFuture =
  client.getKvState(JobID.fromHexString("337f4476388fabc6f29bb4fb9107082c"),
"rules",
  uuid, TypeInformation.of(UUID.class), descriptor);

  while (!resultFuture.isDone()) {
Thread.sleep(1000);
  }
  resultFuture.whenComplete((result, throwable) -> {
if (throwable != null) {
  throwable.printStackTrace();
} else {
  try {
System.out.println(result.get(uuid));
  } catch (Exception e) {
e.printStackTrace();
  }
}
  });
}


Below is the stack trace:

Caused by: java.lang.RuntimeException: Error while processing request with
ID 12. Caused by: java.io.IOException: Unable to deserialize key and
namespace. This indicates a mismatch in the key/namespace serializers used
by the KvState instance and this access.
at
org.apache.flink.queryablestate.client.state.serialization.KvStateSerializer.deserializeKeyAndNamespace(KvStateSerializer.java:107)
at
org.apache.flink.runtime.state.heap.AbstractHeapState.getSerializedValue(AbstractHeapState.java:93)
at
org.apache.flink.queryablestate.server.KvStateServerHandler.handleRequest(KvStateServerHandler.java:87)
at
org.apache.flink.queryablestate.server.KvStateServerHandler.handleRequest(KvStateServerHandler.java:49)
at
org.apache.flink.queryablestate.network.AbstractServerHandler$AsyncRequestTask.run(AbstractServerHandler.java:229)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.EOFException
at
org.apache.flink.core.memory.DataInputDeserializer.readUnsignedByte(DataInputDeserializer.java:307)
at org.apache.flink.types.StringValue.readString(StringValue.java:770)
at
org.apache.flink.api.common.typeutils.base.StringSerializer.deserialize(StringSerializer.java:69)
at
org.apache.flink.api.common.typeutils.base.StringSerializer.deserialize(StringSerializer.java:28)
at
org.apache.flink.api.java.typeutils.runtime.TupleSerializer.deserialize(TupleSerializer.java:136)
at
org.apache.flink.api.java.typeutils.runtime.TupleSerializer.deserialize(TupleSerializer.java:30)
at
org.apache.flink.queryablestate.client.state.serialization.KvStateSerializer.deserializeKeyAndNamespace(KvStateSerializer.java:94)
... 9 more

at
org.apache.flink.queryablestate.server.KvStateServerHandler.handleRequest(KvStateServerHandler.java:98)
at
org.apache.flink.queryablestate.server.KvStateServerHandler.handleRequest(KvStateServerHandler.java:49)
at
org.apache.flink.queryablestate.network.AbstractServerHandler$AsyncRequestTask.run(AbstractServerHandler.java:229)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)

at
org.apache.flink.queryablestate.network.AbstractServerHandler$AsyncRequestTask.lambda$run$0(AbstractServerHandler.java:266)
at
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
at
java.util.concurrent.CompletableFuture.uniWhenCompleteStage(CompletableFuture.java:778)
at
java.util.concurrent.CompletableFuture.whenComplete(CompletableFuture.java:2140)
at
org.apache.flink.queryablestate.network.AbstractServerHandler$AsyncRequestTask.run(AbstractServerHandler.java:229)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)

at
java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
at
java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
at
java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:593)
at
java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
at
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at
java.util.concurrent.CompletableFuture.complete

What if not to keep containers across attempts in HA setup?

2018-11-12 Thread Paul Lam
Hi,

Recently I found a bug on our YARN cluster that crashes the standby RM during a 
RM failover, and 
the bug is triggered by the keeping containers across attempts behavior of 
applications (see [1], a related 
issue but the patch is not exactly the fix, because the problem is not on 
recovery, but the attempt after 
the recovery).

Since YARN is a fundamental component and a maintenance of it would affect a 
lot users, as a last resort
I wonder if we could modify YarnClusterDescriptor and not to keep containers 
across attempts. 

IMHO, Flink application’s state is not dependent on YARN, so there is no state 
that must be recovered 
from the previous application attempt. In case of a application master failure, 
the taskmanagers can be 
shutdown and the cost is longer recovery time.

Please correct me if I’m wrong. Thank you!

[1]https://issues.apache.org/jira/browse/YARN-2823 


Best,
Paul Lam

Re: Task Manager allocation issue when upgrading 1.6.0 to 1.6.2

2018-11-12 Thread Cliff Resnick
Hi Till,

Yes, it turns out the problem was
having flink-queryable-state-runtime_2.11-1.6.2.jar in flink/lib. I guess
Queriable State bootstraps itself and, in my situation, it brought the task
manager down when it found no available ports. What's a little troubling is
that I had not configured Queriable State at all, so I would not expect it
to get in the way. I haven't looked further into it but I think that if
Queriable State wants to enable itself then it should at worst take an
unused port by default, especially since many folks will be running in
shared environments like YARN.

But anyway, thanks for that! I'm now up with 1.6.2.

Cliff

On Mon, Nov 12, 2018 at 6:04 AM Till Rohrmann  wrote:

> Hi Cliff,
>
> the TaskManger fail to start with exit code 31 which indicates an
> initialization error on startup. If you check the TaskManager logs via
> `yarn logs -applicationId ` you should see the problem why the TMs
> don't start up.
>
> Cheers,
> Till
>
> On Fri, Nov 9, 2018 at 8:32 PM Cliff Resnick  wrote:
>
>> Hi Till,
>>
>> Here are Job Manager logs, same job in both 1.6.0 and 1.6.2 at DEBUG
>> level. I saw several errors in 1.6.2, hope it's informative!
>>
>> Cliff
>>
>> On Fri, Nov 9, 2018 at 8:34 AM Till Rohrmann 
>> wrote:
>>
>>> Hi Cliff,
>>>
>>> this sounds not right. Could you share the logs of the Yarn cluster
>>> entrypoint with the community for further debugging? Ideally on DEBUG
>>> level. The Yarn logs would also be helpful to fully understand the problem.
>>> Thanks a lot!
>>>
>>> Cheers,
>>> Till
>>>
>>> On Thu, Nov 8, 2018 at 9:59 PM Cliff Resnick  wrote:
>>>
 I'm running a YARN cluster of 8 * 4 core instances = 32 cores, with a
 configuration of 3 slots per TM. The cluster is dedicated to a single job
 that runs at full capacity in "FLIP6" mode. So in this cluster, the
 parallelism is 21 (7 TMs * 3, one container dedicated for Job Manager).

 When I run the job in 1.6.0, seven Task Managers are spun up as
 expected. But if I run with 1.6.2 only four Task Managers spin up and the
 job hangs waiting for more resources.

 Our Flink distribution is set up by script after building from source.
 So aside from flink jars, both 1.6.0 and 1.6.2 directories are identical.
 The job is the same, restarting from savepoint. The problem is repeatable.

 Has something changed in 1.6.2, and if so can it be remedied with a
 config change?








Kinesis Shards and Parallelism

2018-11-12 Thread shkob1
Looking at the doc
https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/kinesis.html
"When the number of shards is larger than the parallelism of the consumer,
then each consumer subtask can subscribe to multiple shards; otherwise if
the number of shards is smaller than the parallelism of the consumer, then
some consumer subtasks will simply be idle and wait until it gets assigned
new shards"

I have the *same number of shards as the configured parallelism*. Seems
though a task is grabbing multiple shards while others are idle. is that an
expected behavior?



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Re: Linkage error when using DropwizardMeterWrapper

2018-11-12 Thread Jayant Ameta
Nevermind. Relocating the dropwizard packages using maven shade plugin
fixed it.


Re: How to run Flink 1.6 job cluster in "standalone" mode?

2018-11-12 Thread Hao Sun
Hi Tim, I am trying to debug this issue
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/java-io-IOException-NSS-is-already-initialized-td24247.html

And in general, how to debug code in a distributed JM/TM architecture is
very interesting topic to me.
Any hints will be appreciated. Thanks

On Mon, Nov 12, 2018 at 7:12 AM Timo Walther  wrote:

> Hi,
>
> a session cluster does not imply that JM + TM are always executed in the
> same JVM. Debugging a job running on different JVMs might be a bit more
> difficult to debug but it should still be straightforward.
>
> Maybe you can tell us what wrong behavior you observe?
>
> Btw. Flink's metrics can also already be quite helpful.
>
> Regards,
> Timo
>
> Am 07.11.18 um 14:15 schrieb Hao Sun:
> > "Standalone" here I mean job-mananger + taskmanager on the same JVM. I
> > have an issue to debug on our K8S environment, I can not reproduce it
> > in local docker env or Intellij. If JM and TM are running in different
> > VMs, it makes things harder to debug.
> >
> > Or is there a way to debug a job running on JM + TM on different VMs?
> > Is reverting to session cluster the only way to get JM + TM on the
> > same VM?
>
>
>


Re: Rich variant for Async IO in Scala

2018-11-12 Thread Timo Walther

Hi Bruno,

`org.apache.flink.streaming.api.functions.async.RichAsyncFunction` 
should also work for the Scala API. `RichMapFunction` or 
`RichFilterFunction` are also shared between both APIs.


Is there anything that blocks you from using it?

Regards,
Timo

Am 09.11.18 um 01:38 schrieb Bruno Aranda:

Hi,

I see that the AsyncFunction for Scala does not seem to have a rich 
variant like the Java one. Is there a particular reason for this? Is 
there any workaround?


Thanks!

Bruno





Re: flink job restarts when flink cluster restarts?

2018-11-12 Thread Timo Walther

Hi,

by default all the metadata is lost when shutting down the JobManager in 
a non high available setup. Flink uses Zookeeper together with a 
distributed filesystem to store the required metadata [1] in a 
persistent and distributed manner.


A single node setup is rather uncommon, but you can also start Zookeeper 
locally as it is done in our end-to-end tests [2].


I hope this helps.

Regards,
Timo

[1] 
https://ci.apache.org/projects/flink/flink-docs-master/ops/jobmanager_high_availability.html
[2] 
https://github.com/apache/flink/blob/master/flink-end-to-end-tests/test-scripts/test_ha_datastream.sh



Am 08.11.18 um 14:15 schrieb Chang Liu:
Or to say, how can I keep the jobs for system patching, server 
restart, etc. Is it related to Standalone vs YARN? Or is it related to 
whether to use Zookeeper?


Many thanks!

Best regards/祝好,

Chang Liu 刘畅


On 8 Nov 2018, at 13:38, Chang Liu > wrote:


Thanks!

If I have a cluster more than one node (standalone or YRAN), can I 
stop and start any single node among them and keep the job running?


Best regards/祝好,

Chang Liu 刘畅


On 7 Nov 2018, at 16:17, 秦超峰 <18637156...@163.com 
> wrote:


the second



秦超峰
邮箱:windyqinchaof...@163.com

 



签名由 网易邮箱大师  
定制


On 11/07/2018 17:14, Chang Liu  wrote:

Hi,

I have a question regarding whether the current running job will
restart if I stop and start the flink cluster?

1. Let’s say I am just having a Standalone one node cluster.
2. I have several Flink jobs already running on the cluster.
3. If I do a bin/cluster-stop.sh and then do a
bin/cluster-start.sh, will be previously running job restart again?

OR

Before I do bin/cluster-stop.sh, I have to do Savepoints for
each of the job.
After bin/cluster-start.sh is finished, I have to do Start Job
based on Savepoints triggered before for each of the job I want
to restart.

Many thanks in advance :)

Best regards/祝好,

Chang Liu 刘畅










Re: Multiple operators to the same sink

2018-11-12 Thread Timo Walther

Hi,

I'm not quite sure if I understand your problem correctly. But your use 
case sounds like a typical application of a union operation.


What do you mean with "knowledge of their destination sink"? The 
operators don't need to be aware of the destination sink. The only thing 
that needs to be coordinated is the result data type of each operation. 
So either you force each operation to have a unified type or you create 
a unified type before declaring the sink.


Or is every operator an independent Flink job? Maybe you can show us a 
skeleton of your pipeline?


Regards,
Timo


Am 08.11.18 um 01:20 schrieb burgesschen:

Hi Guys! I'm designing a topology where multiple operators should forward the
messages to the same sink.


For example I have Operator A,B,C,D,E. I want A,B,C to forward to Sink1 and
D, E to forward to Sink2.

My options are

1. Union A, B and C. then add Sink1 to them. Similarly for D and E. However,
the current framework out team has builds each operator individually. There
is nothing outside of the operators
that has the knowledge of their destination sink. It means we need to build
something on the job level to union the operators.

2. have each operator output to a side output tag. A,B, and C will output to
tag "sink1", And have a singleton sink1 to consume from tag "sink1".
Similarly for sink2. My concern here is that 'it feels hacky', since those
messages are not really side outputs.

is this a legitimate use case for output tag or not? Is there a better way
to achieve this? Thank you!






--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/





Flink with parallelism 3 is running locally but not on cluster

2018-11-12 Thread zavalit
Hi,
may be i just missing smth, but i just have no more ideas where to look.

here is an screen of the failed state

 

i read messages from 2 sources, make a join based on a common key and sink
it all in a kafka.

  val env = StreamExecutionEnvironment.getExecutionEnvironment
  env.setParallelism(3)
  ...
  source1
 .keyBy(_.searchId)
 .connect(source2.keyBy(_.searchId))
 .process(new SearchResultsJoinFunction)
 .addSink(KafkaSink.sink)

so it perfectly works when launch it locally. when i deploy it to 1 job
manager and 3 taskmanagers and get every Task in "RUNNING" state, after 2
minutes (when nothing is comming to sink) one of the taskmanagers gets
following in log:

 Flat Map (1/3) (9598c11996f4b52a2e2f9f532f91ff66) switched from RUNNING to
FAILED.
java.io.IOException: Connecting the channel failed: Connecting to remote
task manager + 'flink-taskmanager-11-dn9cj/10.81.27.84:37708' has failed.
This might indicate that the remote task manager has been lost.
at
org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactory$ConnectingChannel.waitForChannel(PartitionRequestClientFactory.java:196)
at
org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactory$ConnectingChannel.access$000(PartitionRequestClientFactory.java:133)
at
org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactory.createPartitionRequestClient(PartitionRequestClientFactory.java:85)
at
org.apache.flink.runtime.io.network.netty.NettyConnectionManager.createPartitionRequestClient(NettyConnectionManager.java:60)
at
org.apache.flink.runtime.io.network.partition.consumer.RemoteInputChannel.requestSubpartition(RemoteInputChannel.java:166)
at
org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.requestPartitions(SingleInputGate.java:494)
at
org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.getNextBufferOrEvent(SingleInputGate.java:525)
at
org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.getNextBufferOrEvent(SingleInputGate.java:508)
at
org.apache.flink.streaming.runtime.io.BarrierTracker.getNextNonBlocked(BarrierTracker.java:94)
at
org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:209)
at
org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:105)
at
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:300)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
at java.lang.Thread.run(Thread.java:748)
Caused by:
org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException:
Connecting to remote task manager +
'flink-taskmanager-11-dn9cj/10.81.27.84:37708' has failed. This might
indicate that the remote task manager has been lost.
at
org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactory$ConnectingChannel.operationComplete(PartitionRequestClientFactory.java:219)
at
org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactory$ConnectingChannel.operationComplete(PartitionRequestClientFactory.java:133)
at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:511)
at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:504)
at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:483)
at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:424)
at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:121)
at
org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe$1.run(AbstractNioChannel.java:269)
at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.PromiseTask$RunnableAdapter.call(PromiseTask.java:38)
at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:125)
at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404)
at
org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:463)
at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:884)
... 1 more
Caused by:
org.apache.flink.shaded.netty4.io.netty.channel.ConnectTimeoutException:
connection timed out: flink-taskmanager-11-dn9cj/10.81.27.84:37708

Re: Implementation error: Unhandled exception - "Implementation error: Unhandled exception."

2018-11-12 Thread Timo Walther

Hi Richard,

this sounds like a bug to me. I will loop in Till (in CC) who might know 
more about this.


Regards,
Timo


Am 07.11.18 um 20:35 schrieb Richard Deurwaarder:

Hello,

We have a flink job / cluster running in kubernetes. Flink 1.6.2 (but 
the same happens in 1.6.0 and 1.6.1) To upgrade our job we use the 
REST API.


Every so often the jobmanager seems to be stuck in a crashing state 
and the logs show me this stack trace:


2018-11-07 18:43:05,815 [flink-scheduler-1] ERROR 
org.apache.flink.runtime.rest.handler.cluster.ClusterOverviewHandler - 
Implementation error: Unhandled exception.
akka.pattern.AskTimeoutException: Ask timed out on 
[Actor[akka://flink/user/dispatcher#1016927511]] after [1 ms]. 
Sender[null] sent message of type 
"org.apache.flink.runtime.rpc.messages.Implementation error: Unhandled 
exception.".
at 
akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)

at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)
at 
scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
at 
scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
at 
scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
at 
akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)
at 
akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280)
at 
akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284)
at 
akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236)

at java.lang.Thread.run(Thread.java:748)

If I restart the jobmanager everything is fine afterwards, but the 
jobmanager will not restart by itself.


What might've caused this and is this something we can prevent?

Richard





Re: InterruptedException when async function is cancelled

2018-11-12 Thread Timo Walther

Hi Anil,

if I researched correctly we are talking about these changes [1]. I 
don't know if you can back port it, but I hope this helps.


Regards,
Timo

[1] https://issues.apache.org/jira/browse/FLINK-9304


Am 07.11.18 um 17:41 schrieb Anil:

Hi Till,
 Thanks for the reply. Is there any particular patch I can use as
upgrading to Flink 1.6 is not an option for me at the moment.
Regards,
Anil.



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/





Re: flink-1.6.2 in standalone-job mode | Cluster initialization failed.

2018-11-12 Thread zavalit
jepp, that was the Issue.
tnx a lot.



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Re: flink run from savepoint

2018-11-12 Thread Timo Walther

Hi Franck,

as a first hint: paths are hard-coded in the savepoint's metadata so you 
should make sure that the path is still the same and accessible by all 
JobManagers and TaskManagers.


Can you share logs with us to figure out what caused the internal server 
error?


Thanks,
Timo


Am 07.11.18 um 17:34 schrieb Cussac, Franck:


Hi,

I’m working with Flink 1.5.0 and I try to run a job from a savepoint. 
My jobmanager is dockerized and I try to run my flink job in another 
container.


The command :

flink run -m jobmanager:8081 myJar.jar

works fine, but when I try to run a job from a savepoint, I got  an 
Internal server error.


Here my command to run flink job and the stacktrace :

flink run -m jobmanager:8081 -s file:/tmp/test/savepoint/ myJar.jar

Starting execution of program



The program finished with the following exception:

org.apache.flink.client.program.ProgramInvocationException: Could not 
retrieve the execution result.


at 
org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:258)


at 
org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:464)


at 
org.apache.flink.streaming.api.environment.StreamContextEnvironment.execute(StreamContextEnvironment.java:66)


at 
org.apache.flink.streaming.api.scala.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.scala:654)


at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)


at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)


at java.lang.reflect.Method.invoke(Method.java:498)

at 
org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:528)


at 
org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:420)


at 
org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:404)


at 
org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:781)


at 
org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:275)


at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:210)

at 
org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1020)


at 
org.apache.flink.client.cli.CliFrontend.lambda$main$9(CliFrontend.java:1096)


at 
org.apache.flink.runtime.security.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30)


at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1096)

Caused by: org.apache.flink.runtime.client.JobSubmissionException: 
Failed to submit JobGraph.


at 
org.apache.flink.client.program.rest.RestClusterClient.lambda$submitJob$5(RestClusterClient.java:357)


at 
java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)


at 
java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852)


at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)


at 
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)


at 
org.apache.flink.runtime.concurrent.FutureUtils.lambda$retryOperationWithDelay$5(FutureUtils.java:214)


at 
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)


at 
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)


at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)


at 
java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:561)


at 
java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:929)


at 
java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)


at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)


at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)


at java.lang.Thread.run(Thread.java:748)

Caused by: java.util.concurrent.CompletionException: 
org.apache.flink.runtime.concurrent.FutureUtils$RetryException: Could 
not complete the operation. Exception is not retryable.


at 
java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:326)


at 
java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:338)


at 
java.util.concurrent.CompletableFuture.uniRelay(CompletableFuture.java:911)


at 
java.util.concurrent.CompletableFuture$UniRelay.tryFire(CompletableFuture.java:899)


... 12 more

Caused by: 
org.apache.flink.runtime.concurrent.FutureUtils$RetryException: Could 
not complete the operation. Exception is not retryable.


... 10 more

Caused by: java.util.concurrent.CompletionException: 
org.apache.flink.runtime.rest.util.RestClientException: [Internal 
server error.]


at 
java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:326)


at 
java.util.concurrent.CompletableFuture.completeRelay(Completable

Flink auth against Zookeeper with MD5-Digest

2018-11-12 Thread Laura Uzcátegui
Hi,

 I was wondering if there is any plans in the near future to include
support for another  authentication mechanism different than Kerberos? such
as MD5-Digest ?

Cheers,


Re: Error with custom `SourceFunction` wrapping `InputFormatSourceFunction` (Flink 1.6)

2018-11-12 Thread Aaron Levin
Hi Aljoscha,

Thanks! I will look into this.

Best,

Aaron Levin

On Fri, Nov 9, 2018 at 5:01 AM, Aljoscha Krettek 
wrote:

> Hi,
>
> I think for this case a model that is similar to how the Streaming File
> Source works should be good. You can have a look at
> ContinuousFileMonitoringFunction and ContinuousFileReaderOperator. The
> idea is that the first emits splits that should be processed and the second
> is responsible for reading those splits. A generic version of that is what
> I'm proposing for the refactoring of our source interface [1] that also
> comes with a prototype implementation [2].
>
> I think something like this should be adaptable to your case. The split
> enumerator would at first only emit file splits downstream, after that it
> would emit Kafka partitions that should be read. The split reader would
> understand both file splits and kafka partitions and can read from both.
> This still has some kinks to be worked out when it comes to watermarks,
> FLIP-27 is not finished.
>
> What do you think?
>
> Best,
> Aljoscha
>
> [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-
> 27%3A+Refactor+Source+Interface
> 
> [2] https://github.com/aljoscha/flink/commits/refactor-source-interface
>
>
> On 1. Nov 2018, at 16:50, Aaron Levin  wrote:
>
> Hey,
>
> Thanks for reaching out! I'd love to take a step back and find a better
> solution, so I'll try to be succint in what I'm trying to accomplish:
>
> We're trying to write a SourceFunction which:
> * reads some Sequence files from S3 in a particular order (each task gets
> files in a specific order).
> * sends a watermark between each sequence file
> * when that's complete, starts reading from Kafka topics.
> * (This is similar to the bootstrap problem which Lyft has talked about
> (see: https://www.slideshare.net/FlinkForward/flink-forward-
> san-francisco-2018-gregory-fee-bootstrapping-state-in-apache-flink))
>
> The current solution I have involves a custom InputFormat, InputSplit, and
> SplitAssignor. It achieves most of these requirements, except I have to
> extend InputFormatSourceFunction. I have a class that looks like:
>
> class MySourceFunction(val s3Archives: CustomInputFormat, val kafka:
> KafkaBase) extends InputFormatSourceFunction(s3Archives, typestuff) {...}
>
> There are lots I don't like about the existing solution:
> * I have to extend InputFormatSourceFunction to ensure the graph is
> initialized properly (the bug I wrote about)
> * I had to replicate most of the implementation of
> InputFormatSourceFunction so I could insert Watermarks between splits.
>
> I'd love any suggestions around improving this!
>
> Best,
>
> Aaron Levin
>
> On Thu, Nov 1, 2018 at 10:41 AM, Aljoscha Krettek 
> wrote:
>
>> Hi Aaron,
>>
>> I'l like to take a step back and understand why you're trying to wrap an
>> InputFormatSourceFunction?
>>
>> In my opinion, InputFormatSourceFunction should not be used because it
>> has some shortcomings, the most prominent among them that it does not
>> support checkpointing, i.e. in case of failure all data will (probably) be
>> read again. I'm saying probably because the interaction of
>> InputFormatSourceFunction with how InputSplits are generated (which relates
>> to that code snippet with the cast you found) could be somewhat "spooky"
>> and lead to weird results in some cases.
>>
>> The interface is a remnant of a very early version of the streaming API
>> and should probably be removed soon. I hope we can find a better solution
>> for your problem that fits better with Flink.
>>
>> Best,
>> Aljoscha
>>
>> On 1. Nov 2018, at 15:30, Aaron Levin  wrote:
>>
>> Hey Friends! Last ping and I'll move this over to a ticket. If anyone can
>> provide any insight or advice, that would be helpful!
>>
>> Thanks again.
>>
>> Best,
>>
>> Aaron Levin
>>
>> On Fri, Oct 26, 2018 at 9:55 AM, Aaron Levin 
>> wrote:
>>
>>> Hey,
>>>
>>> Not sure how convo threading works on this list, so in case the folks
>>> CC'd missed my other response, here's some more info:
>>>
>>> First, I appreciate everyone's help! Thank you!
>>>
>>> I wrote several wrappers to try and debug this, including one which is
>>> an exact copy of `InputFormatSourceFunction` which also failed. They all
>>> failed with the same error I detail above. I'll post two of them below.
>>> They all extended `RichParallelSourceFunction` and, as far as I could tell,
>>> were properly initialized (though I may have missed something!).
>>> Additionally, for the two below, if I change `extends
>>> RichParallelSourceFunction` to `extends InputFormatSourceFunction(...,...)`,
>>> I no longer receive the exception. This is what led me to believe the
>>> source of the issue was casting and how I found the line of code where the
>>> stream graph is given the input format.
>>>
>>> Quick explanation of the wrappers:
>>> 1. `WrappedInputFormat` does a basic wrap around
>>> `InputFormatSourceFuncti

Re: Run a Flink job: REST/ binary client

2018-11-12 Thread Timo Walther

I will loop in Chesnay. He might know more about the REST service internals.

Timo

Am 07.11.18 um 16:15 schrieb Flavio Pompermaier:
After a painful migration to Flink 1.6.2 we were able to run one of 
the jobs.
Unfortunately we faced the same behaviour: all the code after the 
first env.execute() is not execute if the job is called from the REST 
services or from the web UI, while everything works fine if running 
the job using 'bin/flink run' from a shell.


Any solution to this?

On Tue, Nov 6, 2018 at 4:55 PM Flavio Pompermaier 
mailto:pomperma...@okkam.it>> wrote:


Hi to all,
I'm using Flink 1.3.2. If executing a job using bin/flink run
everything goes well.
If executing using REST service of job manager (/jars:jarid/run)
the job writes to the sink but fails to return on env.execute()
and all the code after it is not executed.

Is this a known issue? Was it resolved in Flink 1.6.2?

Best,
Flavio






Re: Report failed job submission

2018-11-12 Thread Timo Walther
I assume you are using the REST API? Flink's RestClusterClient is able 
to deserialize the exception including its cause that might be more 
helpful in your case as well.


The entire exception should be queryable via execution result [1]. At 
least we get a better error using the SQL Client [2] as an example.


Regards,
Timo

[1] 
https://ci.apache.org/projects/flink/flink-docs-master/monitoring/rest_api.html#jobs-jobid-execution-result
[2] 
https://github.com/apache/flink/blob/master/flink-libraries/flink-sql-client/src/main/java/org/apache/flink/table/client/gateway/local/ProgramDeployer.java 




Am 12.11.18 um 16:00 schrieb Flavio Pompermaier:
Let's say that my job needs to do some check before running (like 
existence of a file or some other condition): at the moment I can only 
throw an Exception but on the client side you get only something 
like: {"errors":["org.apache.flink.client.program.ProgramInvocationException: 
The main method caused an error."]}


I was wondering if there is any better way to handle this kind of 
problems..


On Mon, Nov 12, 2018 at 3:53 PM Timo Walther > wrote:


Hi Flavio,

I'm not entirely sure if I get your question correct but what you
are looking for is more information (like categorization) why the
submission failed right?

Regards,
Timo


Am 06.11.18 um 14:33 schrieb Flavio Pompermaier:

Any idea about how to address this issue?
On Tue, Oct 16, 2018 at 11:32 AM Flavio Pompermaier
mailto:pomperma...@okkam.it>> wrote:

Hi to all,
which is the correct wat to report back to the user a failure
from a job submission in FLink?
If everything is OK the job run API returns the job id but
what if there are error in parameter validation or some other
problem?
Which is the right way to report back to the user the job
error detail (apart from throwing an Exception)?

Best,
Flavio










Re: How to run Flink 1.6 job cluster in "standalone" mode?

2018-11-12 Thread Timo Walther

Hi,

a session cluster does not imply that JM + TM are always executed in the 
same JVM. Debugging a job running on different JVMs might be a bit more 
difficult to debug but it should still be straightforward.


Maybe you can tell us what wrong behavior you observe?

Btw. Flink's metrics can also already be quite helpful.

Regards,
Timo

Am 07.11.18 um 14:15 schrieb Hao Sun:
"Standalone" here I mean job-mananger + taskmanager on the same JVM. I 
have an issue to debug on our K8S environment, I can not reproduce it 
in local docker env or Intellij. If JM and TM are running in different 
VMs, it makes things harder to debug.


Or is there a way to debug a job running on JM + TM on different VMs?
Is reverting to session cluster the only way to get JM + TM on the 
same VM?





Re: Report failed job submission

2018-11-12 Thread Flavio Pompermaier
Let's say that my job needs to do some check before running (like existence
of a file or some other condition): at the moment I can only throw an
Exception but on the client side you get only something
like: {"errors":["org.apache.flink.client.program.ProgramInvocationException:
The main method caused an error."]}

I was wondering if there is any better way to handle this kind of problems..

On Mon, Nov 12, 2018 at 3:53 PM Timo Walther  wrote:

> Hi Flavio,
>
> I'm not entirely sure if I get your question correct but what you are
> looking for is more information (like categorization) why the submission
> failed right?
>
> Regards,
> Timo
>
>
> Am 06.11.18 um 14:33 schrieb Flavio Pompermaier:
>
> Any idea about how to address this issue?
> On Tue, Oct 16, 2018 at 11:32 AM Flavio Pompermaier 
> wrote:
>
>> Hi to all,
>> which is the correct wat to report back to the user a failure from a job
>> submission in FLink?
>> If everything is OK the job run API returns the job id but what if there
>> are error in parameter validation or some other problem?
>> Which is the right way to report back to the user the job error detail
>> (apart from throwing an Exception)?
>>
>> Best,
>> Flavio
>>
>
>
>
>


Re: Report failed job submission

2018-11-12 Thread Timo Walther

Hi Flavio,

I'm not entirely sure if I get your question correct but what you are 
looking for is more information (like categorization) why the submission 
failed right?


Regards,
Timo


Am 06.11.18 um 14:33 schrieb Flavio Pompermaier:

Any idea about how to address this issue?
On Tue, Oct 16, 2018 at 11:32 AM Flavio Pompermaier 
mailto:pomperma...@okkam.it>> wrote:


Hi to all,
which is the correct wat to report back to the user a failure from
a job submission in FLink?
If everything is OK the job run API returns the job id but what if
there are error in parameter validation or some other problem?
Which is the right way to report back to the user the job error
detail (apart from throwing an Exception)?

Best,
Flavio







Linkage error when using DropwizardMeterWrapper

2018-11-12 Thread Jayant Ameta
java.lang.LinkageError: loader constraint violation: when resolving method
"org.apache.flink.dropwizard.metrics.DropwizardMeterWrapper.(Lcom/codahale/metrics/Meter;)V"
the class loader (instance of
org/apache/flink/runtime/execution/librarycache/FlinkUserCodeClassLoaders$ChildFirstClassLoader)
of the current class, com/test/PatternMatcher, and the class loader
(instance of sun/misc/Launcher$AppClassLoader) for the method's defining
class, org/apache/flink/dropwizard/metrics/DropwizardMeterWrapper, have
different Class objects for the type com/codahale/metrics/Meter used in the
signature

Jayant Ameta


Re: FLINK Kinesis Connector Error - ProvisionedThroughputExceededException

2018-11-12 Thread Rafi Aroch
Hi Steve,

We've encountered this also. We have way more than enough shards, but were
still getting exceptions.
We think we know what is the reason, we would love for someone to
approve/reject.

What we suspect is happening is as follows:

The KPL's RateLimit parameter is tracking the amount of bytes/records
written into a specific shard.
If the parallelism of your Sink is >1 (which is probably the case),
multiple tasks == multiple KPL instances which may be writing to the same
shard.
So for each individual KPL the RateLimit is not breached, but if multiple
parallel tasks are writing to the same shard the RateLimit gets breached
and a ProvisionedThroughputExceededException is being thrown.

What we've tried:

   - Using a random partition key to spread the load evenly between the
   shards. This did not work for us...
   - We tried to make records being written to the same shards by the same
   KPL instance, so the RateLimit would get enforced. We did a keyBy before
   the Sink to ensure same records go to the same task and using the same
   keyBy logic as the Kinesis partitionKey. This did not work for us...

What solved it eventually:

Reducing the parallelism of the FlinkKinesisProducer to 1. We also set a
queueSize so that we'll get back-pressured in case of high load (without
getting ProvisionedThroughputExceededException exceptions). This solved the
problem and currently is not a bottleneck for us, but can be soon. So this
is not a real solution.

Can anyone suggest a better solution? Approve/reject our assumption?

Thanks
Rafi


On Sat, Nov 10, 2018, 03:02 shkob1  If it's running in parallel aren't you just adding readers which maxes out
> your provisioned throughput? probably doesn't belong in here but rather a
> Kinesis thing, but i suggest increasing your number of shards?
>
>
>
> --
> Sent from:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>


Re: Understanding checkpoint behavior

2018-11-12 Thread Timo Walther

Hi,

do you observe such long checkpoint times also without performing 
external calls? If not, I guess the communication to the external system 
is flaky.


Maybe you have to rethink how you perform such calls in order to make 
the pipeline more robust against these latencies. Flink also offers an 
async operator [1] for exactly such cases, this could be worth a look.


Regards,
Timo

[1] 
https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/stream/asyncio.html



Am 05.11.18 um 18:52 schrieb PranjalChauhan:

Hi,

I am new Fink user and currently, using Flink 1.2.1 version. I am trying to
understand how checkpoints actually work when Window operator is processing
events.

My pipeline has the following flow where each operator's parallelism is 1.
source -> flatmap -> tumbling window -> sink
In this pipeline, I had configured the window to be evaluated every 1 hour
(3600 seconds) and the checkpoint interval was 5 mins. The checkpoint
timeout was set to 1 hour as I wanted the checkpoints to complete.

In my window function, the job makes https call to another service so window
function may take some time to evaluate/process all events.

Please refer the following image. In this case, the window was triggered at
23:00:00. Checkpoint 12 was triggered soon after that and I notice that
checkpoint 12 takes long time to complete (compared to other checkpoints
when window function is not processing events).


Following images shows checkpoint 12 details of window & sink operators.



I see that the time spent for checkpoint was actually just 5 ms & 8 ms
(checkpoint duration sync) for window & sink operators. However, End to End
Duration for checkpoint was 11m 12s for both window & sink operator.

Is this expected behavior? If yes, do you have any suggestion to reduce the
end to end checkpoint duration?

Please let me know if any more information is needed.

Thanks.



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/





Re: Task Manager allocation issue when upgrading 1.6.0 to 1.6.2

2018-11-12 Thread Till Rohrmann
Hi Cliff,

the TaskManger fail to start with exit code 31 which indicates an
initialization error on startup. If you check the TaskManager logs via
`yarn logs -applicationId ` you should see the problem why the TMs
don't start up.

Cheers,
Till

On Fri, Nov 9, 2018 at 8:32 PM Cliff Resnick  wrote:

> Hi Till,
>
> Here are Job Manager logs, same job in both 1.6.0 and 1.6.2 at DEBUG
> level. I saw several errors in 1.6.2, hope it's informative!
>
> Cliff
>
> On Fri, Nov 9, 2018 at 8:34 AM Till Rohrmann  wrote:
>
>> Hi Cliff,
>>
>> this sounds not right. Could you share the logs of the Yarn cluster
>> entrypoint with the community for further debugging? Ideally on DEBUG
>> level. The Yarn logs would also be helpful to fully understand the problem.
>> Thanks a lot!
>>
>> Cheers,
>> Till
>>
>> On Thu, Nov 8, 2018 at 9:59 PM Cliff Resnick  wrote:
>>
>>> I'm running a YARN cluster of 8 * 4 core instances = 32 cores, with a
>>> configuration of 3 slots per TM. The cluster is dedicated to a single job
>>> that runs at full capacity in "FLIP6" mode. So in this cluster, the
>>> parallelism is 21 (7 TMs * 3, one container dedicated for Job Manager).
>>>
>>> When I run the job in 1.6.0, seven Task Managers are spun up as
>>> expected. But if I run with 1.6.2 only four Task Managers spin up and the
>>> job hangs waiting for more resources.
>>>
>>> Our Flink distribution is set up by script after building from source.
>>> So aside from flink jars, both 1.6.0 and 1.6.2 directories are identical.
>>> The job is the same, restarting from savepoint. The problem is repeatable.
>>>
>>> Has something changed in 1.6.2, and if so can it be remedied with a
>>> config change?
>>>
>>>
>>>
>>>
>>>
>>>


Re: Any examples on invoke the Flink REST API post method ?

2018-11-12 Thread 远远
hi,
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/cancel-with-savepoint-404-Not-Found-td19227.html
it may help you.
and for flink on yarn cancel job , "yarn-cancel" work well not "cancel"
the below python code for trigging savepoint work well.

json = {"cancel-job": False}
r = requests.post(url, json=json)


Gary Yao  于2018年11月12日周一 下午5:33写道:

> Hi Henry,
>
> What you see in the API documentation is a schema definition and not a
> sample
> request. The request body should be:
>
> {
> "target-directory": "hdfs:///flinkDsl",
> "cancel-job": false
> }
>
> Let me know if that helps.
>
> Best,
> Gary
>
> On Mon, Nov 12, 2018 at 7:15 AM vino yang  wrote:
>
>> Hi Henry,
>>
>> Maybe Gary can help you, ping him for you.
>>
>> Thanks, vino.
>>
>> 徐涛  于2018年11月12日周一 下午12:45写道:
>>
>>> HI Experts,
>>> I am trying to trigger a savepoint from Flink REST API on version 1.6 ,
>>> in the document it shows that I need to pass a json as a request body
>>> {
>>>  "type" : "object”,
>>>   "id" :
>>> "urn:jsonschema:org:apache:flink:runtime:rest:messages:job:savepoints:SavepointTriggerRequestBody”,
>>>  "properties" : {
>>>  "target-directory" : { "type" : "string" },
>>>  "cancel-job" : { "type" : "boolean" }
>>>  }
>>> }
>>> So I send the following json as
>>> {
>>> "type":"object”,
>>>
>>> "id":"urn:jsonschema:org:apache:flink:runtime:rest:messages:job:savepoints:SavepointTriggerRequestBody”,
>>> "properties”:{
>>> "target-directory":"hdfs:///flinkDsl”,
>>> "cancel-job”:false
>>> }
>>> }
>>>
>>> And I use okhttp to send the request:
>>> val MEDIA_TYPE_JSON = MediaType.parse("application/json; charset=utf-8")
>>> val body = RequestBody.create(MEDIA_TYPE_JSON, postBody)
>>> val request = new Request.Builder()
>>>   .url(url)
>>>   .post(body)
>>>   .build()
>>> client.newCall(request).execute()
>>>
>>>
>>> but get an error  {"errors":["Request did not match expected format
>>> SavepointTriggerRequestBody.”]}
>>> Would anyone give an example of how to invoke the post rest api of Flink?
>>> Thanks a lot.
>>>
>>> Best
>>> Henry
>>>
>>


Auto/Dynamic scaling in Flink

2018-11-12 Thread Nauroz Khan Nausherwani
Dear Flink Contributors and users,

I am a PhD student and I was interested to know, using which matrices, and when 
does Flink performs scaling-in or scaling out of resources? I did search the 
flink's website where I could only find information about how dynamic scaling 
is performed in stateless or stateful operator.  It would be interesting to 
know which matrices Flink uses, and when actually Flink triggers auto-scaling.

Any link or reference paper with required information is appreciated.

best regards,
Nauroz


Re: Any examples on invoke the Flink REST API post method ?

2018-11-12 Thread Gary Yao
Hi Henry,

What you see in the API documentation is a schema definition and not a
sample
request. The request body should be:

{
"target-directory": "hdfs:///flinkDsl",
"cancel-job": false
}

Let me know if that helps.

Best,
Gary

On Mon, Nov 12, 2018 at 7:15 AM vino yang  wrote:

> Hi Henry,
>
> Maybe Gary can help you, ping him for you.
>
> Thanks, vino.
>
> 徐涛  于2018年11月12日周一 下午12:45写道:
>
>> HI Experts,
>> I am trying to trigger a savepoint from Flink REST API on version 1.6 ,
>> in the document it shows that I need to pass a json as a request body
>> {
>>  "type" : "object”,
>>   "id" :
>> "urn:jsonschema:org:apache:flink:runtime:rest:messages:job:savepoints:SavepointTriggerRequestBody”,
>>  "properties" : {
>>  "target-directory" : { "type" : "string" },
>>  "cancel-job" : { "type" : "boolean" }
>>  }
>> }
>> So I send the following json as
>> {
>> "type":"object”,
>>
>> "id":"urn:jsonschema:org:apache:flink:runtime:rest:messages:job:savepoints:SavepointTriggerRequestBody”,
>> "properties”:{
>> "target-directory":"hdfs:///flinkDsl”,
>> "cancel-job”:false
>> }
>> }
>>
>> And I use okhttp to send the request:
>> val MEDIA_TYPE_JSON = MediaType.parse("application/json; charset=utf-8")
>> val body = RequestBody.create(MEDIA_TYPE_JSON, postBody)
>> val request = new Request.Builder()
>>   .url(url)
>>   .post(body)
>>   .build()
>> client.newCall(request).execute()
>>
>>
>> but get an error  {"errors":["Request did not match expected format
>> SavepointTriggerRequestBody.”]}
>> Would anyone give an example of how to invoke the post rest api of Flink?
>> Thanks a lot.
>>
>> Best
>> Henry
>>
>


Re: How to use multiple sources with multiple sinks

2018-11-12 Thread vino yang
Hi,

If you are expressing a job that contains three pairs of source->sinks that
are isolated from each other, then Flink supports this form of Job.
It is not much different from a single source->sink, just changed from a
DataStream to three DataStreams.

For example,

*DataStream ds1 = xxx*
*ds1.addSink();*

*DataStream ds2 = xxx*
*ds2.addSink();*

*DataStream ds3 = xxx*
*ds3.addSink();*

Thanks, vino.

Flink Developer  于2018年11月11日周日 上午9:24写道:

> How can I configure 1 Flink Job (stream execution environment, parallelism
> set to 10) to have multiple kafka sources where each has its' own sink to
> s3.
>
> For example, let's say the sources are:
>
>1. Kafka Topic A - Consumer (10 partitions)
>2. Kafka Topic B - Consumer (10 partitions)
>3. Kafka Topic C - Consumer (10 partitions)
>
> And let's say the sinks are:
>
>1. BucketingSink to S3 in bucket: s3://kafka_topic_a/
>2. BucketingSink to S3 in bucket: s3://kafka_topic_b/
>3. BucketingSink to S3 in bucket: s3://kafka_topic_c/
>
> And between source 1 to sink 1, I would like to perform unique processing.
> Between source 2 to sink 2, it should have unique processing and between
> source 3 to sink 3, it should also have unique processing.
>
> How can this be achieved? Is there an example?
>