Re: fliter and flatMap operation VS only a flatMap operation

2020-01-29 Thread Tzu-Li (Gordon) Tai
Hi, If your filter and flatMap operators are chained, then the performance difference should not be noticeable. If a shuffle (i.e. a keyBy operation) occurs after the filter and before the flatMap, then applying the filter first will be more efficient. Cheers, Gordon On Thu, Jan 30, 2020 at 4:03

Re: TableSource being duplicated

2020-01-29 Thread Benchao Li
Hi Benoît, Do you mean if you register one TableSource, and add two sinks from the same TableSource, the source will duplicate ? If so, maybe you can check *TableEnvironmentImpl.isEagerOperationTranslation*, it's *false* by default. But in *StreamTableEnvironmentImpl*, it's *true* because we need

Re: FsStateBackend vs RocksDBStateBackend

2020-01-29 Thread Tzu-Li (Gordon) Tai
Hi Ran, On Thu, Jan 30, 2020 at 9:39 AM Ran Zhang wrote: > Hi all, > > We have a Flink app that uses a KeyedProcessFunction, and in the function > it requires a ValueState(of TreeSet) and the processElement method needs to > access and update it. We tried to use RocksDB as our stateBackend but t

FsStateBackend vs RocksDBStateBackend

2020-01-29 Thread Ran Zhang
Hi all, We have a Flink app that uses a KeyedProcessFunction, and in the function it requires a ValueState(of TreeSet) and the processElement method needs to access and update it. We tried to use RocksDB as our stateBackend but the performance is not good, and intuitively we think it was because o

Task-manager kubernetes pods take a long time to terminate

2020-01-29 Thread Li Peng
Hey folks, I'm deploying a Flink cluster via kubernetes, and starting each task manager with taskmanager.sh. I noticed that when I tell kubectl to delete the deployment, the job-manager pod usually terminates very quickly, but any task-manager that doesn't get terminated before the job-manager, usu

Using retained checkpoints as savepoints

2020-01-29 Thread Ken Krugler
Hi all, Currently https://ci.apache.org/projects/flink/flink-docs-master/ops/state/checkpoints.html#difference-to-savepoints says checkpoints… "do not support Flink specific features l

Status of FLINK-12692 (Support disk spilling in HeapKeyedStateBackend)

2020-01-29 Thread Ken Krugler
Hi Yu Li, It looks like this stalled out a bit, from May of last year, and won’t make it into 1.10. I’m wondering if there’s a version in Blink (as a completely separate state backend?) that could be tried out? Thanks, — Ken -- Ken Krugler http://www.scaleunlimited.co

Re: Does flink support retries on checkpoint write failures

2020-01-29 Thread wvl
Forgive my lack of knowledge here - I'm a bit out of my league here. But I was wondering if allowing e.g. 1 checkpoint to fail and the reason for which somehow caused a record to be lost (e.g. rocksdb exception / taskmanager crash / etc), there would be no Source rewind to the last successful chec

fliter and flatMap operation VS only a flatMap operation

2020-01-29 Thread Soheil Pourbafrani
Hi, In case we need to filter operation followed by a transformation, which one is more efficient in Flink, applying the filter operation first and then a flatMap operation separately OR using only a flatMap operation that internally includes the filter logic, too? best Soheil

Re: Does flink support retries on checkpoint write failures

2020-01-29 Thread Richard Deurwaarder
Hi Till, I'll see if we can ask google to comment on those issues, perhaps they have a fix in the works that would solve the root problem. In the meanwhile `CheckpointConfig.setTolerableCheckpointFailureNumber` sounds very promising! Thank you for this. I'm going to try this tomorrow to see if tha

Re: Apache Flink Job fails repeatedly due to RemoteTransportException

2020-01-29 Thread Till Rohrmann
Hi M Singh, have you checked the TaskManager logs of ip-xx-xxx-xxx-xxx.ec2.internal/xx.xxx.xxx.xxx:39623 for any suspicious logging statements? This might help to uncover why another node thinks that this TaskManager is no longer reachable. You could also try whether the same problem remains if y

Re: Flink+YARN HDFS replication factor

2020-01-29 Thread Till Rohrmann
Hi Piper, in general, Flink does not store transient data such as event data on HDFS. Event data (data which is sent between the TaskManager's to process it) is only kept in memory and if becoming too big spilled by some operators to local disk. What Flink stores on HDFS (given it is configured t

Re: is streaming outer join sending unnecessary traffic?

2020-01-29 Thread Till Rohrmann
Hi Kant, I am not an expert on Flink's SQL implementation. Hence, I'm pulling in Timo and Jark who might help you with your question. Cheers, Till On Tue, Jan 28, 2020 at 10:46 PM kant kodali wrote: > Sorry. fixed some typos. > > I am doing a streaming outer join from four topics in Kafka lets

Re: Does flink support retries on checkpoint write failures

2020-01-29 Thread Till Rohrmann
Hi Richard, googling a bit indicates that this might actually be a GCS problem [1, 2, 3]. The proposed solution/workaround so far is to retry the whole upload operation as part of the application logic. Since I assume that you are writing to GCS via Hadoop's file system this should actually fall i

Re: Is there anything strictly special about sink functions?

2020-01-29 Thread Till Rohrmann
Yes, checkpointing should behave normally without a sink. If I am not mistaken, then sinks should indeed be isomorphic to FlatMap[A, Nothing]. However, there is no guarantee that this will always stay like this. Cheers, Till On Wed, Jan 29, 2020 at 2:53 PM Andrew Roberts wrote: > Can I expect c

Re: Flink distribution housekeeping for YARN sessions

2020-01-29 Thread Till Rohrmann
Here is the corresponding JIRA ticket: https://issues.apache.org/jira/browse/FLINK-15806 On Wed, Jan 29, 2020 at 3:16 PM Till Rohrmann wrote: > Hi Theo, > > your assumption is correct that Flink won't clean up its files when using > `yarn application -kill ID`. This should also hold true for oth

Re: Flink distribution housekeeping for YARN sessions

2020-01-29 Thread Till Rohrmann
Hi Theo, your assumption is correct that Flink won't clean up its files when using `yarn application -kill ID`. This should also hold true for other temporary files generated by Flink's Blob service, shuffle service and io manager. These files are usually stored under /tmp and should be cleaned up

Re: Is there anything strictly special about sink functions?

2020-01-29 Thread Andrew Roberts
Can I expect checkpointing to behave normally without a sink, or do sink functions Invoke some special behavior? My hope is that sinks are isomorphic to FlatMap[A, Nothing], but it’s a challenge to verify all the bits of behavior observationally. Thanks for all your help! > On Jan 29, 2020, a

Re: Blocking KeyedCoProcessFunction.processElement1

2020-01-29 Thread Till Rohrmann
Yes, using AsyncIO could help in the case of blocking operations. Cheers, Till On Tue, Jan 28, 2020 at 10:45 AM Taher Koitawala wrote: > Would AsyncIO operator not be an option for you to connect to RDBMS? > > On Tue, Jan 28, 2020, 12:45 PM Alexey Trenikhun wrote: > >> Thank you Yun Tang. >> M

Re: Is there anything strictly special about sink functions?

2020-01-29 Thread Till Rohrmann
As far as I know you don't have to define a sink in order to define a valid Flink program (using Flink >= 1.9). Your topology can simply end in a map function and it should be executable once you call env.execute(). Cheers, Till On Tue, Jan 28, 2020 at 10:06 AM Arvid Heise wrote: > As Konstanti

Re: How to debug a job stuck in a deployment/run loop?

2020-01-29 Thread Till Rohrmann
Hi Jason, getting access to the log files would help most to figure out what's going wrong. Cheers, Till On Tue, Jan 28, 2020 at 9:08 AM Arvid Heise wrote: > Hi Jason, > > could you describe your topology? Are you writing to Kafka? Are you using > exactly once? Are you seeing any warning? > If

ActiveMQ connector

2020-01-29 Thread OskarM
Hi all, I am using Flink with Bahir's Apache ActiveMQ connector. However it's quite dated and poses many limitations, most notably the source supports only ByteMessages, does not support parallelism and has a bug that is only fixed in a snapshot version. So I started implementing my own SourceFun

Re: Cypher support for flink graphs?

2020-01-29 Thread Flavio Pompermaier
That would be awesome but I don't know about any support about Cypher. You better go with Morpheus[1] for that. [1] https://github.com/opencypher/morpheus On Wed, Jan 29, 2020 at 10:28 AM kant kodali wrote: > Hi All, > > Can we expect open cypher support for Flink graphs? > > Thanks! >

Cypher support for flink graphs?

2020-01-29 Thread kant kodali
Hi All, Can we expect open cypher support for Flink graphs? Thanks!

Re: Reinterpreting a pre-partitioned data stream as keyed stream

2020-01-29 Thread Guowei Ma
Hi, Krzysztof When you use the *reinterpretAsKeyedStream* you must guarantee that partition is the same as Flink does by yourself. But before going any further I think we should know whether normal DataStream API could satisfy your requirements without using *reinterpretAsKeyedStream.* An opera