Question about the behavior of TM when it lost the zookeeper client session in HA mode

2018-05-13 Thread Tony Wei
Hi all, Recently, my flink job met a problem that caused the job failed and restarted. The log is list this screen snapshot or this ``` 2018-05-11 13:21:04,582 WARN org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in

Re: Batch writing from Flink streaming job

2018-05-13 Thread Jörn Franke
If you want to write in batches from a streaming source you always will need some state ie a state database (here a NoSQL database such as a key value store makes sense). Then you can grab the data at certain points in time and convert it to Avro. You need to make sure that the state is

Batch writing from Flink streaming job

2018-05-13 Thread Padarn Wilson
Hi all, I am writing some some jobs intended to run using the DataStream API using a Kafka source. However we also have a lot of data in Avro archives (of the same Kafka source). I would like to be able to run the processing code over parts of the archive so I can generate some "example output".

Default zookeeper

2018-05-13 Thread miki haiat
When downloading the the flink source in order to run it local thire is a zookeper script and start-zookeeper-quorum script . Is thire any difference between the default zookeeper installation lets say in Ubuntu and the zookeeper that come with flink ? thanks, MIki

Re: How to broadcast messages to all task manager instances in cluster?

2018-05-13 Thread Di Tang
Thanks Piotr for the response. I have many data streams dependant on the configuration by getting value from static variables in a class. The way the configuration change works is to change the static variables' value in the class. Since each task manager only has one JVM process, as long as the

Re: YARN per-job cluster reserves all of remaining memory in YARN

2018-05-13 Thread Till Rohrmann
Hi Dongwon, Fabian is right with his analysis of the problem. Currently, the new ResourceManager implementation will start a new TM per requested slot independent of the number of configured slots. This behavior can cause that the cluster requests too many resources when a job is started.

Re: strange behavior with jobmanager.rpc.address on standalone HA cluster

2018-05-13 Thread Till Rohrmann
Hi Derek, given that you've started the different Flink cluster components all with the same HA enabled configuration, the TMs should be able to connect to jm1 after you've killed jm0. The jobmanager.rpc.address should not be used when HA mode is enabled. In order to get to the bottom of the

Re: A strange exception while consumption using a multi topic Kafka Connector

2018-05-13 Thread Ted Yu
FLINK-9349 was logged. FYI On Sat, May 12, 2018 at 7:52 AM, Ted Yu wrote: > I took a look at ./flink-connectors/flink-connector-kafka-0.9/src/main/ > java/org/apache/flink/streaming/connectors/kafka/ > internal/Kafka09Fetcher.java > > It seems the List

Re: Task Manager detached under load

2018-05-13 Thread Steven Wu
Till, thanks for the clarification. yes, that situation is undesirable either. In our case, restarting jobmanager could also recover the job from akk association lock-out. it was actually the issue (high GC pause) on jobmanager side that caused the akka failure. do we have sth like

Re: Best way to clean-up states in memory

2018-05-13 Thread ashish pok
Hi Till, Thanks for getting back. I am sure that will fix the issue but I feel like that would potentially mask an issue. I have been going back and forth with Fabian on a use case where for some of our highly transient datasets, it might make sense to just use memory based state (except of

Re: Task Manager detached under load

2018-05-13 Thread Till Rohrmann
Hi Steven, the reason why we did not turn on this feature per default was that in case of a true JM failure, all of the TMs will think that they got quarantined which triggers their shut down. Depending on how many container restarts you have left on Yarn, for example, this can lead to a

Re: Confusing debug level log output with Flink 1.5

2018-05-13 Thread Till Rohrmann
Hi Ken, sorry for the long response time. You're right that the log output is too verbose and we should not log the full stack traces in the happy case. I've responded onto the PR thread [1] and suggested to only log the exception message for information purposes. I hope that this will solve the

Re: Best way to clean-up states in memory

2018-05-13 Thread Till Rohrmann
Hi Ashish, have you tried using Flink's RocksDBStateBackend? If your job accumulates state exceeding the available main memory, then you have to use a state backend which can spill to disk. The RocksDBStateBackend offers you exactly this functionality. Cheers, Till On Mon, Apr 30, 2018 at 3:54

Re: Substasks - Uneven allocation

2018-05-13 Thread Till Rohrmann
Hi Pedro, currently, Flink does not allow you to explicitly control the scheduling strategy at such a fine grained level. The idea behind this is to achieve location transparency and to make the scheduling easier. However, there are some tricks you could play depending on the actual job. For