Re: Access logs for a running Flink app in YARN cluster

2018-04-12 Thread 杨力
Maybe you can get them from yarn with rest API. Tao Xia 于 2018年4月13日周五 上午8:09写道: > Any good way to get access container logs from a running Flink app in YARN > cluster in EMR? > You can view the logs through YARN UI. But cannot programmatically access > it and send to other

Question about parallelism

2018-04-12 Thread TechnoMage
I am pretty new to flink. I have a flink job that has 10 transforms (mostly CoFlatMap with some simple filters and key extractrs as well. I have the config set for 6 slots and default parallelism of 6, but all my stages show paralellism of 1. Is that because there is only one task manager?

Access logs for a running Flink app in YARN cluster

2018-04-12 Thread Tao Xia
Any good way to get access container logs from a running Flink app in YARN cluster in EMR? You can view the logs through YARN UI. But cannot programmatically access it and send to other services. The log aggregator only runs when the application finishes or a minimum 3600 secs copy. Any way we can

Re: State management and heap usage

2018-04-12 Thread TechnoMage
Thank you. Michael > On Apr 12, 2018, at 2:45 AM, Gary Yao wrote: > > Hi Michael, > > You can configure the default state backend by setting state.backend in > flink-conf.yaml, or you can configure it per job [1]. The default state > backend > is "jobmanager"

Re: Is Flink able to do real time stock market analysis?

2018-04-12 Thread TechnoMage
Given the data from a window can not arrive before any of the data in that window, it will always arrive after the raw data for the same period, and may have some latency relative to the raw data. If your RichFlatMapFunction uses a ListState to hold more than one window worth of raw and

Re: Kafka Consumers Partition Discovery doesn't work

2018-04-12 Thread Tzu-Li (Gordon) Tai
Hi Juno, Thanks for reporting back, glad to know that it's not an issue :) In general, connector specific configurations should always happen at the connector level, per-connector. The flink-conf.yaml file is usually for cluster wide configurations. And yes, it might be helpful to have a code

Re: How to customize triggering of checkpoints?

2018-04-12 Thread Steven Wu
Syed, I am very curious about the motivation if you can share. On Wed, Apr 11, 2018 at 1:35 AM, Chesnay Schepler wrote: > Hello, > > there is no way to manually trigger checkpoints or configure irregular > intervals. > > You will have to modify the CheckpointCoordinator >

Re: keyBy and parallelism

2018-04-12 Thread Ken Krugler
I’m not sure I understand the actual use case, but … Using a rebalance() to randomly distribute keys to operators is what I think you’d need to do to support “even if I have less keys that slots, I wants each slot to take his share in the work” So it sounds like you want to (a) broadcast all

Re: Is Flink able to do real time stock market analysis?

2018-04-12 Thread Ivan Wang
Thanks Michael very much, it helps a lot! I tried what you suggest and now I can compare smoothed data with raw date in coFlat method. However, it cannot ensure that the smoothed data is coming in the expected way. Basically for every raw event, I’d like to refer to the early but closest

Re: keyBy and parallelism

2018-04-12 Thread Christophe Jolif
Sihua, On Thu, Apr 12, 2018 at 10:04 AM, 周思华 wrote: > Hi Christophe, > I think what you want to do is "stream join", and I'm a bit confuse that > if you have know there are only 8 keys then why would you still like to > use 16 parallelisms? 8 of them will be idle(a waste

Re: State management and heap usage

2018-04-12 Thread Gary Yao
Hi Michael, You can configure the default state backend by setting state.backend in flink-conf.yaml, or you can configure it per job [1]. The default state backend is "jobmanager" (MemoryStateBackend), which stores state and checkpoints on the Java heap. RocksDB must be explicitly enabled, e.g.,

Re: keyBy and parallelism

2018-04-12 Thread 周思华
Hi Christophe, I think what you want to do is "stream join", and I'm a bit confuse that if you have know there are only 8 keys then why would you still like to use 16 parallelisms? 8 of them will be idle(a waste of CPU). In the KeyedStream, the tuples with the same key will be sent to the

Re: keyBy and parallelism

2018-04-12 Thread Christophe Jolif
Thanks Chesnay (and others). That's what I was figuring out. Now let's go onto the follow up with my exact use-case. I have two streams A and B. A basically receives "rules" that the processing of B should observe to process. There is a "key" that allows me to know that a rule x coming in A is

Re: Kafka consumer to sync topics by event time?

2018-04-12 Thread Juho Autio
Looks like the bug https://issues.apache.org/jira/browse/FLINK-5479 is entirely preventing this feature to be used if there are any idle partitions. It would be nice to mention in documentation that currently this requires all subscribed partitions to have a constant stream of data with growing

Re: keyBy and parallelism

2018-04-12 Thread Chesnay Schepler
You will get 16 parallel executions since you specify a parallellism of 16, however 8 of these will not get any data. On 11.04.2018 23:29, Hao Sun wrote: From what I learnt, you have to control parallelism your self. You can set parallelism on operator or set default one through