date:20180412

Re: Access logs for a running Flink app in YARN cluster

2018-04-12 Thread 杨力

Maybe you can get them from yarn with rest API. Tao Xia 于 2018年4月13日周五上午8:09写道： > Any good way to get access container logs from a running Flink app in YARN > cluster in EMR? > You can view the logs through YARN UI. But cannot programmatically access > it and send to other

Question about parallelism

2018-04-12 Thread TechnoMage

I am pretty new to flink. I have a flink job that has 10 transforms (mostly CoFlatMap with some simple filters and key extractrs as well. I have the config set for 6 slots and default parallelism of 6, but all my stages show paralellism of 1. Is that because there is only one task manager?

Access logs for a running Flink app in YARN cluster

2018-04-12 Thread Tao Xia

Any good way to get access container logs from a running Flink app in YARN cluster in EMR? You can view the logs through YARN UI. But cannot programmatically access it and send to other services. The log aggregator only runs when the application finishes or a minimum 3600 secs copy. Any way we can

Re: State management and heap usage

2018-04-12 Thread TechnoMage

Thank you. Michael > On Apr 12, 2018, at 2:45 AM, Gary Yao wrote: > > Hi Michael, > > You can configure the default state backend by setting state.backend in > flink-conf.yaml, or you can configure it per job [1]. The default state > backend > is "jobmanager"

Re: Is Flink able to do real time stock market analysis?

2018-04-12 Thread TechnoMage

Given the data from a window can not arrive before any of the data in that window, it will always arrive after the raw data for the same period, and may have some latency relative to the raw data. If your RichFlatMapFunction uses a ListState to hold more than one window worth of raw and

Re: Kafka Consumers Partition Discovery doesn't work

2018-04-12 Thread Tzu-Li (Gordon) Tai

Hi Juno, Thanks for reporting back, glad to know that it's not an issue :) In general, connector specific configurations should always happen at the connector level, per-connector. The flink-conf.yaml file is usually for cluster wide configurations. And yes, it might be helpful to have a code

Re: How to customize triggering of checkpoints?

2018-04-12 Thread Steven Wu

Syed, I am very curious about the motivation if you can share. On Wed, Apr 11, 2018 at 1:35 AM, Chesnay Schepler wrote: > Hello, > > there is no way to manually trigger checkpoints or configure irregular > intervals. > > You will have to modify the CheckpointCoordinator >

Re: keyBy and parallelism

2018-04-12 Thread Ken Krugler

I’m not sure I understand the actual use case, but … Using a rebalance() to randomly distribute keys to operators is what I think you’d need to do to support “even if I have less keys that slots, I wants each slot to take his share in the work” So it sounds like you want to (a) broadcast all

Re: Is Flink able to do real time stock market analysis?

2018-04-12 Thread Ivan Wang

Thanks Michael very much, it helps a lot! I tried what you suggest and now I can compare smoothed data with raw date in coFlat method. However, it cannot ensure that the smoothed data is coming in the expected way. Basically for every raw event, I’d like to refer to the early but closest

Re: keyBy and parallelism

2018-04-12 Thread Christophe Jolif

Sihua, On Thu, Apr 12, 2018 at 10:04 AM, 周思华 wrote: > Hi Christophe, > I think what you want to do is "stream join", and I'm a bit confuse that > if you have know there are only 8 keys then why would you still like to > use 16 parallelisms? 8 of them will be idle(a waste

Re: State management and heap usage

2018-04-12 Thread Gary Yao

Hi Michael, You can configure the default state backend by setting state.backend in flink-conf.yaml, or you can configure it per job [1]. The default state backend is "jobmanager" (MemoryStateBackend), which stores state and checkpoints on the Java heap. RocksDB must be explicitly enabled, e.g.,

Re: keyBy and parallelism

2018-04-12 Thread 周思华

Hi Christophe, I think what you want to do is "stream join", and I'm a bit confuse that if you have know there are only 8 keys then why would you still like to use 16 parallelisms? 8 of them will be idle(a waste of CPU). In the KeyedStream, the tuples with the same key will be sent to the

Re: keyBy and parallelism

2018-04-12 Thread Christophe Jolif

Thanks Chesnay (and others). That's what I was figuring out. Now let's go onto the follow up with my exact use-case. I have two streams A and B. A basically receives "rules" that the processing of B should observe to process. There is a "key" that allows me to know that a rule x coming in A is

Re: Kafka consumer to sync topics by event time?

2018-04-12 Thread Juho Autio

Looks like the bug https://issues.apache.org/jira/browse/FLINK-5479 is entirely preventing this feature to be used if there are any idle partitions. It would be nice to mention in documentation that currently this requires all subscribed partitions to have a constant stream of data with growing

Re: keyBy and parallelism

2018-04-12 Thread Chesnay Schepler

You will get 16 parallel executions since you specify a parallellism of 16, however 8 of these will not get any data. On 11.04.2018 23:29, Hao Sun wrote: From what I learnt, you have to control parallelism your self. You can set parallelism on operator or set default one through

Re: Access logs for a running Flink app in YARN cluster

Question about parallelism

Access logs for a running Flink app in YARN cluster

Re: State management and heap usage

Re: Is Flink able to do real time stock market analysis?

Re: Kafka Consumers Partition Discovery doesn't work

Re: How to customize triggering of checkpoints?

Re: keyBy and parallelism

Re: Is Flink able to do real time stock market analysis?

Re: keyBy and parallelism

Re: State management and heap usage

Re: keyBy and parallelism

Re: keyBy and parallelism

Re: Kafka consumer to sync topics by event time?

Re: keyBy and parallelism

15 matches

Site Navigation

Mail list logo

Footer information