Flink 1.5 TooLongFrameException in cluster mode?

2018-06-20 Thread chrisr123
I download Flink 1.5 and I'm trying to run it in standalone mode. 1 job manager, 2 task managers. I can run flink job when I run in local mode: 1 machine as both job manager and task manager. But when I add 2 remote machines as slaves and try to run, I am seeing this error in the log and the

Re: Backpressure from producer with flink connector kinesis 1.4.2

2018-06-20 Thread Liu, Gavin (CAI - Atlanta)
Hi guys, I have another question related to the KPL problem. I wonder what the consequences of overwhelming KPL internal queue (kinesis) can be. From my observation in experimenting with 1.4.2 (which does not have backpressure support yet in the open pr stated below), when the flink cluster

How to use broadcast variables in data stream

2018-06-20 Thread zhen li
Hi,all: I want to use some other broadcast resources such as list or map in the flatmap function or customer triggers, but I don’t find some api to satisfy. Anyone can help? thanks

Re: Cluster resurrects old job

2018-06-20 Thread Elias Levy
Alas, that error appears to be a red herring. Admin mistyped the cancel command leading to the error. But immediately corrected it, resulting in the job being canceled next. So seems unrelated to the job coming back to life later on. On Wed, Jun 20, 2018 at 10:04 AM Elias Levy wrote: > The

Re: Blob Server Removes Failed Jobs Immediately

2018-06-20 Thread Chesnay Schepler
hmm, this indeed looks odd. Looping in Till (cc) who might know more about this. On 20.06.2018 16:43, Dominik Wosiński wrote: Hello, I'm not sure whether the problem is connected with bad configuration or it's some inconsistency in the documentation but according to this

Re: Cluster resurrects old job

2018-06-20 Thread Elias Levy
The source of the issue may be this error that occurred when the job was being canceled on June 5: June 5th 2018, 14:59:59.430 Failure during cancellation of job c59dd3133b1182ce2c05a5e2603a0646 with savepoint. java.io.IOException: Failed to create savepoint directory at --checkpoint-dir at

Cluster resurrects old job

2018-06-20 Thread Elias Levy
We had an unusual situation last night. One of our Flink clusters experienced some connectivity issues, with lead to the the single job running on the cluster failing and then being restored. And then something odd happened. The cluster decided to also restore an old version of the job. One we

Re: Backpressure from producer with flink connector kinesis 1.4.2

2018-06-20 Thread Liu, Gavin (CAI - Atlanta)
Thanks, Gordon. You are quick and It is very helpful to me. I tried some other alternatives to resolve this, finally thought about rewriting the FlinkKinesisProducer class for our need. Glad that I asked before I started. Really appreciate the quick response. From: "Tzu-Li (Gordon) Tai" Date:

Re: Backpressure from producer with flink connector kinesis 1.4.2

2018-06-20 Thread Tzu-Li (Gordon) Tai
Hi Gavin, The problem is that the Kinesis producer currently does not propagate backpressure properly. Records are added to the internally used KPL client’s queue, without any queue size limit. This is considered a bug, and already has a pull request for it [1], which we should probably push

Backpressure from producer with flink connector kinesis 1.4.2

2018-06-20 Thread Liu, Gavin (CAI - Atlanta)
Hi guys, I am new to flink framework. And we are building an application that takes kinesis stream for both flink source and sink. The flink version we are using is 1.4.2, which is also the version for the flink-connector-kinesis. We built the flink-connector-kinesis jar explicitly with KPL

Re: # of active session windows of a streaming job

2018-06-20 Thread Dongwon Kim
Hi Fabian and Chesnay, As Chesnay pointed out, it seems that I need to write the current counter (which is defined inside Trigger) into state which I think should be the operator state of the window operator. However, as I previously said, TriggerContext allows for users to access only the

Re: Exception while submitting jobs through Yarn

2018-06-20 Thread Garvit Sharma
So, finally, I have got this working. The issue was because of a poor library which was using xerces 2.6 :). In this process, I found few things missing from the doc would like to contribute the same. I really appreciate the support provided. Thanks, On Tue, 19 Jun 2018 at 4:05 PM, Ted Yu

Blob Server Removes Failed Jobs Immediately

2018-06-20 Thread Dominik Wosiński
Hello, I'm not sure whether the problem is connected with bad configuration or it's some inconsistency in the documentation but according to this document: https://cwiki.apache.org/confluence/display/FLINK/FLIP-19%3A+Improved+BLOB+storage+architecture . *I*f a job fails, all non-HA files'

Re: How to get past "bad" Kafka message, restart, keep state

2018-06-20 Thread Tzu-Li (Gordon) Tai
Hi, You can “skip” the corrupted message by returning `null` from the deserialize method on the user-provided DeserializationSchema. This lets the Kafka connector consider the record as processed, advances the offset, but doesn’t emit anything downstream for it. Hope this helps! Cheers,

Re: How to get past "bad" Kafka message, restart, keep state

2018-06-20 Thread Kien Truong
Hi, You can use FlatMap instead of Map, and only collect valid elements. Regards, Kien On 6/20/2018 7:57 AM, chrisr123 wrote: First time I'm trying to get this to work so bear with me. I'm trying to learn checkpointing with Kafka and handling "bad" messages, restarting without losing

Re: Ordering of stream from different kafka partitions

2018-06-20 Thread Andrey Zagrebin
Hi Amol, > In above code also it will sort the records in specific time window only. All windows will be emitted as watermark passes the end of the window. The watermark only increases. So the non-overlapping windows should be also sorted by time and as a consequence the records across windows

Re: Heap Problem with Checkpoints

2018-06-20 Thread Fabian Wollert
to that last one: i'm accessing S3 from one EC2 instance which has a IAM Role attached ... I'll get back to you when i have those stacktraces printed ... will have to build the project and package the custom version first, might take some time, and also some vacation is up next ... Cheers --

Re: Heap Problem with Checkpoints

2018-06-20 Thread Piotr Nowojski
Hi, I was looking in this more, and I have couple of suspicions, but it’s still hard to tell which is correct. Could you for example place a breakpoint (or add a code there to print a stack trace) in org.apache.log4j.helpers.AppenderAttachableImpl#addAppender And check who is calling it? Since

Re: # of active session windows of a streaming job

2018-06-20 Thread Chesnay Schepler
Checkpointing of metrics is a manual process. The operator must write the current value into state, retrieve it on restore and restore the counter's count. On 20.06.2018 12:10, Fabian Hueske wrote: Hi Dongwon, You are of course right! We need to decrement the counter when the window is

Re: Passing records between two jobs

2018-06-20 Thread Fabian Hueske
Hi Avihai, Rafi pointed out the two common approaches to deal with this situation. Let me expand a bit on those. 1) Transactional producing in to queues: There are two approaches to accomplish exactly-once producing into queues, 1) using a system with transactional support such as Kafka or 2)

Re: # of active session windows of a streaming job

2018-06-20 Thread Fabian Hueske
Hi Dongwon, You are of course right! We need to decrement the counter when the window is closed. The idea of using Trigger.clear() (the clean up method is called clear() instead of onClose()) method is great! It will be called when the window is closed but also when it is merged. So, I think you

Re: Ordering of stream from different kafka partitions

2018-06-20 Thread Andrey Zagrebin
Hi, Good point, sorry for confusion, BoundedOutOfOrdernessTimestampExtractor of course does not buffer records, you need to apply windowing (e.g. TumblingEventTimeWindows) for that and then sort the window output by time and emit records in sorted order. You can also use windowAll which

Re: Memory Leak in ProcessingTimeSessionWindow

2018-06-20 Thread Stefan Richter
Hi, it is possible that the number of processing time timers can grow, because internal timers are scoped by time, key, and namespace (typically this means „window“, because each key can be part of multiple windows). So if the number of keys in your application is steadily growing this can

Re: Ordering of stream from different kafka partitions

2018-06-20 Thread sihua zhou
Hi, I think a global ordering is a bit impractical on production, but in theroy, you still can do that. You need to - Firstly fix the operate's parallelism to 1(except the source node). - If you want to sort the records within a bouned time, then you can keyBy() a constant and window it,

Re: Breakage in Flink CLI in 1.5.0

2018-06-20 Thread Till Rohrmann
It will, but it defaults to jobmanager.rpc.address if no rest.address has been specified. On Wed, Jun 20, 2018 at 9:49 AM Chesnay Schepler wrote: > Shouldn't the non-HA case be covered by rest.address? > > On 20.06.2018 09:40, Till Rohrmann wrote: > > Hi Sampath, > > it is no longer possible to

Re: Breakage in Flink CLI in 1.5.0

2018-06-20 Thread Chesnay Schepler
Shouldn't the non-HA case be covered by rest.address? On 20.06.2018 09:40, Till Rohrmann wrote: Hi Sampath, it is no longer possible to not start the rest server endpoint by setting rest.port to -1. If you do this, then the cluster won't start. The comment in the flink-conf.yaml holds only

Re: Breakage in Flink CLI in 1.5.0

2018-06-20 Thread Till Rohrmann
Hi Sampath, it is no longer possible to not start the rest server endpoint by setting rest.port to -1. If you do this, then the cluster won't start. The comment in the flink-conf.yaml holds only true for the legacy mode. In non-HA setups we need the jobmanager.rpc.address to derive the hostname

Re: Passing records between two jobs

2018-06-20 Thread Rafi Aroch
Hi Avihai, The problem is that every message queuing sink only provides at-least-once > guarantee > >From what I see, possible messaging queue which guarantees exactly-once is Kafka 0.11, while using the Kafka transactional messaging

Re: Breakage in Flink CLI in 1.5.0

2018-06-20 Thread Chesnay Schepler
I was worried this might be the case. The rest.port handling was simply copied from the legacy web-server, which explicitly allowed shutting it down. It may (I'm not entirely sure) also not be necessary for all deployment modes; for example if the job is baked into the job/taskmanager images.

Re: Questions regarding to Flink 1.5.0 REST API change

2018-06-20 Thread Chesnay Schepler
I think you can set the target-directory to null. But I'm not sure why this particular request requires this, other request allow optional fields to simply be ommitted... On 20.06.2018 06:12, Siew Wai Yow wrote: Hi all, Seems pass in target-directory is a must now for checkpoints REST API,

Re: Breakage in Flink CLI in 1.5.0

2018-06-20 Thread Sampath Bhat
Hi Chesnay Adding on to this point you made - " the rpc address is still *required *due to some technical implementations; it may be that you can set this to some arbitrary value however." For job submission to happen successfully we should give specific rpc address and not any arbitrary value.