Re: Checkpoint takes long with FlinkKafkaConsumer

2016-07-05 Thread Stephan Ewen
I looked into this a bit and it, I think it is a Flink issue: The blocking is between the poll() and the commitToKafka() calls. The "commitToKafkaCall()" is not part of the checkpoint, it comes only after the checkpoint. So even if it is not called, this should not block the checkpoint. What may

Re: Checkpoint takes long with FlinkKafkaConsumer

2016-07-05 Thread Hironori Ogibayashi
Hi, Sorry for my late response. Actually, I received no response in Kafka mailing list and still cannot find the root cause. But when I use FlinkKafkaConsumer082, I do not encounter this issue, so I will use FlinkKafkaConsumer082. Thanks Hironori 2016-06-17 2:59 GMT+09:00 Ufuk Celebi

Re: JDBC sink in flink

2016-07-05 Thread Stefano Bortoli
The connection will be managed by the splitManager, no need of using a pool. However, if you had to, probably you should look into establishConnection() method of the JDBCInputFormat. 2016-07-05 10:52 GMT+02:00 Flavio Pompermaier : > why do you need a connection pool? >

Timewindow watermarks specific to keys of stream

2016-07-05 Thread madhu phatak
Hi, I am trying to analyse the stock market data. In this, for every symbol i want to find max of stock price in last 10 mins. I want to generate watermarks specific to key rather than across the stream. Is this possible in flink? -- Regards, Madhukara Phatak http://datamantra.io/

Re: Checkpointing very large state in RocksDB?

2016-07-05 Thread Vishnu Viswanath
Hi, Is there any other disadvantage of using fullyAsyncSnapshot, other than being slower. And would the slowness really matter since it is async anyways? Thanks and Regards, Vishnu Viswanath, On Thu, Jun 30, 2016 at 8:07 AM, Aljoscha Krettek wrote: > Hi, > are you taking

Re: Data point goes missing within iteration

2016-07-05 Thread Ufuk Celebi
Sorry Biplob, I didn't have time to look into your code today. I will try to do it tomorrow though. On Mon, Jul 4, 2016 at 2:53 PM, Biplob Biswas wrote: > I have sent you my code in a separate email, I hope you can solve my issue. > > Thanks a lot > Biplob > > > > -- >

Re: Flink stops deploying jobs on normal iteration

2016-07-05 Thread Nguyen Xuan Truong
Hi Vasia, Thank you very much for your explanation :). When running with small maxIteration, the job graph that Flink executed was optimal. However, when maxIterations was large, Flink took very long time to generate the job graph. The actually time to execute the jobs was very fast but the time

Re: Flink stops deploying jobs on normal iteration

2016-07-05 Thread Vasiliki Kalavri
Hi Truong, I'm afraid what you're experiencing is to be expected. Currently, for loops do not perform well in Flink since there is no support for caching intermediate results yet. This has been a quite often requested feature lately, so maybe it will be added soon :) Until then, I suggest you try

disable console output

2016-07-05 Thread Andres R. Masegosa
Hi, I'm having problems when working with flink (local mode) and travis-ci. The console output gives raises to big logs files (>4MB). How can I disable from my Java code (through the Configuration object) the progress messages displayed in console? Thanks, Andres

Re: JDBC sink in flink

2016-07-05 Thread Harikrishnan S
The basic idea was that I would create a pool of connections in the open() method in a custom sink and each invoke() method gets one connection from the pool and does the upserts needed. I might have misunderstood how sinks work in flink though. On Tue, Jul 5, 2016 at 2:22 PM, Flavio Pompermaier

Re: JDBC sink in flink

2016-07-05 Thread Chesnay Schepler
They serve a similar purpose. OutputFormats originate from the Batch API, whereas SinkFunctions are a Streaming API concept. You can however use OutputFormats in the Streaming API using the DataStrea#writeUsingOutputFormat. Regards, Chesnay On 05.07.2016 12:51, Harikrishnan S wrote: Ah

Re: JDBC sink in flink

2016-07-05 Thread Harikrishnan S
Awesome ! Thanks a lot ! I should probably write a blog post somewhere explaining this. On Tue, Jul 5, 2016 at 4:28 PM, Chesnay Schepler wrote: > They serve a similar purpose. > > OutputFormats originate from the Batch API, whereas SinkFunctions are a > Streaming API

Re: JDBC sink in flink

2016-07-05 Thread Harikrishnan S
Oh. So you mean if I write a custom sink for a db, I just need to create one connection in the open() method and then the invoke() method will reuse it ? Basically I need to do 35k-50k+ upserts in postgres. Can I reuse JDBCOutputFormat for this purpose ? I couldn't find a proper document

Re: JDBC sink in flink

2016-07-05 Thread Chesnay Schepler
Hello, an instance of the JDBCOutputFormat will use a single connection to send all values. Essentially - open(...) is called at the very start to create the connection - then all invoke/writeRecord calls are executed (using the same connection) - then close() is called to clean up. The

Re: JDBC sink in flink

2016-07-05 Thread Harikrishnan S
Ah that makes send. Also what's the difference between a RichOutputFormat and a RichSinkFunction ? Can I use JDBCOutputFormat as a sink in a stream ? On Tue, Jul 5, 2016 at 3:53 PM, Chesnay Schepler wrote: > Hello, > > an instance of the JDBCOutputFormat will use a single

Re: Tumbling time window cannot group events properly

2016-07-05 Thread Aljoscha Krettek
The order in which elements are added to internal buffers and the point in time when FoldFunction.fold() is called don't indicate to which window elements are added. Flink will internally keep a buffer for each window and emit the window once the watermark passes the end of the window. In your

Re: JDBC sink in flink

2016-07-05 Thread Stefano Bortoli
As Chesnay said, it not necessary to use a pool as the connection is reused across split. However, if you had to customize it for some reasons, you can do it starting from the JDBC Input and Output format. cheers! 2016-07-05 13:27 GMT+02:00 Harikrishnan S : > Awesome !

Kafka Producer Partitioning issue

2016-07-05 Thread Gyula Fóra
Hi, I have ran into a strange issue when using the kafka producer. I got the following exception: Caused by: java.lang.IllegalArgumentException: Invalid partition given with record: 5 is not in the range [0...2]. at

Late arriving events

2016-07-05 Thread Chen Qin
Hi there, I understand Flink currently doesn't support handling late arriving events. In reality, a exact-once link job needs to deal data missing or backfill from time to time without rewind to previous save point, which implies restored job suffers blackout while it tried to catch up. In

Re: Late arriving events

2016-07-05 Thread Jamie Grier
I put a few comments in-line below... On Tue, Jul 5, 2016 at 4:06 PM, Chen Qin wrote: > Hi there, > > I understand Flink currently doesn't support handling late arriving > events. In reality, a exact-once link job needs to deal data missing or > backfill from time to time

Re: Timewindow watermarks specific to keys of stream

2016-07-05 Thread Jamie Grier
Hi Madhu, This is not possible right now but are you sure this is necessary in your application? Would the timestamps for stock data really be radically different for different symbols that occur close together in the input stream. The windows themselves are for each key but event time advances

Re: Failed job restart - flink on yarn

2016-07-05 Thread vpra...@gmail.com
Thanks for the reply, It would be great to have the feature to restart a failed job from the last checkpoint. Is there a way to pass the initial set of partition-offsets to the kafka-client ? In that case I can maintain a list of last processed offsets from within my window operation (possibly

Re: Failed job restart - flink on yarn

2016-07-05 Thread Jamie Grier
The Kafka client can be configured to commit offsets to Zookeeper periodically even when those offsets are not used in the normal fault-tolerance case. Normally, the Kafka offsets are part of Flink's normal state. However, in the absence of this state the FlinkKafkaConsumer will actually

Re: disable console output

2016-07-05 Thread Jamie Grier
Hi Andres, I believe what you're looking for is to disable the logging. Have a look at the log4j.properties file that exists in your /lib directory. You can configure this to use a NullAppender (or whatever you like). You can also selectively just disable logging for particular parts of the

Re: Switch to skip the stream alignment during a checkpoint?

2016-07-05 Thread Daniel Li
Thanks Ufuk. Really appreciated. On Thu, Jun 30, 2016 at 2:07 AM, Ufuk Celebi wrote: > You are right, this is not very well-documented. You can do it like this: > > > env.getCheckpointConfig().setCheckpointingMode(CheckpointingMode.AT_LEAST_ONCE); > > With this the operators

Flink stops deploying jobs on normal iteration

2016-07-05 Thread Nguyen Xuan Truong
Hi, I have a Flink program which is similar to Kmeans algorithm. I use normal iteration(for loop) because Flink iteration does not allow to compute the intermediate results(in this case the topDistance) within one iteration. The problem is that my program only runs when maxIteration is small.

Re: Checkpointing very large state in RocksDB?

2016-07-05 Thread Daniel Li
Thanks Aljoscha. Yes - that is exactly what I am looking for. On Thu, Jun 30, 2016 at 5:07 AM, Aljoscha Krettek wrote: > Hi, > are you taking about *enableFullyAsyncSnapshots()* in the RocksDB > backend. If not, there is this switch that is described in the JavaDoc: > > /**

Parameters inside an iteration?

2016-07-05 Thread Boden, Christoph
Dear Flink Community, is there a compact and efficient way to get parameters that are know at run-time, but not compile-time inside an iteration? I tried the following: >define an object with the parameters: object iterationVariables{ var numDataPoints = 1 var lambda = 0.2 var stepSize =