I looked into this a bit and it, I think it is a Flink issue:
The blocking is between the poll() and the commitToKafka() calls.
The "commitToKafkaCall()" is not part of the checkpoint, it comes only
after the checkpoint. So even if it is not called, this should not block
the checkpoint.
What may
Hi,
Sorry for my late response.
Actually, I received no response in Kafka mailing list and still
cannot find the root cause.
But when I use FlinkKafkaConsumer082, I do not encounter this issue,
so I will use FlinkKafkaConsumer082.
Thanks
Hironori
2016-06-17 2:59 GMT+09:00 Ufuk Celebi
The connection will be managed by the splitManager, no need of using a
pool. However, if you had to, probably you should look into
establishConnection() method of the JDBCInputFormat.
2016-07-05 10:52 GMT+02:00 Flavio Pompermaier :
> why do you need a connection pool?
>
Hi,
I am trying to analyse the stock market data. In this, for every symbol i
want to find max of stock price in last 10 mins. I want to generate
watermarks specific to key rather than across the stream. Is this possible
in flink?
--
Regards,
Madhukara Phatak
http://datamantra.io/
Hi,
Is there any other disadvantage of using fullyAsyncSnapshot, other than
being slower. And would the slowness really matter since it is async
anyways?
Thanks and Regards,
Vishnu Viswanath,
On Thu, Jun 30, 2016 at 8:07 AM, Aljoscha Krettek
wrote:
> Hi,
> are you taking
Sorry Biplob, I didn't have time to look into your code today. I will
try to do it tomorrow though.
On Mon, Jul 4, 2016 at 2:53 PM, Biplob Biswas wrote:
> I have sent you my code in a separate email, I hope you can solve my issue.
>
> Thanks a lot
> Biplob
>
>
>
> --
>
Hi Vasia,
Thank you very much for your explanation :). When running with small
maxIteration, the job graph that Flink executed was optimal. However, when
maxIterations was large, Flink took very long time to generate the job
graph. The actually time to execute the jobs was very fast but the time
Hi Truong,
I'm afraid what you're experiencing is to be expected. Currently, for loops
do not perform well in Flink since there is no support for caching
intermediate results yet. This has been a quite often requested feature
lately, so maybe it will be added soon :)
Until then, I suggest you try
Hi,
I'm having problems when working with flink (local mode) and travis-ci.
The console output gives raises to big logs files (>4MB).
How can I disable from my Java code (through the Configuration object)
the progress messages displayed in console?
Thanks,
Andres
The basic idea was that I would create a pool of connections in the open()
method in a custom sink and each invoke() method gets one connection from
the pool and does the upserts needed. I might have misunderstood how sinks
work in flink though.
On Tue, Jul 5, 2016 at 2:22 PM, Flavio Pompermaier
They serve a similar purpose.
OutputFormats originate from the Batch API, whereas SinkFunctions are a
Streaming API concept.
You can however use OutputFormats in the Streaming API using the
DataStrea#writeUsingOutputFormat.
Regards,
Chesnay
On 05.07.2016 12:51, Harikrishnan S wrote:
Ah
Awesome ! Thanks a lot ! I should probably write a blog post somewhere
explaining this.
On Tue, Jul 5, 2016 at 4:28 PM, Chesnay Schepler wrote:
> They serve a similar purpose.
>
> OutputFormats originate from the Batch API, whereas SinkFunctions are a
> Streaming API
Oh. So you mean if I write a custom sink for a db, I just need to create
one connection in the open() method and then the invoke() method will reuse
it ? Basically I need to do 35k-50k+ upserts in postgres. Can I reuse
JDBCOutputFormat
for this purpose ? I couldn't find a proper document
Hello,
an instance of the JDBCOutputFormat will use a single connection to send
all values.
Essentially
- open(...) is called at the very start to create the connection
- then all invoke/writeRecord calls are executed (using the same connection)
- then close() is called to clean up.
The
Ah that makes send. Also what's the difference between a RichOutputFormat
and a RichSinkFunction ? Can I use JDBCOutputFormat as a sink in a stream ?
On Tue, Jul 5, 2016 at 3:53 PM, Chesnay Schepler wrote:
> Hello,
>
> an instance of the JDBCOutputFormat will use a single
The order in which elements are added to internal buffers and the point in
time when FoldFunction.fold() is called don't indicate to which window
elements are added. Flink will internally keep a buffer for each window and
emit the window once the watermark passes the end of the window. In your
As Chesnay said, it not necessary to use a pool as the connection is reused
across split. However, if you had to customize it for some reasons, you can
do it starting from the JDBC Input and Output format.
cheers!
2016-07-05 13:27 GMT+02:00 Harikrishnan S :
> Awesome !
Hi,
I have ran into a strange issue when using the kafka producer.
I got the following exception:
Caused by: java.lang.IllegalArgumentException: Invalid partition given
with record: 5 is not in the range [0...2].
at
Hi there,
I understand Flink currently doesn't support handling late arriving events.
In reality, a exact-once link job needs to deal data missing or backfill
from time to time without rewind to previous save point, which implies
restored job suffers blackout while it tried to catch up.
In
I put a few comments in-line below...
On Tue, Jul 5, 2016 at 4:06 PM, Chen Qin wrote:
> Hi there,
>
> I understand Flink currently doesn't support handling late arriving
> events. In reality, a exact-once link job needs to deal data missing or
> backfill from time to time
Hi Madhu,
This is not possible right now but are you sure this is necessary in your
application? Would the timestamps for stock data really be radically
different for different symbols that occur close together in the input
stream. The windows themselves are for each key but event time advances
Thanks for the reply, It would be great to have the feature to restart a
failed job from the last checkpoint.
Is there a way to pass the initial set of partition-offsets to the
kafka-client ? In that case I can maintain a list of last processed offsets
from within my window operation (possibly
The Kafka client can be configured to commit offsets to Zookeeper
periodically even when those offsets are not used in the normal
fault-tolerance case. Normally, the Kafka offsets are part of Flink's
normal state. However, in the absence of this state the FlinkKafkaConsumer
will actually
Hi Andres,
I believe what you're looking for is to disable the logging. Have a look
at the log4j.properties file that exists in your /lib directory.
You can configure this to use a NullAppender (or whatever you like). You
can also selectively just disable logging for particular parts of the
Thanks Ufuk. Really appreciated.
On Thu, Jun 30, 2016 at 2:07 AM, Ufuk Celebi wrote:
> You are right, this is not very well-documented. You can do it like this:
>
>
> env.getCheckpointConfig().setCheckpointingMode(CheckpointingMode.AT_LEAST_ONCE);
>
> With this the operators
Hi,
I have a Flink program which is similar to Kmeans algorithm. I use normal
iteration(for loop) because Flink iteration does not allow to compute the
intermediate results(in this case the topDistance) within one iteration.
The problem is that my program only runs when maxIteration is small.
Thanks Aljoscha. Yes - that is exactly what I am looking for.
On Thu, Jun 30, 2016 at 5:07 AM, Aljoscha Krettek
wrote:
> Hi,
> are you taking about *enableFullyAsyncSnapshots()* in the RocksDB
> backend. If not, there is this switch that is described in the JavaDoc:
>
> /**
Dear Flink Community,
is there a compact and efficient way to get parameters that are know at
run-time, but not compile-time inside an iteration? I tried the following:
>define an object with the parameters:
object iterationVariables{
var numDataPoints = 1
var lambda = 0.2
var stepSize =
28 matches
Mail list logo