I am correct in assuming that the Kafka producer sink can lose message?
I don't expect exactly-once semantics using Kafka as a sink given Kafka
publishing guarantees, but I do expect at least once.
I gather from reading the source that the producer is publishing messages
asynchronously, as
Can someone confirm whether
the org.apache.flink.streaming.api.scala.WindowedStream methods other than
"apply" (e.g. "sum") perform pre-aggregation? The API docs are silent on
this.
.
In my data set keys come and go, and many will never be observed again.
That will lead to
continuous state growth over time.
On Mon, May 2, 2016 at 6:06 PM, Elias Levy <fearsome.lucid...@gmail.com>
wrote:
> Thanks for the suggestion. I ended up implementing it a different way.
>
> [
Till,
Thanks again for putting this together. It is certainly along the lines of
what I want to accomplish, but I see some problem with it. In your code
you use a ValueStore to store the priority queue. If you are expecting to
store a lot of values in the queue, then you are likely to be using
Looking over the code, I see that Flink creates a TimeWindow object each
time the WindowAssigner is created. I have not yet tested this, but I am
wondering if this can become problematic if you have a very long sliding
window with a small slide, such as a 24 hour window with a 1 minute slide.
It
Thanks for the suggestion. I ended up implementing it a different way.
What is needed is a mechanism to give each stream a different window
assigner, and then let Flink perform the join normally given the assigned
windows.
Specifically, for my use case what I need is a sliding window for one
tState. As I
> understand the drawback is that the list state is not maintained in the
> managed memory.
> I'm interested to hear about the right way to implement this.
>
> On Wed, Apr 13, 2016 at 3:53 PM, Elias Levy <fearsome.lucid...@gmail.com>
> wrote:
>
>> I am wondering h
On Wed, Apr 13, 2016 at 2:10 AM, Stephan Ewen wrote:
> If you want to use Flink's internal key/value state, however, you need to
> let Flink re-partition the data by using "keyBy()". That is because Flink's
> internal sharding of state (including the re-sharding to adjust
I am wondering if the Kafka connectors leverage Kafka message keys at all?
Looking through the docs my impression is that it does not. E.g. if I use
the connector to consume from a partitioned Kafka topic, what I will get
back is a DataStream, rather than a KeyedStream. And if I want access to
Is there an API to access an event's time window? When you are computing
aggregates over a time window, you usually want to output the window along
with the aggregate. I could compute the Window on my own, but this seems
like an API that should exist.
I am wondering about the expected behavior of the sum method. Obviously it
sums a specific field in a tuple or POJO. But what should one expect in
other fields? Does sum keep the first field, last field or there aren't
any guarantees?
An observation and a couple of question from a novice.
The observation: The Flink web site makes available ScalaDocs for
org.apache.flink.api.scala but not for org.apache.flink.streaming.api.scala.
Now for the questions:
Why can't you use map to transform a data stream, say convert all the
101 - 112 of 112 matches
Mail list logo