Henry,
Thanks for the great feedbacks. I'm making some proposal for adding the
control mechanism for latency v.s. data volume tradeoffs, which I will put
up to wiki once it is done:
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Streams+Discussions
We can continue the discussion from th
My use case is actually myTable.aggregate().to("output_topic"), so I need a
way to suppress the number of outputs.
I don't think correlating the internal cache flush with the output window
emit frequency is ideal. It's too hard for application developer to see
when the cache will be flushed, we w
To summarize a chat session Guozhang and I just had on slack:
We currently do dedupe the output for stateful operations (e.g. join,
aggregate). They hit an in-memory cache and only produce output to
rocksdb/kafka when either that cache fills up or the commit period occurs.
So the changelog for the
0.10.0.1 is fine for me, I am actually building from trunk head for streams
package.
On Wed, Apr 20, 2016 at 5:06 PM, Guozhang Wang wrote:
> I saw that note, thanks for commenting.
>
> I are cutting the next 0.10.0.0 RC next week, so I am not certain if it
> will make it for 0.10.0.0. But we can
I saw that note, thanks for commenting.
I are cutting the next 0.10.0.0 RC next week, so I am not certain if it
will make it for 0.10.0.0. But we can push it to be in 0.10.0.1.
Guozhang
On Wed, Apr 20, 2016 at 4:57 PM, Henry Cai
wrote:
> Thanks.
>
> Do you know when KAFKA-3101 will be implemen
Thanks.
Do you know when KAFKA-3101 will be implemented?
I also add a note to that JIRA for a left outer join use case which also
need buffer support.
On Wed, Apr 20, 2016 at 4:42 PM, Guozhang Wang wrote:
> Henry,
>
> I thought you were concerned about consumer memory contention. That's a
> v
Henry,
I thought you were concerned about consumer memory contention. That's a
valid point, and yes, you need to keep those buffered records in a
persistent store.
As I mentioned we are trying to do optimize the aggregation outputs as in
https://issues.apache.org/jira/browse/KAFKA-3101
Its idea
I think this scheme still has problems. If during 'holding' I literally
hold (don't return the method call), I will starve the thread. If I am
writing the output to a in-memory buffer and let the method returns, the
kafka stream will acknowledge the record to upstream queue as processed, so
I wou
By "holding the stream", I assume you are still consuming data, but just
that you only write data every 10 minutes instead of upon each received
record right?
Anyways, in either case, consumer should not have severe memory issue as
Kafka Streams will pause its consuming when enough data is buffere
So hold the stream for 15 minutes wouldn't cause too much performance
problems?
On Wed, Apr 20, 2016 at 3:16 PM, Guozhang Wang wrote:
> Consumer' buffer does not depend on offset committing, once it is given
> from the poll() call it is out of the buffer. If offsets are not committed,
> then upo
Consumer' buffer does not depend on offset committing, once it is given
from the poll() call it is out of the buffer. If offsets are not committed,
then upon failover it will simply re-consumer these records again from
Kafka.
Guozhang
On Tue, Apr 19, 2016 at 11:34 PM, Henry Cai
wrote:
> For the
For the technique of custom Processor of holding call to context.forward(),
if I hold it for 10 minutes, what does that mean for the consumer
acknowledgement on source node?
I guess if I hold it for 10 minutes, the consumer is not going to ack to
the upstream queue, will that impact the consumer p
Yes we are aware of this behavior and are working on optimizing it:
https://issues.apache.org/jira/browse/KAFKA-3101
More generally, we are considering to add a "trigger" interface similar to
the Millwheel model where users can customize when they want to emit
outputs to the downstream operators.
Is it true that the aggregation and reduction methods of KStream will emit
a new output message for each incoming message?
I have an application that's copying a Postgres replication stream to a
Kafka topic, and activity tends to be clustered, with many updates to a
given primary key happening in
14 matches
Mail list logo