Kafka provides the ack mechanism, although all ack solution would hurt the
throughput and performance. User could configure it by kafka client
parameter. Kylin would not know and should not know how to process the
duplicate messages. The duplicate is semantic concept. What Kylin could
guarantee is to not consume the messages more than once.

2017-05-16 22:37 GMT+08:00 Tingmao Lin <[email protected]>:

> Hi,
>
>
> Current version of Kafka producer provides at least once semantics. Duplicates
> may occur in the stream due to producer retries.
>
> ( the idempotent producer is still under development  https://issues.
> apache.org/jira/browse/KAFKA-4815 )
> Idempotent/transactional Producer Checklist (KIP-98)
> <https://issues.apache.org/jira/browse/KAFKA-4815>
> issues.apache.org
> This issue tracks implementation progress for KIP-98:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 98+-+Exactly+Once+Delivery+and+Transactional+Messaging.
> When using streaming cube, Kylin may get duplicated messages and provide
> unexpected result.
>
> Does anyone have some experience dealing with this problem?  I think this
> is more about Kafka itself, but since no Idempotent producer is available
> at current time,  could I have some advice to work around it on Kylin side?
> Thanks.
>
>
>
>
>

Reply via email to