[
https://issues.apache.org/jira/browse/ATLAS-629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15272262#comment-15272262
]
Hemanth Yamijala commented on ATLAS-629:
Started looking at the approach to fix this problem. With Kafka's (old) high
level consumer, we only have *atmost-once* delivery because the offsets read
from the partitions are auto committed by default. So if a message is read and
offset auto committed, but before the metadata ingest is completed, the server
reboots, then this message could be lost for processing.
To fix this issue, I am looking at *atleast-once* delivery semantics with
Kafka, under the assumption that *message processing can be idempotent on the
server*. Given we use transactions in Titan and also have create-or-update
semantics, this may be mostly true - but not really sure. Will need to test.
To move to atleast-once processing, the predominant approach people follow
seems to be to:
* disable auto commit
* Create one ConsumerConnector per partition of a topic.
The latter is because the old high level consumer does not provide for commit
per partition. It can only commit all offsets read by all partitions it is
connected to [(Reference
1)|http://grokbase.com/t/kafka/users/144b80h269/consumerconnector-commitoffsets].
The above suggestion of one consumer connector per partition has been proposed
by Kafka experts in many threads [(Reference
2)|http://mail-archives.apache.org/mod_mbox/kafka-users/201409.mbox/%3CCAHBV8WeYj8ce6G5J0k3a1hGgdNskGv3bsaP8JXSM=kwbnuj...@mail.gmail.com%3E].
The other option could be to move to the newer consumer API in Kafka (with
0.9+) that (I think) provides better options for handling a per partition
commit. However, the new consumer is still marked beta, so not really sure. Can
check with some Kafka committers internally.
For now, I will try out the first approach and see. In the meantime, happy to
hear feedback from others.
> Kafka messages in ATLAS_HOOK might be lost in HA mode at the instant of
> failover.
> -
>
> Key: ATLAS-629
> URL: https://issues.apache.org/jira/browse/ATLAS-629
> Project: Atlas
> Issue Type: Bug
>Affects Versions: 0.7-incubating
>Reporter: Hemanth Yamijala
>Assignee: Hemanth Yamijala
>Priority: Critical
> Fix For: 0.7-incubating
>
>
> Write data to Kafka continuously from Hive hook - can do this by writing a
> script that constantly creates tables. Bring down the Active instance with
> kill -9. Ensure writes continue after passive becomes active. The expectation
> is the number of tables created and the number of tables in Atlas match.
> In one test, wrote 180 tables and switched over 6 times from one instance to
> another. Found that 1 table was lost of the lot. i.e. 179 tables were
> created, and 1 did not get in.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)