Hi
We are using HDP 2.6.3 with the Atlas version that comes shipped with that
release. We are having a problem with lagging and falling behind the messages
in the ATLAS_HOOK Kafka topic. And I can understand that, as we ingest a large
number of tables every day to the cluster. Basically, we are creating roughly
165000 entries in the ATLAS_HOOK topic every day. Primarily from sqoop and
create/drop tables in Hive. Problem is that Atlas only process around 35-40000
entries per day, so it kind of builds up.
Many of the tables we import are quite wide, so it's pretty common that the
messages in the Kafka topic are between 600-800Kb each.
I have verified that I can consume the messages in the topic from a normal
Kafka client, so it's not a problem with Kafka.I have also cleared the two
HBase tables and cleared the Kafka topic just to start over from the
beginning., but the problem remains.
I would like to get some help with what kind of performance tuning I can do to
make sure that Atlas can consume at least 200.000 entries from the ATLAS_HOOK
topic per day (we are planning to add a lot more datasources over the next
couple of month). What options do I have to make this happen?
//Berry