Berry,

35-40,000 messages/day seems too low. For reference: Atlas in my local VM (8gb 
RAM, 2gb for Atlas server) processes more than 10,000 messages/hour; the 
messages include tables that have about 1,000 columns. Production environments 
should see a higher throughput

Fix in ATLAS-2169 help improve the performance of notification processing - 
especially delete notifications. You might want to try this patch in your 
deployment.

 

Hope this helps.

Madhan

 

 

From: Österlund Berry <berry.osterlund@ç>
Reply-To: "[email protected]" <[email protected]>
Date: Monday, February 5, 2018 at 11:24 PM
To: "[email protected]" <[email protected]>
Subject: Falling behind the Kafka workload

 

Hi

 

 

We are using HDP 2.6.3 with the Atlas version that comes shipped with that 
release. We are having  a problem with lagging and falling behind the messages 
in the ATLAS_HOOK Kafka topic. And I can understand that, as we ingest a large 
number of tables every day to the cluster. Basically, we are creating roughly 
165000 entries in the ATLAS_HOOK topic every day. Primarily from sqoop and 
create/drop tables in Hive. Problem is that Atlas only process around 35-40000 
entries per day, so it kind of builds up.

Many of the tables we import are quite wide, so it’s pretty common that the 
messages in the Kafka topic are between 600-800Kb each. 

I have verified that I can consume the messages in the topic from a normal 
Kafka client, so it’s not a problem with Kafka.I have also cleared the two 
HBase tables and cleared the Kafka topic just to start over from the 
beginning., but the problem remains. 

I would like to get some help with what kind of performance tuning I can do to 
make sure that Atlas can consume at least 200.000 entries from the ATLAS_HOOK 
topic per day (we are planning to add a lot more datasources over the next 
couple of month). What options do I have to make this happen?

//Berry

Reply via email to