Ashutosh Mestry created ATLAS-4155: -------------------------------------- Summary: NotificationHookConsumer: Large Compressed Message Processing Problem Key: ATLAS-4155 URL: https://issues.apache.org/jira/browse/ATLAS-4155 Project: Atlas Issue Type: Bug Reporter: Ashutosh Mestry Assignee: Ashutosh Mestry
*Background* Notification messages can be large in size. To get over Kafka's limitation on message size, Atlas has compressed and split messages. If message size goes beyond stipulated threshold, the message is compressed. If compressed message goes beyond the size, it is split into multiple messages. *Situation* Consider a message that is so large that uncompressing it takes longer than Kafka's timeout for message. This causes the problem where the large message offset is not committed in time and that causes Kafka to present the same message again. Message Description: Number of splits: 8 Compressed message size: 7,452,640 Uncompressed message size: 520,803,946 Time taken to uncompress and stitch messages: > 90 seconds Sequence: 2021-02-10 14:57:24,221: first message received 2021-02-10 14:58:36,052: all splits combined – 72 seconds 2021-02-10 15:01:06,971: message processing completed – 90 seconds 2021-02-10 15:01:17,158: Kafka commit failed. Elapsed time since first message: 197 seconds 2021-02-10 15:01:19,857: attempt #2: first message received 2021-02-10 15:03:01,993: attempt #2: all splits combined – 102 seconds 2021-02-10 15:04:44,896: attempt #2: Kafka commit failed. Elapsed time since first message: 205 seconds Back to #5 *Solution* Maintain last offset received. If the same offset is presented, commit the offset and move on to the next message. -- This message was sent by Atlassian Jira (v8.3.4#803005)