I am running a simple Spark structured streaming application that is pulling
data from a Kafka Topic. I have a Kafka Topic with nearly 1000 partitions. I am
running this app on 6 node EMR cluster with 4 cores and 16GB RAM. I observed
that Spark is trying to pull data from all 1024 Kafka partition and after
running successful for few iteration it is stuck with following exception:
20/04/18 00:51:41 INFO ContextCleaner: Cleaned accumulator 10120/04/18 00:51:41
INFO ContextCleaner: Cleaned accumulator 6620/04/18 00:51:41 INFO
ContextCleaner: Cleaned accumulator 7720/04/18 00:51:41 INFO ContextCleaner:
Cleaned accumulator 78
20/04/18 00:51:41 INFO BlockManagerInfo: Removed broadcast_2_piece0 on in
memory (size: 4.5 KB, free: 2.7 GB)20/04/18 00:51:41 INFO BlockManagerInfo:
Removed broadcast_2_piece0 on ip- in memory (size: 4.5 KB, free: 2.7
GB)20/04/18 00:51:41 INFO BlockManagerInfo: Removed broadcast_2_piece0 on ip-
in memory (size: 4.5 KB, free: 2.7 GB)Then Sparks show RUNNING but it is NOT
Processing any data.