Hello, I am using Samza on Yarn and have an issue where I see 2 jobs processed for every incoming event. The events are kept on Kafka Raw topic. Samza Yarn job processes it into a processed queue. The issue is that I see 2 jobs in processed queue for every raw message.
Some observations: I see that there are 2 running applications and 9998 apps pending. To my understanding, as there are 2 running jobs, it constitutes to the duplicity. When I kill a running app, another app takes its place (from the pending queue). I have looked at Yarn documentation - https://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html and tried changing the scheduler config to have the maximum application as 2 but it does not seem to take effect. What is the best way to handle this scenario? I want to kill the redundant app job and ensure only 1 runs. Appreciate any inputs. - Shekar
