Greetings,
I have general design question i did not see addressed in the docs.
Basically how does samza guarantee a single writer for each changelog
partition. Because of strong ordering assumption of these changelog, how do
you protect against zombie processes writing to the changelog with out
Security wouldn’t stop zombie processes from writing to kafka. I had this
problem with yarn before where the container thought it was killing jobs but
they never actually died, and in fact continued to write to kafka.
> On Feb 10, 2016, at 4:23 PM, Jagadish Venkatraman
Hi John
Currently there is no authorization on who writes to Kafka. There is a
Kafka security proposal that the kafka community is working on.
https://cwiki.apache.org/confluence/display/KAFKA/Security
Building this into Samza may entail expensive coordination (to prevent
other jobs). Since,
Hey Rick,
If I understand your question, the goal is really to make sure there are no
orphaned containers that continue to run "off the books".
The newly added SAMZA-871 describes a heart beat mechanism to make sure
orphaned containers actually get killed.
Also, the YARN Node Manager Restart
To second Rick's point. Its less about malicious actors, but rather
containers thought to be lost due to a network partition popping up later
and starting to write to the change log. I assume from Rick's response that
yarn is responsible for ensure only one version of each container is
running and
Jake, Not my question, I was just adding my 2 cents :)
John, it’s not that yarn is responsible for maintaining 1 instance of each
container, samza has an abstract management layer that defers this to yarn, but
some people bypass yarn all together and manage their containers themselves or
run
Hi, Rick and John,
Thanks for the great discussion! As Jacob said, we realized the possible
drawbacks relying solely on YARN for process liveness detection as well and
that's why SAMZA-871 was opened. Please help to comment on the JIRA so that
we can track the discussion and move the design