Hi folks, We are looking into some alternate design choices for HDFS Sentry sync to be able to maintain the edit logs of both path updates(coming from HMS) and perm updates (coming from Sentry) in a persistent state. The main motivation is to make the HA cases more stable, as making the services as stateless as possible would make them more fault tolerant and bringing up multiple services can be done easily.
Right now, Sentry service buffers the edit history of path updates from HMS and perm updates from itself in memory and serves NN, so that NN can build ACLS based on this data for Hive based files. Some options: 1. Support edit history at the source: Both HMS and Sentry can implement a WAL in its backend DB, so that NN can request the most recent updates reliably from a persistent storage. Recent Hive replication support added some support to WAL, would be good to explore if we can build on top of it. 2. Source writes the edit history to a persistent, distributed, fault tolerant storage as HDFS/Kafka In case 1, NN can either directly read the edit history directly from Sentry/HMS or Sentry can act as a liaison which serves edits to NN. Both have some advantages and disadvantages. Let me know your thoughts and I can go into details. Thanks!