Hi Sethukumar Thanks for your input. My responses are inline.
Regards Bosco From: Sethukumar Ramachandran <[email protected]> Reply-To: "[email protected]" <[email protected]> Date: Tuesday, April 14, 2015 at 2:48 AM To: "[email protected]" <[email protected]> Cc: "[email protected]" <[email protected]> Subject: Some Apache Ranger queries/thoughts > Hello all, > > We are using HDP 2.2 and setup Apache Ranger along with it in Ubuntu 12.04. We > are not able to fulfill our audit related requirement through Ranger. At > present we have the following items which we were not able to get through > Ranger. Please let us know whether we are missing something or ways to > improve. > > > 1. As part of our audit requirements we are required to capture > PermissionDenied type of exceptions (or any exceptions for that matter) in > HDFS and GRANT related issues in Hive. At present we are not able to capture > these in Ranger. But HDFS audit logs and hiverserver logs have some relevant > information on this. As a single point of information on audit related stuff > we would like to have these in Ranger than looking around in those logs. How > Can we do this with Ranger? Bosco: This is our ultimate goal. With Hive we might be auditing all user level activities. With HDFS, we are auditing all file access related actions. Would you be able to list out the actions you want to audit. This will help us to scope the work. Please create a JIRA to track this. > 2. Both HDFS and Hive plugins for Ranger actually captures multiple audit > entries for the same event and this is bit an overhead from auditing > perspective. Is it possible to have a single and clear audit entry in Ranger > for a particular auditable event? Is there some configuration available for > this to work? Bosco: In the release under development (Apache Ranger 0.5), the HDFS audit has been optimized to only one call per request. For Hive, we are just capturing one action per request. I am now sure whether you are referring to ³USE² action. Anyway, for Hive, it would be good if you can let us know which ones are duplicate. We can look into it. > 3. If we have an HDFS read, write or delete operation we get multiple > entries in Ranger audit. But we are not able to figure about the exact nature > of change happened in HDFS by looking through the Ranger Audit trail records. > Similar is the case for Hive related operations. The resource name that Ranger > captures is sometimes vague and point to /tmp folder and all Bosco: Hopefully, eliminating the multiple entries will ease some of your pain. Regarding Hive access to HDFS, since Hive creates a lot of temporary intermediate files, there is a lot of noise. Your concerns are valid. I feel, we should extend our UI search to be more smart and help the admin users to suppress (filter out) accesses to /tmp folders and similar transient resources. Can you help us documenting and track the requirement by creating a JIRA? FYI, we are moving our audits to Solr. This gives a lot more search and filter capabilities and you can also use Banana (or other BI tools) to write your own custom Audit dashboard. Something that might be interesting to you. > 4. If there is a change in HDFS or Hive (grants, data delete/update), as a > requirement we need to store the old value and new value along with who made > the change, when the change was made and whether it was successful or not. But > this is not happening now. How can we achieve this with Ranger? Bosco: Assuming you are referring to policy changes, all Hive related policy changes (Ranger UI, Ranger REST or Hive GRANT/REVOKE) are logged into Ranger. You can check them from Ranger -> Audit -> Admin tab. For HDFS, all policy changes done via Ranger UI and Ranger REST are logged in Ranger. > > > Thanks & Regards, > Sethukumar Ramachandran
