Hi Sethukumar You requests are reasonable. Let¹s start with creating the JIRA. Also if you are planning to do some specific contribution, then let us know.
Thanks Bosco From: Sethukumar Ramachandran <[email protected]> Reply-To: "[email protected]" <[email protected]> Date: Tuesday, April 14, 2015 at 9:01 PM To: "[email protected]" <[email protected]>, "[email protected]" <[email protected]> Subject: RE: Some Apache Ranger queries/thoughts > Thanks Durai for the responses. I¹m happy to contribute to Ranger in whatever > way I can. I shall create JIRA with detailed descriptions/requirements for > these items (1) eliminating multiple entries for a single event (2) auditable > actions in hdfs and hive (would be really nice if this is based on some > configurable patterns) (3) Ranger to capture the exact nature of event > (update, create, delete, permission modified, ACL created etc..) . > > On the fourth item it is not exactly the policy changes (policy changes in > Ranger keep track of old value and new value for any kind of changes) but any > changes happening in HDFS and HIVE which can be defined in some fashion. For > example, in HDFS we need to audit file/folder creation, modification to the > same, deletion, user creation, user permission changes, ACL changes, HIVE > grants and revokes etc. just to list some of them (can go in detail in JIRA > with exact requirements). For these kind of changes it is required to keep > track of what changes from what value to what value and by whom and when. If > such a change attempt resulted in failure that also need to be audited. > > > Hope this outlines the requirements. I shall start creating JIRAs for these > and let me know in whatever way I can contribute to this. > > > Thanks > Sethukumar > > > From: Don Bosco Durai [mailto:[email protected]] On Behalf Of Don Bosco > Durai > Sent: Wednesday, April 15, 2015 6:44 AM > To: [email protected]; [email protected] > Subject: Re: Some Apache Ranger queries/thoughts > > > Hi Sethukumar > > > > Thanks for your input. My responses are inline. > > > > Regards > > > > Bosco > > > > > > From: Sethukumar Ramachandran <[email protected]> > Reply-To: "[email protected]" > <[email protected]> > Date: Tuesday, April 14, 2015 at 2:48 AM > To: "[email protected]" <[email protected]> > Cc: "[email protected]" <[email protected]> > Subject: Some Apache Ranger queries/thoughts > > >> >> Hello all, >> >> We are using HDP 2.2 and setup Apache Ranger along with it in Ubuntu 12.04. >> We are not able to fulfill our audit related requirement through Ranger. At >> present we have the following items which we were not able to get through >> Ranger. Please let us know whether we are missing something or ways to >> improve. >> >> >> 1. As part of our audit requirements we are required to capture >> PermissionDenied type of exceptions (or any exceptions for that matter) in >> HDFS and GRANT related issues in Hive. At present we are not able to capture >> these in Ranger. But HDFS audit logs and hiverserver logs have some relevant >> information on this. As a single point of information on audit related stuff >> we would like to have these in Ranger than looking around in those logs. How >> Can we do this with Ranger? > > Bosco: This is our ultimate goal. With Hive we might be auditing all user > level activities. With HDFS, we are auditing all file access related actions. > Would you be able to list out the actions you want to audit. This will help us > to scope the work. Please create a JIRA to track this. > > >> >> 2. Both HDFS and Hive plugins for Ranger actually captures multiple >> audit entries for the same event and this is bit an overhead from auditing >> perspective. Is it possible to have a single and clear audit entry in Ranger >> for a particular auditable event? Is there some configuration available for >> this to work? > > Bosco: In the release under development (Apache Ranger 0.5), the HDFS audit > has been optimized to only one call per request. For Hive, we are just > capturing one action per request. I am now sure whether you are referring to > ³USE² action. Anyway, for Hive, it would be good if you can let us know which > ones are duplicate. We can look into it. > > >> >> 3. If we have an HDFS read, write or delete operation we get multiple >> entries in Ranger audit. But we are not able to figure about the exact nature >> of change happened in HDFS by looking through the Ranger Audit trail >> records. Similar is the case for Hive related operations. The resource name >> that Ranger captures is sometimes vague and point to /tmp folder and all > > Bosco: Hopefully, eliminating the multiple entries will ease some of your > pain. Regarding Hive access to HDFS, since Hive creates a lot of temporary > intermediate files, there is a lot of noise. Your concerns are valid. I feel, > we should extend our UI search to be more smart and help the admin users to > suppress (filter out) accesses to /tmp folders and similar transient > resources. Can you help us documenting and track the requirement by creating a > JIRA? FYI, we are moving our audits to Solr. This gives a lot more search and > filter capabilities and you can also use Banana (or other BI tools) to write > your own custom Audit dashboard. Something that might be interesting to you. > > >> >> 4. If there is a change in HDFS or Hive (grants, data delete/update), as >> a requirement we need to store the old value and new value along with who >> made the change, when the change was made and whether it was successful or >> not. But this is not happening now. How can we achieve this with Ranger? > > Bosco: Assuming you are referring to policy changes, all Hive related policy > changes (Ranger UI, Ranger REST or Hive GRANT/REVOKE) are logged into Ranger. > You can check them from Ranger -> Audit -> Admin tab. For HDFS, all policy > changes done via Ranger UI and Ranger REST are logged in Ranger. > > > > >> >> >> >> Thanks & Regards, >> Sethukumar Ramachandran
