Re: Review Request 68926: SENTRY-2372: SentryStore should not implement grantOptionCheck

2018-10-19 Thread Sergio Pena via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68926/
---

(Updated Oct. 20, 2018, 1:58 a.m.)


Review request for sentry and kalyan kumar kalvagadda.


Changes
---

Addressed comments and add unit tests.


Bugs: SENTRY-2372
https://issues.apache.org/jira/browse/SENTRY-2372


Repository: sentry


Description
---

Make the grant option check on the SentryPolicyStoreProcessor so that it gets 
the group information from it. The SentryStore has the new method to check the 
grant option in the DB only.


Diffs (updated)
-

  
sentry-core/sentry-core-common/src/main/java/org/apache/sentry/core/common/utils/SentryUtils.java
 b225af0bc4e8c472e2ee17677a74525b882e214c 
  
sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/SentryPlugin.java
 b8f5ce7dbc8cd2dd6317f02cc09a225ef460acf4 
  
sentry-service/sentry-service-server/src/main/java/org/apache/sentry/api/service/thrift/SentryPolicyStoreProcessor.java
 709434c388689b63d5db7d71cd6cc47952d647bf 
  
sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/model/MSentryPrivilege.java
 eba40d2ebd9778ddf74df0d35ad8b22500717f73 
  
sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
 b387a22e40b8958395e1c12af12272b89a778726 
  
sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStoreInterface.java
 6217719b8cf0c120b47a1612f6b3f0bbba98f724 
  
sentry-service/sentry-service-server/src/test/java/org/apache/sentry/api/service/thrift/SentryPolicyStoreProcessorTestUtils.java
 PRE-CREATION 
  
sentry-service/sentry-service-server/src/test/java/org/apache/sentry/api/service/thrift/TestSentryPolicyStoreProcessor.java
 16ff79e2e4258c20a46ffcf162872b5b20d97aba 


Diff: https://reviews.apache.org/r/68926/diff/3/

Changes: https://reviews.apache.org/r/68926/diff/2-3/


Testing
---


Thanks,

Sergio Pena



Re: Review Request 68973: SENTRY-2305: Optimize time taken for persistence HMS snapshot by persisting in parallel

2018-10-19 Thread Arjun Mishra via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68973/#review209810
---




sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestSentryStore.java
Lines 2946 (patched)


can you update authzPaths map with db paths too?



sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestSentryStore.java
Lines 2984 (patched)


can you update authzPaths map with db paths too?


- Arjun Mishra


On Oct. 19, 2018, 4:53 p.m., kalyan kumar kalvagadda wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68973/
> ---
> 
> (Updated Oct. 19, 2018, 4:53 p.m.)
> 
> 
> Review request for sentry, Arjun Mishra, Na Li, and Sergio Pena.
> 
> 
> Bugs: SENTRY-2305
> https://issues.apache.org/jira/browse/SENTRY-2305
> 
> 
> Repository: sentry
> 
> 
> Description
> ---
> 
> I have considered multiple options. Persisting in batches is not an option 
> with out changing the schema as the data nucleus does not persist row in 
> batches for tables which have foreign key on other tables.
> 
> I see that best option is to persist the paths in parallel. It gave good 
> results.
> 
> Solution Approach:
> 
> I have used a thread pool to persist the snapshot. Size of this thread pool 
> is configurable. Paths for each object database/table are submitted to this 
> thread pool. If for reason some of the paths are not pesisted, snapshot is 
> removed and exception is throw back.
> 
>   This patch along with SENTRY-2423 was 5 times faster when tested with below.
> 
>  
> 
> Object Type   Count
> Databases 209
> Tables2100
> Partitions24
> 
> 
> Diffs
> -
> 
>   
> sentry-core/sentry-core-common/src/main/java/org/apache/sentry/service/common/ServiceConstants.java
>  092060c450c6a906850630cb10454737157af5fe 
>   
> sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
>  b387a22e40b8958395e1c12af12272b89a778726 
>   
> sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestHMSFollower.java
>  0d62941a7bd45e0d8a67b5d95fdb39a8801f5a26 
>   
> sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestSentryStore.java
>  4a9afe303672baff39be01d4f190034b2bfb75fe 
> 
> 
> Diff: https://reviews.apache.org/r/68973/diff/4/
> 
> 
> Testing
> ---
> 
> Added new test and also made sure that existing tests passed.
> 
> 
> Thanks,
> 
> kalyan kumar kalvagadda
> 
>



Re: Review Request 68973: SENTRY-2305: Optimize time taken for persistence HMS snapshot by persisting in parallel

2018-10-19 Thread Arjun Mishra via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68973/#review209809
---




sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
Lines 3944 (patched)


Can you change the comment to deleting all MAuthzPathsMapping and 
corresponding MAuthzPath objects



sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestSentryStore.java
Lines 2997 (patched)


I think a more clear test is to check counts on MAuthzPathsMapping and 
MAuthzPaths. You agree?


- Arjun Mishra


On Oct. 19, 2018, 4:53 p.m., kalyan kumar kalvagadda wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68973/
> ---
> 
> (Updated Oct. 19, 2018, 4:53 p.m.)
> 
> 
> Review request for sentry, Arjun Mishra, Na Li, and Sergio Pena.
> 
> 
> Bugs: SENTRY-2305
> https://issues.apache.org/jira/browse/SENTRY-2305
> 
> 
> Repository: sentry
> 
> 
> Description
> ---
> 
> I have considered multiple options. Persisting in batches is not an option 
> with out changing the schema as the data nucleus does not persist row in 
> batches for tables which have foreign key on other tables.
> 
> I see that best option is to persist the paths in parallel. It gave good 
> results.
> 
> Solution Approach:
> 
> I have used a thread pool to persist the snapshot. Size of this thread pool 
> is configurable. Paths for each object database/table are submitted to this 
> thread pool. If for reason some of the paths are not pesisted, snapshot is 
> removed and exception is throw back.
> 
>   This patch along with SENTRY-2423 was 5 times faster when tested with below.
> 
>  
> 
> Object Type   Count
> Databases 209
> Tables2100
> Partitions24
> 
> 
> Diffs
> -
> 
>   
> sentry-core/sentry-core-common/src/main/java/org/apache/sentry/service/common/ServiceConstants.java
>  092060c450c6a906850630cb10454737157af5fe 
>   
> sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
>  b387a22e40b8958395e1c12af12272b89a778726 
>   
> sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestHMSFollower.java
>  0d62941a7bd45e0d8a67b5d95fdb39a8801f5a26 
>   
> sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestSentryStore.java
>  4a9afe303672baff39be01d4f190034b2bfb75fe 
> 
> 
> Diff: https://reviews.apache.org/r/68973/diff/4/
> 
> 
> Testing
> ---
> 
> Added new test and also made sure that existing tests passed.
> 
> 
> Thanks,
> 
> kalyan kumar kalvagadda
> 
>



Re: Review Request 68973: SENTRY-2305: Optimize time taken for persistence HMS snapshot by persisting in parallel

2018-10-19 Thread kalyan kumar kalvagadda via Review Board


> On Oct. 19, 2018, 7:12 p.m., Na Li wrote:
> > sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
> > Lines 187 (patched)
> > 
> >
> > should this variable be local to persistFullPathsImage, and be atomic 
> > as it is set by more than one thread.

If it is local variable it would be not avaiable in SnapshotUpdater. member 
variables of SentryStore will be available to instances of SnapshotUpdater as 
it is a inner class for sentrystore.


> On Oct. 19, 2018, 7:12 p.m., Na Li wrote:
> > sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
> > Lines 294 (patched)
> > 
> >
> > When full snapshot fetch is done, should these counters be reset? 
> > Otherwise, they will keep on increasing, more than 
> > totalNumberOfObjectsToPersist

You are right. I will fix it. So yo have any other comments on this patch?


- kalyan kumar


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68973/#review209798
---


On Oct. 19, 2018, 4:53 p.m., kalyan kumar kalvagadda wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68973/
> ---
> 
> (Updated Oct. 19, 2018, 4:53 p.m.)
> 
> 
> Review request for sentry, Arjun Mishra, Na Li, and Sergio Pena.
> 
> 
> Bugs: SENTRY-2305
> https://issues.apache.org/jira/browse/SENTRY-2305
> 
> 
> Repository: sentry
> 
> 
> Description
> ---
> 
> I have considered multiple options. Persisting in batches is not an option 
> with out changing the schema as the data nucleus does not persist row in 
> batches for tables which have foreign key on other tables.
> 
> I see that best option is to persist the paths in parallel. It gave good 
> results.
> 
> Solution Approach:
> 
> I have used a thread pool to persist the snapshot. Size of this thread pool 
> is configurable. Paths for each object database/table are submitted to this 
> thread pool. If for reason some of the paths are not pesisted, snapshot is 
> removed and exception is throw back.
> 
>   This patch along with SENTRY-2423 was 5 times faster when tested with below.
> 
>  
> 
> Object Type   Count
> Databases 209
> Tables2100
> Partitions24
> 
> 
> Diffs
> -
> 
>   
> sentry-core/sentry-core-common/src/main/java/org/apache/sentry/service/common/ServiceConstants.java
>  092060c450c6a906850630cb10454737157af5fe 
>   
> sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
>  b387a22e40b8958395e1c12af12272b89a778726 
>   
> sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestHMSFollower.java
>  0d62941a7bd45e0d8a67b5d95fdb39a8801f5a26 
>   
> sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestSentryStore.java
>  4a9afe303672baff39be01d4f190034b2bfb75fe 
> 
> 
> Diff: https://reviews.apache.org/r/68973/diff/4/
> 
> 
> Testing
> ---
> 
> Added new test and also made sure that existing tests passed.
> 
> 
> Thanks,
> 
> kalyan kumar kalvagadda
> 
>



Re: Review Request 69076: SENTRY-2301: Log where sentry stands in the snapshot fetching process, periodically

2018-10-19 Thread Arjun Mishra via Review Board


> On Oct. 19, 2018, 7:56 p.m., Na Li wrote:
> > sentry-service/sentry-service-server/src/main/java/org/apache/sentry/api/service/thrift/SentryMetrics.java
> > Lines 127 (patched)
> > 
> >
> > We already have a timer for this purpose, name 
> > "org.apache.sentry.service.thrift.FullUpdateInitializer.snapshot"
> > 
> > It is defined in SentryHMSClient
> >   private final Timer updateTimer = SentryMetrics.getInstance()
> >   .getTimer(name(FullUpdateInitializer.class, SNAPSHOT));
> >   
> > It is called in 
> > SentryHMSClient.fetchFullUpdate()
> > 
> > Do we need another timer for the same purpose?
> 
> Na Li wrote:
> The existing timer is removed to be the timer specified here.

I didn't create a new timer. I simply moved it to SentryMetrics and renamed it


- Arjun


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69076/#review209799
---


On Oct. 18, 2018, 9:29 p.m., Arjun Mishra wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69076/
> ---
> 
> (Updated Oct. 18, 2018, 9:29 p.m.)
> 
> 
> Review request for sentry, kalyan kumar kalvagadda, Na Li, and Sergio Pena.
> 
> 
> Repository: sentry
> 
> 
> Description
> ---
> 
> When sentry is fetching snapshot from HMS, it should log periodically on 
> where it stands in the snapshot process. This will help person debugging it 
> and help him understand the progress.
> 
>  
> 
> This is important as this process could take magnitude of minutes when the 
> HMS data is huge.
> 
> 
> Diffs
> -
> 
>   
> sentry-service/sentry-service-server/src/main/java/org/apache/sentry/api/service/thrift/SentryMetrics.java
>  214d78c53 
>   
> sentry-service/sentry-service-server/src/main/java/org/apache/sentry/service/thrift/FullUpdateInitializer.java
>  3e27d1bbe 
>   
> sentry-service/sentry-service-server/src/main/java/org/apache/sentry/service/thrift/SentryHMSClient.java
>  4baeb6725 
> 
> 
> Diff: https://reviews.apache.org/r/69076/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Arjun Mishra
> 
>



Re: Review Request 69076: SENTRY-2301: Log where sentry stands in the snapshot fetching process, periodically

2018-10-19 Thread Arjun Mishra via Review Board


> On Oct. 19, 2018, 7:56 p.m., Na Li wrote:
> > sentry-service/sentry-service-server/src/main/java/org/apache/sentry/service/thrift/FullUpdateInitializer.java
> > Line 307 (original), 313 (patched)
> > 
> >
> > You can change it to check if part.getSd() is null or not

Lina that is fixed in SENTRY-2428


- Arjun


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69076/#review209799
---


On Oct. 18, 2018, 9:29 p.m., Arjun Mishra wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69076/
> ---
> 
> (Updated Oct. 18, 2018, 9:29 p.m.)
> 
> 
> Review request for sentry, kalyan kumar kalvagadda, Na Li, and Sergio Pena.
> 
> 
> Repository: sentry
> 
> 
> Description
> ---
> 
> When sentry is fetching snapshot from HMS, it should log periodically on 
> where it stands in the snapshot process. This will help person debugging it 
> and help him understand the progress.
> 
>  
> 
> This is important as this process could take magnitude of minutes when the 
> HMS data is huge.
> 
> 
> Diffs
> -
> 
>   
> sentry-service/sentry-service-server/src/main/java/org/apache/sentry/api/service/thrift/SentryMetrics.java
>  214d78c53 
>   
> sentry-service/sentry-service-server/src/main/java/org/apache/sentry/service/thrift/FullUpdateInitializer.java
>  3e27d1bbe 
>   
> sentry-service/sentry-service-server/src/main/java/org/apache/sentry/service/thrift/SentryHMSClient.java
>  4baeb6725 
> 
> 
> Diff: https://reviews.apache.org/r/69076/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Arjun Mishra
> 
>



Re: Review Request 69076: SENTRY-2301: Log where sentry stands in the snapshot fetching process, periodically

2018-10-19 Thread Arjun Mishra via Review Board


> On Oct. 19, 2018, 6:46 p.m., Sergio Pena wrote:
> > I did a quick look and I feel this logging will be too verbose and may slow 
> > the fetch process as there is one message per task executed. There can be 
> > thousands or even millions of tasks (one per databse, one per table, one 
> > per 100 partitions).
> > 
> > Is there another way to print a progress instead of two messages per task?

Sergio we don't do one per table. We do 1 per 100 tables. If a customer had 100 
database, each database had 100 tables (1) tables and each table had 100 
partitions (1*1000 = 1000 10 million) we would have a total of 100 + 
100 + 100 = 300 tasks. So its not that many. Also we need to know what task is 
taking how much time. I will post a snippet of log messages


- Arjun


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69076/#review209797
---


On Oct. 18, 2018, 9:29 p.m., Arjun Mishra wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69076/
> ---
> 
> (Updated Oct. 18, 2018, 9:29 p.m.)
> 
> 
> Review request for sentry, kalyan kumar kalvagadda, Na Li, and Sergio Pena.
> 
> 
> Repository: sentry
> 
> 
> Description
> ---
> 
> When sentry is fetching snapshot from HMS, it should log periodically on 
> where it stands in the snapshot process. This will help person debugging it 
> and help him understand the progress.
> 
>  
> 
> This is important as this process could take magnitude of minutes when the 
> HMS data is huge.
> 
> 
> Diffs
> -
> 
>   
> sentry-service/sentry-service-server/src/main/java/org/apache/sentry/api/service/thrift/SentryMetrics.java
>  214d78c53 
>   
> sentry-service/sentry-service-server/src/main/java/org/apache/sentry/service/thrift/FullUpdateInitializer.java
>  3e27d1bbe 
>   
> sentry-service/sentry-service-server/src/main/java/org/apache/sentry/service/thrift/SentryHMSClient.java
>  4baeb6725 
> 
> 
> Diff: https://reviews.apache.org/r/69076/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Arjun Mishra
> 
>



Re: Review Request 69076: SENTRY-2301: Log where sentry stands in the snapshot fetching process, periodically

2018-10-19 Thread Na Li via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69076/#review209799
---




sentry-service/sentry-service-server/src/main/java/org/apache/sentry/api/service/thrift/SentryMetrics.java
Lines 127 (patched)


We already have a timer for this purpose, name 
"org.apache.sentry.service.thrift.FullUpdateInitializer.snapshot"

It is defined in SentryHMSClient
  private final Timer updateTimer = SentryMetrics.getInstance()
  .getTimer(name(FullUpdateInitializer.class, SNAPSHOT));
  
It is called in 
SentryHMSClient.fetchFullUpdate()

Do we need another timer for the same purpose?



sentry-service/sentry-service-server/src/main/java/org/apache/sentry/service/thrift/FullUpdateInitializer.java
Line 307 (original), 313 (patched)


You can change it to check if part.getSd() is null or not


- Na Li


On Oct. 18, 2018, 9:29 p.m., Arjun Mishra wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69076/
> ---
> 
> (Updated Oct. 18, 2018, 9:29 p.m.)
> 
> 
> Review request for sentry, kalyan kumar kalvagadda, Na Li, and Sergio Pena.
> 
> 
> Repository: sentry
> 
> 
> Description
> ---
> 
> When sentry is fetching snapshot from HMS, it should log periodically on 
> where it stands in the snapshot process. This will help person debugging it 
> and help him understand the progress.
> 
>  
> 
> This is important as this process could take magnitude of minutes when the 
> HMS data is huge.
> 
> 
> Diffs
> -
> 
>   
> sentry-service/sentry-service-server/src/main/java/org/apache/sentry/api/service/thrift/SentryMetrics.java
>  214d78c53 
>   
> sentry-service/sentry-service-server/src/main/java/org/apache/sentry/service/thrift/FullUpdateInitializer.java
>  3e27d1bbe 
>   
> sentry-service/sentry-service-server/src/main/java/org/apache/sentry/service/thrift/SentryHMSClient.java
>  4baeb6725 
> 
> 
> Diff: https://reviews.apache.org/r/69076/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Arjun Mishra
> 
>



Re: Review Request 68973: SENTRY-2305: Optimize time taken for persistence HMS snapshot by persisting in parallel

2018-10-19 Thread Na Li via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68973/#review209798
---




sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
Lines 187 (patched)


should this variable be local to persistFullPathsImage, and be atomic as it 
is set by more than one thread.



sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
Lines 294 (patched)


When full snapshot fetch is done, should these counters be reset? 
Otherwise, they will keep on increasing, more than totalNumberOfObjectsToPersist


- Na Li


On Oct. 19, 2018, 4:53 p.m., kalyan kumar kalvagadda wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68973/
> ---
> 
> (Updated Oct. 19, 2018, 4:53 p.m.)
> 
> 
> Review request for sentry, Arjun Mishra, Na Li, and Sergio Pena.
> 
> 
> Bugs: SENTRY-2305
> https://issues.apache.org/jira/browse/SENTRY-2305
> 
> 
> Repository: sentry
> 
> 
> Description
> ---
> 
> I have considered multiple options. Persisting in batches is not an option 
> with out changing the schema as the data nucleus does not persist row in 
> batches for tables which have foreign key on other tables.
> 
> I see that best option is to persist the paths in parallel. It gave good 
> results.
> 
> Solution Approach:
> 
> I have used a thread pool to persist the snapshot. Size of this thread pool 
> is configurable. Paths for each object database/table are submitted to this 
> thread pool. If for reason some of the paths are not pesisted, snapshot is 
> removed and exception is throw back.
> 
>   This patch along with SENTRY-2423 was 5 times faster when tested with below.
> 
>  
> 
> Object Type   Count
> Databases 209
> Tables2100
> Partitions24
> 
> 
> Diffs
> -
> 
>   
> sentry-core/sentry-core-common/src/main/java/org/apache/sentry/service/common/ServiceConstants.java
>  092060c450c6a906850630cb10454737157af5fe 
>   
> sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
>  b387a22e40b8958395e1c12af12272b89a778726 
>   
> sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestHMSFollower.java
>  0d62941a7bd45e0d8a67b5d95fdb39a8801f5a26 
>   
> sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestSentryStore.java
>  4a9afe303672baff39be01d4f190034b2bfb75fe 
> 
> 
> Diff: https://reviews.apache.org/r/68973/diff/4/
> 
> 
> Testing
> ---
> 
> Added new test and also made sure that existing tests passed.
> 
> 
> Thanks,
> 
> kalyan kumar kalvagadda
> 
>



Review Request 69088: SENTRY-2428: Skip null partitions or partitions with null sds entries

2018-10-19 Thread Arjun Mishra via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69088/
---

Review request for sentry and Na Li.


Bugs: SENTRY-2428
https://issues.apache.org/jira/browse/SENTRY-2428


Repository: sentry


Description
---

This issue is a regression of SENTRY-2309


Diffs
-

  
sentry-service/sentry-service-server/src/main/java/org/apache/sentry/service/thrift/FullUpdateInitializer.java
 3e27d1bbe 


Diff: https://reviews.apache.org/r/69088/diff/1/


Testing
---


Thanks,

Arjun Mishra



Re: Review Request 69076: SENTRY-2301: Log where sentry stands in the snapshot fetching process, periodically

2018-10-19 Thread Sergio Pena via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69076/#review209797
---



I did a quick look and I feel this logging will be too verbose and may slow the 
fetch process as there is one message per task executed. There can be thousands 
or even millions of tasks (one per databse, one per table, one per 100 
partitions).

Is there another way to print a progress instead of two messages per task?

- Sergio Pena


On Oct. 18, 2018, 9:29 p.m., Arjun Mishra wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69076/
> ---
> 
> (Updated Oct. 18, 2018, 9:29 p.m.)
> 
> 
> Review request for sentry, kalyan kumar kalvagadda, Na Li, and Sergio Pena.
> 
> 
> Repository: sentry
> 
> 
> Description
> ---
> 
> When sentry is fetching snapshot from HMS, it should log periodically on 
> where it stands in the snapshot process. This will help person debugging it 
> and help him understand the progress.
> 
>  
> 
> This is important as this process could take magnitude of minutes when the 
> HMS data is huge.
> 
> 
> Diffs
> -
> 
>   
> sentry-service/sentry-service-server/src/main/java/org/apache/sentry/api/service/thrift/SentryMetrics.java
>  214d78c53 
>   
> sentry-service/sentry-service-server/src/main/java/org/apache/sentry/service/thrift/FullUpdateInitializer.java
>  3e27d1bbe 
>   
> sentry-service/sentry-service-server/src/main/java/org/apache/sentry/service/thrift/SentryHMSClient.java
>  4baeb6725 
> 
> 
> Diff: https://reviews.apache.org/r/69076/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Arjun Mishra
> 
>



Review Request 69087: SENTRY-2249: Enable batch insert of HMS paths in Full Snapshot.

2018-10-19 Thread kalyan kumar kalvagadda via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69087/
---

Review request for sentry, Arjun Mishra, Na Li, and Sergio Pena.


Bugs: SENTRY-2249
https://issues.apache.org/jira/browse/SENTRY-2249


Repository: sentry


Description
---

Currently each entry in full snapshot of HMS is persisted one entry at a time. 
Instead it could be optimized by persisting the path entries in batches. DB 
operations are expensive, reducing the number of database operations and around 
trip time will help. This would decrease the time to persist the snapshot in to 
database significantly.

Size of the batch could be configurable.

This patch will change with SENTRY-2305. This review request is just to give 
you a sence of the change.


Diffs
-

  
sentry-core/sentry-core-common/src/main/java/org/apache/sentry/core/common/utils/SentryConstants.java
 d8c1061d36ed0b92116f0b2dd5ec820ccb166818 
  
sentry-core/sentry-core-common/src/main/java/org/apache/sentry/service/common/ServiceConstants.java
 092060c450c6a906850630cb10454737157af5fe 
  
sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/model/MAuthzPathsMapping.java
 c51f25a0393b482814afcd3b7a19e547b689ac6e 
  
sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/model/MPathToPersist.java
 PRE-CREATION 
  
sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/model/package.jdo
 20ec0deab6b97065cfe99beea3d14a6c7268aac3 
  
sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
 b387a22e40b8958395e1c12af12272b89a778726 


Diff: https://reviews.apache.org/r/69087/diff/1/


Testing
---


Thanks,

kalyan kumar kalvagadda



Re: Review Request 68973: SENTRY-2305: Optimize time taken for persistence HMS snapshot by persisting in parallel

2018-10-19 Thread Arjun Mishra via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68973/#review209795
---



Had a discussion with Kalyan about commiting a snapshot Id to 
AUTHZ_PATHS_MAPPING table that doesn't yet exist in AUTH_PATHS_SNAPSHOT_ID 
table. As of now since there is no foreign key constraint between 
AUTHZ_PATHS_MAPPING and AUTH_PATHS_SNAPSHOT_ID table it is fine. In future if 
we decide to add one code may need to be modified

- Arjun Mishra


On Oct. 19, 2018, 4:53 p.m., kalyan kumar kalvagadda wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68973/
> ---
> 
> (Updated Oct. 19, 2018, 4:53 p.m.)
> 
> 
> Review request for sentry, Arjun Mishra, Na Li, and Sergio Pena.
> 
> 
> Bugs: SENTRY-2305
> https://issues.apache.org/jira/browse/SENTRY-2305
> 
> 
> Repository: sentry
> 
> 
> Description
> ---
> 
> I have considered multiple options. Persisting in batches is not an option 
> with out changing the schema as the data nucleus does not persist row in 
> batches for tables which have foreign key on other tables.
> 
> I see that best option is to persist the paths in parallel. It gave good 
> results.
> 
> Solution Approach:
> 
> I have used a thread pool to persist the snapshot. Size of this thread pool 
> is configurable. Paths for each object database/table are submitted to this 
> thread pool. If for reason some of the paths are not pesisted, snapshot is 
> removed and exception is throw back.
> 
>   This patch along with SENTRY-2423 was 5 times faster when tested with below.
> 
>  
> 
> Object Type   Count
> Databases 209
> Tables2100
> Partitions24
> 
> 
> Diffs
> -
> 
>   
> sentry-core/sentry-core-common/src/main/java/org/apache/sentry/service/common/ServiceConstants.java
>  092060c450c6a906850630cb10454737157af5fe 
>   
> sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
>  b387a22e40b8958395e1c12af12272b89a778726 
>   
> sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestHMSFollower.java
>  0d62941a7bd45e0d8a67b5d95fdb39a8801f5a26 
>   
> sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestSentryStore.java
>  4a9afe303672baff39be01d4f190034b2bfb75fe 
> 
> 
> Diff: https://reviews.apache.org/r/68973/diff/4/
> 
> 
> Testing
> ---
> 
> Added new test and also made sure that existing tests passed.
> 
> 
> Thanks,
> 
> kalyan kumar kalvagadda
> 
>



Re: Review Request 68973: SENTRY-2305: Optimize time taken for persistence HMS snapshot by persisting in parallel

2018-10-19 Thread kalyan kumar kalvagadda via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68973/
---

(Updated Oct. 19, 2018, 4:53 p.m.)


Review request for sentry, Arjun Mishra, Na Li, and Sergio Pena.


Changes
---

Addressed review comments.


Bugs: SENTRY-2305
https://issues.apache.org/jira/browse/SENTRY-2305


Repository: sentry


Description
---

I have considered multiple options. Persisting in batches is not an option with 
out changing the schema as the data nucleus does not persist row in batches for 
tables which have foreign key on other tables.

I see that best option is to persist the paths in parallel. It gave good 
results.

Solution Approach:

I have used a thread pool to persist the snapshot. Size of this thread pool is 
configurable. Paths for each object database/table are submitted to this thread 
pool. If for reason some of the paths are not pesisted, snapshot is removed and 
exception is throw back.

  This patch along with SENTRY-2423 was 5 times faster when tested with below.

 

Object Type Count
Databases   209
Tables  2100
Partitions  24


Diffs (updated)
-

  
sentry-core/sentry-core-common/src/main/java/org/apache/sentry/service/common/ServiceConstants.java
 092060c450c6a906850630cb10454737157af5fe 
  
sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
 b387a22e40b8958395e1c12af12272b89a778726 
  
sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestHMSFollower.java
 0d62941a7bd45e0d8a67b5d95fdb39a8801f5a26 
  
sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestSentryStore.java
 4a9afe303672baff39be01d4f190034b2bfb75fe 


Diff: https://reviews.apache.org/r/68973/diff/4/

Changes: https://reviews.apache.org/r/68973/diff/3-4/


Testing
---

Added new test and also made sure that existing tests passed.


Thanks,

kalyan kumar kalvagadda