Re: Review Request 68973: SENTRY-2305: Optimize time taken for persistence HMS snapshot by persisting in parallel

2018-10-10 Thread Na Li via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68973/#review209430
---



do you have measurement on how much time saved on saving big snapshot in 
parallel vs the number of threads? When multiple threads accessing the same 
table, the improvement may not be linear.

- Na Li


On Oct. 10, 2018, 11:04 p.m., kalyan kumar kalvagadda wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68973/
> ---
> 
> (Updated Oct. 10, 2018, 11:04 p.m.)
> 
> 
> Review request for sentry, Arjun Mishra, Na Li, and Sergio Pena.
> 
> 
> Bugs: SENTRY-2305
> https://issues.apache.org/jira/browse/SENTRY-2305
> 
> 
> Repository: sentry
> 
> 
> Description
> ---
> 
> I have considered multiple options. Persisting in batches is not an option 
> with out changing the schema as the data nucleus does not persist row in 
> batches for tables which have foreign key on other tables.
> 
> I see that best option is to persist the paths in parallel. It gave good 
> results.
> 
> Solution Approach:
> 
> I have used a thread pool to persist the snapshot. Size of this thread pool 
> is configurable. Paths for each object database/table are submitted to this 
> thread pool. If for reason some of the paths are not pesisted, snapshot is 
> removed and exception is throw back.
> 
>   This patch along with SENTRY-2423 was 5 times faster when tested with below.
> 
>  
> 
> Object Type   Count
> Databases 209
> Tables2100
> Partitions24
> 
> 
> Diffs
> -
> 
>   
> sentry-core/sentry-core-common/src/main/java/org/apache/sentry/service/common/ServiceConstants.java
>  092060c450c6a906850630cb10454737157af5fe 
>   
> sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
>  7a736ca9604eb0bb182a159b5a2aed274275c16e 
>   
> sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestHMSFollower.java
>  0d62941a7bd45e0d8a67b5d95fdb39a8801f5a26 
>   
> sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestSentryStore.java
>  4a9afe303672baff39be01d4f190034b2bfb75fe 
> 
> 
> Diff: https://reviews.apache.org/r/68973/diff/3/
> 
> 
> Testing
> ---
> 
> Added new test and also made sure that existing tests passed.
> 
> 
> Thanks,
> 
> kalyan kumar kalvagadda
> 
>



Re: Review Request 68968: SENTRY-2425: Add metric to track the time taken to update the owner privilege

2018-10-10 Thread Na Li via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68968/#review209429
---




sentry-service/sentry-service-server/src/main/java/org/apache/sentry/api/service/thrift/SentryMetrics.java
Lines 124 (patched)


We should use different metric for "granting" vs "update" since update is 
much more complicated and would take more time. Once we support "revoke" owner 
privilege, we can add a metric for that.


- Na Li


On Oct. 9, 2018, 6:01 p.m., kalyan kumar kalvagadda wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68968/
> ---
> 
> (Updated Oct. 9, 2018, 6:01 p.m.)
> 
> 
> Review request for sentry, Arjun Mishra, Na Li, and Sergio Pena.
> 
> 
> Bugs: SENTRY-2425
> https://issues.apache.org/jira/browse/SENTRY-2425
> 
> 
> Repository: sentry
> 
> 
> Description
> ---
> 
> New metric should be added to track the time taken by sentry server in 
> granting/updating owner privileges.
> 
> 
> Diffs
> -
> 
>   
> sentry-service/sentry-service-server/src/main/java/org/apache/sentry/api/service/thrift/SentryMetrics.java
>  d34340cd6a41ba01943bd3b2b42c9bddc9fa722f 
>   
> sentry-service/sentry-service-server/src/main/java/org/apache/sentry/api/service/thrift/SentryPolicyStoreProcessor.java
>  709434c388689b63d5db7d71cd6cc47952d647bf 
> 
> 
> Diff: https://reviews.apache.org/r/68968/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> kalyan kumar kalvagadda
> 
>



Re: Review Request 68958: SENTRY-2419: Log where sentry stands in the process of persisting the snpashot

2018-10-10 Thread Na Li via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68958/#review209427
---




sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
Lines 487 (patched)


log message is not right. Fix it.



sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
Lines 501 (patched)


log message is not right. Fix it.


- Na Li


On Oct. 8, 2018, 10:16 p.m., Arjun Mishra wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68958/
> ---
> 
> (Updated Oct. 8, 2018, 10:16 p.m.)
> 
> 
> Review request for sentry, kalyan kumar kalvagadda, Na Li, and Sergio Pena.
> 
> 
> Bugs: SENTRY-2419
> https://issues.apache.org/jira/browse/SENTRY-2419
> 
> 
> Repository: sentry
> 
> 
> Description
> ---
> 
> Process of persisting the snapshot might take longer based on the snapshot 
> size. It would be helpfull to to be able to understand where the sentry 
> sentry stands in the process of persisting.
> 
>  
> 
> Adding a Metric for this will be helpfull in debugging.
> 
> 
> Diffs
> -
> 
>   
> sentry-core/sentry-core-common/src/main/java/org/apache/sentry/service/common/ServiceConstants.java
>  e90fe2df4 
>   
> sentry-service/sentry-service-server/src/main/java/org/apache/sentry/api/service/thrift/SentryMetrics.java
>  d34340cd6 
>   
> sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/HMSFollower.java
>  b6dca7aa8 
>   
> sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
>  1722109e3 
>   
> sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStoreInterface.java
>  3a68eb6bf 
>   
> sentry-service/sentry-service-server/src/test/java/org/apache/sentry/api/service/thrift/TestSentryPolicyStoreProcessor.java
>  059621a68 
> 
> 
> Diff: https://reviews.apache.org/r/68958/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Arjun Mishra
> 
>



Re: Review Request 68982: SENTRY-2426: Update the version in pom file at 2.1 branch

2018-10-10 Thread Na Li via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68982/#review209426
---


Ship it!




Ship It!

- Na Li


On Oct. 10, 2018, 9:42 p.m., Na Li wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68982/
> ---
> 
> (Updated Oct. 10, 2018, 9:42 p.m.)
> 
> 
> Review request for sentry, Arjun Mishra, kalyan kumar kalvagadda, and Sergio 
> Pena.
> 
> 
> Bugs: sentry-2426
> https://issues.apache.org/jira/browse/sentry-2426
> 
> 
> Repository: sentry
> 
> 
> Description
> ---
> 
> update the version from 2.1.0 to 2.1.1-SNAPSHOT in the pom files of 
> branch-2.1 for maintenance release
> 
> 
> Diffs
> -
> 
>   pom.xml 73112e8 
>   sentry-binding/pom.xml 8784794 
>   sentry-binding/sentry-binding-hbase-indexer/pom.xml e0cb32c 
>   sentry-binding/sentry-binding-hive-common/pom.xml cfa5081 
>   sentry-binding/sentry-binding-hive-conf/pom.xml c9f6fcc 
>   sentry-binding/sentry-binding-hive-follower/pom.xml 5e51cc2 
>   sentry-binding/sentry-binding-hive/pom.xml 6f4d69e 
>   sentry-binding/sentry-binding-kafka/pom.xml 27b0ad9 
>   sentry-binding/sentry-binding-solr/pom.xml cba7d59 
>   sentry-binding/sentry-binding-sqoop/pom.xml 37c02f9 
>   sentry-core/pom.xml e536805 
>   sentry-core/sentry-core-common/pom.xml 07b6bcb 
>   sentry-core/sentry-core-model-db/pom.xml 1e0b02d 
>   sentry-core/sentry-core-model-indexer/pom.xml 1ccc8e8 
>   sentry-core/sentry-core-model-kafka/pom.xml 6a7a989 
>   sentry-core/sentry-core-model-solr/pom.xml defeaba 
>   sentry-core/sentry-core-model-sqoop/pom.xml 9ddfd1b 
>   sentry-dist/pom.xml c795809 
>   sentry-hdfs/pom.xml 358350e 
>   sentry-hdfs/sentry-hdfs-common/pom.xml f8fa887 
>   sentry-hdfs/sentry-hdfs-dist/pom.xml 123f631 
>   sentry-hdfs/sentry-hdfs-namenode-plugin/pom.xml 716789b 
>   sentry-hdfs/sentry-hdfs-service/pom.xml bcdf8c3 
>   sentry-policy/pom.xml 89cf602 
>   sentry-policy/sentry-policy-common/pom.xml c2f0075 
>   sentry-policy/sentry-policy-engine/pom.xml 6c1576a 
>   sentry-provider/pom.xml 814fe45 
>   sentry-provider/sentry-provider-cache/pom.xml ec43c05 
>   sentry-provider/sentry-provider-common/pom.xml a733eaf 
>   sentry-provider/sentry-provider-db/pom.xml 6d2590e 
>   sentry-provider/sentry-provider-file/pom.xml 78e603d 
>   sentry-service/pom.xml 565e86b 
>   sentry-service/sentry-service-api/pom.xml 465ccf2 
>   sentry-service/sentry-service-client/pom.xml 8249b20 
>   sentry-service/sentry-service-server/pom.xml 7729656 
>   sentry-solr/pom.xml 218c73e 
>   sentry-solr/solr-sentry-handlers/pom.xml 4f9e4ba 
>   sentry-spi/pom.xml eb9e668 
>   sentry-tests/pom.xml 1b6bd0d 
>   sentry-tests/sentry-tests-hive/pom.xml d9e535d 
>   sentry-tests/sentry-tests-kafka/pom.xml c6c0565 
>   sentry-tests/sentry-tests-solr/pom.xml cdf2924 
>   sentry-tests/sentry-tests-sqoop/pom.xml 2a999f5 
>   sentry-thirdparty/pom.xml 7c0fc9d 
>   sentry-thirdparty/sentry-shaded/pom.xml 99a9217 
>   sentry-tools/pom.xml 4bf374c 
> 
> 
> Diff: https://reviews.apache.org/r/68982/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Na Li
> 
>



Re: Review Request 68973: SENTRY-2305: Optimize time taken for persistence HMS snapshot by persisting in parallel

2018-10-10 Thread kalyan kumar kalvagadda via Review Board


> On Oct. 10, 2018, 8:04 p.m., Na Li wrote:
> > sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
> > Lines 3430 (patched)
> > 
> >
> > to reduce overhead of reading when persisting snapshot fails, can we 
> > just delete all path mapping with nextSnapshotID?
> 
> kalyan kumar kalvagadda wrote:
> I will look into it

I have updated the patch. I think this is the best we could do as the 
datanucleus operates on DAO objects, this might internally fetch the data and 
then delete it.


- kalyan kumar


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68973/#review209419
---


On Oct. 10, 2018, 11:04 p.m., kalyan kumar kalvagadda wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68973/
> ---
> 
> (Updated Oct. 10, 2018, 11:04 p.m.)
> 
> 
> Review request for sentry, Arjun Mishra, Na Li, and Sergio Pena.
> 
> 
> Bugs: SENTRY-2305
> https://issues.apache.org/jira/browse/SENTRY-2305
> 
> 
> Repository: sentry
> 
> 
> Description
> ---
> 
> I have considered multiple options. Persisting in batches is not an option 
> with out changing the schema as the data nucleus does not persist row in 
> batches for tables which have foreign key on other tables.
> 
> I see that best option is to persist the paths in parallel. It gave good 
> results.
> 
> Solution Approach:
> 
> I have used a thread pool to persist the snapshot. Size of this thread pool 
> is configurable. Paths for each object database/table are submitted to this 
> thread pool. If for reason some of the paths are not pesisted, snapshot is 
> removed and exception is throw back.
> 
>   This patch along with SENTRY-2423 was 5 times faster when tested with below.
> 
>  
> 
> Object Type   Count
> Databases 209
> Tables2100
> Partitions24
> 
> 
> Diffs
> -
> 
>   
> sentry-core/sentry-core-common/src/main/java/org/apache/sentry/service/common/ServiceConstants.java
>  092060c450c6a906850630cb10454737157af5fe 
>   
> sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
>  7a736ca9604eb0bb182a159b5a2aed274275c16e 
>   
> sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestHMSFollower.java
>  0d62941a7bd45e0d8a67b5d95fdb39a8801f5a26 
>   
> sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestSentryStore.java
>  4a9afe303672baff39be01d4f190034b2bfb75fe 
> 
> 
> Diff: https://reviews.apache.org/r/68973/diff/3/
> 
> 
> Testing
> ---
> 
> Added new test and also made sure that existing tests passed.
> 
> 
> Thanks,
> 
> kalyan kumar kalvagadda
> 
>



Re: Review Request 68973: SENTRY-2305: Optimize time taken for persistence HMS snapshot by persisting in parallel

2018-10-10 Thread kalyan kumar kalvagadda via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68973/
---

(Updated Oct. 10, 2018, 11:04 p.m.)


Review request for sentry, Arjun Mishra, Na Li, and Sergio Pena.


Changes
---

addressed review comments.


Bugs: SENTRY-2305
https://issues.apache.org/jira/browse/SENTRY-2305


Repository: sentry


Description
---

I have considered multiple options. Persisting in batches is not an option with 
out changing the schema as the data nucleus does not persist row in batches for 
tables which have foreign key on other tables.

I see that best option is to persist the paths in parallel. It gave good 
results.

Solution Approach:

I have used a thread pool to persist the snapshot. Size of this thread pool is 
configurable. Paths for each object database/table are submitted to this thread 
pool. If for reason some of the paths are not pesisted, snapshot is removed and 
exception is throw back.

  This patch along with SENTRY-2423 was 5 times faster when tested with below.

 

Object Type Count
Databases   209
Tables  2100
Partitions  24


Diffs (updated)
-

  
sentry-core/sentry-core-common/src/main/java/org/apache/sentry/service/common/ServiceConstants.java
 092060c450c6a906850630cb10454737157af5fe 
  
sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
 7a736ca9604eb0bb182a159b5a2aed274275c16e 
  
sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestHMSFollower.java
 0d62941a7bd45e0d8a67b5d95fdb39a8801f5a26 
  
sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestSentryStore.java
 4a9afe303672baff39be01d4f190034b2bfb75fe 


Diff: https://reviews.apache.org/r/68973/diff/3/

Changes: https://reviews.apache.org/r/68973/diff/2-3/


Testing
---

Added new test and also made sure that existing tests passed.


Thanks,

kalyan kumar kalvagadda



Review Request 68982: SENTRY-2426: Update the version in pom file at 2.1 branch

2018-10-10 Thread Na Li via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68982/
---

Review request for sentry, Arjun Mishra, kalyan kumar kalvagadda, and Sergio 
Pena.


Bugs: sentry-2426
https://issues.apache.org/jira/browse/sentry-2426


Repository: sentry


Description
---

update the version from 2.1.0 to 2.1.1-SNAPSHOT in the pom files of branch-2.1 
for maintenance release


Diffs
-

  pom.xml 73112e8 
  sentry-binding/pom.xml 8784794 
  sentry-binding/sentry-binding-hbase-indexer/pom.xml e0cb32c 
  sentry-binding/sentry-binding-hive-common/pom.xml cfa5081 
  sentry-binding/sentry-binding-hive-conf/pom.xml c9f6fcc 
  sentry-binding/sentry-binding-hive-follower/pom.xml 5e51cc2 
  sentry-binding/sentry-binding-hive/pom.xml 6f4d69e 
  sentry-binding/sentry-binding-kafka/pom.xml 27b0ad9 
  sentry-binding/sentry-binding-solr/pom.xml cba7d59 
  sentry-binding/sentry-binding-sqoop/pom.xml 37c02f9 
  sentry-core/pom.xml e536805 
  sentry-core/sentry-core-common/pom.xml 07b6bcb 
  sentry-core/sentry-core-model-db/pom.xml 1e0b02d 
  sentry-core/sentry-core-model-indexer/pom.xml 1ccc8e8 
  sentry-core/sentry-core-model-kafka/pom.xml 6a7a989 
  sentry-core/sentry-core-model-solr/pom.xml defeaba 
  sentry-core/sentry-core-model-sqoop/pom.xml 9ddfd1b 
  sentry-dist/pom.xml c795809 
  sentry-hdfs/pom.xml 358350e 
  sentry-hdfs/sentry-hdfs-common/pom.xml f8fa887 
  sentry-hdfs/sentry-hdfs-dist/pom.xml 123f631 
  sentry-hdfs/sentry-hdfs-namenode-plugin/pom.xml 716789b 
  sentry-hdfs/sentry-hdfs-service/pom.xml bcdf8c3 
  sentry-policy/pom.xml 89cf602 
  sentry-policy/sentry-policy-common/pom.xml c2f0075 
  sentry-policy/sentry-policy-engine/pom.xml 6c1576a 
  sentry-provider/pom.xml 814fe45 
  sentry-provider/sentry-provider-cache/pom.xml ec43c05 
  sentry-provider/sentry-provider-common/pom.xml a733eaf 
  sentry-provider/sentry-provider-db/pom.xml 6d2590e 
  sentry-provider/sentry-provider-file/pom.xml 78e603d 
  sentry-service/pom.xml 565e86b 
  sentry-service/sentry-service-api/pom.xml 465ccf2 
  sentry-service/sentry-service-client/pom.xml 8249b20 
  sentry-service/sentry-service-server/pom.xml 7729656 
  sentry-solr/pom.xml 218c73e 
  sentry-solr/solr-sentry-handlers/pom.xml 4f9e4ba 
  sentry-spi/pom.xml eb9e668 
  sentry-tests/pom.xml 1b6bd0d 
  sentry-tests/sentry-tests-hive/pom.xml d9e535d 
  sentry-tests/sentry-tests-kafka/pom.xml c6c0565 
  sentry-tests/sentry-tests-solr/pom.xml cdf2924 
  sentry-tests/sentry-tests-sqoop/pom.xml 2a999f5 
  sentry-thirdparty/pom.xml 7c0fc9d 
  sentry-thirdparty/sentry-shaded/pom.xml 99a9217 
  sentry-tools/pom.xml 4bf374c 


Diff: https://reviews.apache.org/r/68982/diff/1/


Testing
---


Thanks,

Na Li



Re: Review Request 68973: SENTRY-2305: Optimize time taken for persistence HMS snapshot by persisting in parallel

2018-10-10 Thread kalyan kumar kalvagadda via Review Board


> On Oct. 10, 2018, 8:04 p.m., Na Li wrote:
> > sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
> > Lines 3421 (patched)
> > 
> >
> > can we wrap the whole processing in try() and wait for threadpool to 
> > shutdown in finally?

We don't want to persist the notificationID and snapshotId untill the 
psersisting the path mapping is completed and verfied.


> On Oct. 10, 2018, 8:04 p.m., Na Li wrote:
> > sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
> > Lines 3430 (patched)
> > 
> >
> > to reduce overhead of reading when persisting snapshot fails, can we 
> > just delete all path mapping with nextSnapshotID?

I will look into it


- kalyan kumar


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68973/#review209419
---


On Oct. 10, 2018, 7:05 p.m., kalyan kumar kalvagadda wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68973/
> ---
> 
> (Updated Oct. 10, 2018, 7:05 p.m.)
> 
> 
> Review request for sentry, Arjun Mishra, Na Li, and Sergio Pena.
> 
> 
> Bugs: SENTRY-2305
> https://issues.apache.org/jira/browse/SENTRY-2305
> 
> 
> Repository: sentry
> 
> 
> Description
> ---
> 
> I have considered multiple options. Persisting in batches is not an option 
> with out changing the schema as the data nucleus does not persist row in 
> batches for tables which have foreign key on other tables.
> 
> I see that best option is to persist the paths in parallel. It gave good 
> results.
> 
> Solution Approach:
> 
> I have used a thread pool to persist the snapshot. Size of this thread pool 
> is configurable. Paths for each object database/table are submitted to this 
> thread pool. If for reason some of the paths are not pesisted, snapshot is 
> removed and exception is throw back.
> 
>   This patch along with SENTRY-2423 was 5 times faster when tested with below.
> 
>  
> 
> Object Type   Count
> Databases 209
> Tables2100
> Partitions24
> 
> 
> Diffs
> -
> 
>   
> sentry-core/sentry-core-common/src/main/java/org/apache/sentry/service/common/ServiceConstants.java
>  092060c450c6a906850630cb10454737157af5fe 
>   
> sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
>  7a736ca9604eb0bb182a159b5a2aed274275c16e 
>   
> sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestHMSFollower.java
>  0d62941a7bd45e0d8a67b5d95fdb39a8801f5a26 
>   
> sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestSentryStore.java
>  4a9afe303672baff39be01d4f190034b2bfb75fe 
> 
> 
> Diff: https://reviews.apache.org/r/68973/diff/2/
> 
> 
> Testing
> ---
> 
> Added new test and also made sure that existing tests passed.
> 
> 
> Thanks,
> 
> kalyan kumar kalvagadda
> 
>



Re: Review Request 68973: SENTRY-2305: Optimize time taken for persistence HMS snapshot by persisting in parallel

2018-10-10 Thread Na Li via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68973/#review209419
---




sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
Lines 3421 (patched)


can we wrap the whole processing in try() and wait for threadpool to 
shutdown in finally?



sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
Lines 3430 (patched)


to reduce overhead of reading when persisting snapshot fails, can we just 
delete all path mapping with nextSnapshotID?


- Na Li


On Oct. 10, 2018, 7:05 p.m., kalyan kumar kalvagadda wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68973/
> ---
> 
> (Updated Oct. 10, 2018, 7:05 p.m.)
> 
> 
> Review request for sentry, Arjun Mishra, Na Li, and Sergio Pena.
> 
> 
> Bugs: SENTRY-2305
> https://issues.apache.org/jira/browse/SENTRY-2305
> 
> 
> Repository: sentry
> 
> 
> Description
> ---
> 
> I have considered multiple options. Persisting in batches is not an option 
> with out changing the schema as the data nucleus does not persist row in 
> batches for tables which have foreign key on other tables.
> 
> I see that best option is to persist the paths in parallel. It gave good 
> results.
> 
> Solution Approach:
> 
> I have used a thread pool to persist the snapshot. Size of this thread pool 
> is configurable. Paths for each object database/table are submitted to this 
> thread pool. If for reason some of the paths are not pesisted, snapshot is 
> removed and exception is throw back.
> 
>   This patch along with SENTRY-2423 was 5 times faster when tested with below.
> 
>  
> 
> Object Type   Count
> Databases 209
> Tables2100
> Partitions24
> 
> 
> Diffs
> -
> 
>   
> sentry-core/sentry-core-common/src/main/java/org/apache/sentry/service/common/ServiceConstants.java
>  092060c450c6a906850630cb10454737157af5fe 
>   
> sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
>  7a736ca9604eb0bb182a159b5a2aed274275c16e 
>   
> sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestHMSFollower.java
>  0d62941a7bd45e0d8a67b5d95fdb39a8801f5a26 
>   
> sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestSentryStore.java
>  4a9afe303672baff39be01d4f190034b2bfb75fe 
> 
> 
> Diff: https://reviews.apache.org/r/68973/diff/2/
> 
> 
> Testing
> ---
> 
> Added new test and also made sure that existing tests passed.
> 
> 
> Thanks,
> 
> kalyan kumar kalvagadda
> 
>



Re: Review Request 68973: SENTRY-2305: Optimize time taken for persistence HMS snapshot by persisting in parallel

2018-10-10 Thread Arjun Mishra via Review Board


> On Oct. 10, 2018, 3:18 p.m., Arjun Mishra wrote:
> > sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
> > Lines 3404 (patched)
> > 
> >
> > This can be an attribute of the class and initialized in the constructor

Can this be added to the constructor? Will conf.getInt add unnecessary time 
delay to persit call?


- Arjun


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68973/#review209408
---


On Oct. 10, 2018, 7:05 p.m., kalyan kumar kalvagadda wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68973/
> ---
> 
> (Updated Oct. 10, 2018, 7:05 p.m.)
> 
> 
> Review request for sentry, Arjun Mishra, Na Li, and Sergio Pena.
> 
> 
> Bugs: SENTRY-2305
> https://issues.apache.org/jira/browse/SENTRY-2305
> 
> 
> Repository: sentry
> 
> 
> Description
> ---
> 
> I have considered multiple options. Persisting in batches is not an option 
> with out changing the schema as the data nucleus does not persist row in 
> batches for tables which have foreign key on other tables.
> 
> I see that best option is to persist the paths in parallel. It gave good 
> results.
> 
> Solution Approach:
> 
> I have used a thread pool to persist the snapshot. Size of this thread pool 
> is configurable. Paths for each object database/table are submitted to this 
> thread pool. If for reason some of the paths are not pesisted, snapshot is 
> removed and exception is throw back.
> 
>   This patch along with SENTRY-2423 was 5 times faster when tested with below.
> 
>  
> 
> Object Type   Count
> Databases 209
> Tables2100
> Partitions24
> 
> 
> Diffs
> -
> 
>   
> sentry-core/sentry-core-common/src/main/java/org/apache/sentry/service/common/ServiceConstants.java
>  092060c450c6a906850630cb10454737157af5fe 
>   
> sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
>  7a736ca9604eb0bb182a159b5a2aed274275c16e 
>   
> sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestHMSFollower.java
>  0d62941a7bd45e0d8a67b5d95fdb39a8801f5a26 
>   
> sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestSentryStore.java
>  4a9afe303672baff39be01d4f190034b2bfb75fe 
> 
> 
> Diff: https://reviews.apache.org/r/68973/diff/2/
> 
> 
> Testing
> ---
> 
> Added new test and also made sure that existing tests passed.
> 
> 
> Thanks,
> 
> kalyan kumar kalvagadda
> 
>



Re: Review Request 68973: SENTRY-2305: Optimize time taken for persistence HMS snapshot by persisting in parallel

2018-10-10 Thread Arjun Mishra via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68973/#review209417
---




sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
Lines 3400 (patched)


Shouldn't this be fetched in the same transaction? What if there is an 
error thrown here, the retry logic won't be invoked?


- Arjun Mishra


On Oct. 10, 2018, 7:05 p.m., kalyan kumar kalvagadda wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68973/
> ---
> 
> (Updated Oct. 10, 2018, 7:05 p.m.)
> 
> 
> Review request for sentry, Arjun Mishra, Na Li, and Sergio Pena.
> 
> 
> Bugs: SENTRY-2305
> https://issues.apache.org/jira/browse/SENTRY-2305
> 
> 
> Repository: sentry
> 
> 
> Description
> ---
> 
> I have considered multiple options. Persisting in batches is not an option 
> with out changing the schema as the data nucleus does not persist row in 
> batches for tables which have foreign key on other tables.
> 
> I see that best option is to persist the paths in parallel. It gave good 
> results.
> 
> Solution Approach:
> 
> I have used a thread pool to persist the snapshot. Size of this thread pool 
> is configurable. Paths for each object database/table are submitted to this 
> thread pool. If for reason some of the paths are not pesisted, snapshot is 
> removed and exception is throw back.
> 
>   This patch along with SENTRY-2423 was 5 times faster when tested with below.
> 
>  
> 
> Object Type   Count
> Databases 209
> Tables2100
> Partitions24
> 
> 
> Diffs
> -
> 
>   
> sentry-core/sentry-core-common/src/main/java/org/apache/sentry/service/common/ServiceConstants.java
>  092060c450c6a906850630cb10454737157af5fe 
>   
> sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
>  7a736ca9604eb0bb182a159b5a2aed274275c16e 
>   
> sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestHMSFollower.java
>  0d62941a7bd45e0d8a67b5d95fdb39a8801f5a26 
>   
> sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestSentryStore.java
>  4a9afe303672baff39be01d4f190034b2bfb75fe 
> 
> 
> Diff: https://reviews.apache.org/r/68973/diff/2/
> 
> 
> Testing
> ---
> 
> Added new test and also made sure that existing tests passed.
> 
> 
> Thanks,
> 
> kalyan kumar kalvagadda
> 
>



Re: Review Request 68973: SENTRY-2305: Optimize time taken for persistence HMS snapshot by persisting in parallel

2018-10-10 Thread kalyan kumar kalvagadda via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68973/
---

(Updated Oct. 10, 2018, 7:05 p.m.)


Review request for sentry, Arjun Mishra, Na Li, and Sergio Pena.


Changes
---

Addressed review comments.


Bugs: SENTRY-2305
https://issues.apache.org/jira/browse/SENTRY-2305


Repository: sentry


Description
---

I have considered multiple options. Persisting in batches is not an option with 
out changing the schema as the data nucleus does not persist row in batches for 
tables which have foreign key on other tables.

I see that best option is to persist the paths in parallel. It gave good 
results.

Solution Approach:

I have used a thread pool to persist the snapshot. Size of this thread pool is 
configurable. Paths for each object database/table are submitted to this thread 
pool. If for reason some of the paths are not pesisted, snapshot is removed and 
exception is throw back.

  This patch along with SENTRY-2423 gave was 5 times faster when tested with 
below.

 

Object Type Count
Databases   209
Tables  2100
Partitions  24


Diffs (updated)
-

  
sentry-core/sentry-core-common/src/main/java/org/apache/sentry/service/common/ServiceConstants.java
 092060c450c6a906850630cb10454737157af5fe 
  
sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
 7a736ca9604eb0bb182a159b5a2aed274275c16e 
  
sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestHMSFollower.java
 0d62941a7bd45e0d8a67b5d95fdb39a8801f5a26 
  
sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestSentryStore.java
 4a9afe303672baff39be01d4f190034b2bfb75fe 


Diff: https://reviews.apache.org/r/68973/diff/2/

Changes: https://reviews.apache.org/r/68973/diff/1-2/


Testing
---

Added new test and also made sure that existing tests passed.


Thanks,

kalyan kumar kalvagadda



Re: Review Request 68973: SENTRY-2305: Optimize time taken for persistence HMS snapshot by persisting in parallel

2018-10-10 Thread kalyan kumar kalvagadda via Review Board


> On Oct. 10, 2018, 3:18 p.m., Arjun Mishra wrote:
> > sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
> > Lines 3420 (patched)
> > 
> >
> > Can you explain why we need the thread.sleep ?

isTerminated() is not a blocking call. It returns immediatly. isTerminated() 
returns true if all tasks have completed following shut down. I'm blocking the 
thread for snapsot to be persited before performing the validation and 
persisting notification and snapshot id.


> On Oct. 10, 2018, 3:18 p.m., Arjun Mishra wrote:
> > sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
> > Lines 3424 (patched)
> > 
> >
> > I think if we know there is an exception we should handle it right away 
> > instead of comparing the size with what is persisted. According to the 
> > Executors.newFixedThreadPool API comments "If any thread terminates due to 
> > a failure during execution prior to shutdown, a new one will take its place 
> > if needed to execute subsequent tasks." Looks like 
> > Executors.newFixedThreadPool doesn't throw an exception if one of the tasks 
> > failed. If that is true I am not very comfortable with using this approach.

it can be solved. Look for it in the new patch


> On Oct. 10, 2018, 3:18 p.m., Arjun Mishra wrote:
> > sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
> > Lines 3430 (patched)
> > 
> >
> > If one snapshot persistance failed, how do we know the subsequent 
> > retries will pass?
> > 
> > Waiting for part of the snapshot persistence to complete and then 
> > checking is not very efficient in my opinion

We don't know for sure if it succeds when retried. We should perform retries 
based on the failure. That is an enhancement we should make to sentry. This 
trasaction of persisting the snapshot was could be retired before and I 
maintained the same.


I did not understand your 2nd point here. Let me clary one thing. Logic of 
validating the snapshot is invoked at the end, after inserting all the path 
mapping information is complete.


- kalyan kumar


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68973/#review209408
---


On Oct. 10, 2018, 5:38 p.m., kalyan kumar kalvagadda wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68973/
> ---
> 
> (Updated Oct. 10, 2018, 5:38 p.m.)
> 
> 
> Review request for sentry, Arjun Mishra, Na Li, and Sergio Pena.
> 
> 
> Bugs: SENTRY-2305
> https://issues.apache.org/jira/browse/SENTRY-2305
> 
> 
> Repository: sentry
> 
> 
> Description
> ---
> 
> I have considered multiple options. Persisting in batches is not an option 
> with out changing the schema as the data nucleus does not persist row in 
> batches for tables which have foreign key on other tables.
> 
> I see that best option is to persist the paths in parallel. It gave good 
> results.
> 
> Solution Approach:
> 
> I have used a thread pool to persist the snapshot. Size of this thread pool 
> is configurable. Paths for each object database/table are submitted to this 
> thread pool. If for reason some of the paths are not pesisted, snapshot is 
> removed and exception is throw back.
> 
>   This patch along with SENTRY-2423 gave was 5 times faster when tested with 
> below.
> 
>  
> 
> Object Type   Count
> Databases 209
> Tables2100
> Partitions24
> 
> 
> Diffs
> -
> 
>   
> sentry-core/sentry-core-common/src/main/java/org/apache/sentry/service/common/ServiceConstants.java
>  092060c450c6a906850630cb10454737157af5fe 
>   
> sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
>  7a736ca9604eb0bb182a159b5a2aed274275c16e 
>   
> sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestSentryStore.java
>  4a9afe303672baff39be01d4f190034b2bfb75fe 
> 
> 
> Diff: https://reviews.apache.org/r/68973/diff/1/
> 
> 
> Testing
> ---
> 
> Added new test and also made sure that existing tests passed.
> 
> 
> Thanks,
> 
> kalyan kumar kalvagadda
> 
>



Re: Review Request 68973: SENTRY-2305: Optimize time taken for persistence HMS snapshot by persisting in parallel

2018-10-10 Thread kalyan kumar kalvagadda via Review Board


> On Oct. 10, 2018, 3:29 p.m., Arjun Mishra wrote:
> > sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
> > Lines 3428 (patched)
> > 
> >
> > Our current code doesn't deletePersistentAll if an exception is thrown. 
> > This could cause more delays trying to undo a persistence

Current code doesn't have to delete on failure for two reasons
1. Complete snapshoot is persisted in single transaction.
2. If there is failure all the chnages are rolledback automatically in 
Transaction Manager.

With this change paths for each object is persisted in different transaction, 
If for some reason there were issue inserting in one of the transaction we 
should deleted the rest otherwise snapshot has incomplete data. 

Exceptions are not expected while persting snapshot. If there is failure while 
persisting, something is really wrong.  More over, you see that over ahead with 
the current logic as well. If there is a exception in the middle of persisting 
the snapshot, all the entries that were persisted till then will be rolled back 
by trasaction manager. Even this takes time. Only difference is that with the 
new approach we are doing it explciitly as the path information is persisted in 
multiple transactions.


- kalyan kumar


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68973/#review209410
---


On Oct. 10, 2018, 5:38 p.m., kalyan kumar kalvagadda wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68973/
> ---
> 
> (Updated Oct. 10, 2018, 5:38 p.m.)
> 
> 
> Review request for sentry, Arjun Mishra, Na Li, and Sergio Pena.
> 
> 
> Bugs: SENTRY-2305
> https://issues.apache.org/jira/browse/SENTRY-2305
> 
> 
> Repository: sentry
> 
> 
> Description
> ---
> 
> I have considered multiple options. Persisting in batches is not an option 
> with out changing the schema as the data nucleus does not persist row in 
> batches for tables which have foreign key on other tables.
> 
> I see that best option is to persist the paths in parallel. It gave good 
> results.
> 
> Solution Approach:
> 
> I have used a thread pool to persist the snapshot. Size of this thread pool 
> is configurable. Paths for each object database/table are submitted to this 
> thread pool. If for reason some of the paths are not pesisted, snapshot is 
> removed and exception is throw back.
> 
>   This patch along with SENTRY-2423 gave was 5 times faster when tested with 
> below.
> 
>  
> 
> Object Type   Count
> Databases 209
> Tables2100
> Partitions24
> 
> 
> Diffs
> -
> 
>   
> sentry-core/sentry-core-common/src/main/java/org/apache/sentry/service/common/ServiceConstants.java
>  092060c450c6a906850630cb10454737157af5fe 
>   
> sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
>  7a736ca9604eb0bb182a159b5a2aed274275c16e 
>   
> sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestSentryStore.java
>  4a9afe303672baff39be01d4f190034b2bfb75fe 
> 
> 
> Diff: https://reviews.apache.org/r/68973/diff/1/
> 
> 
> Testing
> ---
> 
> Added new test and also made sure that existing tests passed.
> 
> 
> Thanks,
> 
> kalyan kumar kalvagadda
> 
>



Re: Review Request 68973: SENTRY-2305: Optimize time taken for persistence HMS snapshot by persisting in parallel

2018-10-10 Thread kalyan kumar kalvagadda via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68973/
---

(Updated Oct. 10, 2018, 5:38 p.m.)


Review request for sentry, Arjun Mishra, Na Li, and Sergio Pena.


Changes
---

Changed the summary.


Summary (updated)
-

SENTRY-2305: Optimize time taken for persistence HMS snapshot by persisting in 
parallel


Bugs: SENTRY-2305
https://issues.apache.org/jira/browse/SENTRY-2305


Repository: sentry


Description
---

I have considered multiple options. Persisting in batches is not an option with 
out changing the schema as the data nucleus does not persist row in batches for 
tables which have foreign key on other tables.

I see that best option is to persist the paths in parallel. It gave good 
results.

Solution Approach:

I have used a thread pool to persist the snapshot. Size of this thread pool is 
configurable. Paths for each object database/table are submitted to this thread 
pool. If for reason some of the paths are not pesisted, snapshot is removed and 
exception is throw back.

  This patch along with SENTRY-2423 gave was 5 times faster when tested with 
below.

 

Object Type Count
Databases   209
Tables  2100
Partitions  24


Diffs
-

  
sentry-core/sentry-core-common/src/main/java/org/apache/sentry/service/common/ServiceConstants.java
 092060c450c6a906850630cb10454737157af5fe 
  
sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
 7a736ca9604eb0bb182a159b5a2aed274275c16e 
  
sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestSentryStore.java
 4a9afe303672baff39be01d4f190034b2bfb75fe 


Diff: https://reviews.apache.org/r/68973/diff/1/


Testing
---

Added new test and also made sure that existing tests passed.


Thanks,

kalyan kumar kalvagadda



Re: Review Request 68973: SENTRY-2305: Optimize time taken for persistence HMS snapshot

2018-10-10 Thread Arjun Mishra via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68973/#review209410
---




sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
Lines 3428 (patched)


Our current code doesn't deletePersistentAll if an exception is thrown. 
This could cause more delays trying to undo a persistence


- Arjun Mishra


On Oct. 10, 2018, 1:45 p.m., kalyan kumar kalvagadda wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68973/
> ---
> 
> (Updated Oct. 10, 2018, 1:45 p.m.)
> 
> 
> Review request for sentry, Arjun Mishra, Na Li, and Sergio Pena.
> 
> 
> Bugs: SENTRY-2305
> https://issues.apache.org/jira/browse/SENTRY-2305
> 
> 
> Repository: sentry
> 
> 
> Description
> ---
> 
> I have considered multiple options. Persisting in batches is not an option 
> with out changing the schema as the data nucleus does not persist row in 
> batches for tables which have foreign key on other tables.
> 
> I see that best option is to persist the paths in parallel. It gave good 
> results.
> 
> Solution Approach:
> 
> I have used a thread pool to persist the snapshot. Size of this thread pool 
> is configurable. Paths for each object database/table are submitted to this 
> thread pool. If for reason some of the paths are not pesisted, snapshot is 
> removed and exception is throw back.
> 
>   This patch along with SENTRY-2423 gave was 5 times faster when tested with 
> below.
> 
>  
> 
> Object Type   Count
> Databases 209
> Tables2100
> Partitions24
> 
> 
> Diffs
> -
> 
>   
> sentry-core/sentry-core-common/src/main/java/org/apache/sentry/service/common/ServiceConstants.java
>  092060c450c6a906850630cb10454737157af5fe 
>   
> sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
>  7a736ca9604eb0bb182a159b5a2aed274275c16e 
>   
> sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestSentryStore.java
>  4a9afe303672baff39be01d4f190034b2bfb75fe 
> 
> 
> Diff: https://reviews.apache.org/r/68973/diff/1/
> 
> 
> Testing
> ---
> 
> Added new test and also made sure that existing tests passed.
> 
> 
> Thanks,
> 
> kalyan kumar kalvagadda
> 
>



Re: Review Request 68973: SENTRY-2305: Optimize time taken for persistence HMS snapshot

2018-10-10 Thread Arjun Mishra via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68973/#review209408
---




sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
Lines 3404 (patched)


This can be an attribute of the class and initialized in the constructor



sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
Lines 3411 (patched)


Why is this declared outside when the scope is only restricted to the for 
loop?



sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
Lines 3420 (patched)


Can you explain why we need the thread.sleep ?



sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
Lines 3424 (patched)


I think if we know there is an exception we should handle it right away 
instead of comparing the size with what is persisted. According to the 
Executors.newFixedThreadPool API comments "If any thread terminates due to a 
failure during execution prior to shutdown, a new one will take its place if 
needed to execute subsequent tasks." Looks like Executors.newFixedThreadPool 
doesn't throw an exception if one of the tasks failed. If that is true I am not 
very comfortable with using this approach.



sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
Lines 3430 (patched)


If one snapshot persistance failed, how do we know the subsequent retries 
will pass?

Waiting for part of the snapshot persistence to complete and then checking 
is not very efficient in my opinion


- Arjun Mishra


On Oct. 10, 2018, 1:45 p.m., kalyan kumar kalvagadda wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68973/
> ---
> 
> (Updated Oct. 10, 2018, 1:45 p.m.)
> 
> 
> Review request for sentry, Arjun Mishra, Na Li, and Sergio Pena.
> 
> 
> Bugs: SENTRY-2305
> https://issues.apache.org/jira/browse/SENTRY-2305
> 
> 
> Repository: sentry
> 
> 
> Description
> ---
> 
> I have considered multiple options. Persisting in batches is not an option 
> with out changing the schema as the data nucleus does not persist row in 
> batches for tables which have foreign key on other tables.
> 
> I see that best option is to persist the paths in parallel. It gave good 
> results.
> 
> Solution Approach:
> 
> I have used a thread pool to persist the snapshot. Size of this thread pool 
> is configurable. Paths for each object database/table are submitted to this 
> thread pool. If for reason some of the paths are not pesisted, snapshot is 
> removed and exception is throw back.
> 
>   This patch along with SENTRY-2423 gave was 5 times faster when tested with 
> below.
> 
>  
> 
> Object Type   Count
> Databases 209
> Tables2100
> Partitions24
> 
> 
> Diffs
> -
> 
>   
> sentry-core/sentry-core-common/src/main/java/org/apache/sentry/service/common/ServiceConstants.java
>  092060c450c6a906850630cb10454737157af5fe 
>   
> sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
>  7a736ca9604eb0bb182a159b5a2aed274275c16e 
>   
> sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestSentryStore.java
>  4a9afe303672baff39be01d4f190034b2bfb75fe 
> 
> 
> Diff: https://reviews.apache.org/r/68973/diff/1/
> 
> 
> Testing
> ---
> 
> Added new test and also made sure that existing tests passed.
> 
> 
> Thanks,
> 
> kalyan kumar kalvagadda
> 
>



Re: Review Request 68973: SENTRY-2305: Optimize time taken for persistence HMS snapshot

2018-10-10 Thread Arjun Mishra via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68973/#review209407
---



Can you change the title of this ticket to persisting in parellel?

- Arjun Mishra


On Oct. 10, 2018, 1:45 p.m., kalyan kumar kalvagadda wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68973/
> ---
> 
> (Updated Oct. 10, 2018, 1:45 p.m.)
> 
> 
> Review request for sentry, Arjun Mishra, Na Li, and Sergio Pena.
> 
> 
> Bugs: SENTRY-2305
> https://issues.apache.org/jira/browse/SENTRY-2305
> 
> 
> Repository: sentry
> 
> 
> Description
> ---
> 
> I have considered multiple options. Persisting in batches is not an option 
> with out changing the schema as the data nucleus does not persist row in 
> batches for tables which have foreign key on other tables.
> 
> I see that best option is to persist the paths in parallel. It gave good 
> results.
> 
> Solution Approach:
> 
> I have used a thread pool to persist the snapshot. Size of this thread pool 
> is configurable. Paths for each object database/table are submitted to this 
> thread pool. If for reason some of the paths are not pesisted, snapshot is 
> removed and exception is throw back.
> 
>   This patch along with SENTRY-2423 gave was 5 times faster when tested with 
> below.
> 
>  
> 
> Object Type   Count
> Databases 209
> Tables2100
> Partitions24
> 
> 
> Diffs
> -
> 
>   
> sentry-core/sentry-core-common/src/main/java/org/apache/sentry/service/common/ServiceConstants.java
>  092060c450c6a906850630cb10454737157af5fe 
>   
> sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
>  7a736ca9604eb0bb182a159b5a2aed274275c16e 
>   
> sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestSentryStore.java
>  4a9afe303672baff39be01d4f190034b2bfb75fe 
> 
> 
> Diff: https://reviews.apache.org/r/68973/diff/1/
> 
> 
> Testing
> ---
> 
> Added new test and also made sure that existing tests passed.
> 
> 
> Thanks,
> 
> kalyan kumar kalvagadda
> 
>



Review Request 68973: SENTRY-2305: Optimize time taken for persistence HMS snapshot

2018-10-10 Thread kalyan kumar kalvagadda via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68973/
---

Review request for sentry, Arjun Mishra, Na Li, and Sergio Pena.


Bugs: SENTRY-2305
https://issues.apache.org/jira/browse/SENTRY-2305


Repository: sentry


Description
---

I have considered multiple options. Persisting in batches is not an option with 
out changing the schema as the data nucleus does not persist row in batches for 
tables which have foreign key on other tables.

I see that best option is to persist the paths in parallel. It gave good 
results.

Solution Approach:

I have used a thread pool to persist the snapshot. Size of this thread pool is 
configurable. Paths for each object database/table are submitted to this thread 
pool. If for reason some of the paths are not pesisted, snapshot is removed and 
exception is throw back.

  This patch along with SENTRY-2423 gave was 5 times faster when tested with 
below.

 

Object Type Count
Databases   209
Tables  2100
Partitions  24


Diffs
-

  
sentry-core/sentry-core-common/src/main/java/org/apache/sentry/service/common/ServiceConstants.java
 092060c450c6a906850630cb10454737157af5fe 
  
sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
 7a736ca9604eb0bb182a159b5a2aed274275c16e 
  
sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestSentryStore.java
 4a9afe303672baff39be01d4f190034b2bfb75fe 


Diff: https://reviews.apache.org/r/68973/diff/1/


Testing
---

Added new test and also made sure that existing tests passed.


Thanks,

kalyan kumar kalvagadda