Re: Review Request 68973: SENTRY-2305: Optimize time taken for persistence HMS snapshot by persisting in parallel

2018-10-31 Thread kalyan kumar kalvagadda via Review Board


> On Oct. 22, 2018, 4:18 p.m., Sergio Pena wrote:
> > sentry-core/sentry-core-common/src/main/java/org/apache/sentry/service/common/ServiceConstants.java
> > Lines 153 (patched)
> > 
> >
> > is there an optimal value higher than 2 to make it faster?
> 
> kalyan kumar kalvagadda wrote:
> I used 2 because it is minimum value. We can increase the number but the 
> optimal number depends on the hardware used and the cpu usage.

any smaller number(2-3) should be good. It's not good idea to have a bigger 
value as we don't have a control on kind of hardware where is sentry is run.


> On Oct. 22, 2018, 4:18 p.m., Sergio Pena wrote:
> > sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
> > Lines 3476-3479 (original), 3491-3494 (patched)
> > 
> >
> > I wonder how much memory do we expect in the worst case scenarios to be 
> > hold in memory for all these objects in the pool?
> > 
> > In the previous patchthe objects were persisted (1 at a time), and GC 
> > could collected them.
> 
> kalyan kumar kalvagadda wrote:
> That is the case even now GC should collect them. Only difference is that 
> there is additional memory usage for short period of time. It is not possible 
> to come up with exact memory usage. In the worst case it could be size of the 
> fullsnapshot.
> I can make code changes to limit the queue length to avaoid additional 
> memory usage.

This appraach reduces memory usage signifiacantly as we are not holding up the 
data in L-1 cache.


> On Oct. 22, 2018, 4:18 p.m., Sergio Pena wrote:
> > sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
> > Line 3478 (original), 3493 (patched)
> > 
> >
> > This is executed in a different transaction?
> 
> Arjun Mishra wrote:
> I believe its all in a single transaction
> 
> kalyan kumar kalvagadda wrote:
> it is not possible to perform concurrent inserts using a single 
> transaction. SnapshotUpdater is executed as seperate transaction. In the 
> event if failure we need to perform explicit cleanup.

Yes, This is different transaction.


> On Oct. 22, 2018, 4:18 p.m., Sergio Pena wrote:
> > sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
> > Lines 3510 (patched)
> > 
> >
> > Aren't these deleted during the rollback?
> 
> Arjun Mishra wrote:
> Does executeTransactionWithRetry include roll back on failure?
> 
> kalyan kumar kalvagadda wrote:
> it is not possible to perform concurrent inserts using a single 
> transaction. SnapshotUpdater is executed as seperate transaction. In the 
> event if failure we need to perform explicit cleanup.

As we are persisinting the paths for a hive table/database is seperate 
transaction we need to have an explcitiy logic to remove the other paths in the 
event of failure.


- kalyan kumar


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68973/#review209865
---


On Oct. 22, 2018, 2:42 p.m., kalyan kumar kalvagadda wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68973/
> ---
> 
> (Updated Oct. 22, 2018, 2:42 p.m.)
> 
> 
> Review request for sentry, Arjun Mishra, Na Li, and Sergio Pena.
> 
> 
> Bugs: SENTRY-2305
> https://issues.apache.org/jira/browse/SENTRY-2305
> 
> 
> Repository: sentry
> 
> 
> Description
> ---
> 
> I have considered multiple options. Persisting in batches is not an option 
> with out changing the schema as the data nucleus does not persist row in 
> batches for tables which have foreign key on other tables.
> 
> I see that best option is to persist the paths in parallel. It gave good 
> results.
> 
> Solution Approach:
> 
> I have used a thread pool to persist the snapshot. Size of this thread pool 
> is configurable. Paths for each object database/table are submitted to this 
> thread pool. If for reason some of the paths are not pesisted, snapshot is 
> removed and exception is throw back.
> 
>   This patch along with SENTRY-2423 was 5 times faster when tested with below.
> 
>  
> 
> Object Type   Count
> Databases 209
> Tables2100
> Partitions24
> 
> 
> Diffs
> -
> 
>   
> sentry-core/sentry-core-common/src/main/java/org/apache/sentry/service/common/ServiceConstants.java
>  

Re: Review Request 68973: SENTRY-2305: Optimize time taken for persistence HMS snapshot by persisting in parallel

2018-10-22 Thread Arjun Mishra via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68973/#review209892
---




sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
Lines 5110 (patched)


We should probably replace "executeTransactionWithRetry" and 
"executeTransaction". We don't want to get into a weird state where we have 
thrown the exception but the transaction retries. However we have kicked off a 
new persist job in the original transaction



sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
Lines 5111 (patched)


Are we here creating a single transaction for every persist? If so, will 
this have performance impact?


- Arjun Mishra


On Oct. 22, 2018, 2:42 p.m., kalyan kumar kalvagadda wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68973/
> ---
> 
> (Updated Oct. 22, 2018, 2:42 p.m.)
> 
> 
> Review request for sentry, Arjun Mishra, Na Li, and Sergio Pena.
> 
> 
> Bugs: SENTRY-2305
> https://issues.apache.org/jira/browse/SENTRY-2305
> 
> 
> Repository: sentry
> 
> 
> Description
> ---
> 
> I have considered multiple options. Persisting in batches is not an option 
> with out changing the schema as the data nucleus does not persist row in 
> batches for tables which have foreign key on other tables.
> 
> I see that best option is to persist the paths in parallel. It gave good 
> results.
> 
> Solution Approach:
> 
> I have used a thread pool to persist the snapshot. Size of this thread pool 
> is configurable. Paths for each object database/table are submitted to this 
> thread pool. If for reason some of the paths are not pesisted, snapshot is 
> removed and exception is throw back.
> 
>   This patch along with SENTRY-2423 was 5 times faster when tested with below.
> 
>  
> 
> Object Type   Count
> Databases 209
> Tables2100
> Partitions24
> 
> 
> Diffs
> -
> 
>   
> sentry-core/sentry-core-common/src/main/java/org/apache/sentry/service/common/ServiceConstants.java
>  092060c450c6a906850630cb10454737157af5fe 
>   
> sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
>  b387a22e40b8958395e1c12af12272b89a778726 
>   
> sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestHMSFollower.java
>  0d62941a7bd45e0d8a67b5d95fdb39a8801f5a26 
>   
> sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestSentryStore.java
>  4a9afe303672baff39be01d4f190034b2bfb75fe 
> 
> 
> Diff: https://reviews.apache.org/r/68973/diff/5/
> 
> 
> Testing
> ---
> 
> Added new test and also made sure that existing tests passed.
> 
> 
> Thanks,
> 
> kalyan kumar kalvagadda
> 
>



Re: Review Request 68973: SENTRY-2305: Optimize time taken for persistence HMS snapshot by persisting in parallel

2018-10-22 Thread kalyan kumar kalvagadda via Review Board


> On Oct. 22, 2018, 4:18 p.m., Sergio Pena wrote:
> > sentry-core/sentry-core-common/src/main/java/org/apache/sentry/service/common/ServiceConstants.java
> > Lines 152 (patched)
> > 
> >
> > how can we make this variable name to not be confused with threads 
> > fetching an hms snapshot?

I don't think the name is confusing.


> On Oct. 22, 2018, 4:18 p.m., Sergio Pena wrote:
> > sentry-core/sentry-core-common/src/main/java/org/apache/sentry/service/common/ServiceConstants.java
> > Lines 153 (patched)
> > 
> >
> > is there an optimal value higher than 2 to make it faster?

I used 2 because it is minimum value. We can increase the number but the 
optimal number depends on the hardware used and the cpu usage.


> On Oct. 22, 2018, 4:18 p.m., Sergio Pena wrote:
> > sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
> > Lines 3476-3479 (original), 3491-3494 (patched)
> > 
> >
> > I wonder how much memory do we expect in the worst case scenarios to be 
> > hold in memory for all these objects in the pool?
> > 
> > In the previous patchthe objects were persisted (1 at a time), and GC 
> > could collected them.

That is the case even now GC should collect them. Only difference is that there 
is additional memory usage for short period of time. It is not possible to come 
up with exact memory usage. In the worst case it could be size of the 
fullsnapshot.
I can make code changes to limit the queue length to avaoid additional memory 
usage.


> On Oct. 22, 2018, 4:18 p.m., Sergio Pena wrote:
> > sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
> > Line 3478 (original), 3493 (patched)
> > 
> >
> > This is executed in a different transaction?
> 
> Arjun Mishra wrote:
> I believe its all in a single transaction

it is not possible to perform concurrent inserts using a single transaction. 
SnapshotUpdater is executed as seperate transaction. In the event if failure we 
need to perform explicit cleanup.


> On Oct. 22, 2018, 4:18 p.m., Sergio Pena wrote:
> > sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
> > Line 3481 (original), 3496 (patched)
> > 
> >
> > Shouldn't we destroy or stop the persisting process of the rest of the 
> > paths if the persistingSnapshotFailed is true?
> 
> Arjun Mishra wrote:
> Yeah we do that. persistingSnapshotFailed is set to true when a single 
> processing fails with exception

we already do that using flag persistingSnapshotFailed.


> On Oct. 22, 2018, 4:18 p.m., Sergio Pena wrote:
> > sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
> > Lines 3510 (patched)
> > 
> >
> > Aren't these deleted during the rollback?
> 
> Arjun Mishra wrote:
> Does executeTransactionWithRetry include roll back on failure?

it is not possible to perform concurrent inserts using a single transaction. 
SnapshotUpdater is executed as seperate transaction. In the event if failure we 
need to perform explicit cleanup.


> On Oct. 22, 2018, 4:18 p.m., Sergio Pena wrote:
> > sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
> > Lines 5106-5112 (patched)
> > 
> >
> > How often does this code get ID conflicts while persisting in a 
> > threaded process? Isn't this going to make the persisting slow?

As the PATH_ID is auto generated there will not be any conflicts there may be 
some gaps.


- kalyan kumar


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68973/#review209865
---


On Oct. 22, 2018, 2:42 p.m., kalyan kumar kalvagadda wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68973/
> ---
> 
> (Updated Oct. 22, 2018, 2:42 p.m.)
> 
> 
> Review request for sentry, Arjun Mishra, Na Li, and Sergio Pena.
> 
> 
> Bugs: SENTRY-2305
> https://issues.apache.org/jira/browse/SENTRY-2305
> 
> 
> Repository: sentry
> 
> 
> Description
> ---
> 
> I have considered multiple options. 

Re: Review Request 68973: SENTRY-2305: Optimize time taken for persistence HMS snapshot by persisting in parallel

2018-10-22 Thread Arjun Mishra via Review Board


> On Oct. 22, 2018, 4:18 p.m., Sergio Pena wrote:
> > sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
> > Line 3478 (original), 3493 (patched)
> > 
> >
> > This is executed in a different transaction?

I believe its all in a single transaction


- Arjun


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68973/#review209865
---


On Oct. 22, 2018, 2:42 p.m., kalyan kumar kalvagadda wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68973/
> ---
> 
> (Updated Oct. 22, 2018, 2:42 p.m.)
> 
> 
> Review request for sentry, Arjun Mishra, Na Li, and Sergio Pena.
> 
> 
> Bugs: SENTRY-2305
> https://issues.apache.org/jira/browse/SENTRY-2305
> 
> 
> Repository: sentry
> 
> 
> Description
> ---
> 
> I have considered multiple options. Persisting in batches is not an option 
> with out changing the schema as the data nucleus does not persist row in 
> batches for tables which have foreign key on other tables.
> 
> I see that best option is to persist the paths in parallel. It gave good 
> results.
> 
> Solution Approach:
> 
> I have used a thread pool to persist the snapshot. Size of this thread pool 
> is configurable. Paths for each object database/table are submitted to this 
> thread pool. If for reason some of the paths are not pesisted, snapshot is 
> removed and exception is throw back.
> 
>   This patch along with SENTRY-2423 was 5 times faster when tested with below.
> 
>  
> 
> Object Type   Count
> Databases 209
> Tables2100
> Partitions24
> 
> 
> Diffs
> -
> 
>   
> sentry-core/sentry-core-common/src/main/java/org/apache/sentry/service/common/ServiceConstants.java
>  092060c450c6a906850630cb10454737157af5fe 
>   
> sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
>  b387a22e40b8958395e1c12af12272b89a778726 
>   
> sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestHMSFollower.java
>  0d62941a7bd45e0d8a67b5d95fdb39a8801f5a26 
>   
> sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestSentryStore.java
>  4a9afe303672baff39be01d4f190034b2bfb75fe 
> 
> 
> Diff: https://reviews.apache.org/r/68973/diff/5/
> 
> 
> Testing
> ---
> 
> Added new test and also made sure that existing tests passed.
> 
> 
> Thanks,
> 
> kalyan kumar kalvagadda
> 
>



Re: Review Request 68973: SENTRY-2305: Optimize time taken for persistence HMS snapshot by persisting in parallel

2018-10-22 Thread Arjun Mishra via Review Board


> On Oct. 22, 2018, 4:18 p.m., Sergio Pena wrote:
> > sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
> > Lines 3510 (patched)
> > 
> >
> > Aren't these deleted during the rollback?

Does executeTransactionWithRetry include roll back on failure?


- Arjun


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68973/#review209865
---


On Oct. 22, 2018, 2:42 p.m., kalyan kumar kalvagadda wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68973/
> ---
> 
> (Updated Oct. 22, 2018, 2:42 p.m.)
> 
> 
> Review request for sentry, Arjun Mishra, Na Li, and Sergio Pena.
> 
> 
> Bugs: SENTRY-2305
> https://issues.apache.org/jira/browse/SENTRY-2305
> 
> 
> Repository: sentry
> 
> 
> Description
> ---
> 
> I have considered multiple options. Persisting in batches is not an option 
> with out changing the schema as the data nucleus does not persist row in 
> batches for tables which have foreign key on other tables.
> 
> I see that best option is to persist the paths in parallel. It gave good 
> results.
> 
> Solution Approach:
> 
> I have used a thread pool to persist the snapshot. Size of this thread pool 
> is configurable. Paths for each object database/table are submitted to this 
> thread pool. If for reason some of the paths are not pesisted, snapshot is 
> removed and exception is throw back.
> 
>   This patch along with SENTRY-2423 was 5 times faster when tested with below.
> 
>  
> 
> Object Type   Count
> Databases 209
> Tables2100
> Partitions24
> 
> 
> Diffs
> -
> 
>   
> sentry-core/sentry-core-common/src/main/java/org/apache/sentry/service/common/ServiceConstants.java
>  092060c450c6a906850630cb10454737157af5fe 
>   
> sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
>  b387a22e40b8958395e1c12af12272b89a778726 
>   
> sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestHMSFollower.java
>  0d62941a7bd45e0d8a67b5d95fdb39a8801f5a26 
>   
> sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestSentryStore.java
>  4a9afe303672baff39be01d4f190034b2bfb75fe 
> 
> 
> Diff: https://reviews.apache.org/r/68973/diff/5/
> 
> 
> Testing
> ---
> 
> Added new test and also made sure that existing tests passed.
> 
> 
> Thanks,
> 
> kalyan kumar kalvagadda
> 
>



Re: Review Request 68973: SENTRY-2305: Optimize time taken for persistence HMS snapshot by persisting in parallel

2018-10-22 Thread Arjun Mishra via Review Board


> On Oct. 22, 2018, 4:18 p.m., Sergio Pena wrote:
> > sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
> > Line 3481 (original), 3496 (patched)
> > 
> >
> > Shouldn't we destroy or stop the persisting process of the rest of the 
> > paths if the persistingSnapshotFailed is true?

Yeah we do that. persistingSnapshotFailed is set to true when a single 
processing fails with exception


- Arjun


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68973/#review209865
---


On Oct. 22, 2018, 2:42 p.m., kalyan kumar kalvagadda wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68973/
> ---
> 
> (Updated Oct. 22, 2018, 2:42 p.m.)
> 
> 
> Review request for sentry, Arjun Mishra, Na Li, and Sergio Pena.
> 
> 
> Bugs: SENTRY-2305
> https://issues.apache.org/jira/browse/SENTRY-2305
> 
> 
> Repository: sentry
> 
> 
> Description
> ---
> 
> I have considered multiple options. Persisting in batches is not an option 
> with out changing the schema as the data nucleus does not persist row in 
> batches for tables which have foreign key on other tables.
> 
> I see that best option is to persist the paths in parallel. It gave good 
> results.
> 
> Solution Approach:
> 
> I have used a thread pool to persist the snapshot. Size of this thread pool 
> is configurable. Paths for each object database/table are submitted to this 
> thread pool. If for reason some of the paths are not pesisted, snapshot is 
> removed and exception is throw back.
> 
>   This patch along with SENTRY-2423 was 5 times faster when tested with below.
> 
>  
> 
> Object Type   Count
> Databases 209
> Tables2100
> Partitions24
> 
> 
> Diffs
> -
> 
>   
> sentry-core/sentry-core-common/src/main/java/org/apache/sentry/service/common/ServiceConstants.java
>  092060c450c6a906850630cb10454737157af5fe 
>   
> sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
>  b387a22e40b8958395e1c12af12272b89a778726 
>   
> sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestHMSFollower.java
>  0d62941a7bd45e0d8a67b5d95fdb39a8801f5a26 
>   
> sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestSentryStore.java
>  4a9afe303672baff39be01d4f190034b2bfb75fe 
> 
> 
> Diff: https://reviews.apache.org/r/68973/diff/5/
> 
> 
> Testing
> ---
> 
> Added new test and also made sure that existing tests passed.
> 
> 
> Thanks,
> 
> kalyan kumar kalvagadda
> 
>



Re: Review Request 68973: SENTRY-2305: Optimize time taken for persistence HMS snapshot by persisting in parallel

2018-10-22 Thread Na Li via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68973/#review209879
---




sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
Lines 5106-5112 (patched)


Sergio,

Persisting full snapshot does not persist path change, so no changeID 
conflict


- Na Li


On Oct. 22, 2018, 2:42 p.m., kalyan kumar kalvagadda wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68973/
> ---
> 
> (Updated Oct. 22, 2018, 2:42 p.m.)
> 
> 
> Review request for sentry, Arjun Mishra, Na Li, and Sergio Pena.
> 
> 
> Bugs: SENTRY-2305
> https://issues.apache.org/jira/browse/SENTRY-2305
> 
> 
> Repository: sentry
> 
> 
> Description
> ---
> 
> I have considered multiple options. Persisting in batches is not an option 
> with out changing the schema as the data nucleus does not persist row in 
> batches for tables which have foreign key on other tables.
> 
> I see that best option is to persist the paths in parallel. It gave good 
> results.
> 
> Solution Approach:
> 
> I have used a thread pool to persist the snapshot. Size of this thread pool 
> is configurable. Paths for each object database/table are submitted to this 
> thread pool. If for reason some of the paths are not pesisted, snapshot is 
> removed and exception is throw back.
> 
>   This patch along with SENTRY-2423 was 5 times faster when tested with below.
> 
>  
> 
> Object Type   Count
> Databases 209
> Tables2100
> Partitions24
> 
> 
> Diffs
> -
> 
>   
> sentry-core/sentry-core-common/src/main/java/org/apache/sentry/service/common/ServiceConstants.java
>  092060c450c6a906850630cb10454737157af5fe 
>   
> sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
>  b387a22e40b8958395e1c12af12272b89a778726 
>   
> sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestHMSFollower.java
>  0d62941a7bd45e0d8a67b5d95fdb39a8801f5a26 
>   
> sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestSentryStore.java
>  4a9afe303672baff39be01d4f190034b2bfb75fe 
> 
> 
> Diff: https://reviews.apache.org/r/68973/diff/5/
> 
> 
> Testing
> ---
> 
> Added new test and also made sure that existing tests passed.
> 
> 
> Thanks,
> 
> kalyan kumar kalvagadda
> 
>



Re: Review Request 68973: SENTRY-2305: Optimize time taken for persistence HMS snapshot by persisting in parallel

2018-10-22 Thread Sergio Pena via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68973/#review209865
---




sentry-core/sentry-core-common/src/main/java/org/apache/sentry/service/common/ServiceConstants.java
Lines 152 (patched)


how can we make this variable name to not be confused with threads fetching 
an hms snapshot?



sentry-core/sentry-core-common/src/main/java/org/apache/sentry/service/common/ServiceConstants.java
Lines 153 (patched)


is there an optimal value higher than 2 to make it faster?



sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
Lines 3476-3479 (original), 3491-3494 (patched)


I wonder how much memory do we expect in the worst case scenarios to be 
hold in memory for all these objects in the pool?

In the previous patchthe objects were persisted (1 at a time), and GC could 
collected them.



sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
Line 3478 (original), 3493 (patched)


This is executed in a different transaction?



sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
Line 3481 (original), 3496 (patched)


Shouldn't we destroy or stop the persisting process of the rest of the 
paths if the persistingSnapshotFailed is true?



sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
Lines 3510 (patched)


Aren't these deleted during the rollback?



sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
Lines 5106-5112 (patched)


How often does this code get ID conflicts while persisting in a threaded 
process? Isn't this going to make the persisting slow?


- Sergio Pena


On Oct. 22, 2018, 2:42 p.m., kalyan kumar kalvagadda wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68973/
> ---
> 
> (Updated Oct. 22, 2018, 2:42 p.m.)
> 
> 
> Review request for sentry, Arjun Mishra, Na Li, and Sergio Pena.
> 
> 
> Bugs: SENTRY-2305
> https://issues.apache.org/jira/browse/SENTRY-2305
> 
> 
> Repository: sentry
> 
> 
> Description
> ---
> 
> I have considered multiple options. Persisting in batches is not an option 
> with out changing the schema as the data nucleus does not persist row in 
> batches for tables which have foreign key on other tables.
> 
> I see that best option is to persist the paths in parallel. It gave good 
> results.
> 
> Solution Approach:
> 
> I have used a thread pool to persist the snapshot. Size of this thread pool 
> is configurable. Paths for each object database/table are submitted to this 
> thread pool. If for reason some of the paths are not pesisted, snapshot is 
> removed and exception is throw back.
> 
>   This patch along with SENTRY-2423 was 5 times faster when tested with below.
> 
>  
> 
> Object Type   Count
> Databases 209
> Tables2100
> Partitions24
> 
> 
> Diffs
> -
> 
>   
> sentry-core/sentry-core-common/src/main/java/org/apache/sentry/service/common/ServiceConstants.java
>  092060c450c6a906850630cb10454737157af5fe 
>   
> sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
>  b387a22e40b8958395e1c12af12272b89a778726 
>   
> sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestHMSFollower.java
>  0d62941a7bd45e0d8a67b5d95fdb39a8801f5a26 
>   
> sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestSentryStore.java
>  4a9afe303672baff39be01d4f190034b2bfb75fe 
> 
> 
> Diff: https://reviews.apache.org/r/68973/diff/5/
> 
> 
> Testing
> ---
> 
> Added new test and also made sure that existing tests passed.
> 
> 
> Thanks,
> 
> kalyan kumar kalvagadda
> 
>



Re: Review Request 68973: SENTRY-2305: Optimize time taken for persistence HMS snapshot by persisting in parallel

2018-10-22 Thread kalyan kumar kalvagadda via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68973/
---

(Updated Oct. 22, 2018, 2:42 p.m.)


Review request for sentry, Arjun Mishra, Na Li, and Sergio Pena.


Changes
---

addressed review comments.


Bugs: SENTRY-2305
https://issues.apache.org/jira/browse/SENTRY-2305


Repository: sentry


Description
---

I have considered multiple options. Persisting in batches is not an option with 
out changing the schema as the data nucleus does not persist row in batches for 
tables which have foreign key on other tables.

I see that best option is to persist the paths in parallel. It gave good 
results.

Solution Approach:

I have used a thread pool to persist the snapshot. Size of this thread pool is 
configurable. Paths for each object database/table are submitted to this thread 
pool. If for reason some of the paths are not pesisted, snapshot is removed and 
exception is throw back.

  This patch along with SENTRY-2423 was 5 times faster when tested with below.

 

Object Type Count
Databases   209
Tables  2100
Partitions  24


Diffs (updated)
-

  
sentry-core/sentry-core-common/src/main/java/org/apache/sentry/service/common/ServiceConstants.java
 092060c450c6a906850630cb10454737157af5fe 
  
sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
 b387a22e40b8958395e1c12af12272b89a778726 
  
sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestHMSFollower.java
 0d62941a7bd45e0d8a67b5d95fdb39a8801f5a26 
  
sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestSentryStore.java
 4a9afe303672baff39be01d4f190034b2bfb75fe 


Diff: https://reviews.apache.org/r/68973/diff/5/

Changes: https://reviews.apache.org/r/68973/diff/4-5/


Testing
---

Added new test and also made sure that existing tests passed.


Thanks,

kalyan kumar kalvagadda



Re: Review Request 68973: SENTRY-2305: Optimize time taken for persistence HMS snapshot by persisting in parallel

2018-10-19 Thread Arjun Mishra via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68973/#review209810
---




sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestSentryStore.java
Lines 2946 (patched)


can you update authzPaths map with db paths too?



sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestSentryStore.java
Lines 2984 (patched)


can you update authzPaths map with db paths too?


- Arjun Mishra


On Oct. 19, 2018, 4:53 p.m., kalyan kumar kalvagadda wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68973/
> ---
> 
> (Updated Oct. 19, 2018, 4:53 p.m.)
> 
> 
> Review request for sentry, Arjun Mishra, Na Li, and Sergio Pena.
> 
> 
> Bugs: SENTRY-2305
> https://issues.apache.org/jira/browse/SENTRY-2305
> 
> 
> Repository: sentry
> 
> 
> Description
> ---
> 
> I have considered multiple options. Persisting in batches is not an option 
> with out changing the schema as the data nucleus does not persist row in 
> batches for tables which have foreign key on other tables.
> 
> I see that best option is to persist the paths in parallel. It gave good 
> results.
> 
> Solution Approach:
> 
> I have used a thread pool to persist the snapshot. Size of this thread pool 
> is configurable. Paths for each object database/table are submitted to this 
> thread pool. If for reason some of the paths are not pesisted, snapshot is 
> removed and exception is throw back.
> 
>   This patch along with SENTRY-2423 was 5 times faster when tested with below.
> 
>  
> 
> Object Type   Count
> Databases 209
> Tables2100
> Partitions24
> 
> 
> Diffs
> -
> 
>   
> sentry-core/sentry-core-common/src/main/java/org/apache/sentry/service/common/ServiceConstants.java
>  092060c450c6a906850630cb10454737157af5fe 
>   
> sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
>  b387a22e40b8958395e1c12af12272b89a778726 
>   
> sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestHMSFollower.java
>  0d62941a7bd45e0d8a67b5d95fdb39a8801f5a26 
>   
> sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestSentryStore.java
>  4a9afe303672baff39be01d4f190034b2bfb75fe 
> 
> 
> Diff: https://reviews.apache.org/r/68973/diff/4/
> 
> 
> Testing
> ---
> 
> Added new test and also made sure that existing tests passed.
> 
> 
> Thanks,
> 
> kalyan kumar kalvagadda
> 
>



Re: Review Request 68973: SENTRY-2305: Optimize time taken for persistence HMS snapshot by persisting in parallel

2018-10-19 Thread Arjun Mishra via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68973/#review209809
---




sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
Lines 3944 (patched)


Can you change the comment to deleting all MAuthzPathsMapping and 
corresponding MAuthzPath objects



sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestSentryStore.java
Lines 2997 (patched)


I think a more clear test is to check counts on MAuthzPathsMapping and 
MAuthzPaths. You agree?


- Arjun Mishra


On Oct. 19, 2018, 4:53 p.m., kalyan kumar kalvagadda wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68973/
> ---
> 
> (Updated Oct. 19, 2018, 4:53 p.m.)
> 
> 
> Review request for sentry, Arjun Mishra, Na Li, and Sergio Pena.
> 
> 
> Bugs: SENTRY-2305
> https://issues.apache.org/jira/browse/SENTRY-2305
> 
> 
> Repository: sentry
> 
> 
> Description
> ---
> 
> I have considered multiple options. Persisting in batches is not an option 
> with out changing the schema as the data nucleus does not persist row in 
> batches for tables which have foreign key on other tables.
> 
> I see that best option is to persist the paths in parallel. It gave good 
> results.
> 
> Solution Approach:
> 
> I have used a thread pool to persist the snapshot. Size of this thread pool 
> is configurable. Paths for each object database/table are submitted to this 
> thread pool. If for reason some of the paths are not pesisted, snapshot is 
> removed and exception is throw back.
> 
>   This patch along with SENTRY-2423 was 5 times faster when tested with below.
> 
>  
> 
> Object Type   Count
> Databases 209
> Tables2100
> Partitions24
> 
> 
> Diffs
> -
> 
>   
> sentry-core/sentry-core-common/src/main/java/org/apache/sentry/service/common/ServiceConstants.java
>  092060c450c6a906850630cb10454737157af5fe 
>   
> sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
>  b387a22e40b8958395e1c12af12272b89a778726 
>   
> sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestHMSFollower.java
>  0d62941a7bd45e0d8a67b5d95fdb39a8801f5a26 
>   
> sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestSentryStore.java
>  4a9afe303672baff39be01d4f190034b2bfb75fe 
> 
> 
> Diff: https://reviews.apache.org/r/68973/diff/4/
> 
> 
> Testing
> ---
> 
> Added new test and also made sure that existing tests passed.
> 
> 
> Thanks,
> 
> kalyan kumar kalvagadda
> 
>



Re: Review Request 68973: SENTRY-2305: Optimize time taken for persistence HMS snapshot by persisting in parallel

2018-10-19 Thread kalyan kumar kalvagadda via Review Board


> On Oct. 19, 2018, 7:12 p.m., Na Li wrote:
> > sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
> > Lines 187 (patched)
> > 
> >
> > should this variable be local to persistFullPathsImage, and be atomic 
> > as it is set by more than one thread.

If it is local variable it would be not avaiable in SnapshotUpdater. member 
variables of SentryStore will be available to instances of SnapshotUpdater as 
it is a inner class for sentrystore.


> On Oct. 19, 2018, 7:12 p.m., Na Li wrote:
> > sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
> > Lines 294 (patched)
> > 
> >
> > When full snapshot fetch is done, should these counters be reset? 
> > Otherwise, they will keep on increasing, more than 
> > totalNumberOfObjectsToPersist

You are right. I will fix it. So yo have any other comments on this patch?


- kalyan kumar


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68973/#review209798
---


On Oct. 19, 2018, 4:53 p.m., kalyan kumar kalvagadda wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68973/
> ---
> 
> (Updated Oct. 19, 2018, 4:53 p.m.)
> 
> 
> Review request for sentry, Arjun Mishra, Na Li, and Sergio Pena.
> 
> 
> Bugs: SENTRY-2305
> https://issues.apache.org/jira/browse/SENTRY-2305
> 
> 
> Repository: sentry
> 
> 
> Description
> ---
> 
> I have considered multiple options. Persisting in batches is not an option 
> with out changing the schema as the data nucleus does not persist row in 
> batches for tables which have foreign key on other tables.
> 
> I see that best option is to persist the paths in parallel. It gave good 
> results.
> 
> Solution Approach:
> 
> I have used a thread pool to persist the snapshot. Size of this thread pool 
> is configurable. Paths for each object database/table are submitted to this 
> thread pool. If for reason some of the paths are not pesisted, snapshot is 
> removed and exception is throw back.
> 
>   This patch along with SENTRY-2423 was 5 times faster when tested with below.
> 
>  
> 
> Object Type   Count
> Databases 209
> Tables2100
> Partitions24
> 
> 
> Diffs
> -
> 
>   
> sentry-core/sentry-core-common/src/main/java/org/apache/sentry/service/common/ServiceConstants.java
>  092060c450c6a906850630cb10454737157af5fe 
>   
> sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
>  b387a22e40b8958395e1c12af12272b89a778726 
>   
> sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestHMSFollower.java
>  0d62941a7bd45e0d8a67b5d95fdb39a8801f5a26 
>   
> sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestSentryStore.java
>  4a9afe303672baff39be01d4f190034b2bfb75fe 
> 
> 
> Diff: https://reviews.apache.org/r/68973/diff/4/
> 
> 
> Testing
> ---
> 
> Added new test and also made sure that existing tests passed.
> 
> 
> Thanks,
> 
> kalyan kumar kalvagadda
> 
>



Re: Review Request 68973: SENTRY-2305: Optimize time taken for persistence HMS snapshot by persisting in parallel

2018-10-19 Thread Na Li via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68973/#review209798
---




sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
Lines 187 (patched)


should this variable be local to persistFullPathsImage, and be atomic as it 
is set by more than one thread.



sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
Lines 294 (patched)


When full snapshot fetch is done, should these counters be reset? 
Otherwise, they will keep on increasing, more than totalNumberOfObjectsToPersist


- Na Li


On Oct. 19, 2018, 4:53 p.m., kalyan kumar kalvagadda wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68973/
> ---
> 
> (Updated Oct. 19, 2018, 4:53 p.m.)
> 
> 
> Review request for sentry, Arjun Mishra, Na Li, and Sergio Pena.
> 
> 
> Bugs: SENTRY-2305
> https://issues.apache.org/jira/browse/SENTRY-2305
> 
> 
> Repository: sentry
> 
> 
> Description
> ---
> 
> I have considered multiple options. Persisting in batches is not an option 
> with out changing the schema as the data nucleus does not persist row in 
> batches for tables which have foreign key on other tables.
> 
> I see that best option is to persist the paths in parallel. It gave good 
> results.
> 
> Solution Approach:
> 
> I have used a thread pool to persist the snapshot. Size of this thread pool 
> is configurable. Paths for each object database/table are submitted to this 
> thread pool. If for reason some of the paths are not pesisted, snapshot is 
> removed and exception is throw back.
> 
>   This patch along with SENTRY-2423 was 5 times faster when tested with below.
> 
>  
> 
> Object Type   Count
> Databases 209
> Tables2100
> Partitions24
> 
> 
> Diffs
> -
> 
>   
> sentry-core/sentry-core-common/src/main/java/org/apache/sentry/service/common/ServiceConstants.java
>  092060c450c6a906850630cb10454737157af5fe 
>   
> sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
>  b387a22e40b8958395e1c12af12272b89a778726 
>   
> sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestHMSFollower.java
>  0d62941a7bd45e0d8a67b5d95fdb39a8801f5a26 
>   
> sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestSentryStore.java
>  4a9afe303672baff39be01d4f190034b2bfb75fe 
> 
> 
> Diff: https://reviews.apache.org/r/68973/diff/4/
> 
> 
> Testing
> ---
> 
> Added new test and also made sure that existing tests passed.
> 
> 
> Thanks,
> 
> kalyan kumar kalvagadda
> 
>



Re: Review Request 68973: SENTRY-2305: Optimize time taken for persistence HMS snapshot by persisting in parallel

2018-10-19 Thread Arjun Mishra via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68973/#review209795
---



Had a discussion with Kalyan about commiting a snapshot Id to 
AUTHZ_PATHS_MAPPING table that doesn't yet exist in AUTH_PATHS_SNAPSHOT_ID 
table. As of now since there is no foreign key constraint between 
AUTHZ_PATHS_MAPPING and AUTH_PATHS_SNAPSHOT_ID table it is fine. In future if 
we decide to add one code may need to be modified

- Arjun Mishra


On Oct. 19, 2018, 4:53 p.m., kalyan kumar kalvagadda wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68973/
> ---
> 
> (Updated Oct. 19, 2018, 4:53 p.m.)
> 
> 
> Review request for sentry, Arjun Mishra, Na Li, and Sergio Pena.
> 
> 
> Bugs: SENTRY-2305
> https://issues.apache.org/jira/browse/SENTRY-2305
> 
> 
> Repository: sentry
> 
> 
> Description
> ---
> 
> I have considered multiple options. Persisting in batches is not an option 
> with out changing the schema as the data nucleus does not persist row in 
> batches for tables which have foreign key on other tables.
> 
> I see that best option is to persist the paths in parallel. It gave good 
> results.
> 
> Solution Approach:
> 
> I have used a thread pool to persist the snapshot. Size of this thread pool 
> is configurable. Paths for each object database/table are submitted to this 
> thread pool. If for reason some of the paths are not pesisted, snapshot is 
> removed and exception is throw back.
> 
>   This patch along with SENTRY-2423 was 5 times faster when tested with below.
> 
>  
> 
> Object Type   Count
> Databases 209
> Tables2100
> Partitions24
> 
> 
> Diffs
> -
> 
>   
> sentry-core/sentry-core-common/src/main/java/org/apache/sentry/service/common/ServiceConstants.java
>  092060c450c6a906850630cb10454737157af5fe 
>   
> sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
>  b387a22e40b8958395e1c12af12272b89a778726 
>   
> sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestHMSFollower.java
>  0d62941a7bd45e0d8a67b5d95fdb39a8801f5a26 
>   
> sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestSentryStore.java
>  4a9afe303672baff39be01d4f190034b2bfb75fe 
> 
> 
> Diff: https://reviews.apache.org/r/68973/diff/4/
> 
> 
> Testing
> ---
> 
> Added new test and also made sure that existing tests passed.
> 
> 
> Thanks,
> 
> kalyan kumar kalvagadda
> 
>



Re: Review Request 68973: SENTRY-2305: Optimize time taken for persistence HMS snapshot by persisting in parallel

2018-10-19 Thread kalyan kumar kalvagadda via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68973/
---

(Updated Oct. 19, 2018, 4:53 p.m.)


Review request for sentry, Arjun Mishra, Na Li, and Sergio Pena.


Changes
---

Addressed review comments.


Bugs: SENTRY-2305
https://issues.apache.org/jira/browse/SENTRY-2305


Repository: sentry


Description
---

I have considered multiple options. Persisting in batches is not an option with 
out changing the schema as the data nucleus does not persist row in batches for 
tables which have foreign key on other tables.

I see that best option is to persist the paths in parallel. It gave good 
results.

Solution Approach:

I have used a thread pool to persist the snapshot. Size of this thread pool is 
configurable. Paths for each object database/table are submitted to this thread 
pool. If for reason some of the paths are not pesisted, snapshot is removed and 
exception is throw back.

  This patch along with SENTRY-2423 was 5 times faster when tested with below.

 

Object Type Count
Databases   209
Tables  2100
Partitions  24


Diffs (updated)
-

  
sentry-core/sentry-core-common/src/main/java/org/apache/sentry/service/common/ServiceConstants.java
 092060c450c6a906850630cb10454737157af5fe 
  
sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
 b387a22e40b8958395e1c12af12272b89a778726 
  
sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestHMSFollower.java
 0d62941a7bd45e0d8a67b5d95fdb39a8801f5a26 
  
sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestSentryStore.java
 4a9afe303672baff39be01d4f190034b2bfb75fe 


Diff: https://reviews.apache.org/r/68973/diff/4/

Changes: https://reviews.apache.org/r/68973/diff/3-4/


Testing
---

Added new test and also made sure that existing tests passed.


Thanks,

kalyan kumar kalvagadda



Re: Review Request 68973: SENTRY-2305: Optimize time taken for persistence HMS snapshot by persisting in parallel

2018-10-18 Thread Na Li via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68973/#review209760
---




sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
Lines 3427 (patched)


can you log the value of authzPaths.size() and 
getMAuthzPathsMappingCountCore(pm, nextSnapshotID)? That could give us a sense 
how big a difference they are.



sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
Lines 5001 (patched)


why do you have this field? I don't see it is used in this class.



sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
Lines 5020 (patched)


can you log the value of snapshotID as well?


- Na Li


On Oct. 10, 2018, 11:04 p.m., kalyan kumar kalvagadda wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68973/
> ---
> 
> (Updated Oct. 10, 2018, 11:04 p.m.)
> 
> 
> Review request for sentry, Arjun Mishra, Na Li, and Sergio Pena.
> 
> 
> Bugs: SENTRY-2305
> https://issues.apache.org/jira/browse/SENTRY-2305
> 
> 
> Repository: sentry
> 
> 
> Description
> ---
> 
> I have considered multiple options. Persisting in batches is not an option 
> with out changing the schema as the data nucleus does not persist row in 
> batches for tables which have foreign key on other tables.
> 
> I see that best option is to persist the paths in parallel. It gave good 
> results.
> 
> Solution Approach:
> 
> I have used a thread pool to persist the snapshot. Size of this thread pool 
> is configurable. Paths for each object database/table are submitted to this 
> thread pool. If for reason some of the paths are not pesisted, snapshot is 
> removed and exception is throw back.
> 
>   This patch along with SENTRY-2423 was 5 times faster when tested with below.
> 
>  
> 
> Object Type   Count
> Databases 209
> Tables2100
> Partitions24
> 
> 
> Diffs
> -
> 
>   
> sentry-core/sentry-core-common/src/main/java/org/apache/sentry/service/common/ServiceConstants.java
>  092060c450c6a906850630cb10454737157af5fe 
>   
> sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
>  7a736ca9604eb0bb182a159b5a2aed274275c16e 
>   
> sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestHMSFollower.java
>  0d62941a7bd45e0d8a67b5d95fdb39a8801f5a26 
>   
> sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestSentryStore.java
>  4a9afe303672baff39be01d4f190034b2bfb75fe 
> 
> 
> Diff: https://reviews.apache.org/r/68973/diff/3/
> 
> 
> Testing
> ---
> 
> Added new test and also made sure that existing tests passed.
> 
> 
> Thanks,
> 
> kalyan kumar kalvagadda
> 
>



Re: Review Request 68973: SENTRY-2305: Optimize time taken for persistence HMS snapshot by persisting in parallel

2018-10-18 Thread kalyan kumar kalvagadda via Review Board


> On Oct. 11, 2018, 2:55 a.m., Na Li wrote:
> > do you have measurement on how much time saved on saving big snapshot in 
> > parallel vs the number of threads? When multiple threads accessing the same 
> > table, the improvement may not be linear.
> 
> kalyan kumar kalvagadda wrote:
> Lina I agree. Performence would not increase linearly with the increare 
> in number of threads. That is why limited the thread count to 2. 
> I tests reults depend on the hardware where it is running and other load 
> on the CPU where the tests are run. That is why I did not capture the results 
> with variant thread count. I limited my tests with 1,2,3 threads. Performence 
> was increasing as I increased the threads to 3 but same may not be true if 
> increase furthur.

One side effect of this approach is that there will be out-of-order inserts and 
gaps in PATH_IDin AUTHZ_PATH table. This is not an issue.


- kalyan kumar


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68973/#review209430
---


On Oct. 10, 2018, 11:04 p.m., kalyan kumar kalvagadda wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68973/
> ---
> 
> (Updated Oct. 10, 2018, 11:04 p.m.)
> 
> 
> Review request for sentry, Arjun Mishra, Na Li, and Sergio Pena.
> 
> 
> Bugs: SENTRY-2305
> https://issues.apache.org/jira/browse/SENTRY-2305
> 
> 
> Repository: sentry
> 
> 
> Description
> ---
> 
> I have considered multiple options. Persisting in batches is not an option 
> with out changing the schema as the data nucleus does not persist row in 
> batches for tables which have foreign key on other tables.
> 
> I see that best option is to persist the paths in parallel. It gave good 
> results.
> 
> Solution Approach:
> 
> I have used a thread pool to persist the snapshot. Size of this thread pool 
> is configurable. Paths for each object database/table are submitted to this 
> thread pool. If for reason some of the paths are not pesisted, snapshot is 
> removed and exception is throw back.
> 
>   This patch along with SENTRY-2423 was 5 times faster when tested with below.
> 
>  
> 
> Object Type   Count
> Databases 209
> Tables2100
> Partitions24
> 
> 
> Diffs
> -
> 
>   
> sentry-core/sentry-core-common/src/main/java/org/apache/sentry/service/common/ServiceConstants.java
>  092060c450c6a906850630cb10454737157af5fe 
>   
> sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
>  7a736ca9604eb0bb182a159b5a2aed274275c16e 
>   
> sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestHMSFollower.java
>  0d62941a7bd45e0d8a67b5d95fdb39a8801f5a26 
>   
> sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestSentryStore.java
>  4a9afe303672baff39be01d4f190034b2bfb75fe 
> 
> 
> Diff: https://reviews.apache.org/r/68973/diff/3/
> 
> 
> Testing
> ---
> 
> Added new test and also made sure that existing tests passed.
> 
> 
> Thanks,
> 
> kalyan kumar kalvagadda
> 
>



Re: Review Request 68973: SENTRY-2305: Optimize time taken for persistence HMS snapshot by persisting in parallel

2018-10-15 Thread kalyan kumar kalvagadda via Review Board


> On Oct. 11, 2018, 2:55 a.m., Na Li wrote:
> > do you have measurement on how much time saved on saving big snapshot in 
> > parallel vs the number of threads? When multiple threads accessing the same 
> > table, the improvement may not be linear.

Lina I agree. Performence would not increase linearly with the increare in 
number of threads. That is why limited the thread count to 2. 
I tests reults depend on the hardware where it is running and other load on the 
CPU where the tests are run. That is why I did not capture the results with 
variant thread count. I limited my tests with 1,2,3 threads. Performence was 
increasing as I increased the threads to 3 but same may not be true if increase 
furthur.


- kalyan kumar


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68973/#review209430
---


On Oct. 10, 2018, 11:04 p.m., kalyan kumar kalvagadda wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68973/
> ---
> 
> (Updated Oct. 10, 2018, 11:04 p.m.)
> 
> 
> Review request for sentry, Arjun Mishra, Na Li, and Sergio Pena.
> 
> 
> Bugs: SENTRY-2305
> https://issues.apache.org/jira/browse/SENTRY-2305
> 
> 
> Repository: sentry
> 
> 
> Description
> ---
> 
> I have considered multiple options. Persisting in batches is not an option 
> with out changing the schema as the data nucleus does not persist row in 
> batches for tables which have foreign key on other tables.
> 
> I see that best option is to persist the paths in parallel. It gave good 
> results.
> 
> Solution Approach:
> 
> I have used a thread pool to persist the snapshot. Size of this thread pool 
> is configurable. Paths for each object database/table are submitted to this 
> thread pool. If for reason some of the paths are not pesisted, snapshot is 
> removed and exception is throw back.
> 
>   This patch along with SENTRY-2423 was 5 times faster when tested with below.
> 
>  
> 
> Object Type   Count
> Databases 209
> Tables2100
> Partitions24
> 
> 
> Diffs
> -
> 
>   
> sentry-core/sentry-core-common/src/main/java/org/apache/sentry/service/common/ServiceConstants.java
>  092060c450c6a906850630cb10454737157af5fe 
>   
> sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
>  7a736ca9604eb0bb182a159b5a2aed274275c16e 
>   
> sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestHMSFollower.java
>  0d62941a7bd45e0d8a67b5d95fdb39a8801f5a26 
>   
> sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestSentryStore.java
>  4a9afe303672baff39be01d4f190034b2bfb75fe 
> 
> 
> Diff: https://reviews.apache.org/r/68973/diff/3/
> 
> 
> Testing
> ---
> 
> Added new test and also made sure that existing tests passed.
> 
> 
> Thanks,
> 
> kalyan kumar kalvagadda
> 
>



Re: Review Request 68973: SENTRY-2305: Optimize time taken for persistence HMS snapshot by persisting in parallel

2018-10-10 Thread Na Li via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68973/#review209430
---



do you have measurement on how much time saved on saving big snapshot in 
parallel vs the number of threads? When multiple threads accessing the same 
table, the improvement may not be linear.

- Na Li


On Oct. 10, 2018, 11:04 p.m., kalyan kumar kalvagadda wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68973/
> ---
> 
> (Updated Oct. 10, 2018, 11:04 p.m.)
> 
> 
> Review request for sentry, Arjun Mishra, Na Li, and Sergio Pena.
> 
> 
> Bugs: SENTRY-2305
> https://issues.apache.org/jira/browse/SENTRY-2305
> 
> 
> Repository: sentry
> 
> 
> Description
> ---
> 
> I have considered multiple options. Persisting in batches is not an option 
> with out changing the schema as the data nucleus does not persist row in 
> batches for tables which have foreign key on other tables.
> 
> I see that best option is to persist the paths in parallel. It gave good 
> results.
> 
> Solution Approach:
> 
> I have used a thread pool to persist the snapshot. Size of this thread pool 
> is configurable. Paths for each object database/table are submitted to this 
> thread pool. If for reason some of the paths are not pesisted, snapshot is 
> removed and exception is throw back.
> 
>   This patch along with SENTRY-2423 was 5 times faster when tested with below.
> 
>  
> 
> Object Type   Count
> Databases 209
> Tables2100
> Partitions24
> 
> 
> Diffs
> -
> 
>   
> sentry-core/sentry-core-common/src/main/java/org/apache/sentry/service/common/ServiceConstants.java
>  092060c450c6a906850630cb10454737157af5fe 
>   
> sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
>  7a736ca9604eb0bb182a159b5a2aed274275c16e 
>   
> sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestHMSFollower.java
>  0d62941a7bd45e0d8a67b5d95fdb39a8801f5a26 
>   
> sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestSentryStore.java
>  4a9afe303672baff39be01d4f190034b2bfb75fe 
> 
> 
> Diff: https://reviews.apache.org/r/68973/diff/3/
> 
> 
> Testing
> ---
> 
> Added new test and also made sure that existing tests passed.
> 
> 
> Thanks,
> 
> kalyan kumar kalvagadda
> 
>



Re: Review Request 68973: SENTRY-2305: Optimize time taken for persistence HMS snapshot by persisting in parallel

2018-10-10 Thread kalyan kumar kalvagadda via Review Board


> On Oct. 10, 2018, 8:04 p.m., Na Li wrote:
> > sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
> > Lines 3430 (patched)
> > 
> >
> > to reduce overhead of reading when persisting snapshot fails, can we 
> > just delete all path mapping with nextSnapshotID?
> 
> kalyan kumar kalvagadda wrote:
> I will look into it

I have updated the patch. I think this is the best we could do as the 
datanucleus operates on DAO objects, this might internally fetch the data and 
then delete it.


- kalyan kumar


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68973/#review209419
---


On Oct. 10, 2018, 11:04 p.m., kalyan kumar kalvagadda wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68973/
> ---
> 
> (Updated Oct. 10, 2018, 11:04 p.m.)
> 
> 
> Review request for sentry, Arjun Mishra, Na Li, and Sergio Pena.
> 
> 
> Bugs: SENTRY-2305
> https://issues.apache.org/jira/browse/SENTRY-2305
> 
> 
> Repository: sentry
> 
> 
> Description
> ---
> 
> I have considered multiple options. Persisting in batches is not an option 
> with out changing the schema as the data nucleus does not persist row in 
> batches for tables which have foreign key on other tables.
> 
> I see that best option is to persist the paths in parallel. It gave good 
> results.
> 
> Solution Approach:
> 
> I have used a thread pool to persist the snapshot. Size of this thread pool 
> is configurable. Paths for each object database/table are submitted to this 
> thread pool. If for reason some of the paths are not pesisted, snapshot is 
> removed and exception is throw back.
> 
>   This patch along with SENTRY-2423 was 5 times faster when tested with below.
> 
>  
> 
> Object Type   Count
> Databases 209
> Tables2100
> Partitions24
> 
> 
> Diffs
> -
> 
>   
> sentry-core/sentry-core-common/src/main/java/org/apache/sentry/service/common/ServiceConstants.java
>  092060c450c6a906850630cb10454737157af5fe 
>   
> sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
>  7a736ca9604eb0bb182a159b5a2aed274275c16e 
>   
> sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestHMSFollower.java
>  0d62941a7bd45e0d8a67b5d95fdb39a8801f5a26 
>   
> sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestSentryStore.java
>  4a9afe303672baff39be01d4f190034b2bfb75fe 
> 
> 
> Diff: https://reviews.apache.org/r/68973/diff/3/
> 
> 
> Testing
> ---
> 
> Added new test and also made sure that existing tests passed.
> 
> 
> Thanks,
> 
> kalyan kumar kalvagadda
> 
>



Re: Review Request 68973: SENTRY-2305: Optimize time taken for persistence HMS snapshot by persisting in parallel

2018-10-10 Thread kalyan kumar kalvagadda via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68973/
---

(Updated Oct. 10, 2018, 11:04 p.m.)


Review request for sentry, Arjun Mishra, Na Li, and Sergio Pena.


Changes
---

addressed review comments.


Bugs: SENTRY-2305
https://issues.apache.org/jira/browse/SENTRY-2305


Repository: sentry


Description
---

I have considered multiple options. Persisting in batches is not an option with 
out changing the schema as the data nucleus does not persist row in batches for 
tables which have foreign key on other tables.

I see that best option is to persist the paths in parallel. It gave good 
results.

Solution Approach:

I have used a thread pool to persist the snapshot. Size of this thread pool is 
configurable. Paths for each object database/table are submitted to this thread 
pool. If for reason some of the paths are not pesisted, snapshot is removed and 
exception is throw back.

  This patch along with SENTRY-2423 was 5 times faster when tested with below.

 

Object Type Count
Databases   209
Tables  2100
Partitions  24


Diffs (updated)
-

  
sentry-core/sentry-core-common/src/main/java/org/apache/sentry/service/common/ServiceConstants.java
 092060c450c6a906850630cb10454737157af5fe 
  
sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
 7a736ca9604eb0bb182a159b5a2aed274275c16e 
  
sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestHMSFollower.java
 0d62941a7bd45e0d8a67b5d95fdb39a8801f5a26 
  
sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestSentryStore.java
 4a9afe303672baff39be01d4f190034b2bfb75fe 


Diff: https://reviews.apache.org/r/68973/diff/3/

Changes: https://reviews.apache.org/r/68973/diff/2-3/


Testing
---

Added new test and also made sure that existing tests passed.


Thanks,

kalyan kumar kalvagadda



Re: Review Request 68973: SENTRY-2305: Optimize time taken for persistence HMS snapshot by persisting in parallel

2018-10-10 Thread kalyan kumar kalvagadda via Review Board


> On Oct. 10, 2018, 8:04 p.m., Na Li wrote:
> > sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
> > Lines 3421 (patched)
> > 
> >
> > can we wrap the whole processing in try() and wait for threadpool to 
> > shutdown in finally?

We don't want to persist the notificationID and snapshotId untill the 
psersisting the path mapping is completed and verfied.


> On Oct. 10, 2018, 8:04 p.m., Na Li wrote:
> > sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
> > Lines 3430 (patched)
> > 
> >
> > to reduce overhead of reading when persisting snapshot fails, can we 
> > just delete all path mapping with nextSnapshotID?

I will look into it


- kalyan kumar


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68973/#review209419
---


On Oct. 10, 2018, 7:05 p.m., kalyan kumar kalvagadda wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68973/
> ---
> 
> (Updated Oct. 10, 2018, 7:05 p.m.)
> 
> 
> Review request for sentry, Arjun Mishra, Na Li, and Sergio Pena.
> 
> 
> Bugs: SENTRY-2305
> https://issues.apache.org/jira/browse/SENTRY-2305
> 
> 
> Repository: sentry
> 
> 
> Description
> ---
> 
> I have considered multiple options. Persisting in batches is not an option 
> with out changing the schema as the data nucleus does not persist row in 
> batches for tables which have foreign key on other tables.
> 
> I see that best option is to persist the paths in parallel. It gave good 
> results.
> 
> Solution Approach:
> 
> I have used a thread pool to persist the snapshot. Size of this thread pool 
> is configurable. Paths for each object database/table are submitted to this 
> thread pool. If for reason some of the paths are not pesisted, snapshot is 
> removed and exception is throw back.
> 
>   This patch along with SENTRY-2423 was 5 times faster when tested with below.
> 
>  
> 
> Object Type   Count
> Databases 209
> Tables2100
> Partitions24
> 
> 
> Diffs
> -
> 
>   
> sentry-core/sentry-core-common/src/main/java/org/apache/sentry/service/common/ServiceConstants.java
>  092060c450c6a906850630cb10454737157af5fe 
>   
> sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
>  7a736ca9604eb0bb182a159b5a2aed274275c16e 
>   
> sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestHMSFollower.java
>  0d62941a7bd45e0d8a67b5d95fdb39a8801f5a26 
>   
> sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestSentryStore.java
>  4a9afe303672baff39be01d4f190034b2bfb75fe 
> 
> 
> Diff: https://reviews.apache.org/r/68973/diff/2/
> 
> 
> Testing
> ---
> 
> Added new test and also made sure that existing tests passed.
> 
> 
> Thanks,
> 
> kalyan kumar kalvagadda
> 
>



Re: Review Request 68973: SENTRY-2305: Optimize time taken for persistence HMS snapshot by persisting in parallel

2018-10-10 Thread Na Li via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68973/#review209419
---




sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
Lines 3421 (patched)


can we wrap the whole processing in try() and wait for threadpool to 
shutdown in finally?



sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
Lines 3430 (patched)


to reduce overhead of reading when persisting snapshot fails, can we just 
delete all path mapping with nextSnapshotID?


- Na Li


On Oct. 10, 2018, 7:05 p.m., kalyan kumar kalvagadda wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68973/
> ---
> 
> (Updated Oct. 10, 2018, 7:05 p.m.)
> 
> 
> Review request for sentry, Arjun Mishra, Na Li, and Sergio Pena.
> 
> 
> Bugs: SENTRY-2305
> https://issues.apache.org/jira/browse/SENTRY-2305
> 
> 
> Repository: sentry
> 
> 
> Description
> ---
> 
> I have considered multiple options. Persisting in batches is not an option 
> with out changing the schema as the data nucleus does not persist row in 
> batches for tables which have foreign key on other tables.
> 
> I see that best option is to persist the paths in parallel. It gave good 
> results.
> 
> Solution Approach:
> 
> I have used a thread pool to persist the snapshot. Size of this thread pool 
> is configurable. Paths for each object database/table are submitted to this 
> thread pool. If for reason some of the paths are not pesisted, snapshot is 
> removed and exception is throw back.
> 
>   This patch along with SENTRY-2423 was 5 times faster when tested with below.
> 
>  
> 
> Object Type   Count
> Databases 209
> Tables2100
> Partitions24
> 
> 
> Diffs
> -
> 
>   
> sentry-core/sentry-core-common/src/main/java/org/apache/sentry/service/common/ServiceConstants.java
>  092060c450c6a906850630cb10454737157af5fe 
>   
> sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
>  7a736ca9604eb0bb182a159b5a2aed274275c16e 
>   
> sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestHMSFollower.java
>  0d62941a7bd45e0d8a67b5d95fdb39a8801f5a26 
>   
> sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestSentryStore.java
>  4a9afe303672baff39be01d4f190034b2bfb75fe 
> 
> 
> Diff: https://reviews.apache.org/r/68973/diff/2/
> 
> 
> Testing
> ---
> 
> Added new test and also made sure that existing tests passed.
> 
> 
> Thanks,
> 
> kalyan kumar kalvagadda
> 
>



Re: Review Request 68973: SENTRY-2305: Optimize time taken for persistence HMS snapshot by persisting in parallel

2018-10-10 Thread Arjun Mishra via Review Board


> On Oct. 10, 2018, 3:18 p.m., Arjun Mishra wrote:
> > sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
> > Lines 3404 (patched)
> > 
> >
> > This can be an attribute of the class and initialized in the constructor

Can this be added to the constructor? Will conf.getInt add unnecessary time 
delay to persit call?


- Arjun


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68973/#review209408
---


On Oct. 10, 2018, 7:05 p.m., kalyan kumar kalvagadda wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68973/
> ---
> 
> (Updated Oct. 10, 2018, 7:05 p.m.)
> 
> 
> Review request for sentry, Arjun Mishra, Na Li, and Sergio Pena.
> 
> 
> Bugs: SENTRY-2305
> https://issues.apache.org/jira/browse/SENTRY-2305
> 
> 
> Repository: sentry
> 
> 
> Description
> ---
> 
> I have considered multiple options. Persisting in batches is not an option 
> with out changing the schema as the data nucleus does not persist row in 
> batches for tables which have foreign key on other tables.
> 
> I see that best option is to persist the paths in parallel. It gave good 
> results.
> 
> Solution Approach:
> 
> I have used a thread pool to persist the snapshot. Size of this thread pool 
> is configurable. Paths for each object database/table are submitted to this 
> thread pool. If for reason some of the paths are not pesisted, snapshot is 
> removed and exception is throw back.
> 
>   This patch along with SENTRY-2423 was 5 times faster when tested with below.
> 
>  
> 
> Object Type   Count
> Databases 209
> Tables2100
> Partitions24
> 
> 
> Diffs
> -
> 
>   
> sentry-core/sentry-core-common/src/main/java/org/apache/sentry/service/common/ServiceConstants.java
>  092060c450c6a906850630cb10454737157af5fe 
>   
> sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
>  7a736ca9604eb0bb182a159b5a2aed274275c16e 
>   
> sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestHMSFollower.java
>  0d62941a7bd45e0d8a67b5d95fdb39a8801f5a26 
>   
> sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestSentryStore.java
>  4a9afe303672baff39be01d4f190034b2bfb75fe 
> 
> 
> Diff: https://reviews.apache.org/r/68973/diff/2/
> 
> 
> Testing
> ---
> 
> Added new test and also made sure that existing tests passed.
> 
> 
> Thanks,
> 
> kalyan kumar kalvagadda
> 
>



Re: Review Request 68973: SENTRY-2305: Optimize time taken for persistence HMS snapshot by persisting in parallel

2018-10-10 Thread Arjun Mishra via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68973/#review209417
---




sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
Lines 3400 (patched)


Shouldn't this be fetched in the same transaction? What if there is an 
error thrown here, the retry logic won't be invoked?


- Arjun Mishra


On Oct. 10, 2018, 7:05 p.m., kalyan kumar kalvagadda wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68973/
> ---
> 
> (Updated Oct. 10, 2018, 7:05 p.m.)
> 
> 
> Review request for sentry, Arjun Mishra, Na Li, and Sergio Pena.
> 
> 
> Bugs: SENTRY-2305
> https://issues.apache.org/jira/browse/SENTRY-2305
> 
> 
> Repository: sentry
> 
> 
> Description
> ---
> 
> I have considered multiple options. Persisting in batches is not an option 
> with out changing the schema as the data nucleus does not persist row in 
> batches for tables which have foreign key on other tables.
> 
> I see that best option is to persist the paths in parallel. It gave good 
> results.
> 
> Solution Approach:
> 
> I have used a thread pool to persist the snapshot. Size of this thread pool 
> is configurable. Paths for each object database/table are submitted to this 
> thread pool. If for reason some of the paths are not pesisted, snapshot is 
> removed and exception is throw back.
> 
>   This patch along with SENTRY-2423 was 5 times faster when tested with below.
> 
>  
> 
> Object Type   Count
> Databases 209
> Tables2100
> Partitions24
> 
> 
> Diffs
> -
> 
>   
> sentry-core/sentry-core-common/src/main/java/org/apache/sentry/service/common/ServiceConstants.java
>  092060c450c6a906850630cb10454737157af5fe 
>   
> sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
>  7a736ca9604eb0bb182a159b5a2aed274275c16e 
>   
> sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestHMSFollower.java
>  0d62941a7bd45e0d8a67b5d95fdb39a8801f5a26 
>   
> sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestSentryStore.java
>  4a9afe303672baff39be01d4f190034b2bfb75fe 
> 
> 
> Diff: https://reviews.apache.org/r/68973/diff/2/
> 
> 
> Testing
> ---
> 
> Added new test and also made sure that existing tests passed.
> 
> 
> Thanks,
> 
> kalyan kumar kalvagadda
> 
>



Re: Review Request 68973: SENTRY-2305: Optimize time taken for persistence HMS snapshot by persisting in parallel

2018-10-10 Thread kalyan kumar kalvagadda via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68973/
---

(Updated Oct. 10, 2018, 7:05 p.m.)


Review request for sentry, Arjun Mishra, Na Li, and Sergio Pena.


Changes
---

Addressed review comments.


Bugs: SENTRY-2305
https://issues.apache.org/jira/browse/SENTRY-2305


Repository: sentry


Description
---

I have considered multiple options. Persisting in batches is not an option with 
out changing the schema as the data nucleus does not persist row in batches for 
tables which have foreign key on other tables.

I see that best option is to persist the paths in parallel. It gave good 
results.

Solution Approach:

I have used a thread pool to persist the snapshot. Size of this thread pool is 
configurable. Paths for each object database/table are submitted to this thread 
pool. If for reason some of the paths are not pesisted, snapshot is removed and 
exception is throw back.

  This patch along with SENTRY-2423 gave was 5 times faster when tested with 
below.

 

Object Type Count
Databases   209
Tables  2100
Partitions  24


Diffs (updated)
-

  
sentry-core/sentry-core-common/src/main/java/org/apache/sentry/service/common/ServiceConstants.java
 092060c450c6a906850630cb10454737157af5fe 
  
sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
 7a736ca9604eb0bb182a159b5a2aed274275c16e 
  
sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestHMSFollower.java
 0d62941a7bd45e0d8a67b5d95fdb39a8801f5a26 
  
sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestSentryStore.java
 4a9afe303672baff39be01d4f190034b2bfb75fe 


Diff: https://reviews.apache.org/r/68973/diff/2/

Changes: https://reviews.apache.org/r/68973/diff/1-2/


Testing
---

Added new test and also made sure that existing tests passed.


Thanks,

kalyan kumar kalvagadda



Re: Review Request 68973: SENTRY-2305: Optimize time taken for persistence HMS snapshot by persisting in parallel

2018-10-10 Thread kalyan kumar kalvagadda via Review Board


> On Oct. 10, 2018, 3:18 p.m., Arjun Mishra wrote:
> > sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
> > Lines 3420 (patched)
> > 
> >
> > Can you explain why we need the thread.sleep ?

isTerminated() is not a blocking call. It returns immediatly. isTerminated() 
returns true if all tasks have completed following shut down. I'm blocking the 
thread for snapsot to be persited before performing the validation and 
persisting notification and snapshot id.


> On Oct. 10, 2018, 3:18 p.m., Arjun Mishra wrote:
> > sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
> > Lines 3424 (patched)
> > 
> >
> > I think if we know there is an exception we should handle it right away 
> > instead of comparing the size with what is persisted. According to the 
> > Executors.newFixedThreadPool API comments "If any thread terminates due to 
> > a failure during execution prior to shutdown, a new one will take its place 
> > if needed to execute subsequent tasks." Looks like 
> > Executors.newFixedThreadPool doesn't throw an exception if one of the tasks 
> > failed. If that is true I am not very comfortable with using this approach.

it can be solved. Look for it in the new patch


> On Oct. 10, 2018, 3:18 p.m., Arjun Mishra wrote:
> > sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
> > Lines 3430 (patched)
> > 
> >
> > If one snapshot persistance failed, how do we know the subsequent 
> > retries will pass?
> > 
> > Waiting for part of the snapshot persistence to complete and then 
> > checking is not very efficient in my opinion

We don't know for sure if it succeds when retried. We should perform retries 
based on the failure. That is an enhancement we should make to sentry. This 
trasaction of persisting the snapshot was could be retired before and I 
maintained the same.


I did not understand your 2nd point here. Let me clary one thing. Logic of 
validating the snapshot is invoked at the end, after inserting all the path 
mapping information is complete.


- kalyan kumar


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68973/#review209408
---


On Oct. 10, 2018, 5:38 p.m., kalyan kumar kalvagadda wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68973/
> ---
> 
> (Updated Oct. 10, 2018, 5:38 p.m.)
> 
> 
> Review request for sentry, Arjun Mishra, Na Li, and Sergio Pena.
> 
> 
> Bugs: SENTRY-2305
> https://issues.apache.org/jira/browse/SENTRY-2305
> 
> 
> Repository: sentry
> 
> 
> Description
> ---
> 
> I have considered multiple options. Persisting in batches is not an option 
> with out changing the schema as the data nucleus does not persist row in 
> batches for tables which have foreign key on other tables.
> 
> I see that best option is to persist the paths in parallel. It gave good 
> results.
> 
> Solution Approach:
> 
> I have used a thread pool to persist the snapshot. Size of this thread pool 
> is configurable. Paths for each object database/table are submitted to this 
> thread pool. If for reason some of the paths are not pesisted, snapshot is 
> removed and exception is throw back.
> 
>   This patch along with SENTRY-2423 gave was 5 times faster when tested with 
> below.
> 
>  
> 
> Object Type   Count
> Databases 209
> Tables2100
> Partitions24
> 
> 
> Diffs
> -
> 
>   
> sentry-core/sentry-core-common/src/main/java/org/apache/sentry/service/common/ServiceConstants.java
>  092060c450c6a906850630cb10454737157af5fe 
>   
> sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
>  7a736ca9604eb0bb182a159b5a2aed274275c16e 
>   
> sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestSentryStore.java
>  4a9afe303672baff39be01d4f190034b2bfb75fe 
> 
> 
> Diff: https://reviews.apache.org/r/68973/diff/1/
> 
> 
> Testing
> ---
> 
> Added new test and also made sure that existing tests passed.
> 
> 
> Thanks,
> 
> kalyan kumar kalvagadda
> 
>



Re: Review Request 68973: SENTRY-2305: Optimize time taken for persistence HMS snapshot by persisting in parallel

2018-10-10 Thread kalyan kumar kalvagadda via Review Board


> On Oct. 10, 2018, 3:29 p.m., Arjun Mishra wrote:
> > sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
> > Lines 3428 (patched)
> > 
> >
> > Our current code doesn't deletePersistentAll if an exception is thrown. 
> > This could cause more delays trying to undo a persistence

Current code doesn't have to delete on failure for two reasons
1. Complete snapshoot is persisted in single transaction.
2. If there is failure all the chnages are rolledback automatically in 
Transaction Manager.

With this change paths for each object is persisted in different transaction, 
If for some reason there were issue inserting in one of the transaction we 
should deleted the rest otherwise snapshot has incomplete data. 

Exceptions are not expected while persting snapshot. If there is failure while 
persisting, something is really wrong.  More over, you see that over ahead with 
the current logic as well. If there is a exception in the middle of persisting 
the snapshot, all the entries that were persisted till then will be rolled back 
by trasaction manager. Even this takes time. Only difference is that with the 
new approach we are doing it explciitly as the path information is persisted in 
multiple transactions.


- kalyan kumar


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68973/#review209410
---


On Oct. 10, 2018, 5:38 p.m., kalyan kumar kalvagadda wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68973/
> ---
> 
> (Updated Oct. 10, 2018, 5:38 p.m.)
> 
> 
> Review request for sentry, Arjun Mishra, Na Li, and Sergio Pena.
> 
> 
> Bugs: SENTRY-2305
> https://issues.apache.org/jira/browse/SENTRY-2305
> 
> 
> Repository: sentry
> 
> 
> Description
> ---
> 
> I have considered multiple options. Persisting in batches is not an option 
> with out changing the schema as the data nucleus does not persist row in 
> batches for tables which have foreign key on other tables.
> 
> I see that best option is to persist the paths in parallel. It gave good 
> results.
> 
> Solution Approach:
> 
> I have used a thread pool to persist the snapshot. Size of this thread pool 
> is configurable. Paths for each object database/table are submitted to this 
> thread pool. If for reason some of the paths are not pesisted, snapshot is 
> removed and exception is throw back.
> 
>   This patch along with SENTRY-2423 gave was 5 times faster when tested with 
> below.
> 
>  
> 
> Object Type   Count
> Databases 209
> Tables2100
> Partitions24
> 
> 
> Diffs
> -
> 
>   
> sentry-core/sentry-core-common/src/main/java/org/apache/sentry/service/common/ServiceConstants.java
>  092060c450c6a906850630cb10454737157af5fe 
>   
> sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
>  7a736ca9604eb0bb182a159b5a2aed274275c16e 
>   
> sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestSentryStore.java
>  4a9afe303672baff39be01d4f190034b2bfb75fe 
> 
> 
> Diff: https://reviews.apache.org/r/68973/diff/1/
> 
> 
> Testing
> ---
> 
> Added new test and also made sure that existing tests passed.
> 
> 
> Thanks,
> 
> kalyan kumar kalvagadda
> 
>



Re: Review Request 68973: SENTRY-2305: Optimize time taken for persistence HMS snapshot by persisting in parallel

2018-10-10 Thread kalyan kumar kalvagadda via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68973/
---

(Updated Oct. 10, 2018, 5:38 p.m.)


Review request for sentry, Arjun Mishra, Na Li, and Sergio Pena.


Changes
---

Changed the summary.


Summary (updated)
-

SENTRY-2305: Optimize time taken for persistence HMS snapshot by persisting in 
parallel


Bugs: SENTRY-2305
https://issues.apache.org/jira/browse/SENTRY-2305


Repository: sentry


Description
---

I have considered multiple options. Persisting in batches is not an option with 
out changing the schema as the data nucleus does not persist row in batches for 
tables which have foreign key on other tables.

I see that best option is to persist the paths in parallel. It gave good 
results.

Solution Approach:

I have used a thread pool to persist the snapshot. Size of this thread pool is 
configurable. Paths for each object database/table are submitted to this thread 
pool. If for reason some of the paths are not pesisted, snapshot is removed and 
exception is throw back.

  This patch along with SENTRY-2423 gave was 5 times faster when tested with 
below.

 

Object Type Count
Databases   209
Tables  2100
Partitions  24


Diffs
-

  
sentry-core/sentry-core-common/src/main/java/org/apache/sentry/service/common/ServiceConstants.java
 092060c450c6a906850630cb10454737157af5fe 
  
sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
 7a736ca9604eb0bb182a159b5a2aed274275c16e 
  
sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestSentryStore.java
 4a9afe303672baff39be01d4f190034b2bfb75fe 


Diff: https://reviews.apache.org/r/68973/diff/1/


Testing
---

Added new test and also made sure that existing tests passed.


Thanks,

kalyan kumar kalvagadda