[jira] [Comment Edited] (SENTRY-2305) Optimize time taken for persistence HMS snapshot by persisting in parallel

2018-10-19 Thread kalyan kumar kalvagadda (JIRA)


[ 
https://issues.apache.org/jira/browse/SENTRY-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16657221#comment-16657221
 ] 

kalyan kumar kalvagadda edited comment on SENTRY-2305 at 10/19/18 6:02 PM:
---

There are couple of advantages with is approach
 # Persisting the paths in multiple threads would increase the throughput.
 # Breaks the snapshot into bunch of commits making thing faster.


was (Author: kkalyan):
There are couple of advantages with is approach
 # Persisting the paths in multiple threads would increase the throughput.
 # Breaks the snapshot into butch of commits making thing faster.

> Optimize time taken for persistence HMS snapshot by persisting in parallel
> --
>
> Key: SENTRY-2305
> URL: https://issues.apache.org/jira/browse/SENTRY-2305
> Project: Sentry
>  Issue Type: Sub-task
>  Components: Sentry
>Affects Versions: 2.1.0
>Reporter: kalyan kumar kalvagadda
>Assignee: kalyan kumar kalvagadda
>Priority: Major
> Attachments: SENTRY-2305.001.patch, SENTRY-2305.002.patch, 
> SENTRY-2305.003.patch, SENTRY-2305.004.patch, SENTRY-2305.005.patch
>
>
> There are couple of options
> # Break the total snapshot into to batches and persist all of them in 
> parallel in different transactions. As sentry uses repeatable_read isolation 
> level we should be able to have parallel writes on the same table. This bring 
> an issue if there is a failure in persisting any of the batches. This 
> approach needs additional logic of cleaning the partially persisted snapshot. 
> I’m evaluating this option. 
> ** *Result:* Initial results are promising. Time to persist the snapshot came 
> down by 60%.
> # Try disabling L1 Cache for persisting the snapshot.
> # Try persisting the snapshot entries sequentially in separate transactions. 
> As transactions which commit huge data might take longer as they take a lot 
> of CPU cycles to keep the rollback log up to date.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (SENTRY-2305) Optimize time taken for persistence HMS snapshot by persisting in parallel

2018-10-10 Thread kalyan kumar kalvagadda (JIRA)


[ 
https://issues.apache.org/jira/browse/SENTRY-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16642479#comment-16642479
 ] 

kalyan kumar kalvagadda edited comment on SENTRY-2305 at 10/10/18 7:06 PM:
---

I have considered multiple options. Persisting in batches is not an option with 
out changing the schema as the data nucleus does not persist row in batches for 
tables which have foreign key on other tables.

I see that best option is to persist the paths in parallel. It gave good 
results.

*Solution Approach:*

I have used a thread pool to persist the snapshot. Size of this thread pool is 
configurable. Paths for each object database/table are submitted to this thread 
pool. If for reason some of the paths are not pesisted, snapshot is removed and 
exception is throw back.

  This patch along with SENTRY-2423 was 5 times faster when tested with below.

 
||Object Type||Count||
|Databases|209|
|Tables|2100|
|Partitions|24|

 

 


was (Author: kkalyan):
I have considered multiple options. Persisting in batches is not an option with 
out changing the schema as the data nucleus does not persist row in batches for 
tables which have foreign key on other tables.

I see that best option is to persist the paths in parallel. It gave good 
results.

*Solution Approach:*

I have used a thread pool to persist the snapshot. Size of this thread pool is 
configurable. Paths for each object database/table are submitted to this thread 
pool. If for reason some of the paths are not pesisted, snapshot is removed and 
exception is throw back.

  This patch along with SENTRY-2423 gave was 5 times faster when tested with 
below.

 
||Object Type||Count||
|Databases|209|
|Tables|2100|
|Partitions|24|

 

 

> Optimize time taken for persistence HMS snapshot by persisting in parallel
> --
>
> Key: SENTRY-2305
> URL: https://issues.apache.org/jira/browse/SENTRY-2305
> Project: Sentry
>  Issue Type: Sub-task
>  Components: Sentry
>Affects Versions: 2.1.0
>Reporter: kalyan kumar kalvagadda
>Assignee: kalyan kumar kalvagadda
>Priority: Major
> Attachments: SENTRY-2305.001.patch, SENTRY-2305.002.patch, 
> SENTRY-2305.003.patch
>
>
> There are couple of options
> # Break the total snapshot into to batches and persist all of them in 
> parallel in different transactions. As sentry uses repeatable_read isolation 
> level we should be able to have parallel writes on the same table. This bring 
> an issue if there is a failure in persisting any of the batches. This 
> approach needs additional logic of cleaning the partially persisted snapshot. 
> I’m evaluating this option. 
> ** *Result:* Initial results are promising. Time to persist the snapshot came 
> down by 60%.
> # Try disabling L1 Cache for persisting the snapshot.
> # Try persisting the snapshot entries sequentially in separate transactions. 
> As transactions which commit huge data might take longer as they take a lot 
> of CPU cycles to keep the rollback log up to date.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (SENTRY-2305) Optimize time taken for persistence HMS snapshot

2018-10-10 Thread kalyan kumar kalvagadda (JIRA)


[ 
https://issues.apache.org/jira/browse/SENTRY-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16642479#comment-16642479
 ] 

kalyan kumar kalvagadda edited comment on SENTRY-2305 at 10/10/18 1:41 PM:
---

I have considered multiple options. Persisting in batches is not an option with 
out changing the schema as the data nucleus does not persist row in batches for 
tables which have foreign key on other tables.

I see that best option is to persist the paths in parallel. It gave good 
results.

*Solution Approach:*

I have used a thread pool to persist the snapshot. Size of this thread pool is 
configurable. Paths for each object database/table are submitted to this thread 
pool. If for reason some of the paths are not pesisted, snapshot is removed and 
exception is throw back.

  This patch along with SENTRY-2423 gave was 5 times faster when tested with 
below.

 
||Object Type||Count||
|Databases|209|
|Tables|2100|
|Partitions|24|

 

 


was (Author: kkalyan):
I have considered multiple options. Persisting in batches is not an option with 
out changing the schema as the data nucleus does not persist row in batches for 
tables which have foreign key on other tables.

I see that best option is to persist the paths in parallel. It gave good 
results.

*Solution Approach:*

I have used a thread pool to persist the snapshot. Paths for each object 
database/table are submitted to this thread pool. If for reason some of the 
paths are not pesisted, snapshot is removed and exception is throw back.

  This patch along with SENTRY-2423 gave was 5 times faster when tested with 
below.

 
||Object Type||Count||
|Databases|209|
|Tables|2100|
|Partitions|24|

 

 

> Optimize time taken for persistence HMS snapshot 
> -
>
> Key: SENTRY-2305
> URL: https://issues.apache.org/jira/browse/SENTRY-2305
> Project: Sentry
>  Issue Type: Sub-task
>  Components: Sentry
>Affects Versions: 2.1.0
>Reporter: kalyan kumar kalvagadda
>Assignee: kalyan kumar kalvagadda
>Priority: Major
> Attachments: SENTRY-2305.001.patch
>
>
> There are couple of options
> # Break the total snapshot into to batches and persist all of them in 
> parallel in different transactions. As sentry uses repeatable_read isolation 
> level we should be able to have parallel writes on the same table. This bring 
> an issue if there is a failure in persisting any of the batches. This 
> approach needs additional logic of cleaning the partially persisted snapshot. 
> I’m evaluating this option. 
> ** *Result:* Initial results are promising. Time to persist the snapshot came 
> down by 60%.
> # Try disabling L1 Cache for persisting the snapshot.
> # Try persisting the snapshot entries sequentially in separate transactions. 
> As transactions which commit huge data might take longer as they take a lot 
> of CPU cycles to keep the rollback log up to date.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (SENTRY-2305) Optimize time taken for persistence HMS snapshot

2018-10-10 Thread kalyan kumar kalvagadda (JIRA)


[ 
https://issues.apache.org/jira/browse/SENTRY-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16642479#comment-16642479
 ] 

kalyan kumar kalvagadda edited comment on SENTRY-2305 at 10/10/18 1:39 PM:
---

I have considered multiple options. Persisting in batches is not an option with 
out changing the schema as the data nucleus does not persist row in batches for 
tables which have foreign key on other tables.

I see that best option is to persist the paths in parallel. It gave good 
results.

*Solution Approach:*

I have used a thread pool to persist the snapshot. Paths for each object 
database/table are submitted to this thread pool. If for reason some of the 
paths are not pesisted, snapshot is removed and exception is throw back.

  This patch along with SENTRY-2423 gave was 5 times faster when tested with 
below.

 
||Object Type||Count||
|Databases|209|
|Tables|2100|
|Partitions|24|

 

 


was (Author: kkalyan):
I have considered multiple options. Persisting in batches is not an option with 
out changing the schema as the data nucleus does not persist row in batches for 
tables which have foreign key on other tables.

I see that best option is to persist the paths in parallel. It gave good 
results.  I will be updating the results from the tests in a day.

> Optimize time taken for persistence HMS snapshot 
> -
>
> Key: SENTRY-2305
> URL: https://issues.apache.org/jira/browse/SENTRY-2305
> Project: Sentry
>  Issue Type: Sub-task
>  Components: Sentry
>Affects Versions: 2.1.0
>Reporter: kalyan kumar kalvagadda
>Assignee: kalyan kumar kalvagadda
>Priority: Major
> Attachments: SENTRY-2305.001.patch
>
>
> There are couple of options
> # Break the total snapshot into to batches and persist all of them in 
> parallel in different transactions. As sentry uses repeatable_read isolation 
> level we should be able to have parallel writes on the same table. This bring 
> an issue if there is a failure in persisting any of the batches. This 
> approach needs additional logic of cleaning the partially persisted snapshot. 
> I’m evaluating this option. 
> ** *Result:* Initial results are promising. Time to persist the snapshot came 
> down by 60%.
> # Try disabling L1 Cache for persisting the snapshot.
> # Try persisting the snapshot entries sequentially in separate transactions. 
> As transactions which commit huge data might take longer as they take a lot 
> of CPU cycles to keep the rollback log up to date.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (SENTRY-2305) Optimize time taken for persistence HMS snapshot

2018-07-11 Thread kalyan kumar kalvagadda (JIRA)


[ 
https://issues.apache.org/jira/browse/SENTRY-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540113#comment-16540113
 ] 

kalyan kumar kalvagadda edited comment on SENTRY-2305 at 7/11/18 9:53 PM:
--

*Test Data:*
*Databases:* 2
*Tables in each Database*: 100
*Partitions in table*: 500 

*Note: *Time taken in my tests should be not considered as standard as i'm 
running them on my local machine. What we should be looking at is the relative 
difference.

Work in progress.
||Option||Description||Time Taken||
|1|no change| n sec||
|2|Persist batches in parallel| n min||
|3|Persist batches sequentially| 5 min||
|4|disable L-1 Cache| 3 sec||

 


was (Author: kkalyan):
*Test Data:*
*Databases:* 2
*Tables in each Database*: 100
*Partitions in table*: 500 

*Note: *Time taken in my tests should be not considered as standard as i'm 
running them on my local machine. What we should be looking at is the relative 
difference.

Work in progress.
||Option||Description||Time Taken||
|1|no change| n sec||
|2|Persist batches in parallel| n min||
|3|Persist batches sequentially| 5 min||
|4|disable L-1 Cache| n sec||

 

> Optimize time taken for persistence HMS snapshot 
> -
>
> Key: SENTRY-2305
> URL: https://issues.apache.org/jira/browse/SENTRY-2305
> Project: Sentry
>  Issue Type: Sub-task
>  Components: Sentry
>Affects Versions: 2.1.0
>Reporter: kalyan kumar kalvagadda
>Assignee: kalyan kumar kalvagadda
>Priority: Major
>
> There are couple of options
> # Break the total snapshot into to batches and persist all of them in 
> parallel in different transactions. As sentry uses repeatable_read isolation 
> level we should be able to have parallel writes on the same table. This bring 
> an issue if there is a failure in persisting any of the batches. This 
> approach needs additional logic of cleaning the partially persisted snapshot. 
> I’m evaluating this option. 
> ** *Result:* Initial results are promising. Time to persist the snapshot came 
> down by 60%.
> # Try disabling L1 Cache for persisting the snapshot.
> # Try persisting the snapshot entries sequentially in separate transactions. 
> As transactions which commit huge data might take longer as they take a lot 
> of CPU cycles to keep the rollback log up to date.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (SENTRY-2305) Optimize time taken for persistence HMS snapshot

2018-07-11 Thread kalyan kumar kalvagadda (JIRA)


[ 
https://issues.apache.org/jira/browse/SENTRY-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540113#comment-16540113
 ] 

kalyan kumar kalvagadda edited comment on SENTRY-2305 at 7/11/18 1:50 PM:
--

*Test Data:*
*Databases:* 2
*Tables in each Database*: 100
*Partitions in table*: 500 

*Note: *Time taken in my tests should be not considered as standard as i'm 
running them on my local machine. What we should be looking at is the relative 
difference.

Work in progress.
||Option||Description||Time Taken||
|1|no change| n sec||
|2|Persist batches in parallel| n sec||
|3|Persist batches sequentially| n sec||
|4|disable L-1 Cache| n sec||

 


was (Author: kkalyan):
*Test Data:*
*Databases:* 2
*Tables in each Database*: 100
*Partitions in table*: 500 

*Note: *Time taken in my tests should be not considered as standard as i'm 
running them on my local machine. What we should be looking at is the relative 
difference.

Work in progress.
||Option||Description||Time Taken||
|1|no change| n sec||
|2|no change| n sec||
|3|no change| n sec||
|4|no change| n sec||

 

> Optimize time taken for persistence HMS snapshot 
> -
>
> Key: SENTRY-2305
> URL: https://issues.apache.org/jira/browse/SENTRY-2305
> Project: Sentry
>  Issue Type: Sub-task
>  Components: Sentry
>Affects Versions: 2.1.0
>Reporter: kalyan kumar kalvagadda
>Assignee: kalyan kumar kalvagadda
>Priority: Major
>
> There are couple of options
> # Break the total snapshot into to batches and persist all of them in 
> parallel in different transactions. As sentry uses repeatable_read isolation 
> level we should be able to have parallel writes on the same table. This bring 
> an issue if there is a failure in persisting any of the batches. This 
> approach needs additional logic of cleaning the partially persisted snapshot. 
> I’m evaluating this option. 
> ** *Result:* Initial results are promising. Time to persist the snapshot came 
> down by 60%.
> # Try disabling L1 Cache for persisting the snapshot.
> # Try persisting the snapshot entries sequentially in separate transactions. 
> As transactions which commit huge data might take longer as they take a lot 
> of CPU cycles to keep the rollback log up to date.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (SENTRY-2305) Optimize time taken for persistence HMS snapshot

2018-07-11 Thread kalyan kumar kalvagadda (JIRA)


[ 
https://issues.apache.org/jira/browse/SENTRY-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540113#comment-16540113
 ] 

kalyan kumar kalvagadda edited comment on SENTRY-2305 at 7/11/18 1:50 PM:
--

*Test Data:*
*Databases:* 2
*Tables in each Database*: 100
*Partitions in table*: 500 

*Note: *Time taken in my tests should be not considered as standard as i'm 
running them on my local machine. What we should be looking at is the relative 
difference.

Work in progress.
||Option||Description||Time Taken||
|1|no change| n sec||
|2|Persist batches in parallel| n min||
|3|Persist batches sequentially| 5 min||
|4|disable L-1 Cache| n sec||

 


was (Author: kkalyan):
*Test Data:*
*Databases:* 2
*Tables in each Database*: 100
*Partitions in table*: 500 

*Note: *Time taken in my tests should be not considered as standard as i'm 
running them on my local machine. What we should be looking at is the relative 
difference.

Work in progress.
||Option||Description||Time Taken||
|1|no change| n sec||
|2|Persist batches in parallel| n sec||
|3|Persist batches sequentially| n sec||
|4|disable L-1 Cache| n sec||

 

> Optimize time taken for persistence HMS snapshot 
> -
>
> Key: SENTRY-2305
> URL: https://issues.apache.org/jira/browse/SENTRY-2305
> Project: Sentry
>  Issue Type: Sub-task
>  Components: Sentry
>Affects Versions: 2.1.0
>Reporter: kalyan kumar kalvagadda
>Assignee: kalyan kumar kalvagadda
>Priority: Major
>
> There are couple of options
> # Break the total snapshot into to batches and persist all of them in 
> parallel in different transactions. As sentry uses repeatable_read isolation 
> level we should be able to have parallel writes on the same table. This bring 
> an issue if there is a failure in persisting any of the batches. This 
> approach needs additional logic of cleaning the partially persisted snapshot. 
> I’m evaluating this option. 
> ** *Result:* Initial results are promising. Time to persist the snapshot came 
> down by 60%.
> # Try disabling L1 Cache for persisting the snapshot.
> # Try persisting the snapshot entries sequentially in separate transactions. 
> As transactions which commit huge data might take longer as they take a lot 
> of CPU cycles to keep the rollback log up to date.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)