[jira] [Commented] (KUDU-1994) Automatically Create New Range Partitions When Needed

2022-06-27 Thread Thomas D'Silva (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17559203#comment-17559203
 ] 

Thomas D'Silva commented on KUDU-1994:
--

I have not been able to work on this, feel free to assign this to yourself if 
you want to work on it. 

> Automatically Create New Range Partitions When Needed
> -
>
> Key: KUDU-1994
> URL: https://issues.apache.org/jira/browse/KUDU-1994
> Project: Kudu
>  Issue Type: Improvement
>Affects Versions: 1.3.0
>Reporter: Alan Jackoway
>Assignee: Thomas D'Silva
>Priority: Major
>  Labels: roadmap-candidate
>
> We have a few Kudu tables where we use a range-partitioned timestamp as part 
> of the key. The intention of this is to keep data locality for data that is 
> likely to be scanned together, such as events in a timeseries.
> Currently we create these with a partitions that look like this:
> {noformat}
> RANGE (ts) (
> PARTITION 0 <= VALUES < 142008840,
> PARTITION 142008840 <= VALUES < 142786080,
> PARTITION 142786080 <= VALUES < 143572320,
> PARTITION 143572320 <= VALUES < 144367200,
> PARTITION 144367200 <= VALUES < 145162440,
> PARTITION 145162440 <= VALUES < 145948320,
> PARTITION 145948320 <= VALUES < 146734560,
> PARTITION 146734560 <= VALUES < 147529440,
> PARTITION 147529440 <= VALUES < 148324680,
> PARTITION 148324680 <= VALUES < 149103360,
> PARTITION 149103360 <= VALUES < 149889600,
> PARTITION 149889600 <= VALUES < 150684480
> )
> {noformat}
> The problem is that as time goes on we have to choose to either create empty 
> partitions in advance of when we are writing data or risk forgetting to 
> create a partition and having writes of new data fail.
> Ideally, Kudu would have a way to indicate the size of the partitions (in 
> this example 3 months converted to milliseconds) and then automatically 
> create new partitions when new data comes in that needs the partition.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (KUDU-1994) Automatically Create New Range Partitions When Needed

2022-06-27 Thread Xixu Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17559153#comment-17559153
 ] 

Xixu Wang commented on KUDU-1994:
-

How is this feature going?

> Automatically Create New Range Partitions When Needed
> -
>
> Key: KUDU-1994
> URL: https://issues.apache.org/jira/browse/KUDU-1994
> Project: Kudu
>  Issue Type: Improvement
>Affects Versions: 1.3.0
>Reporter: Alan Jackoway
>Assignee: Thomas D'Silva
>Priority: Major
>  Labels: roadmap-candidate
>
> We have a few Kudu tables where we use a range-partitioned timestamp as part 
> of the key. The intention of this is to keep data locality for data that is 
> likely to be scanned together, such as events in a timeseries.
> Currently we create these with a partitions that look like this:
> {noformat}
> RANGE (ts) (
> PARTITION 0 <= VALUES < 142008840,
> PARTITION 142008840 <= VALUES < 142786080,
> PARTITION 142786080 <= VALUES < 143572320,
> PARTITION 143572320 <= VALUES < 144367200,
> PARTITION 144367200 <= VALUES < 145162440,
> PARTITION 145162440 <= VALUES < 145948320,
> PARTITION 145948320 <= VALUES < 146734560,
> PARTITION 146734560 <= VALUES < 147529440,
> PARTITION 147529440 <= VALUES < 148324680,
> PARTITION 148324680 <= VALUES < 149103360,
> PARTITION 149103360 <= VALUES < 149889600,
> PARTITION 149889600 <= VALUES < 150684480
> )
> {noformat}
> The problem is that as time goes on we have to choose to either create empty 
> partitions in advance of when we are writing data or risk forgetting to 
> create a partition and having writes of new data fail.
> Ideally, Kudu would have a way to indicate the size of the partitions (in 
> this example 3 months converted to milliseconds) and then automatically 
> create new partitions when new data comes in that needs the partition.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (KUDU-1994) Automatically Create New Range Partitions When Needed

2020-04-23 Thread LiFu He (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090400#comment-17090400
 ] 

LiFu He commented on KUDU-1994:
---

Yes, go ahead : )

> Automatically Create New Range Partitions When Needed
> -
>
> Key: KUDU-1994
> URL: https://issues.apache.org/jira/browse/KUDU-1994
> Project: Kudu
>  Issue Type: Improvement
>Affects Versions: 1.3.0
>Reporter: Alan Jackoway
>Assignee: Thomas D'Silva
>Priority: Major
>  Labels: roadmap-candidate
>
> We have a few Kudu tables where we use a range-partitioned timestamp as part 
> of the key. The intention of this is to keep data locality for data that is 
> likely to be scanned together, such as events in a timeseries.
> Currently we create these with a partitions that look like this:
> {noformat}
> RANGE (ts) (
> PARTITION 0 <= VALUES < 142008840,
> PARTITION 142008840 <= VALUES < 142786080,
> PARTITION 142786080 <= VALUES < 143572320,
> PARTITION 143572320 <= VALUES < 144367200,
> PARTITION 144367200 <= VALUES < 145162440,
> PARTITION 145162440 <= VALUES < 145948320,
> PARTITION 145948320 <= VALUES < 146734560,
> PARTITION 146734560 <= VALUES < 147529440,
> PARTITION 147529440 <= VALUES < 148324680,
> PARTITION 148324680 <= VALUES < 149103360,
> PARTITION 149103360 <= VALUES < 149889600,
> PARTITION 149889600 <= VALUES < 150684480
> )
> {noformat}
> The problem is that as time goes on we have to choose to either create empty 
> partitions in advance of when we are writing data or risk forgetting to 
> create a partition and having writes of new data fail.
> Ideally, Kudu would have a way to indicate the size of the partitions (in 
> this example 3 months converted to milliseconds) and then automatically 
> create new partitions when new data comes in that needs the partition.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KUDU-1994) Automatically Create New Range Partitions When Needed

2020-04-22 Thread Thomas D'Silva (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090223#comment-17090223
 ] 

Thomas D'Silva commented on KUDU-1994:
--

 

[~helifu]

This would be a nice feature since currently we need a job that runs 
periodically to create new date partitions.

I was wondering if you still plan on working on this JIRA, if not I am 
interested in picking it up.

> Automatically Create New Range Partitions When Needed
> -
>
> Key: KUDU-1994
> URL: https://issues.apache.org/jira/browse/KUDU-1994
> Project: Kudu
>  Issue Type: Improvement
>Affects Versions: 1.3.0
>Reporter: Alan Jackoway
>Assignee: LiFu He
>Priority: Major
>  Labels: roadmap-candidate
>
> We have a few Kudu tables where we use a range-partitioned timestamp as part 
> of the key. The intention of this is to keep data locality for data that is 
> likely to be scanned together, such as events in a timeseries.
> Currently we create these with a partitions that look like this:
> {noformat}
> RANGE (ts) (
> PARTITION 0 <= VALUES < 142008840,
> PARTITION 142008840 <= VALUES < 142786080,
> PARTITION 142786080 <= VALUES < 143572320,
> PARTITION 143572320 <= VALUES < 144367200,
> PARTITION 144367200 <= VALUES < 145162440,
> PARTITION 145162440 <= VALUES < 145948320,
> PARTITION 145948320 <= VALUES < 146734560,
> PARTITION 146734560 <= VALUES < 147529440,
> PARTITION 147529440 <= VALUES < 148324680,
> PARTITION 148324680 <= VALUES < 149103360,
> PARTITION 149103360 <= VALUES < 149889600,
> PARTITION 149889600 <= VALUES < 150684480
> )
> {noformat}
> The problem is that as time goes on we have to choose to either create empty 
> partitions in advance of when we are writing data or risk forgetting to 
> create a partition and having writes of new data fail.
> Ideally, Kudu would have a way to indicate the size of the partitions (in 
> this example 3 months converted to milliseconds) and then automatically 
> create new partitions when new data comes in that needs the partition.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KUDU-1994) Automatically Create New Range Partitions When Needed

2019-08-28 Thread Grant Henke (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16917854#comment-16917854
 ] 

Grant Henke commented on KUDU-1994:
---

I think the concept of "Interval partitioning" makes sense here and we can get 
inspiration from other databases which already have a similar feature. See 
Oracles interval partitioning for example: 
https://oracle-base.com/articles/11g/partitioning-enhancements-11gr1#interval_partitioning

> Automatically Create New Range Partitions When Needed
> -
>
> Key: KUDU-1994
> URL: https://issues.apache.org/jira/browse/KUDU-1994
> Project: Kudu
>  Issue Type: Improvement
>Affects Versions: 1.3.0
>Reporter: Alan Jackoway
>Assignee: HeLifu
>Priority: Major
>  Labels: roadmap-candidate
>
> We have a few Kudu tables where we use a range-partitioned timestamp as part 
> of the key. The intention of this is to keep data locality for data that is 
> likely to be scanned together, such as events in a timeseries.
> Currently we create these with a partitions that look like this:
> {noformat}
> RANGE (ts) (
> PARTITION 0 <= VALUES < 142008840,
> PARTITION 142008840 <= VALUES < 142786080,
> PARTITION 142786080 <= VALUES < 143572320,
> PARTITION 143572320 <= VALUES < 144367200,
> PARTITION 144367200 <= VALUES < 145162440,
> PARTITION 145162440 <= VALUES < 145948320,
> PARTITION 145948320 <= VALUES < 146734560,
> PARTITION 146734560 <= VALUES < 147529440,
> PARTITION 147529440 <= VALUES < 148324680,
> PARTITION 148324680 <= VALUES < 149103360,
> PARTITION 149103360 <= VALUES < 149889600,
> PARTITION 149889600 <= VALUES < 150684480
> )
> {noformat}
> The problem is that as time goes on we have to choose to either create empty 
> partitions in advance of when we are writing data or risk forgetting to 
> create a partition and having writes of new data fail.
> Ideally, Kudu would have a way to indicate the size of the partitions (in 
> this example 3 months converted to milliseconds) and then automatically 
> create new partitions when new data comes in that needs the partition.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)