[jira] [Commented] (KUDU-1994) Automatically Create New Range Partitions When Needed
[ https://issues.apache.org/jira/browse/KUDU-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17559203#comment-17559203 ] Thomas D'Silva commented on KUDU-1994: -- I have not been able to work on this, feel free to assign this to yourself if you want to work on it. > Automatically Create New Range Partitions When Needed > - > > Key: KUDU-1994 > URL: https://issues.apache.org/jira/browse/KUDU-1994 > Project: Kudu > Issue Type: Improvement >Affects Versions: 1.3.0 >Reporter: Alan Jackoway >Assignee: Thomas D'Silva >Priority: Major > Labels: roadmap-candidate > > We have a few Kudu tables where we use a range-partitioned timestamp as part > of the key. The intention of this is to keep data locality for data that is > likely to be scanned together, such as events in a timeseries. > Currently we create these with a partitions that look like this: > {noformat} > RANGE (ts) ( > PARTITION 0 <= VALUES < 142008840, > PARTITION 142008840 <= VALUES < 142786080, > PARTITION 142786080 <= VALUES < 143572320, > PARTITION 143572320 <= VALUES < 144367200, > PARTITION 144367200 <= VALUES < 145162440, > PARTITION 145162440 <= VALUES < 145948320, > PARTITION 145948320 <= VALUES < 146734560, > PARTITION 146734560 <= VALUES < 147529440, > PARTITION 147529440 <= VALUES < 148324680, > PARTITION 148324680 <= VALUES < 149103360, > PARTITION 149103360 <= VALUES < 149889600, > PARTITION 149889600 <= VALUES < 150684480 > ) > {noformat} > The problem is that as time goes on we have to choose to either create empty > partitions in advance of when we are writing data or risk forgetting to > create a partition and having writes of new data fail. > Ideally, Kudu would have a way to indicate the size of the partitions (in > this example 3 months converted to milliseconds) and then automatically > create new partitions when new data comes in that needs the partition. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (KUDU-1994) Automatically Create New Range Partitions When Needed
[ https://issues.apache.org/jira/browse/KUDU-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17559153#comment-17559153 ] Xixu Wang commented on KUDU-1994: - How is this feature going? > Automatically Create New Range Partitions When Needed > - > > Key: KUDU-1994 > URL: https://issues.apache.org/jira/browse/KUDU-1994 > Project: Kudu > Issue Type: Improvement >Affects Versions: 1.3.0 >Reporter: Alan Jackoway >Assignee: Thomas D'Silva >Priority: Major > Labels: roadmap-candidate > > We have a few Kudu tables where we use a range-partitioned timestamp as part > of the key. The intention of this is to keep data locality for data that is > likely to be scanned together, such as events in a timeseries. > Currently we create these with a partitions that look like this: > {noformat} > RANGE (ts) ( > PARTITION 0 <= VALUES < 142008840, > PARTITION 142008840 <= VALUES < 142786080, > PARTITION 142786080 <= VALUES < 143572320, > PARTITION 143572320 <= VALUES < 144367200, > PARTITION 144367200 <= VALUES < 145162440, > PARTITION 145162440 <= VALUES < 145948320, > PARTITION 145948320 <= VALUES < 146734560, > PARTITION 146734560 <= VALUES < 147529440, > PARTITION 147529440 <= VALUES < 148324680, > PARTITION 148324680 <= VALUES < 149103360, > PARTITION 149103360 <= VALUES < 149889600, > PARTITION 149889600 <= VALUES < 150684480 > ) > {noformat} > The problem is that as time goes on we have to choose to either create empty > partitions in advance of when we are writing data or risk forgetting to > create a partition and having writes of new data fail. > Ideally, Kudu would have a way to indicate the size of the partitions (in > this example 3 months converted to milliseconds) and then automatically > create new partitions when new data comes in that needs the partition. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (KUDU-1994) Automatically Create New Range Partitions When Needed
[ https://issues.apache.org/jira/browse/KUDU-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090400#comment-17090400 ] LiFu He commented on KUDU-1994: --- Yes, go ahead : ) > Automatically Create New Range Partitions When Needed > - > > Key: KUDU-1994 > URL: https://issues.apache.org/jira/browse/KUDU-1994 > Project: Kudu > Issue Type: Improvement >Affects Versions: 1.3.0 >Reporter: Alan Jackoway >Assignee: Thomas D'Silva >Priority: Major > Labels: roadmap-candidate > > We have a few Kudu tables where we use a range-partitioned timestamp as part > of the key. The intention of this is to keep data locality for data that is > likely to be scanned together, such as events in a timeseries. > Currently we create these with a partitions that look like this: > {noformat} > RANGE (ts) ( > PARTITION 0 <= VALUES < 142008840, > PARTITION 142008840 <= VALUES < 142786080, > PARTITION 142786080 <= VALUES < 143572320, > PARTITION 143572320 <= VALUES < 144367200, > PARTITION 144367200 <= VALUES < 145162440, > PARTITION 145162440 <= VALUES < 145948320, > PARTITION 145948320 <= VALUES < 146734560, > PARTITION 146734560 <= VALUES < 147529440, > PARTITION 147529440 <= VALUES < 148324680, > PARTITION 148324680 <= VALUES < 149103360, > PARTITION 149103360 <= VALUES < 149889600, > PARTITION 149889600 <= VALUES < 150684480 > ) > {noformat} > The problem is that as time goes on we have to choose to either create empty > partitions in advance of when we are writing data or risk forgetting to > create a partition and having writes of new data fail. > Ideally, Kudu would have a way to indicate the size of the partitions (in > this example 3 months converted to milliseconds) and then automatically > create new partitions when new data comes in that needs the partition. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KUDU-1994) Automatically Create New Range Partitions When Needed
[ https://issues.apache.org/jira/browse/KUDU-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090223#comment-17090223 ] Thomas D'Silva commented on KUDU-1994: -- [~helifu] This would be a nice feature since currently we need a job that runs periodically to create new date partitions. I was wondering if you still plan on working on this JIRA, if not I am interested in picking it up. > Automatically Create New Range Partitions When Needed > - > > Key: KUDU-1994 > URL: https://issues.apache.org/jira/browse/KUDU-1994 > Project: Kudu > Issue Type: Improvement >Affects Versions: 1.3.0 >Reporter: Alan Jackoway >Assignee: LiFu He >Priority: Major > Labels: roadmap-candidate > > We have a few Kudu tables where we use a range-partitioned timestamp as part > of the key. The intention of this is to keep data locality for data that is > likely to be scanned together, such as events in a timeseries. > Currently we create these with a partitions that look like this: > {noformat} > RANGE (ts) ( > PARTITION 0 <= VALUES < 142008840, > PARTITION 142008840 <= VALUES < 142786080, > PARTITION 142786080 <= VALUES < 143572320, > PARTITION 143572320 <= VALUES < 144367200, > PARTITION 144367200 <= VALUES < 145162440, > PARTITION 145162440 <= VALUES < 145948320, > PARTITION 145948320 <= VALUES < 146734560, > PARTITION 146734560 <= VALUES < 147529440, > PARTITION 147529440 <= VALUES < 148324680, > PARTITION 148324680 <= VALUES < 149103360, > PARTITION 149103360 <= VALUES < 149889600, > PARTITION 149889600 <= VALUES < 150684480 > ) > {noformat} > The problem is that as time goes on we have to choose to either create empty > partitions in advance of when we are writing data or risk forgetting to > create a partition and having writes of new data fail. > Ideally, Kudu would have a way to indicate the size of the partitions (in > this example 3 months converted to milliseconds) and then automatically > create new partitions when new data comes in that needs the partition. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KUDU-1994) Automatically Create New Range Partitions When Needed
[ https://issues.apache.org/jira/browse/KUDU-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16917854#comment-16917854 ] Grant Henke commented on KUDU-1994: --- I think the concept of "Interval partitioning" makes sense here and we can get inspiration from other databases which already have a similar feature. See Oracles interval partitioning for example: https://oracle-base.com/articles/11g/partitioning-enhancements-11gr1#interval_partitioning > Automatically Create New Range Partitions When Needed > - > > Key: KUDU-1994 > URL: https://issues.apache.org/jira/browse/KUDU-1994 > Project: Kudu > Issue Type: Improvement >Affects Versions: 1.3.0 >Reporter: Alan Jackoway >Assignee: HeLifu >Priority: Major > Labels: roadmap-candidate > > We have a few Kudu tables where we use a range-partitioned timestamp as part > of the key. The intention of this is to keep data locality for data that is > likely to be scanned together, such as events in a timeseries. > Currently we create these with a partitions that look like this: > {noformat} > RANGE (ts) ( > PARTITION 0 <= VALUES < 142008840, > PARTITION 142008840 <= VALUES < 142786080, > PARTITION 142786080 <= VALUES < 143572320, > PARTITION 143572320 <= VALUES < 144367200, > PARTITION 144367200 <= VALUES < 145162440, > PARTITION 145162440 <= VALUES < 145948320, > PARTITION 145948320 <= VALUES < 146734560, > PARTITION 146734560 <= VALUES < 147529440, > PARTITION 147529440 <= VALUES < 148324680, > PARTITION 148324680 <= VALUES < 149103360, > PARTITION 149103360 <= VALUES < 149889600, > PARTITION 149889600 <= VALUES < 150684480 > ) > {noformat} > The problem is that as time goes on we have to choose to either create empty > partitions in advance of when we are writing data or risk forgetting to > create a partition and having writes of new data fail. > Ideally, Kudu would have a way to indicate the size of the partitions (in > this example 3 months converted to milliseconds) and then automatically > create new partitions when new data comes in that needs the partition. -- This message was sent by Atlassian Jira (v8.3.2#803003)