[jira] [Assigned] (KUDU-2844) Avoid copying strings from dictionary or plain-encoded blocks
[ https://issues.apache.org/jira/browse/KUDU-2844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon reassigned KUDU-2844: - Assignee: Todd Lipcon > Avoid copying strings from dictionary or plain-encoded blocks > - > > Key: KUDU-2844 > URL: https://issues.apache.org/jira/browse/KUDU-2844 > Project: Kudu > Issue Type: Improvement > Components: cfile, perf >Reporter: Todd Lipcon >Assignee: Todd Lipcon >Priority: Major > Attachments: fg.svg > > > When scanning a plain or dictionary-encoded binary column, we currently loop > over each entry and copy the string into the destination RowBlock's arena. In > TPCH Q1, the scanner threads use a significant percentage of CPU doing this > copying, and it also increases CPU cache footprint which likely decreases > performance in downstream operations like predicate evaluation, merging, > result serialization, etc. > Instead of doing this, we could "attach" the dictionary block (with > ref-counting) to the RowBlock and refer directly to the dictionary entry from > the RowBlock. When the RowBlock eventually is reset, we can drop the > reference. This should be safe because we never mutate indirect data in-place. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KUDU-3007) ARM/aarch64 platform support
[ https://issues.apache.org/jira/browse/KUDU-3007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091118#comment-17091118 ] liusheng commented on KUDU-3007: Thanks a lot for your help [~adar][~awong], Hi [~aserbin] could you please take look ? > ARM/aarch64 platform support > > > Key: KUDU-3007 > URL: https://issues.apache.org/jira/browse/KUDU-3007 > Project: Kudu > Issue Type: Improvement >Reporter: liusheng >Priority: Critical > > As an import alternative of x86 architecture, Aarch64(ARM) architecture is > currently the dominate architecture in small devices like phone, IOT devices, > security cameras, drones etc. And also, there are more and more hadware or > cloud vendor start to provide ARM resources, such as AWS, Huawei, Packet, > Ampere. etc. Usually, the ARM servers are low cost and more cheap than x86 > servers, and now more and more ARM servers have comparative performance with > x86 servers, and even more efficient in some areas. > We want to propose to add an Aarch64 CI for KUDU to promote the support for > KUDU on Aarch64 platforms. We are willing to provide machines to the current > CI system and manpower to mananging the CI and fxing problems that occours. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KUDU-3110) tserver data folder too large
[ https://issues.apache.org/jira/browse/KUDU-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090633#comment-17090633 ] SeaAndHill commented on KUDU-3110: -- [~helifu] the -tablet_history_max_age_sec default is 900 seconds , bu kudu running more than three days ,and tsserver data folder add 3 G every day , > tserver data folder too large > - > > Key: KUDU-3110 > URL: https://issues.apache.org/jira/browse/KUDU-3110 > Project: Kudu > Issue Type: Bug > Components: tserver >Affects Versions: 1.7.1 >Reporter: SeaAndHill >Priority: Critical > Attachments: kudu use disk.png > > > there is about 100,000 rows in one table , the kudu tserver data directory > use 50G -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KUDU-3110) tserver data folder too large
[ https://issues.apache.org/jira/browse/KUDU-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090420#comment-17090420 ] LiFu He commented on KUDU-3110: --- Are there many updates? How about '[–tablet_history_max_age_sec|[https://kudu.apache.org/releases/1.7.1/docs/configuration_reference.html#kudu-tserver_tablet_history_max_age_sec]]' ? In addition, use this command to check the data info if possible: [https://kudu.apache.org/releases/1.7.1/docs/configuration_reference.html#kudu-tserver_tablet_history_max_age_sec] > tserver data folder too large > - > > Key: KUDU-3110 > URL: https://issues.apache.org/jira/browse/KUDU-3110 > Project: Kudu > Issue Type: Bug > Components: tserver >Affects Versions: 1.7.1 >Reporter: SeaAndHill >Priority: Critical > Attachments: kudu use disk.png > > > there is about 100,000 rows in one table , the kudu tserver data directory > use 50G -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KUDU-1994) Automatically Create New Range Partitions When Needed
[ https://issues.apache.org/jira/browse/KUDU-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090400#comment-17090400 ] LiFu He commented on KUDU-1994: --- Yes, go ahead : ) > Automatically Create New Range Partitions When Needed > - > > Key: KUDU-1994 > URL: https://issues.apache.org/jira/browse/KUDU-1994 > Project: Kudu > Issue Type: Improvement >Affects Versions: 1.3.0 >Reporter: Alan Jackoway >Assignee: Thomas D'Silva >Priority: Major > Labels: roadmap-candidate > > We have a few Kudu tables where we use a range-partitioned timestamp as part > of the key. The intention of this is to keep data locality for data that is > likely to be scanned together, such as events in a timeseries. > Currently we create these with a partitions that look like this: > {noformat} > RANGE (ts) ( > PARTITION 0 <= VALUES < 142008840, > PARTITION 142008840 <= VALUES < 142786080, > PARTITION 142786080 <= VALUES < 143572320, > PARTITION 143572320 <= VALUES < 144367200, > PARTITION 144367200 <= VALUES < 145162440, > PARTITION 145162440 <= VALUES < 145948320, > PARTITION 145948320 <= VALUES < 146734560, > PARTITION 146734560 <= VALUES < 147529440, > PARTITION 147529440 <= VALUES < 148324680, > PARTITION 148324680 <= VALUES < 149103360, > PARTITION 149103360 <= VALUES < 149889600, > PARTITION 149889600 <= VALUES < 150684480 > ) > {noformat} > The problem is that as time goes on we have to choose to either create empty > partitions in advance of when we are writing data or risk forgetting to > create a partition and having writes of new data fail. > Ideally, Kudu would have a way to indicate the size of the partitions (in > this example 3 months converted to milliseconds) and then automatically > create new partitions when new data comes in that needs the partition. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (KUDU-1994) Automatically Create New Range Partitions When Needed
[ https://issues.apache.org/jira/browse/KUDU-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] LiFu He reassigned KUDU-1994: - Assignee: Thomas D'Silva (was: LiFu He) > Automatically Create New Range Partitions When Needed > - > > Key: KUDU-1994 > URL: https://issues.apache.org/jira/browse/KUDU-1994 > Project: Kudu > Issue Type: Improvement >Affects Versions: 1.3.0 >Reporter: Alan Jackoway >Assignee: Thomas D'Silva >Priority: Major > Labels: roadmap-candidate > > We have a few Kudu tables where we use a range-partitioned timestamp as part > of the key. The intention of this is to keep data locality for data that is > likely to be scanned together, such as events in a timeseries. > Currently we create these with a partitions that look like this: > {noformat} > RANGE (ts) ( > PARTITION 0 <= VALUES < 142008840, > PARTITION 142008840 <= VALUES < 142786080, > PARTITION 142786080 <= VALUES < 143572320, > PARTITION 143572320 <= VALUES < 144367200, > PARTITION 144367200 <= VALUES < 145162440, > PARTITION 145162440 <= VALUES < 145948320, > PARTITION 145948320 <= VALUES < 146734560, > PARTITION 146734560 <= VALUES < 147529440, > PARTITION 147529440 <= VALUES < 148324680, > PARTITION 148324680 <= VALUES < 149103360, > PARTITION 149103360 <= VALUES < 149889600, > PARTITION 149889600 <= VALUES < 150684480 > ) > {noformat} > The problem is that as time goes on we have to choose to either create empty > partitions in advance of when we are writing data or risk forgetting to > create a partition and having writes of new data fail. > Ideally, Kudu would have a way to indicate the size of the partitions (in > this example 3 months converted to milliseconds) and then automatically > create new partitions when new data comes in that needs the partition. -- This message was sent by Atlassian Jira (v8.3.4#803005)