[jira] [Assigned] (KUDU-2844) Avoid copying strings from dictionary or plain-encoded blocks

2020-04-23 Thread Todd Lipcon (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-2844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon reassigned KUDU-2844:
-

Assignee: Todd Lipcon

> Avoid copying strings from dictionary or plain-encoded blocks
> -
>
> Key: KUDU-2844
> URL: https://issues.apache.org/jira/browse/KUDU-2844
> Project: Kudu
>  Issue Type: Improvement
>  Components: cfile, perf
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Major
> Attachments: fg.svg
>
>
> When scanning a plain or dictionary-encoded binary column, we currently loop 
> over each entry and copy the string into the destination RowBlock's arena. In 
> TPCH Q1, the scanner threads use a significant percentage of CPU doing this 
> copying, and it also increases CPU cache footprint which likely decreases 
> performance in downstream operations like predicate evaluation, merging, 
> result serialization, etc.
> Instead of doing this, we could "attach" the dictionary block (with 
> ref-counting) to the RowBlock and refer directly to the dictionary entry from 
> the RowBlock. When the RowBlock eventually is reset, we can drop the 
> reference. This should be safe because we never mutate indirect data in-place.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KUDU-3007) ARM/aarch64 platform support

2020-04-23 Thread liusheng (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-3007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091118#comment-17091118
 ] 

liusheng commented on KUDU-3007:


Thanks a lot for your help [~adar][~awong], Hi [~aserbin] could you please take 
look ?

> ARM/aarch64 platform support
> 
>
> Key: KUDU-3007
> URL: https://issues.apache.org/jira/browse/KUDU-3007
> Project: Kudu
>  Issue Type: Improvement
>Reporter: liusheng
>Priority: Critical
>
> As an import alternative of x86 architecture, Aarch64(ARM) architecture  is 
> currently the dominate architecture in small devices like phone, IOT devices, 
> security cameras, drones etc. And also, there are more and more hadware or 
> cloud vendor start to provide ARM resources, such as AWS, Huawei, Packet, 
> Ampere. etc. Usually, the ARM servers are low cost and more cheap than x86 
> servers, and now more and more ARM servers have comparative performance with 
> x86 servers, and even more efficient in some areas.
> We want to propose to add an Aarch64 CI for KUDU to promote the support for 
> KUDU on Aarch64 platforms. We are willing to provide machines to the current 
> CI system and manpower to mananging the CI and fxing problems that occours.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KUDU-3110) tserver data folder too large

2020-04-23 Thread SeaAndHill (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090633#comment-17090633
 ] 

SeaAndHill commented on KUDU-3110:
--

[~helifu] the -tablet_history_max_age_sec default is 900 seconds , bu kudu 
running more than three days ,and tsserver data folder add 3 G every day ,

> tserver data folder too large
> -
>
> Key: KUDU-3110
> URL: https://issues.apache.org/jira/browse/KUDU-3110
> Project: Kudu
>  Issue Type: Bug
>  Components: tserver
>Affects Versions: 1.7.1
>Reporter: SeaAndHill
>Priority: Critical
> Attachments: kudu use disk.png
>
>
> there is about 100,000 rows in one table , the kudu tserver data directory 
> use 50G 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KUDU-3110) tserver data folder too large

2020-04-23 Thread LiFu He (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090420#comment-17090420
 ] 

LiFu He commented on KUDU-3110:
---

Are there many updates? How about 
'[–tablet_history_max_age_sec|[https://kudu.apache.org/releases/1.7.1/docs/configuration_reference.html#kudu-tserver_tablet_history_max_age_sec]]'
 ? In addition, use this command to check the data info if possible:  
[https://kudu.apache.org/releases/1.7.1/docs/configuration_reference.html#kudu-tserver_tablet_history_max_age_sec]

> tserver data folder too large
> -
>
> Key: KUDU-3110
> URL: https://issues.apache.org/jira/browse/KUDU-3110
> Project: Kudu
>  Issue Type: Bug
>  Components: tserver
>Affects Versions: 1.7.1
>Reporter: SeaAndHill
>Priority: Critical
> Attachments: kudu use disk.png
>
>
> there is about 100,000 rows in one table , the kudu tserver data directory 
> use 50G 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KUDU-1994) Automatically Create New Range Partitions When Needed

2020-04-23 Thread LiFu He (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090400#comment-17090400
 ] 

LiFu He commented on KUDU-1994:
---

Yes, go ahead : )

> Automatically Create New Range Partitions When Needed
> -
>
> Key: KUDU-1994
> URL: https://issues.apache.org/jira/browse/KUDU-1994
> Project: Kudu
>  Issue Type: Improvement
>Affects Versions: 1.3.0
>Reporter: Alan Jackoway
>Assignee: Thomas D'Silva
>Priority: Major
>  Labels: roadmap-candidate
>
> We have a few Kudu tables where we use a range-partitioned timestamp as part 
> of the key. The intention of this is to keep data locality for data that is 
> likely to be scanned together, such as events in a timeseries.
> Currently we create these with a partitions that look like this:
> {noformat}
> RANGE (ts) (
> PARTITION 0 <= VALUES < 142008840,
> PARTITION 142008840 <= VALUES < 142786080,
> PARTITION 142786080 <= VALUES < 143572320,
> PARTITION 143572320 <= VALUES < 144367200,
> PARTITION 144367200 <= VALUES < 145162440,
> PARTITION 145162440 <= VALUES < 145948320,
> PARTITION 145948320 <= VALUES < 146734560,
> PARTITION 146734560 <= VALUES < 147529440,
> PARTITION 147529440 <= VALUES < 148324680,
> PARTITION 148324680 <= VALUES < 149103360,
> PARTITION 149103360 <= VALUES < 149889600,
> PARTITION 149889600 <= VALUES < 150684480
> )
> {noformat}
> The problem is that as time goes on we have to choose to either create empty 
> partitions in advance of when we are writing data or risk forgetting to 
> create a partition and having writes of new data fail.
> Ideally, Kudu would have a way to indicate the size of the partitions (in 
> this example 3 months converted to milliseconds) and then automatically 
> create new partitions when new data comes in that needs the partition.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (KUDU-1994) Automatically Create New Range Partitions When Needed

2020-04-23 Thread LiFu He (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LiFu He reassigned KUDU-1994:
-

Assignee: Thomas D'Silva  (was: LiFu He)

> Automatically Create New Range Partitions When Needed
> -
>
> Key: KUDU-1994
> URL: https://issues.apache.org/jira/browse/KUDU-1994
> Project: Kudu
>  Issue Type: Improvement
>Affects Versions: 1.3.0
>Reporter: Alan Jackoway
>Assignee: Thomas D'Silva
>Priority: Major
>  Labels: roadmap-candidate
>
> We have a few Kudu tables where we use a range-partitioned timestamp as part 
> of the key. The intention of this is to keep data locality for data that is 
> likely to be scanned together, such as events in a timeseries.
> Currently we create these with a partitions that look like this:
> {noformat}
> RANGE (ts) (
> PARTITION 0 <= VALUES < 142008840,
> PARTITION 142008840 <= VALUES < 142786080,
> PARTITION 142786080 <= VALUES < 143572320,
> PARTITION 143572320 <= VALUES < 144367200,
> PARTITION 144367200 <= VALUES < 145162440,
> PARTITION 145162440 <= VALUES < 145948320,
> PARTITION 145948320 <= VALUES < 146734560,
> PARTITION 146734560 <= VALUES < 147529440,
> PARTITION 147529440 <= VALUES < 148324680,
> PARTITION 148324680 <= VALUES < 149103360,
> PARTITION 149103360 <= VALUES < 149889600,
> PARTITION 149889600 <= VALUES < 150684480
> )
> {noformat}
> The problem is that as time goes on we have to choose to either create empty 
> partitions in advance of when we are writing data or risk forgetting to 
> create a partition and having writes of new data fail.
> Ideally, Kudu would have a way to indicate the size of the partitions (in 
> this example 3 months converted to milliseconds) and then automatically 
> create new partitions when new data comes in that needs the partition.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)