[jira] [Updated] (HIVE-24580) Add support for combiner in hash mode group aggregation (Support for distinct)

2021-01-03 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-24580:
---
Labels:   (was: pull-request-available)

> Add support for combiner in hash mode group aggregation (Support for distinct)
> --
>
> Key: HIVE-24580
> URL: https://issues.apache.org/jira/browse/HIVE-24580
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>
> For distinct the number of  aggregation function does not match with the 
> number of value column and this needs special handling in the combiner logic.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24580) Add support for combiner in hash mode group aggregation (Support for distinct)

2021-01-03 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-24580:
---
Description: For distinct the number of  aggregation function does not 
match with the number of value column and this needs special handling in the 
combiner logic.  (was: In map side group aggregation, partial grouped 
aggregation is calculated to reduce the data written to disk by map task. In 
case of hash aggregation, where the input data is not sorted, hash table is 
used (with sorting also being performed before flushing). If the hash table 
size increases beyond configurable limit, data is flushed to disk and new hash 
table is generated. If the reduction by hash table is less than min hash 
aggregation reduction calculated during compile time, the map side aggregation 
is converted to streaming mode. So if the first few batch of records does not 
result into significant reduction, then the mode is switched to streaming mode. 
This may have impact on performance, if the subsequent batch of records have 
less number of distinct values. 

To improve performance both in Hash and Streaming mode, a combiner can be added 
to the map task after the keys are sorted. This will make sure that the 
aggregation is done if possible and reduce the data written to disk.)

> Add support for combiner in hash mode group aggregation (Support for distinct)
> --
>
> Key: HIVE-24580
> URL: https://issues.apache.org/jira/browse/HIVE-24580
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
>
> For distinct the number of  aggregation function does not match with the 
> number of value column and this needs special handling in the combiner logic.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24580) Add support for combiner in hash mode group aggregation (Support for distinct)

2021-01-03 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-24580:
---
Parent: HIVE-24471
Issue Type: Sub-task  (was: Bug)

> Add support for combiner in hash mode group aggregation (Support for distinct)
> --
>
> Key: HIVE-24580
> URL: https://issues.apache.org/jira/browse/HIVE-24580
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
>
> In map side group aggregation, partial grouped aggregation is calculated to 
> reduce the data written to disk by map task. In case of hash aggregation, 
> where the input data is not sorted, hash table is used (with sorting also 
> being performed before flushing). If the hash table size increases beyond 
> configurable limit, data is flushed to disk and new hash table is generated. 
> If the reduction by hash table is less than min hash aggregation reduction 
> calculated during compile time, the map side aggregation is converted to 
> streaming mode. So if the first few batch of records does not result into 
> significant reduction, then the mode is switched to streaming mode. This may 
> have impact on performance, if the subsequent batch of records have less 
> number of distinct values. 
> To improve performance both in Hash and Streaming mode, a combiner can be 
> added to the map task after the keys are sorted. This will make sure that the 
> aggregation is done if possible and reduce the data written to disk.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)