[jira] [Updated] (IMPALA-2138) Get rid of unused columns by upstream operators at points of materialization

2019-09-12 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-2138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-2138:
--
Attachment: performance_result.txt

> Get rid of unused columns by upstream operators at points of materialization
> 
>
> Key: IMPALA-2138
> URL: https://issues.apache.org/jira/browse/IMPALA-2138
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 1.4, Impala 2.0, Impala 2.2
>Reporter: Ippokratis Pandis
>Assignee: Tim Armstrong
>Priority: Major
>  Labels: performance
> Attachments: 0001-Projection-prototype.patch, performance_result.txt
>
>
> It would be a very good performance improvement if we were able to get rid of 
> columns as soon as we know that they are not going to be used from any other 
> operators upstream. The amount of data we are handling will reduce making the 
> network and I/O (spilling) transfers more efficient. It will also improve 
> cache performance. 
> The current row-wise in-memory format does not make it very easy to get rid 
> of such unused columns. However, there are points of materialization where we 
> copy-out the tuples and we can actually perform these projections. There are 
> multiple points of materialization, notably:
> * The exchange operator
> * The build side of hash join
> * The probe side of hash join when we have spilling
> * The aggregation
> * Sorts and analytic function evaluation
> In order to do these projections we need to modify the FE and know at each 
> operator what's the minimum set of columns that are being referenced by this 
> operator and all the upstream ones. (That minimum set is very easy to be 
> calculated during an additional top-down traversal of the plan.) We also need 
> to modify the BE and make the copy-out operation aware of such projections.
> Assigning first to Alex, because of the needed FE changes. Happy to take care 
> of the needed BE changes. Perhaps we could split this issue into 2 sub-tasks, 
> the FE and the BE changes.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-2138) Get rid of unused columns by upstream operators at points of materialization

2018-10-23 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-2138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-2138:
--
Priority: Major  (was: Critical)

> Get rid of unused columns by upstream operators at points of materialization
> 
>
> Key: IMPALA-2138
> URL: https://issues.apache.org/jira/browse/IMPALA-2138
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 1.4, Impala 2.0, Impala 2.2
>Reporter: Ippokratis Pandis
>Priority: Major
>  Labels: performance
> Attachments: 0001-Projection-prototype.patch
>
>
> It would be a very good performance improvement if we were able to get rid of 
> columns as soon as we know that they are not going to be used from any other 
> operators upstream. The amount of data we are handling will reduce making the 
> network and I/O (spilling) transfers more efficient. It will also improve 
> cache performance. 
> The current row-wise in-memory format does not make it very easy to get rid 
> of such unused columns. However, there are points of materialization where we 
> copy-out the tuples and we can actually perform these projections. There are 
> multiple points of materialization, notably:
> * The exchange operator
> * The build side of hash join
> * The probe side of hash join when we have spilling
> * The aggregation
> * Sorts and analytic function evaluation
> In order to do these projections we need to modify the FE and know at each 
> operator what's the minimum set of columns that are being referenced by this 
> operator and all the upstream ones. (That minimum set is very easy to be 
> calculated during an additional top-down traversal of the plan.) We also need 
> to modify the BE and make the copy-out operation aware of such projections.
> Assigning first to Alex, because of the needed FE changes. Happy to take care 
> of the needed BE changes. Perhaps we could split this issue into 2 sub-tasks, 
> the FE and the BE changes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-2138) Get rid of unused columns by upstream operators at points of materialization

2018-07-11 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-2138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-2138:
--
Attachment: 0001-Projection-prototype.patch

> Get rid of unused columns by upstream operators at points of materialization
> 
>
> Key: IMPALA-2138
> URL: https://issues.apache.org/jira/browse/IMPALA-2138
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 1.4, Impala 2.0, Impala 2.2
>Reporter: Ippokratis Pandis
>Priority: Critical
>  Labels: performance
> Attachments: 0001-Projection-prototype.patch
>
>
> It would be a very good performance improvement if we were able to get rid of 
> columns as soon as we know that they are not going to be used from any other 
> operators upstream. The amount of data we are handling will reduce making the 
> network and I/O (spilling) transfers more efficient. It will also improve 
> cache performance. 
> The current row-wise in-memory format does not make it very easy to get rid 
> of such unused columns. However, there are points of materialization where we 
> copy-out the tuples and we can actually perform these projections. There are 
> multiple points of materialization, notably:
> * The exchange operator
> * The build side of hash join
> * The probe side of hash join when we have spilling
> * The aggregation
> * Sorts and analytic function evaluation
> In order to do these projections we need to modify the FE and know at each 
> operator what's the minimum set of columns that are being referenced by this 
> operator and all the upstream ones. (That minimum set is very easy to be 
> calculated during an additional top-down traversal of the plan.) We also need 
> to modify the BE and make the copy-out operation aware of such projections.
> Assigning first to Alex, because of the needed FE changes. Happy to take care 
> of the needed BE changes. Perhaps we could split this issue into 2 sub-tasks, 
> the FE and the BE changes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org