[jira] [Assigned] (HIVE-4160) Vectorized Query Execution in Hive

2018-02-25 Thread Mandy Martin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mandy Martin reassigned HIVE-4160:
--

Assignee: Jitendra Nath Pandey  (was: Mandy Martin)

> Vectorized Query Execution in Hive
> --
>
> Key: HIVE-4160
> URL: https://issues.apache.org/jira/browse/HIVE-4160
> Project: Hive
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
>Priority: Major
> Attachments: Hive-Vectorized-Query-Execution-Design-rev10.docx, 
> Hive-Vectorized-Query-Execution-Design-rev10.docx, 
> Hive-Vectorized-Query-Execution-Design-rev10.pdf, 
> Hive-Vectorized-Query-Execution-Design-rev11.docx, 
> Hive-Vectorized-Query-Execution-Design-rev11.pdf, 
> Hive-Vectorized-Query-Execution-Design-rev2.docx, 
> Hive-Vectorized-Query-Execution-Design-rev3.docx, 
> Hive-Vectorized-Query-Execution-Design-rev3.docx, 
> Hive-Vectorized-Query-Execution-Design-rev3.pdf, 
> Hive-Vectorized-Query-Execution-Design-rev4.docx, 
> Hive-Vectorized-Query-Execution-Design-rev4.pdf, 
> Hive-Vectorized-Query-Execution-Design-rev5.docx, 
> Hive-Vectorized-Query-Execution-Design-rev5.pdf, 
> Hive-Vectorized-Query-Execution-Design-rev6.docx, 
> Hive-Vectorized-Query-Execution-Design-rev6.pdf, 
> Hive-Vectorized-Query-Execution-Design-rev7.docx, 
> Hive-Vectorized-Query-Execution-Design-rev8.docx, 
> Hive-Vectorized-Query-Execution-Design-rev8.pdf, 
> Hive-Vectorized-Query-Execution-Design-rev9.docx, 
> Hive-Vectorized-Query-Execution-Design-rev9.pdf, 
> Hive-Vectorized-Query-Execution-Design.docx
>
>
> The Hive query execution engine currently processes one row at a time. A 
> single row of data goes through all the operators before the next row can be 
> processed. This mode of processing is very inefficient in terms of CPU usage. 
> Research has demonstrated that this yields very low instructions per cycle 
> [MonetDB X100]. Also currently Hive heavily relies on lazy deserialization 
> and data columns go through a layer of object inspectors that identify column 
> type, deserialize data and determine appropriate expression routines in the 
> inner loop. These layers of virtual method calls further slow down the 
> processing. 
> This work will add support for vectorized query execution to Hive, where, 
> instead of individual rows, batches of about a thousand rows at a time are 
> processed. Each column in the batch is represented as a vector of a primitive 
> data type. The inner loop of execution scans these vectors very fast, 
> avoiding method calls, deserialization, unnecessary if-then-else, etc. This 
> substantially reduces CPU time used, and gives excellent instructions per 
> cycle (i.e. improved processor pipeline utilization). See the attached design 
> specification for more details.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-4160) Vectorized Query Execution in Hive

2018-02-25 Thread Mandy Martin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mandy Martin reassigned HIVE-4160:
--

Assignee: Mandy Martin  (was: Jitendra Nath Pandey)

> Vectorized Query Execution in Hive
> --
>
> Key: HIVE-4160
> URL: https://issues.apache.org/jira/browse/HIVE-4160
> Project: Hive
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Mandy Martin
>Priority: Major
> Attachments: Hive-Vectorized-Query-Execution-Design-rev10.docx, 
> Hive-Vectorized-Query-Execution-Design-rev10.docx, 
> Hive-Vectorized-Query-Execution-Design-rev10.pdf, 
> Hive-Vectorized-Query-Execution-Design-rev11.docx, 
> Hive-Vectorized-Query-Execution-Design-rev11.pdf, 
> Hive-Vectorized-Query-Execution-Design-rev2.docx, 
> Hive-Vectorized-Query-Execution-Design-rev3.docx, 
> Hive-Vectorized-Query-Execution-Design-rev3.docx, 
> Hive-Vectorized-Query-Execution-Design-rev3.pdf, 
> Hive-Vectorized-Query-Execution-Design-rev4.docx, 
> Hive-Vectorized-Query-Execution-Design-rev4.pdf, 
> Hive-Vectorized-Query-Execution-Design-rev5.docx, 
> Hive-Vectorized-Query-Execution-Design-rev5.pdf, 
> Hive-Vectorized-Query-Execution-Design-rev6.docx, 
> Hive-Vectorized-Query-Execution-Design-rev6.pdf, 
> Hive-Vectorized-Query-Execution-Design-rev7.docx, 
> Hive-Vectorized-Query-Execution-Design-rev8.docx, 
> Hive-Vectorized-Query-Execution-Design-rev8.pdf, 
> Hive-Vectorized-Query-Execution-Design-rev9.docx, 
> Hive-Vectorized-Query-Execution-Design-rev9.pdf, 
> Hive-Vectorized-Query-Execution-Design.docx
>
>
> The Hive query execution engine currently processes one row at a time. A 
> single row of data goes through all the operators before the next row can be 
> processed. This mode of processing is very inefficient in terms of CPU usage. 
> Research has demonstrated that this yields very low instructions per cycle 
> [MonetDB X100]. Also currently Hive heavily relies on lazy deserialization 
> and data columns go through a layer of object inspectors that identify column 
> type, deserialize data and determine appropriate expression routines in the 
> inner loop. These layers of virtual method calls further slow down the 
> processing. 
> This work will add support for vectorized query execution to Hive, where, 
> instead of individual rows, batches of about a thousand rows at a time are 
> processed. Each column in the batch is represented as a vector of a primitive 
> data type. The inner loop of execution scans these vectors very fast, 
> avoiding method calls, deserialization, unnecessary if-then-else, etc. This 
> substantially reduces CPU time used, and gives excellent instructions per 
> cycle (i.e. improved processor pipeline utilization). See the attached design 
> specification for more details.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)