[jira] [Updated] (HIVE-22731) Probe MapJoin hashtables for row level filtering

2020-01-22 Thread Panagiotis Garefalakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Panagiotis Garefalakis updated HIVE-22731:
--
Status: Patch Available  (was: In Progress)

> Probe MapJoin hashtables for row level filtering
> 
>
> Key: HIVE-22731
> URL: https://issues.apache.org/jira/browse/HIVE-22731
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive, llap
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22731.1.patch, HIVE-22731.2.patch, 
> HIVE-22731.WIP.patch, decode_time_bars.pdf
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, RecordReaders such as ORC support filtering at coarser-grained 
> levels, namely: File, Stripe (64 to 256mb), and Row group (10k row) level. 
> They only filter sets of rows if they can guarantee that none of the rows can 
> pass a filter (usually given as searchable argument).
> However, a significant amount of time can be spend decoding rows with 
> multiple columns that are not even used in the final result. See figure where 
> original is what happens today and in LazyDecode we skip decoding rows that 
> do not match the key.
> To enable a more fine-grained filtering in the particular case of a MapJoin 
> we could utilize the key HashTable created from the smaller table to skip 
> deserializing row columns at the larger table that do not match any key and 
> thus save CPU time. 
> This Jira investigates this direction. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22731) Probe MapJoin hashtables for row level filtering

2020-01-22 Thread Panagiotis Garefalakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Panagiotis Garefalakis updated HIVE-22731:
--
Status: In Progress  (was: Patch Available)

> Probe MapJoin hashtables for row level filtering
> 
>
> Key: HIVE-22731
> URL: https://issues.apache.org/jira/browse/HIVE-22731
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive, llap
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22731.1.patch, HIVE-22731.2.patch, 
> HIVE-22731.WIP.patch, decode_time_bars.pdf
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, RecordReaders such as ORC support filtering at coarser-grained 
> levels, namely: File, Stripe (64 to 256mb), and Row group (10k row) level. 
> They only filter sets of rows if they can guarantee that none of the rows can 
> pass a filter (usually given as searchable argument).
> However, a significant amount of time can be spend decoding rows with 
> multiple columns that are not even used in the final result. See figure where 
> original is what happens today and in LazyDecode we skip decoding rows that 
> do not match the key.
> To enable a more fine-grained filtering in the particular case of a MapJoin 
> we could utilize the key HashTable created from the smaller table to skip 
> deserializing row columns at the larger table that do not match any key and 
> thus save CPU time. 
> This Jira investigates this direction. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22731) Probe MapJoin hashtables for row level filtering

2020-01-22 Thread Panagiotis Garefalakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Panagiotis Garefalakis updated HIVE-22731:
--
Attachment: HIVE-22731.2.patch

> Probe MapJoin hashtables for row level filtering
> 
>
> Key: HIVE-22731
> URL: https://issues.apache.org/jira/browse/HIVE-22731
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive, llap
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22731.1.patch, HIVE-22731.2.patch, 
> HIVE-22731.WIP.patch, decode_time_bars.pdf
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, RecordReaders such as ORC support filtering at coarser-grained 
> levels, namely: File, Stripe (64 to 256mb), and Row group (10k row) level. 
> They only filter sets of rows if they can guarantee that none of the rows can 
> pass a filter (usually given as searchable argument).
> However, a significant amount of time can be spend decoding rows with 
> multiple columns that are not even used in the final result. See figure where 
> original is what happens today and in LazyDecode we skip decoding rows that 
> do not match the key.
> To enable a more fine-grained filtering in the particular case of a MapJoin 
> we could utilize the key HashTable created from the smaller table to skip 
> deserializing row columns at the larger table that do not match any key and 
> thus save CPU time. 
> This Jira investigates this direction. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22731) Probe MapJoin hashtables for row level filtering

2020-01-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-22731:
--
Labels: pull-request-available  (was: )

> Probe MapJoin hashtables for row level filtering
> 
>
> Key: HIVE-22731
> URL: https://issues.apache.org/jira/browse/HIVE-22731
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive, llap
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22731.1.patch, HIVE-22731.WIP.patch, 
> decode_time_bars.pdf
>
>
> Currently, RecordReaders such as ORC support filtering at coarser-grained 
> levels, namely: File, Stripe (64 to 256mb), and Row group (10k row) level. 
> They only filter sets of rows if they can guarantee that none of the rows can 
> pass a filter (usually given as searchable argument).
> However, a significant amount of time can be spend decoding rows with 
> multiple columns that are not even used in the final result. See figure where 
> original is what happens today and in LazyDecode we skip decoding rows that 
> do not match the key.
> To enable a more fine-grained filtering in the particular case of a MapJoin 
> we could utilize the key HashTable created from the smaller table to skip 
> deserializing row columns at the larger table that do not match any key and 
> thus save CPU time. 
> This Jira investigates this direction. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22731) Probe MapJoin hashtables for row level filtering

2020-01-20 Thread Panagiotis Garefalakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Panagiotis Garefalakis updated HIVE-22731:
--
Issue Type: Improvement  (was: Bug)

> Probe MapJoin hashtables for row level filtering
> 
>
> Key: HIVE-22731
> URL: https://issues.apache.org/jira/browse/HIVE-22731
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive, llap
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
> Attachments: HIVE-22731.1.patch, HIVE-22731.WIP.patch, 
> decode_time_bars.pdf
>
>
> Currently, RecordReaders such as ORC support filtering at coarser-grained 
> levels, namely: File, Stripe (64 to 256mb), and Row group (10k row) level. 
> They only filter sets of rows if they can guarantee that none of the rows can 
> pass a filter (usually given as searchable argument).
> However, a significant amount of time can be spend decoding rows with 
> multiple columns that are not even used in the final result. See figure where 
> original is what happens today and in LazyDecode we skip decoding rows that 
> do not match the key.
> To enable a more fine-grained filtering in the particular case of a MapJoin 
> we could utilize the key HashTable created from the smaller table to skip 
> deserializing row columns at the larger table that do not match any key and 
> thus save CPU time. 
> This Jira investigates this direction. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22731) Probe MapJoin hashtables for row level filtering

2020-01-20 Thread Panagiotis Garefalakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Panagiotis Garefalakis updated HIVE-22731:
--
Attachment: HIVE-22731.1.patch

> Probe MapJoin hashtables for row level filtering
> 
>
> Key: HIVE-22731
> URL: https://issues.apache.org/jira/browse/HIVE-22731
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, llap
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
> Attachments: HIVE-22731.1.patch, HIVE-22731.WIP.patch, 
> decode_time_bars.pdf
>
>
> Currently, RecordReaders such as ORC support filtering at coarser-grained 
> levels, namely: File, Stripe (64 to 256mb), and Row group (10k row) level. 
> They only filter sets of rows if they can guarantee that none of the rows can 
> pass a filter (usually given as searchable argument).
> However, a significant amount of time can be spend decoding rows with 
> multiple columns that are not even used in the final result. See figure where 
> original is what happens today and in LazyDecode we skip decoding rows that 
> do not match the key.
> To enable a more fine-grained filtering in the particular case of a MapJoin 
> we could utilize the key HashTable created from the smaller table to skip 
> deserializing row columns at the larger table that do not match any key and 
> thus save CPU time. 
> This Jira investigates this direction. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22731) Probe MapJoin hashtables for row level filtering

2020-01-20 Thread Panagiotis Garefalakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Panagiotis Garefalakis updated HIVE-22731:
--
Status: Patch Available  (was: In Progress)

> Probe MapJoin hashtables for row level filtering
> 
>
> Key: HIVE-22731
> URL: https://issues.apache.org/jira/browse/HIVE-22731
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, llap
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
> Attachments: HIVE-22731.1.patch, HIVE-22731.WIP.patch, 
> decode_time_bars.pdf
>
>
> Currently, RecordReaders such as ORC support filtering at coarser-grained 
> levels, namely: File, Stripe (64 to 256mb), and Row group (10k row) level. 
> They only filter sets of rows if they can guarantee that none of the rows can 
> pass a filter (usually given as searchable argument).
> However, a significant amount of time can be spend decoding rows with 
> multiple columns that are not even used in the final result. See figure where 
> original is what happens today and in LazyDecode we skip decoding rows that 
> do not match the key.
> To enable a more fine-grained filtering in the particular case of a MapJoin 
> we could utilize the key HashTable created from the smaller table to skip 
> deserializing row columns at the larger table that do not match any key and 
> thus save CPU time. 
> This Jira investigates this direction. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22731) Probe MapJoin hashtables for row level filtering

2020-01-17 Thread Panagiotis Garefalakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Panagiotis Garefalakis updated HIVE-22731:
--
Attachment: HIVE-22731.WIP.patch

> Probe MapJoin hashtables for row level filtering
> 
>
> Key: HIVE-22731
> URL: https://issues.apache.org/jira/browse/HIVE-22731
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, llap
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
> Attachments: HIVE-22731.WIP.patch, decode_time_bars.pdf
>
>
> Currently, RecordReaders such as ORC support filtering at coarser-grained 
> levels, namely: File, Stripe (64 to 256mb), and Row group (10k row) level. 
> They only filter sets of rows if they can guarantee that none of the rows can 
> pass a filter (usually given as searchable argument).
> However, a significant amount of time can be spend decoding rows with 
> multiple columns that are not even used in the final result. See figure where 
> original is what happens today and in LazyDecode we skip decoding rows that 
> do not match the key.
> To enable a more fine-grained filtering in the particular case of a MapJoin 
> we could utilize the key HashTable created from the smaller table to skip 
> deserializing row columns at the larger table that do not match any key and 
> thus save CPU time. 
> This Jira investigates this direction. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22731) Probe MapJoin hashtables for row level filtering

2020-01-15 Thread Gopal Vijayaraghavan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal Vijayaraghavan updated HIVE-22731:

Description: 
Currently, RecordReaders such as ORC support filtering at coarser-grained 
levels, namely: File, Stripe (64 to 256mb), and Row group (10k row) level. They 
only filter sets of rows if they can guarantee that none of the rows can pass a 
filter (usually given as searchable argument).

However, a significant amount of time can be spend decoding rows with multiple 
columns that are not even used in the final result. See figure where original 
is what happens today and in LazyDecode we skip decoding rows that do not match 
the key.

To enable a more fine-grained filtering in the particular case of a MapJoin we 
could utilize the key HashTable created from the smaller table to skip 
deserializing row columns at the larger table that do not match any key and 
thus save CPU time. 
This Jira investigates this direction. 

  was:
Currently, RecordReaders such as ORC support filtering at coarser-grained 
levels, namely: File, Stripe (64 to 256mb), and Row group (10k row) level. They 
only filter sets of rows if they can guarantee that none of the rows can pass a 
filter (usually given as searchable argument).

However, a significant amount of time can be spend deconding rows with multiple 
columns that are not even used in the final result. See figure where original 
is what happens today and in LazyDecode we skip decoding rows that do not much 
the key.

To enable a more fine-grained filtering in the particular case of a MapJoin we 
could utilize the key HashTable created from the smaller table to skip 
deserializing row columns at the larger table that do not match any key and 
thus save CPU time. 
This Jira investigates this direction. 


> Probe MapJoin hashtables for row level filtering
> 
>
> Key: HIVE-22731
> URL: https://issues.apache.org/jira/browse/HIVE-22731
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, llap
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
> Attachments: decode_time_bars.pdf
>
>
> Currently, RecordReaders such as ORC support filtering at coarser-grained 
> levels, namely: File, Stripe (64 to 256mb), and Row group (10k row) level. 
> They only filter sets of rows if they can guarantee that none of the rows can 
> pass a filter (usually given as searchable argument).
> However, a significant amount of time can be spend decoding rows with 
> multiple columns that are not even used in the final result. See figure where 
> original is what happens today and in LazyDecode we skip decoding rows that 
> do not match the key.
> To enable a more fine-grained filtering in the particular case of a MapJoin 
> we could utilize the key HashTable created from the smaller table to skip 
> deserializing row columns at the larger table that do not match any key and 
> thus save CPU time. 
> This Jira investigates this direction. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22731) Probe MapJoin hashtables for row level filtering

2020-01-15 Thread Panagiotis Garefalakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Panagiotis Garefalakis updated HIVE-22731:
--
Summary: Probe MapJoin hashtables for row level filtering  (was: Use 
MapJoin hashtables for row level filtering)

> Probe MapJoin hashtables for row level filtering
> 
>
> Key: HIVE-22731
> URL: https://issues.apache.org/jira/browse/HIVE-22731
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, llap
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
> Attachments: decode_time_bars.pdf
>
>
> Currently, RecordReaders such as ORC support filtering at coarser-grained 
> levels, namely: File, Stripe (64 to 256mb), and Row group (10k row) level. 
> They only filter sets of rows if they can guarantee that none of the rows can 
> pass a filter (usually given as searchable argument).
> However, a significant amount of time can be spend deconding rows with 
> multiple columns that are not even used in the final result. See figure where 
> original is what happens today and in LazyDecode we skip decoding rows that 
> do not much the key.
> To enable a more fine-grained filtering in the particular case of a MapJoin 
> we could utilize the key HashTable created from the smaller table to skip 
> deserializing row columns at the larger table that do not match any key and 
> thus save CPU time. 
> This Jira investigates this direction. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)