[jira] [Resolved] (HIVE-18779) Hive does not enable ppd to underlying storage format by default

2018-02-23 Thread Keith Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Keith Sun resolved HIVE-18779.
--
Resolution: Not A Problem

sorry , i did not notice the note in the hive configuration wiki :

Note: Turn on 
*[hive.optimize.index.filter|https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.optimize.index.filter]*
 as well to use file format specific indexes with PPD.

> Hive does not enable ppd to underlying storage format by default
> 
>
> Key: HIVE-18779
> URL: https://issues.apache.org/jira/browse/HIVE-18779
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer, Query Planning
>Affects Versions: 1.2.1, 1.2.2, 2.3.2
> Environment: Hive 1.2.1 and also checked the latest version , it 
> still have this issue.
>Reporter: Keith Sun
>Priority: Major
> Attachments: image-2018-02-22-20-29-41-589.png
>
>
> *Issue :* Hive does not enable ppd to underlying  storage format by default 
> even with hive.optimize.ppd/storage=true and the inputFormat is applicable 
> for fitler push down.
> *How to re-produce :*
>  
> {code:java}
> CREATE TABLE MYDUAL (ID INT) stored as parquet;
> insert overwrite table mydual ...
> set hive.optimize.ppd=true
> set hive.optimize.ppd.storage=true 
> explain select * from mydual where id =100;
> //No filterExpr generated which will be utilized by Parquet InputFormat
> STAGE PLANS:
> Stage: Stage-0
> Fetch Operator
> limit: -1
> Processor Tree:
> TableScan
> alias: mydual
> Statistics: Num rows: 362 Data size: 362 Basic stats: COMPLETE Column stats: 
> NONE
> Filter Operator
> predicate: (id = 100) (type: boolean)
> Statistics: Num rows: 181 Data size: 181 Basic stats: COMPLETE Column stats: 
> NONE
> Select Operator
> expressions: 100 (type: int)
> //set hive.optimize.index.filter=true which is false by default.
> //then we get the filterExpr which can be pushed down to parquet.
> STAGE PLANS: Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: 
> TableScan alias: mydual filterExpr: (id = 100) (type: boolean) Statistics: 
> Num rows: 362 Data size: 362 Basic stats: COMPLETE Column stats: NONE Filter 
> Operator predicate: (id = 100) (type: boolean) Statistics: Num rows: 181 Data 
> size: 181 Basic stats: COMPLETE Column stats: NONE Select Operator 
> expressions: 100 (type: int) outputColumnNames: _col0 Statistics: Num rows: 
> 181 Data size: 181 Basic stats: COMPLETE Column stats: NONE ListSink 
> {code}
> By checking the code of org.apache.hadoop.hive.ql.ppd.OpProcFactory:
> I just found that to generate the filterExpr in the plan, we have to set :
> hive.optimize.index.filter=true as a workaround, but this parameter is not 
> related to parquet input format as we does not have index at all.
>  
> {code:java}
> private static ExprNodeGenericFuncDesc pushFilterToStorageHandler(
>   TableScanOperator tableScanOp,
>   ExprNodeGenericFuncDesc originalPredicate,
>   OpWalkerInfo owi,
>   HiveConf hiveConf) {
>   TableScanDesc tableScanDesc = tableScanOp.getConf();
>   Table tbl = tableScanDesc.getTableMetadata();
>   if (HiveConf.getBoolVar(hiveConf, HiveConf.ConfVars.HIVEOPTINDEXFILTER)) {
> // attach the original predicate to the table scan operator for index
> // optimizations that require the pushed predicate before pcr & later
> // optimizations are applied
> tableScanDesc.setFilterExpr(originalPredicate);
>   }
>   if (!tbl.isNonNative()) {
> return originalPredicate;
>   }
>   HiveStorageHandler storageHandler = tbl.getStorageHandler();
>   if (!(storageHandler instanceof HiveStoragePredicateHandler)) {
> // The storage handler does not provide predicate decomposition
> // support, so we'll implement the entire filter in Hive.  However,
> // we still provide the full predicate to the storage handler in
> // case it wants to do any of its own prefiltering.
> tableScanDesc.setFilterExpr(originalPredicate);
> return originalPredicate;
>   }
>   HiveStoragePredicateHandler predicateHandler =
> (HiveStoragePredicateHandler) storageHandler;
>   JobConf jobConf = new JobConf(owi.getParseContext().getConf());
>   Utilities.setColumnNameList(jobConf, tableScanOp);
>   Utilities.setColumnTypeList(jobConf, tableScanOp);
>   Utilities.copyTableJobPropertiesToConf(
> Utilities.getTableDesc(tbl),
> jobConf);
> {code}
> Per my check , the "getFilterExpr" method of TableScanDesc is called below 
> places and 
> If hive always set the filterExpr, it may not cause trouble (chime in if i am 
> wrong).
> !image-2018-02-22-20-29-41-589.png!
> I could propose a pull request then.
> !image-2018-02-22-20-23-07-732.png!
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18779) Hive does not enable ppd to underlying storage format by default

2018-02-22 Thread Keith Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Keith Sun updated HIVE-18779:
-
Attachment: image-2018-02-22-20-29-41-589.png

> Hive does not enable ppd to underlying storage format by default
> 
>
> Key: HIVE-18779
> URL: https://issues.apache.org/jira/browse/HIVE-18779
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer, Query Planning
>Affects Versions: 1.2.1, 1.2.2, 2.3.2
> Environment: Hive 1.2.1 and also checked the latest version , it 
> still have this issue.
>Reporter: Keith Sun
>Priority: Major
> Attachments: image-2018-02-22-20-29-41-589.png
>
>
> *Issue :* Hive does not enable ppd to underlying  storage format by default 
> even with hive.optimize.ppd/storage=true and the inputFormat is applicable 
> for fitler push down.
> *How to re-produce :*
>  
> {code:java}
> CREATE TABLE MYDUAL (ID INT) stored as parquet;
> insert overwrite table mydual ...
> set hive.optimize.ppd=true
> set hive.optimize.ppd.storage=true 
> explain select * from mydual where id =100;
> //No filterExpr generated which will be utilized by Parquet InputFormat
> STAGE PLANS:
> Stage: Stage-0
> Fetch Operator
> limit: -1
> Processor Tree:
> TableScan
> alias: mydual
> Statistics: Num rows: 362 Data size: 362 Basic stats: COMPLETE Column stats: 
> NONE
> Filter Operator
> predicate: (id = 100) (type: boolean)
> Statistics: Num rows: 181 Data size: 181 Basic stats: COMPLETE Column stats: 
> NONE
> Select Operator
> expressions: 100 (type: int)
> //set hive.optimize.index.filter=true which is false by default.
> //then we get the filterExpr which can be pushed down to parquet.
> STAGE PLANS: Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: 
> TableScan alias: mydual filterExpr: (id = 100) (type: boolean) Statistics: 
> Num rows: 362 Data size: 362 Basic stats: COMPLETE Column stats: NONE Filter 
> Operator predicate: (id = 100) (type: boolean) Statistics: Num rows: 181 Data 
> size: 181 Basic stats: COMPLETE Column stats: NONE Select Operator 
> expressions: 100 (type: int) outputColumnNames: _col0 Statistics: Num rows: 
> 181 Data size: 181 Basic stats: COMPLETE Column stats: NONE ListSink 
> {code}
> By checking the code of org.apache.hadoop.hive.ql.ppd.OpProcFactory:
> I just found that to generate the filterExpr in the plan, we have to set :
> hive.optimize.index.filter=true as a workaround, but this parameter is not 
> related to parquet input format as we does not have index at all.
>  
> {code:java}
> private static ExprNodeGenericFuncDesc pushFilterToStorageHandler(
>   TableScanOperator tableScanOp,
>   ExprNodeGenericFuncDesc originalPredicate,
>   OpWalkerInfo owi,
>   HiveConf hiveConf) {
>   TableScanDesc tableScanDesc = tableScanOp.getConf();
>   Table tbl = tableScanDesc.getTableMetadata();
>   if (HiveConf.getBoolVar(hiveConf, HiveConf.ConfVars.HIVEOPTINDEXFILTER)) {
> // attach the original predicate to the table scan operator for index
> // optimizations that require the pushed predicate before pcr & later
> // optimizations are applied
> tableScanDesc.setFilterExpr(originalPredicate);
>   }
>   if (!tbl.isNonNative()) {
> return originalPredicate;
>   }
>   HiveStorageHandler storageHandler = tbl.getStorageHandler();
>   if (!(storageHandler instanceof HiveStoragePredicateHandler)) {
> // The storage handler does not provide predicate decomposition
> // support, so we'll implement the entire filter in Hive.  However,
> // we still provide the full predicate to the storage handler in
> // case it wants to do any of its own prefiltering.
> tableScanDesc.setFilterExpr(originalPredicate);
> return originalPredicate;
>   }
>   HiveStoragePredicateHandler predicateHandler =
> (HiveStoragePredicateHandler) storageHandler;
>   JobConf jobConf = new JobConf(owi.getParseContext().getConf());
>   Utilities.setColumnNameList(jobConf, tableScanOp);
>   Utilities.setColumnTypeList(jobConf, tableScanOp);
>   Utilities.copyTableJobPropertiesToConf(
> Utilities.getTableDesc(tbl),
> jobConf);
> {code}
> Per my check , the "getFilterExpr" method of TableScanDesc is called below 
> places and 
> If hive always set the filterExpr, it may not cause trouble (chime in if i am 
> wrong).
> I could propose a pull request then.
> !image-2018-02-22-20-23-07-732.png!
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18779) Hive does not enable ppd to underlying storage format by default

2018-02-22 Thread Keith Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Keith Sun updated HIVE-18779:
-
Description: 
*Issue :* Hive does not enable ppd to underlying  storage format by default 
even with hive.optimize.ppd/storage=true and the inputFormat is applicable for 
fitler push down.

*How to re-produce :*

 
{code:java}
CREATE TABLE MYDUAL (ID INT) stored as parquet;
insert overwrite table mydual ...

set hive.optimize.ppd=true
set hive.optimize.ppd.storage=true 
explain select * from mydual where id =100;
//No filterExpr generated which will be utilized by Parquet InputFormat
STAGE PLANS:
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
TableScan
alias: mydual
Statistics: Num rows: 362 Data size: 362 Basic stats: COMPLETE Column stats: 
NONE
Filter Operator
predicate: (id = 100) (type: boolean)
Statistics: Num rows: 181 Data size: 181 Basic stats: COMPLETE Column stats: 
NONE
Select Operator
expressions: 100 (type: int)

//set hive.optimize.index.filter=true which is false by default.
//then we get the filterExpr which can be pushed down to parquet.

STAGE PLANS: Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: TableScan 
alias: mydual filterExpr: (id = 100) (type: boolean) Statistics: Num rows: 362 
Data size: 362 Basic stats: COMPLETE Column stats: NONE Filter Operator 
predicate: (id = 100) (type: boolean) Statistics: Num rows: 181 Data size: 181 
Basic stats: COMPLETE Column stats: NONE Select Operator expressions: 100 
(type: int) outputColumnNames: _col0 Statistics: Num rows: 181 Data size: 181 
Basic stats: COMPLETE Column stats: NONE ListSink 


{code}
By checking the code of org.apache.hadoop.hive.ql.ppd.OpProcFactory:

I just found that to generate the filterExpr in the plan, we have to set :

hive.optimize.index.filter=true as a workaround, but this parameter is not 
related to parquet input format as we does not have index at all.

 
{code:java}
private static ExprNodeGenericFuncDesc pushFilterToStorageHandler(
  TableScanOperator tableScanOp,
  ExprNodeGenericFuncDesc originalPredicate,
  OpWalkerInfo owi,
  HiveConf hiveConf) {

  TableScanDesc tableScanDesc = tableScanOp.getConf();
  Table tbl = tableScanDesc.getTableMetadata();
  if (HiveConf.getBoolVar(hiveConf, HiveConf.ConfVars.HIVEOPTINDEXFILTER)) {
// attach the original predicate to the table scan operator for index
// optimizations that require the pushed predicate before pcr & later
// optimizations are applied
tableScanDesc.setFilterExpr(originalPredicate);
  }
  if (!tbl.isNonNative()) {
return originalPredicate;
  }
  HiveStorageHandler storageHandler = tbl.getStorageHandler();
  if (!(storageHandler instanceof HiveStoragePredicateHandler)) {
// The storage handler does not provide predicate decomposition
// support, so we'll implement the entire filter in Hive.  However,
// we still provide the full predicate to the storage handler in
// case it wants to do any of its own prefiltering.
tableScanDesc.setFilterExpr(originalPredicate);
return originalPredicate;
  }
  HiveStoragePredicateHandler predicateHandler =
(HiveStoragePredicateHandler) storageHandler;
  JobConf jobConf = new JobConf(owi.getParseContext().getConf());
  Utilities.setColumnNameList(jobConf, tableScanOp);
  Utilities.setColumnTypeList(jobConf, tableScanOp);
  Utilities.copyTableJobPropertiesToConf(
Utilities.getTableDesc(tbl),
jobConf);
{code}
Per my check , the "getFilterExpr" method of TableScanDesc is called below 
places and 

If hive always set the filterExpr, it may not cause trouble (chime in if i am 
wrong).

!image-2018-02-22-20-29-41-589.png!

I could propose a pull request then.

!image-2018-02-22-20-23-07-732.png!

 

  was:
*Issue :* Hive does not enable ppd to underlying  storage format by default 
even with hive.optimize.ppd/storage=true and the inputFormat is applicable for 
fitler push down.

*How to re-produce :*

 
{code:java}
CREATE TABLE MYDUAL (ID INT) stored as parquet;
insert overwrite table mydual ...

set hive.optimize.ppd=true
set hive.optimize.ppd.storage=true 
explain select * from mydual where id =100;
//No filterExpr generated which will be utilized by Parquet InputFormat
STAGE PLANS:
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
TableScan
alias: mydual
Statistics: Num rows: 362 Data size: 362 Basic stats: COMPLETE Column stats: 
NONE
Filter Operator
predicate: (id = 100) (type: boolean)
Statistics: Num rows: 181 Data size: 181 Basic stats: COMPLETE Column stats: 
NONE
Select Operator
expressions: 100 (type: int)

//set hive.optimize.index.filter=true which is false by default.
//then we get the filterExpr which can be pushed down to parquet.

STAGE PLANS: Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: TableScan 
alias: mydual filterExpr: (id = 100) (type: boolean) Statistics: Num rows: 362 
Data size: 362 Basic stats: COMPLETE Column stats: NONE F

[jira] [Updated] (HIVE-18779) Hive does not enable ppd to underlying storage format by default

2018-02-22 Thread Keith Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Keith Sun updated HIVE-18779:
-
Description: 
*Issue :* Hive does not enable ppd to underlying  storage format by default 
even with hive.optimize.ppd/storage=true and the inputFormat is applicable for 
fitler push down.

*How to re-produce :*

 
{code:java}
CREATE TABLE MYDUAL (ID INT) stored as parquet;
insert overwrite table mydual ...

set hive.optimize.ppd=true
set hive.optimize.ppd.storage=true 
explain select * from mydual where id =100;
//No filterExpr generated which will be utilized by Parquet InputFormat
STAGE PLANS:
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
TableScan
alias: mydual
Statistics: Num rows: 362 Data size: 362 Basic stats: COMPLETE Column stats: 
NONE
Filter Operator
predicate: (id = 100) (type: boolean)
Statistics: Num rows: 181 Data size: 181 Basic stats: COMPLETE Column stats: 
NONE
Select Operator
expressions: 100 (type: int)

//set hive.optimize.index.filter=true which is false by default.
//then we get the filterExpr which can be pushed down to parquet.

STAGE PLANS: Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: TableScan 
alias: mydual filterExpr: (id = 100) (type: boolean) Statistics: Num rows: 362 
Data size: 362 Basic stats: COMPLETE Column stats: NONE Filter Operator 
predicate: (id = 100) (type: boolean) Statistics: Num rows: 181 Data size: 181 
Basic stats: COMPLETE Column stats: NONE Select Operator expressions: 100 
(type: int) outputColumnNames: _col0 Statistics: Num rows: 181 Data size: 181 
Basic stats: COMPLETE Column stats: NONE ListSink 


{code}
By checking the code of org.apache.hadoop.hive.ql.ppd.OpProcFactory:

I just found that to generate the filterExpr in the plan, we have to set :

hive.optimize.index.filter=true as a workaround, but this parameter is not 
related to parquet input format as we does not have index at all.

 
{code:java}
private static ExprNodeGenericFuncDesc pushFilterToStorageHandler(
  TableScanOperator tableScanOp,
  ExprNodeGenericFuncDesc originalPredicate,
  OpWalkerInfo owi,
  HiveConf hiveConf) {

  TableScanDesc tableScanDesc = tableScanOp.getConf();
  Table tbl = tableScanDesc.getTableMetadata();
  if (HiveConf.getBoolVar(hiveConf, HiveConf.ConfVars.HIVEOPTINDEXFILTER)) {
// attach the original predicate to the table scan operator for index
// optimizations that require the pushed predicate before pcr & later
// optimizations are applied
tableScanDesc.setFilterExpr(originalPredicate);
  }
  if (!tbl.isNonNative()) {
return originalPredicate;
  }
  HiveStorageHandler storageHandler = tbl.getStorageHandler();
  if (!(storageHandler instanceof HiveStoragePredicateHandler)) {
// The storage handler does not provide predicate decomposition
// support, so we'll implement the entire filter in Hive.  However,
// we still provide the full predicate to the storage handler in
// case it wants to do any of its own prefiltering.
tableScanDesc.setFilterExpr(originalPredicate);
return originalPredicate;
  }
  HiveStoragePredicateHandler predicateHandler =
(HiveStoragePredicateHandler) storageHandler;
  JobConf jobConf = new JobConf(owi.getParseContext().getConf());
  Utilities.setColumnNameList(jobConf, tableScanOp);
  Utilities.setColumnTypeList(jobConf, tableScanOp);
  Utilities.copyTableJobPropertiesToConf(
Utilities.getTableDesc(tbl),
jobConf);
{code}
Per my check , the "getFilterExpr" method of TableScanDesc is called below 
places and 

If hive always set the filterExpr, it may not cause trouble (chime in if i am 
wrong).

I could propose a pull request then.

!image-2018-02-22-20-23-07-732.png!

 

  was:
*Issue :* Hive does not enable ppd to underlying  storage format by default 
even with hive.optimize.ppd/storage=true and the inputFormat is applicable for 
fitler push down.

*How to re-produce :*

 
{code:java}
CREATE TABLE MYDUAL (ID INT) stored as parquet;
insert overwrite table mydual ...

set hive.optimize.ppd=true
set hive.optimize.ppd.storage=true 
explain select * from mydual where id =100;
//No filterExpr generated which will be utilized by Parquet InputFormat
STAGE PLANS:
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
TableScan
alias: mydual
Statistics: Num rows: 362 Data size: 362 Basic stats: COMPLETE Column stats: 
NONE
Filter Operator
predicate: (id = 100) (type: boolean)
Statistics: Num rows: 181 Data size: 181 Basic stats: COMPLETE Column stats: 
NONE
Select Operator
expressions: 100 (type: int)

//set hive.optimize.index.filter=true which is false by default.
//then we get the filterExpr which can be pushed down to parquet.

STAGE PLANS: Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: TableScan 
alias: mydual filterExpr: (id = 100) (type: boolean) Statistics: Num rows: 362 
Data size: 362 Basic stats: COMPLETE Column stats: NONE Filter Operator 
predicate: (id = 100)