date:20230320

[jira] [Created] (HIVE-27159) Filters are not pushed down for decimal format in Parquet

2023-03-20 Thread Rajesh Balamohan (Jira)

Rajesh Balamohan created HIVE-27159:
---

 Summary: Filters are not pushed down for decimal format in Parquet
 Key: HIVE-27159
 URL: https://issues.apache.org/jira/browse/HIVE-27159
 Project: Hive
  Issue Type: Improvement
Reporter: Rajesh Balamohan


Decimal filters are not created and pushed down in parquet readers. This causes 
latency delays and unwanted row processing in query execution. 

It throws exception in runtime and processes more rows. 

E.g Q13.

{noformat}

Parquet: (Map 1)

INFO  : Task Execution Summary
INFO  : 
--
INFO  :   VERTICES  DURATION(ms)   CPU_TIME(ms)GC_TIME(ms)   
INPUT_RECORDS   OUTPUT_RECORDS
INFO  : 
--
INFO  :  Map 1  31254.00  0  0 
549,181,950  133
INFO  :  Map 3  0.00  0  0  
73,049  365
INFO  :  Map 4   2027.00  0  0   
6,000,0001,689,919
INFO  :  Map 5  0.00  0  0   
7,2001,440
INFO  :  Map 6517.00  0  0   
1,920,800  493,920
INFO  :  Map 7  0.00  0  0   
1,0021,002
INFO  :  Reducer 2  18716.00  0  0 
1330
INFO  : 
--

ORC:


INFO  : Task Execution Summary
INFO  : 
--
INFO  :   VERTICES  DURATION(ms)   CPU_TIME(ms)GC_TIME(ms)   
INPUT_RECORDS   OUTPUT_RECORDS
INFO  : 
--
INFO  :  Map 1   6556.00  0  0 
267,146,063  152
INFO  :  Map 3  0.00  0  0  
10,000  365
INFO  :  Map 4   2014.00  0  0   
6,000,0001,689,919
INFO  :  Map 5  0.00  0  0   
7,2001,440
INFO  :  Map 6504.00  0  0   
1,920,800  493,920
INFO  :  Reducer 2   3159.00  0  0 
1520
INFO  : 
--

{noformat}




{noformat}
 Map 1
Map Operator Tree:
TableScan
  alias: store_sales
  filterExpr: (ss_hdemo_sk is not null and ss_addr_sk is not 
null and ss_cdemo_sk is not null and ss_store_sk is not null and 
((ss_sales_price >= 100) or (ss_sales_price <= 150) or (ss_sales_price >= 50) 
or (ss_sales_price <= 100) or (ss_sales_price >= 150) or (ss_sales_price <= 
200)) and ((ss_net_profit >= 100) or (ss_net_profit <= 200) or (ss_net_profit 
>= 150) or (ss_net_profit <= 300) or (ss_net_profit >= 50) or (ss_net_profit <= 
250))) (type: boolean)
  probeDecodeDetails: cacheKey:HASH_MAP_MAPJOIN_112_container, 
bigKeyColName:ss_hdemo_sk, smallTablePos:1, keyRatio:5.042575832290721E-6
  Statistics: Num rows: 2750380056 Data size: 1321831086472 
Basic stats: COMPLETE Column stats: COMPLETE
  Filter Operator
predicate: (ss_hdemo_sk is not null and ss_addr_sk is not 
null and ss_cdemo_sk is not null and ss_store_sk is not null and 
((ss_sales_price >= 100) or (ss_sales_price <= 150) or (ss_sales_price >= 50) 
or (ss_sales_price <= 100) or (ss_sales_price >= 150) or (ss_sales_price <= 
200)) and ((ss_net_profit >= 100) or (ss_net_profit <= 200) or (ss_net_profit 
>= 150) or (ss_net_profit <= 300) or (ss_net_profit >= 50) or (ss_net_profit <= 
250))) (type: boolean)
Statistics: Num rows: 2500252205 Data size: 1201619783884 
Basic stats: COMPLETE Column stats: COMPLETE
Select Operator
  expressions: ss_cdemo_sk (type: bigint), ss_hdemo_sk 
(type: bigint), ss_addr_sk (type: bigint), ss_store_sk (type: bigint), 
ss_quantity (type: int), ss_ext_sales_price (type: decimal(7,2)), 
ss_ext_wholesale_cost (type: decimal(7,2)), ss_sold_date_sk (type: bigint), 
ss_net_profit BETWEEN 100 AND 200 (type: boolean), ss_net_profit BETWEEN 150 
AND 300 (type: boolean), ss_net_profit BETWEEN 50 AND 250 (type: boolean), 
ss_sales_price BETWEEN 100 AND 150 (type: boolean), ss_sales_price BETWEEN 50 
AND 100 (type: boolean), ss_sales_price BETWEEN 150 AND 200 (type: boolean)

[jira] [Created] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables

2023-03-20 Thread Simhadri Govindappa (Jira)

Simhadri Govindappa created HIVE-27158:
--

 Summary: Store hive columns stats in puffin files for iceberg 
tables
 Key: HIVE-27158
 URL: https://issues.apache.org/jira/browse/HIVE-27158
 Project: Hive
  Issue Type: Improvement
Reporter: Simhadri Govindappa
Assignee: Simhadri Govindappa






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-27157) AssertionError when inferring return type for unix_timestamp function

2023-03-20 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-27157:
--

 Summary: AssertionError when inferring return type for 
unix_timestamp function
 Key: HIVE-27157
 URL: https://issues.apache.org/jira/browse/HIVE-27157
 Project: Hive
  Issue Type: Bug
  Components: CBO
Affects Versions: 4.0.0-alpha-2
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


Any attempt to derive the return data type for the {{unix_timestamp}} function 
results into the following assertion error.
{noformat}
java.lang.AssertionError: typeName.allowsPrecScale(true, false): BIGINT
at 
org.apache.calcite.sql.type.BasicSqlType.checkPrecScale(BasicSqlType.java:65)
at org.apache.calcite.sql.type.BasicSqlType.(BasicSqlType.java:81)
at 
org.apache.calcite.sql.type.SqlTypeFactoryImpl.createSqlType(SqlTypeFactoryImpl.java:67)
at 
org.apache.calcite.sql.fun.SqlAbstractTimeFunction.inferReturnType(SqlAbstractTimeFunction.java:78)
at 
org.apache.calcite.rex.RexBuilder.deriveReturnType(RexBuilder.java:278)
{noformat}
due to a faulty implementation of type inference for the respective operators:
 * 
[https://github.com/apache/hive/blob/52360151dc43904217e812efde1069d6225e9570/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveUnixTimestampSqlOperator.java]
 * 
[https://github.com/apache/hive/blob/52360151dc43904217e812efde1069d6225e9570/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveToUnixTimestampSqlOperator.java]

Although at this stage in master it is not possible to reproduce the problem 
with an actual SQL query the buggy implementation must be fixed since slight 
changes in the code/CBO rules may lead to methods relying on 
{{{}SqlOperator.inferReturnType{}}}.

Note that in older versions of Hive it is possible to hit the AssertionError in 
various ways. For example in Hive 3.1.3 (and older), the error may come from 
[HiveRelDecorrelator|https://github.com/apache/hive/blob/4df4d75bf1e16fe0af75aad0b4179c34c07fc975/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveRelDecorrelator.java#L1933]
 in the presence of sub-queries.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-27156) Wrong results when CAST timestamp literal with timezone to TIMESTAMP

2023-03-20 Thread Stamatis Zampetakis (Jira)

Stamatis Zampetakis created HIVE-27156:
--

 Summary: Wrong results when CAST timestamp literal with timezone 
to TIMESTAMP
 Key: HIVE-27156
 URL: https://issues.apache.org/jira/browse/HIVE-27156
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 4.0.0-alpha-2
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


Casting a timestamp literal with an invalid timezone to the TIMESTAMP datatype 
results into a timestamp with the time part truncated to midnight (00:00:00). 

*Case I*
{code:sql}
select cast('2020-06-28 22:17:33.123456 Europe/Amsterd' as timestamp);
{code}

+Actual+
|2020-06-28 00:00:00|

+Expected+
|NULL/ERROR/2020-06-28 22:17:33.123456|

*Case II*
{code:sql}
select cast('2020-06-28 22:17:33.123456 Invalid/Zone' as timestamp);
{code}

+Actual+
|2020-06-28 00:00:00|

+Expected+
|NULL/ERROR/2020-06-28 22:17:33.123456|

The existing documentation does not cover what should be the output in the 
cases above:
* 
https://cwiki.apache.org/confluence/display/hive/languagemanual+types#LanguageManualTypes-TimestampstimestampTimestamps
* https://cwiki.apache.org/confluence/display/Hive/Different+TIMESTAMP+types

*Case III*
Another subtle but important case is the following where the timestamp literal 
has a valid timezone but we are attempting a cast to a datatype that does not 
store the timezone.

{code:sql}
select cast('2020-06-28 22:17:33.123456 Europe/Amsterdam' as timestamp);
{code}

+Actual+
|2020-06-28 22:17:33.123456|

The correctness of the last result is debatable since someone would expect a 
NULL or ERROR.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-27155) Iceberg: Vectorize virtual columns

2023-03-20 Thread Denys Kuzmenko (Jira)

Denys Kuzmenko created HIVE-27155:
-

 Summary: Iceberg: Vectorize virtual columns
 Key: HIVE-27155
 URL: https://issues.apache.org/jira/browse/HIVE-27155
 Project: Hive
  Issue Type: Task
Reporter: Denys Kuzmenko


Vectorization gets disabled at runtime with the following reason: 
{code}
Select expression for SELECT operator: Virtual column PARTITION__SPEC__ID is 
not supported
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-27159) Filters are not pushed down for decimal format in Parquet

[jira] [Created] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables

[jira] [Created] (HIVE-27157) AssertionError when inferring return type for unix_timestamp function

[jira] [Created] (HIVE-27156) Wrong results when CAST timestamp literal with timezone to TIMESTAMP

[jira] [Created] (HIVE-27155) Iceberg: Vectorize virtual columns

5 matches

Site Navigation

Mail list logo

Footer information