from:"Jesus Camacho Rodriguez \(Jira\)"

[jira] [Created] (HIVE-25358) Remove reviewer pattern

2021-07-20 Thread Jesus Camacho Rodriguez (Jira)

Jesus Camacho Rodriguez created HIVE-25358:
--

 Summary: Remove reviewer pattern
 Key: HIVE-25358
 URL: https://issues.apache.org/jira/browse/HIVE-25358
 Project: Hive
  Issue Type: Sub-task
Reporter: Panagiotis Garefalakis
Assignee: Panagiotis Garefalakis
 Fix For: 4.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-25105) Support Parquet as MV storage format

2021-05-11 Thread Jesus Camacho Rodriguez (Jira)

Jesus Camacho Rodriguez created HIVE-25105:
--

 Summary: Support Parquet as MV storage format
 Key: HIVE-25105
 URL: https://issues.apache.org/jira/browse/HIVE-25105
 Project: Hive
  Issue Type: Improvement
  Components: Materialized views
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


Currently the support storage formats do not include Parquet:

{code}
...
HIVE_MATERIALIZED_VIEW_FILE_FORMAT("hive.materializedview.fileformat", 
"ORC",
new StringSet("none", "TextFile", "SequenceFile", "RCfile", "ORC"),
...
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-24966) RuntimeException in CBO if HMS stats are modified externally

2021-03-31 Thread Jesus Camacho Rodriguez (Jira)

Jesus Camacho Rodriguez created HIVE-24966:
--

 Summary: RuntimeException in CBO if HMS stats are modified 
externally
 Key: HIVE-24966
 URL: https://issues.apache.org/jira/browse/HIVE-24966
 Project: Hive
  Issue Type: Bug
  Components: CBO
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


While we want to expose this case so the user can take action, currently we 
throw a RuntimeException. Rather than failing the query, it may be better to 
show this information to the user and suggest recomputing stats.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-24685) Remove HiveSubQRemoveRelBuilder

2021-01-25 Thread Jesus Camacho Rodriguez (Jira)

Jesus Camacho Rodriguez created HIVE-24685:
--

 Summary: Remove HiveSubQRemoveRelBuilder
 Key: HIVE-24685
 URL: https://issues.apache.org/jira/browse/HIVE-24685
 Project: Hive
  Issue Type: Bug
  Components: CBO
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


The class seems to be a close clone of {{RelBuilder}} created due to some bugs 
existing in original implementation. Those issues seem to be fixed now and we 
should be able to get rid of the copy. In the worst case scenario, if we need 
to keep it for the time being, we could try to make it extend {{RelBuilder}} 
and override only necessary methods.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-24527) Allow triggering materialized view rewriting for external tables

2020-12-11 Thread Jesus Camacho Rodriguez (Jira)

Jesus Camacho Rodriguez created HIVE-24527:
--

 Summary: Allow triggering materialized view rewriting for external 
tables
 Key: HIVE-24527
 URL: https://issues.apache.org/jira/browse/HIVE-24527
 Project: Hive
  Issue Type: Sub-task
  Components: Materialized views
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


Although we will not be able to check data staleness, this can be useful for 
debugging purposes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-24453) DirectSQL error when parsing create_time value for database

2020-11-30 Thread Jesus Camacho Rodriguez (Jira)

Jesus Camacho Rodriguez created HIVE-24453:
--

 Summary: DirectSQL error when parsing create_time value for 
database
 Key: HIVE-24453
 URL: https://issues.apache.org/jira/browse/HIVE-24453
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


HIVE-21077 introduced a {{create_time}} field for {{DBS}} table in HMS. 
Although the value for that field is always set after that patch, the value 
could be null if the database was created before the feature went in. DirectSQL 
should check for null value before parsing the integer, otherwise we hit an 
exception and fallback to ORM path:
{noformat}
2020-11-28 09:06:05,414 WARN  org.apache.hadoop.hive.metastore.ObjectStore: 
[pool-8-thread-194]: Falling back to ORM path due to direct SQL failure (this 
is not an error): null at 
org.apache.hadoop.hive.metastore.MetastoreDirectSqlUtils.extractSqlInt(MetastoreDirectSqlUtils.java:251)
 at 
org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getDatabase(MetaStoreDirectSql.java:420)
 at 
org.apache.hadoop.hive.metastore.ObjectStore$1.getSqlResult(ObjectStore.java:839)
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-24387) Metastore access through JDBC handler does not use correct database accessor

2020-11-13 Thread Jesus Camacho Rodriguez (Jira)

Jesus Camacho Rodriguez created HIVE-24387:
--

 Summary: Metastore access through JDBC handler does not use 
correct database accessor
 Key: HIVE-24387
 URL: https://issues.apache.org/jira/browse/HIVE-24387
 Project: Hive
  Issue Type: Bug
  Components: JDBC storage handler
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


There is some differences in the SQL syntax for each RDBMS generated by the 
database accessor. For metastore, we always end up with the default accessor, 
which lead to errors, e.g., when a limit query is executed for a 
Postgres-backed metastore.

{code}
Error: java.io.IOException: java.io.IOException: 
org.apache.hive.storage.jdbc.exception.HiveJdbcDatabaseAccessException: Error 
while trying to get column names: ERROR: syntax error at or near "{"
Position: 200 (state=,code=0)

SELECT "TBL_COLUMN_GRANT_ID", "COLUMN_NAME", "CREATE_TIME", "GRANT_OPTION", 
"GRANTOR", "GRANTOR_TYPE", "PRINCIPAL_NAME", "PRINCIPAL_TYPE", "TBL_COL_PRIV", 
"TBL_ID", "AUTHORIZER" FROM "TBL_COL_PRIVS"
{LIMIT 1}
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-24325) Cardinality preserving join optimization may fail when column is a constant

2020-10-28 Thread Jesus Camacho Rodriguez (Jira)

Jesus Camacho Rodriguez created HIVE-24325:
--

 Summary: Cardinality preserving join optimization may fail when 
column is a constant
 Key: HIVE-24325
 URL: https://issues.apache.org/jira/browse/HIVE-24325
 Project: Hive
  Issue Type: Bug
  Components: CBO
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


More info to come.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-24232) Incorrect translation of rollup expression from Calcite

2020-10-05 Thread Jesus Camacho Rodriguez (Jira)

Jesus Camacho Rodriguez created HIVE-24232:
--

 Summary: Incorrect translation of rollup expression from Calcite
 Key: HIVE-24232
 URL: https://issues.apache.org/jira/browse/HIVE-24232
 Project: Hive
  Issue Type: Bug
  Components: CBO
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


In Calcite, it is not necessary that the columns in the group set are in the 
same order as the rollup. For instance, this is the Calcite representation of a 
rollup for a given query:
{code}
HiveAggregate(group=[{1, 6, 7}], groups=[[{1, 6, 7}, {1, 7}, {1}, {}]], 
agg#0=[sum($12)], agg#1=[count($12)], agg#2=[sum($4)], agg#3=[count($4)], 
agg#4=[sum($15)], agg#5=[count($15)])
{code}
When we generate the Hive plan from the Calcite operator, we make such 
assumption incorrectly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-24202) Clean up local HS2 HMS cache code (II)

2020-09-24 Thread Jesus Camacho Rodriguez (Jira)

Jesus Camacho Rodriguez created HIVE-24202:
--

 Summary: Clean up local HS2 HMS cache code (II)
 Key: HIVE-24202
 URL: https://issues.apache.org/jira/browse/HIVE-24202
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


Follow-up for HIVE-24183 (split into different JIRAs).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-24183) Clean up local HS2 HMS cache code

2020-09-19 Thread Jesus Camacho Rodriguez (Jira)

Jesus Camacho Rodriguez created HIVE-24183:
--

 Summary: Clean up local HS2 HMS cache code
 Key: HIVE-24183
 URL: https://issues.apache.org/jira/browse/HIVE-24183
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


Follow-up for HIVE-24176.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-24176) Create query-level cache for HMS requests and extend existing local HS2 HMS cache

2020-09-17 Thread Jesus Camacho Rodriguez (Jira)

Jesus Camacho Rodriguez created HIVE-24176:
--

 Summary: Create query-level cache for HMS requests and extend 
existing local HS2 HMS cache
 Key: HIVE-24176
 URL: https://issues.apache.org/jira/browse/HIVE-24176
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


This issue creates a query-level cache for HMS requests. The lifecycle of that 
cache is associated to the lifecycle of the query. This basically means that 
each unique request to certain HMS APIs should only be served once from HMS, 
while follow-up repetitive calls will be retrieved from cache. The initial 
implementation includes caching for 19 APIs.

This issue also extends existing local HS2 HMS cache implementation introduced 
in HIVE-23949 to support other requests (getTableColumnStatistics, 
getPartitionsByNames). In fact, implementation relies on some of the logic 
introduced in that JIRA since there are some commonalities.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-24157) Strict mode to fail on CAST timestamp <-> numeral

2020-09-13 Thread Jesus Camacho Rodriguez (Jira)

Jesus Camacho Rodriguez created HIVE-24157:
--

 Summary: Strict mode to fail on CAST timestamp <-> numeral
 Key: HIVE-24157
 URL: https://issues.apache.org/jira/browse/HIVE-24157
 Project: Hive
  Issue Type: Improvement
  Components: SQL
Reporter: Jesus Camacho Rodriguez


There is some interest in enforcing that CAST numeral <-> timestamp is 
disallowed to avoid confusion among users, e.g., SQL standard does not allow 
numeral <-> timestamp casting, timestamp type is timezone agnostic, etc.

We should introduce a strict config for timestamp (similar to others before): 
If the config is true, we shall fail while compiling the query with a 
meaningful message.

To provide similar behavior, Hive has multiple functions that provide clearer 
semantics for numeral to timestamp conversion (and vice versa):
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-DateFunctions



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-24154) Missing simplification opportunity with IN and EQUALS clauses

2020-09-12 Thread Jesus Camacho Rodriguez (Jira)

Jesus Camacho Rodriguez created HIVE-24154:
--

 Summary: Missing simplification opportunity with IN and EQUALS 
clauses
 Key: HIVE-24154
 URL: https://issues.apache.org/jira/browse/HIVE-24154
 Project: Hive
  Issue Type: Improvement
  Components: CBO
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


For instance, in perf driver CBO query 74, there are several filters that could 
be simplified further:
{code}
HiveFilter(condition=[AND(=($1, 1999), IN($1, 1998, 1999))])
{code}
This may lead to incorrect estimates and leads to unnecessary execution time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-24147) Table column names are not extracted correctly in Hive JDBC storage handler

2020-09-10 Thread Jesus Camacho Rodriguez (Jira)

Jesus Camacho Rodriguez created HIVE-24147:
--

 Summary: Table column names are not extracted correctly in Hive 
JDBC storage handler
 Key: HIVE-24147
 URL: https://issues.apache.org/jira/browse/HIVE-24147
 Project: Hive
  Issue Type: Bug
  Components: JDBC storage handler
Affects Versions: 4.0.0
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


It seems the `ResultSetMetaData` extracted from the query to retrieve the table 
columns names contains these columns as fully qualified names instead of 
possibly using the {{getTableName}} method. This ends up throwing the storage 
handler off and leading to exceptions, both in CBO path and non-CBO path.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-24144) getIdentifierQuoteString in HiveDatabaseMetaData returns incorrect value

2020-09-10 Thread Jesus Camacho Rodriguez (Jira)

Jesus Camacho Rodriguez created HIVE-24144:
--

 Summary: getIdentifierQuoteString in HiveDatabaseMetaData returns 
incorrect value
 Key: HIVE-24144
 URL: https://issues.apache.org/jira/browse/HIVE-24144
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


{code}
  public String getIdentifierQuoteString() throws SQLException {
return " ";
  }
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-24143) Include convention in JDBC converter operator in Calcite plan

2020-09-10 Thread Jesus Camacho Rodriguez (Jira)

Jesus Camacho Rodriguez created HIVE-24143:
--

 Summary: Include convention in JDBC converter operator in Calcite 
plan
 Key: HIVE-24143
 URL: https://issues.apache.org/jira/browse/HIVE-24143
 Project: Hive
  Issue Type: Improvement
  Components: CBO
Affects Versions: 4.0.0
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


Among others, it will be useful to debug the dialect being chosen for query 
generation. For instance:
{code}
 HiveProject(jdbc_type_conversion_table1.ikey=[$0], 
jdbc_type_conversion_table1.bkey=[$1], jdbc_type_conversion_table1.fkey=[$2], 
jdbc_type_conversion_table1.dkey=[$3], jdbc_type_conversion_table1.chkey=[$4], 
jdbc_type_conversion_table1.dekey=[$5], jdbc_type_conversion_table1.dtkey=[$6], 
jdbc_type_conversion_table1.tkey=[$7])
  HiveProject(ikey=[$0], bkey=[$1], fkey=[$2], dkey=[$3], chkey=[$4], 
dekey=[$5], dtkey=[$6], tkey=[$7])
->HiveJdbcConverter(convention=[JDBC.DERBY])
  JdbcHiveTableScan(table=[[default, jdbc_type_conversion_table1]], 
table:alias=[jdbc_type_conversion_table1])
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-24092) Implement additional JDBC methods required by JDBC storage handler

2020-08-29 Thread Jesus Camacho Rodriguez (Jira)

Jesus Camacho Rodriguez created HIVE-24092:
--

 Summary: Implement additional JDBC methods required by JDBC 
storage handler
 Key: HIVE-24092
 URL: https://issues.apache.org/jira/browse/HIVE-24092
 Project: Hive
  Issue Type: Bug
  Components: JDBC storage handler
Reporter: Jesus Camacho Rodriguez


Calcite may rely on the following JDBC methods to generate SQL queries for Hive 
JDBC storage handler, which in the case of Hive itself, return a {{Method not 
supported}} exception. We should implement such methods:

{code}
nullsAreSortedAtEnd
nullsAreSortedAtStart
nullsAreSortedLow
nullsAreSortedHigh
storesLowerCaseIdentifiers
storesLowerCaseQuotedIdentifiers
storesMixedCaseIdentifiers
storesMixedCaseQuotedIdentifiers
storesUpperCaseIdentifiers
storesUpperCaseQuotedIdentifiers
supportsMixedCaseIdentifiers
supportsMixedCaseQuotedIdentifiers
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-24074) Incorrect handling of timestamp in Parquet/Avro when written in certain time zones in versions before Hive 3.x

2020-08-25 Thread Jesus Camacho Rodriguez (Jira)

Jesus Camacho Rodriguez created HIVE-24074:
--

 Summary: Incorrect handling of timestamp in Parquet/Avro when 
written in certain time zones in versions before Hive 3.x
 Key: HIVE-24074
 URL: https://issues.apache.org/jira/browse/HIVE-24074
 Project: Hive
  Issue Type: Bug
  Components: Avro, Parquet
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


The timezone conversion for Parquet and Avro uses new {{java.time.*}} classes, 
which can lead to incorrect values returned for certain dates in certain 
timezones if timestamp was computed and converted based on {{java.sql.*}} 
classes. For instance, the offset used for Singapore timezone in 
1900-01-01T00:00:00.000 is UTC+8, while the correct offset for that date should 
be UTC+6:55:25. Some additional information can be found here: 
https://stackoverflow.com/a/52152315



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-24073) Execution exception in sort-merge semijoin

2020-08-25 Thread Jesus Camacho Rodriguez (Jira)

Jesus Camacho Rodriguez created HIVE-24073:
--

 Summary: Execution exception in sort-merge semijoin
 Key: HIVE-24073
 URL: https://issues.apache.org/jira/browse/HIVE-24073
 Project: Hive
  Issue Type: Bug
  Components: Operators
Reporter: Jesus Camacho Rodriguez
Assignee: mahesh kumar behera


Working on HIVE-24001, we trigger an additional SJ conversion that leads to 
this exception at execution time:

{code}
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
org.apache.hadoop.hive.ql.metadata.HiveException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Attempting to overwrite 
nextKeyWritables[1]
at 
org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1063)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:685)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:707)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:707)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:707)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:707)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:462)
... 16 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Attempting to overwrite 
nextKeyWritables[1]
at 
org.apache.hadoop.hive.ql.exec.GroupByOperator.flush(GroupByOperator.java:1037)
at 
org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1060)
... 22 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Attempting to 
overwrite nextKeyWritables[1]
at 
org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.processKey(CommonMergeJoinOperator.java:564)
at 
org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.process(CommonMergeJoinOperator.java:243)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:887)
at 
org.apache.hadoop.hive.ql.exec.TezDummyStoreOperator.process(TezDummyStoreOperator.java:49)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:887)
at 
org.apache.hadoop.hive.ql.exec.GroupByOperator.forward(GroupByOperator.java:1003)
at 
org.apache.hadoop.hive.ql.exec.GroupByOperator.flush(GroupByOperator.java:1020)
... 23 more
{code}

To reproduce, just set {{hive.auto.convert.sortmerge.join}} to {{true}} in the 
last query in {{auto_sortmerge_join_10.q}} after HIVE-24041 has been merged.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-24041) Extend semijoin conversion rules

2020-08-14 Thread Jesus Camacho Rodriguez (Jira)

Jesus Camacho Rodriguez created HIVE-24041:
--

 Summary: Extend semijoin conversion rules
 Key: HIVE-24041
 URL: https://issues.apache.org/jira/browse/HIVE-24041
 Project: Hive
  Issue Type: Improvement
  Components: CBO
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


This patch fixes a couple of limitations that can be seen in {{cbo_query95.q}}, 
in particular:
- It adds a rule to trigger semijoin conversion when the there is an aggregate 
on top of the join that prunes all columns from left side, and the aggregate 
operator is on the left input of the join.
- It extends existing semijoin conversion rules to prune the unused columns 
from its left input, which leads to additional conversion opportunities.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-24012) Support for rewriting with materialized views containing grouping sets

2020-08-06 Thread Jesus Camacho Rodriguez (Jira)

Jesus Camacho Rodriguez created HIVE-24012:
--

 Summary: Support for rewriting with materialized views containing 
grouping sets
 Key: HIVE-24012
 URL: https://issues.apache.org/jira/browse/HIVE-24012
 Project: Hive
  Issue Type: Sub-task
  Components: Materialized views
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


Rewriting is not triggered for materialized views containing grouping sets. 
This issue implements an extension from Hive side to trigger additional 
rewritings for materialized views containing grouping sets.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-23973) Use SQL constraints to improve join reordering algorithm (III)

2020-07-31 Thread Jesus Camacho Rodriguez (Jira)

Jesus Camacho Rodriguez created HIVE-23973:
--

 Summary: Use SQL constraints to improve join reordering algorithm 
(III)
 Key: HIVE-23973
 URL: https://issues.apache.org/jira/browse/HIVE-23973
 Project: Hive
  Issue Type: Improvement
  Components: CBO
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-23869) Move alter statements in parser to new file

2020-07-16 Thread Jesus Camacho Rodriguez (Jira)

Jesus Camacho Rodriguez created HIVE-23869:
--

 Summary: Move alter statements in parser to new file
 Key: HIVE-23869
 URL: https://issues.apache.org/jira/browse/HIVE-23869
 Project: Hive
  Issue Type: Improvement
  Components: Parser
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


We are hitting HiveParser 'code too large' problem. HIVE-23857 introduced an 
adhoc script to solve this problem. Instead, we can split HiveParser.g into 
smaller files. For instance, we can group all alter statements into their own 
.g file.

This patch also fixes an ambiguity warning that was through related to LIKE 
ALL/ANY clauses.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-23558) Remove compute_stats UDAF

2020-05-27 Thread Jesus Camacho Rodriguez (Jira)

Jesus Camacho Rodriguez created HIVE-23558:
--

 Summary: Remove compute_stats UDAF
 Key: HIVE-23558
 URL: https://issues.apache.org/jira/browse/HIVE-23558
 Project: Hive
  Issue Type: Improvement
  Components: Statistics
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


HIVE-23530 replaces its usage completely. This issue is to remove it from Hive.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-23530) Use SQL functions instead of compute_stats UDAF to compute column statistics

2020-05-21 Thread Jesus Camacho Rodriguez (Jira)

Jesus Camacho Rodriguez created HIVE-23530:
--

 Summary: Use SQL functions instead of compute_stats UDAF to 
compute column statistics
 Key: HIVE-23530
 URL: https://issues.apache.org/jira/browse/HIVE-23530
 Project: Hive
  Issue Type: Improvement
  Components: Statistics
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


Currently we compute column statistics by relying on the {{compute_stats}} 
UDAF. For instance, for a given table {{tbl}}, the query to compute statistics 
for columns is translated internally into:
{code}
SELECT compute_stats(c1),
   compute_stats(c2),
   ...
FROM tbl;
{code}
{{compute_stats}} produces data for the stats available for each column type, 
e.g., struct<"max":long,"min":long,"countnulls":long,...>.

This issue is to produce a query that relies purely on SQL functions instead:
{code}
SELECT max(c1), min(c1), count(case when c1 is null then 1 else null end),
   ...
FROM tbl;
{code}

This will allow us to deprecate the {{compute_stats}} UDAF since it mostly 
duplicates functionality found in those other functions. Additionally, many of 
those functions already provide a vectorized implementation so the approach 
could potentially improve the performance of column stats collection.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-23389) FilterMergeRule can lead to AssertionError

2020-05-06 Thread Jesus Camacho Rodriguez (Jira)

Jesus Camacho Rodriguez created HIVE-23389:
--

 Summary: FilterMergeRule can lead to AssertionError
 Key: HIVE-23389
 URL: https://issues.apache.org/jira/browse/HIVE-23389
 Project: Hive
  Issue Type: Bug
  Components: CBO
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


I have not been able in latest master but this could potentially happens since 
Filter creation as a check on whether the expression is flat 
([here|https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/core/Filter.java#L74])
 and Filter merge does not flatten an expression when it is created.

{noformat}
java.lang.AssertionError: AND(=($3, 100), OR(OR(null, IS NOT 
NULL(CAST(100):INTEGER)), =(CAST(100):INTEGER, CAST(200):INTEGER)))
at org.apache.calcite.rel.core.Filter.(Filter.java:74)
at 
org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveFilter.(HiveFilter.java:39)
at 
org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelFactories$HiveFilterFactoryImpl.createFilter(HiveRelFactories.java:126)
at 
org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelBuilder.filter(HiveRelBuilder.java:99)
at org.apache.calcite.tools.RelBuilder.filter(RelBuilder.java:1055)
at 
org.apache.calcite.rel.rules.FilterMergeRule.onMatch(FilterMergeRule.java:81)
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-23365) Put RS deduplication optimization under cost based decision

2020-05-04 Thread Jesus Camacho Rodriguez (Jira)

Jesus Camacho Rodriguez created HIVE-23365:
--

 Summary: Put RS deduplication optimization under cost based 
decision
 Key: HIVE-23365
 URL: https://issues.apache.org/jira/browse/HIVE-23365
 Project: Hive
  Issue Type: Improvement
  Components: Physical Optimizer
Reporter: Jesus Camacho Rodriguez


Currently, RS deduplication is always executed whenever it is semantically 
correct. However, it could be beneficial if t to leave both RS operators in the 
plan, e.g., if the NDV of the second RS is very low. Thus, we would like this 
decision to be cost-based. We could use a simple heuristic that would work fine 
for most of the cases without introducing regressions for existing cases, e.g., 
if NDV for partition column is less than estimated parallelism in the second 
RS, do not execute deduplication.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-23302) Create HiveJdbcDatabaseAccessor for JDBC storage handler

2020-04-26 Thread Jesus Camacho Rodriguez (Jira)

Jesus Camacho Rodriguez created HIVE-23302:
--

 Summary: Create HiveJdbcDatabaseAccessor for JDBC storage handler
 Key: HIVE-23302
 URL: https://issues.apache.org/jira/browse/HIVE-23302
 Project: Hive
  Issue Type: Bug
  Components: StorageHandler
Reporter: Jesus Camacho Rodriguez


The {{JdbcDatabaseAccessor}} associated with the storage handler makes some SQL 
calls to the RDBMS through the JDBC connection. There is a 
{{GenericJdbcDatabaseAccessor}} with a generic implementation that the storage 
handler uses if there is no specific implementation for a certain RDBMS.
Currently, Hive uses the {{GenericJdbcDatabaseAccessor}}. Afaik the only 
generic query that will not work is splitting the query based on offset and 
limit, since the syntax for that query is different than the one accepted by 
Hive. We should create a {{HiveJdbcDatabaseAccessor}} to override that query 
and possibly fix any other existing incompatibilities.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-23298) Disable RS deduplication step in Optimizer if it is run in TezCompiler

2020-04-24 Thread Jesus Camacho Rodriguez (Jira)

Jesus Camacho Rodriguez created HIVE-23298:
--

 Summary: Disable RS deduplication step in Optimizer if it is run 
in TezCompiler
 Key: HIVE-23298
 URL: https://issues.apache.org/jira/browse/HIVE-23298
 Project: Hive
  Issue Type: Improvement
  Components: Physical Optimizer
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


HIVE-20703 introduced an additional RS deduplication step in TezCompiler. We 
could possibly try to disable the one that runs in {{Optimizer}} if we are 
using Tez so we do not run the optimization twice.

This issue is to explore that possibility.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-23291) Add Hive to DatabaseType in JDBC storage handler

2020-04-23 Thread Jesus Camacho Rodriguez (Jira)

Jesus Camacho Rodriguez created HIVE-23291:
--

 Summary: Add Hive to DatabaseType in JDBC storage handler
 Key: HIVE-23291
 URL: https://issues.apache.org/jira/browse/HIVE-23291
 Project: Hive
  Issue Type: Improvement
  Components: StorageHandler
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


Inception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-23275) Represent UNBOUNDED in window functions in CBO correctly

2020-04-22 Thread Jesus Camacho Rodriguez (Jira)

Jesus Camacho Rodriguez created HIVE-23275:
--

 Summary: Represent UNBOUNDED in window functions in CBO correctly
 Key: HIVE-23275
 URL: https://issues.apache.org/jira/browse/HIVE-23275
 Project: Hive
  Issue Type: Improvement
  Components: CBO
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


Currently we use a bounded representation with bound set to Integer.MAX_VALUE, 
which works correctly since that is the Hive implementation. However, Calcite 
has a specific boundary class {{RexWindowBoundUnbounded}} that we should be 
using instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-23229) CAST to string on column instead of simplification over literal column

2020-04-16 Thread Jesus Camacho Rodriguez (Jira)

Jesus Camacho Rodriguez created HIVE-23229:
--

 Summary: CAST to string on column instead of simplification over 
literal column
 Key: HIVE-23229
 URL: https://issues.apache.org/jira/browse/HIVE-23229
 Project: Hive
  Issue Type: Improvement
  Components: CBO
Reporter: Jesus Camacho Rodriguez


After HIVE-23100 went in, we end up for one of the queries with CAST over a 
column instead of applying CAST on literal and comparing in CHAR, which can be 
seen in ql/src/test/results/clientpositive/in_typecheck_char.q.out .
{code}
 filterExpr: (((s = 'a') and (t = 'a ')) or (null and (t = 
'bb'))) is null (type: boolean) 
{code}
was replaced by:
{code}
 filterExpr: (((CAST( s AS STRING) = 'a') and (CAST( t AS STRING) = 
'a')) or (null and (CAST( t AS STRING) = 'bb'))) is null (type: boolean)
{code}
Probably this is as a result of the changes introduced in HIVE-23100 wrt IN 
handling.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-23228) Missed optimization opportunity with equals and not equals

2020-04-16 Thread Jesus Camacho Rodriguez (Jira)

Jesus Camacho Rodriguez created HIVE-23228:
--

 Summary: Missed optimization opportunity with equals and not equals
 Key: HIVE-23228
 URL: https://issues.apache.org/jira/browse/HIVE-23228
 Project: Hive
  Issue Type: Improvement
  Components: CBO
Affects Versions: 4.0.0
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


After HIVE-23100 went in, there was a missed opportunity on the simplification 
of an AND predicate containing equals and not equals clause, which can be seen 
in ql/src/test/results/clientpositive/pcs.q.out .
{code}
 filterExpr: ((key = 3) or (ds = '2000-04-08') or key is not 
null) and (key = 2)) or ((ds <> '2000-04-08') and (key = 3))) and ((key + 5) > 
0))) (type: boolean)
{code}
was replaced by:
{code}
 filterExpr: ((key = 3) or (ds = '2000-04-08') or key is not 
null) and (key = 2)) or ((ds <> '2000-04-08') and (key <> 2) and (key = 3))) 
and ((key + 5) > 0))) (type: boolean)
{code}
Note the additional {{key <> 2}} in predicate below.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-23227) Refactor RexConverter and move some of its functionality into HiveFunctionHelper

2020-04-16 Thread Jesus Camacho Rodriguez (Jira)

Jesus Camacho Rodriguez created HIVE-23227:
--

 Summary: Refactor RexConverter and move some of its functionality 
into HiveFunctionHelper
 Key: HIVE-23227
 URL: https://issues.apache.org/jira/browse/HIVE-23227
 Project: Hive
  Issue Type: Improvement
  Components: CBO
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


After HIVE-23100, {{HiveFunctionHelper}} makes a few calls to methods that are 
in {{RexConverter}}. Those methods do not need to be there anymore but were not 
moved as part of that patch to avoid further changes in it. This issue is to 
tackle that refactoring.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-23226) Implement Calcite rule to transform CASE into COALESCE when possible

2020-04-16 Thread Jesus Camacho Rodriguez (Jira)

Jesus Camacho Rodriguez created HIVE-23226:
--

 Summary: Implement Calcite rule to transform CASE into COALESCE 
when possible
 Key: HIVE-23226
 URL: https://issues.apache.org/jira/browse/HIVE-23226
 Project: Hive
  Issue Type: Improvement
  Components: CBO
Reporter: Jesus Camacho Rodriguez


Currently, it is done in {{TypeCheckProcFactory}} when we create a Hive 
expression after Calcite optimization.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-23225) Simplify ExprFactory, ExprNodeDescExprFactory and RexNodeExprFactory

2020-04-16 Thread Jesus Camacho Rodriguez (Jira)

Jesus Camacho Rodriguez created HIVE-23225:
--

 Summary: Simplify ExprFactory, ExprNodeDescExprFactory and 
RexNodeExprFactory
 Key: HIVE-23225
 URL: https://issues.apache.org/jira/browse/HIVE-23225
 Project: Hive
  Issue Type: Improvement
Reporter: Jesus Camacho Rodriguez


The new {{ExprFactory}} was created based on existing calls from 
{{TypeCheckProcFactory}}. Now that we have the {{ExprNodeDesc}} and {{RexNode}} 
implementations, it seems we could do some work consolidating those methods, 
simplifying the super/subclasses, etc. For instance, the handling of literal 
values seems quite convoluted (handled by many different method) and could 
possibly be abstracted in a different way.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-23224) Literals in CBO plan could show less information

2020-04-16 Thread Jesus Camacho Rodriguez (Jira)

Jesus Camacho Rodriguez created HIVE-23224:
--

 Summary: Literals in CBO plan could show less information
 Key: HIVE-23224
 URL: https://issues.apache.org/jira/browse/HIVE-23224
 Project: Hive
  Issue Type: Improvement
  Components: CBO
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


Currently, they are very verbose. For all char and varchar literals it will 
show the encoding, thought it is always the same. For varchar literals, it 
prints type and length, which seems unnecessary.

For instance:
{code}
   HiveFilter(condition=[AND(IN($10, 
_UTF-16LE'wallpaper':VARCHAR(2147483647) CHARACTER SET "UTF-16LE", 
_UTF-16LE'parenting':VARCHAR(2147483647) CHARACTER SET "UTF-16LE", 
_UTF-16LE'musical':VARCHAR(2147483647) CHARACTER SET "UTF-16LE", 
_UTF-16LE'womens':VARCHAR(2147483647) CHARACTER SET "UTF-16LE", 
_UTF-16LE'birdal':VARCHAR(2147483647) CHARACTER SET "UTF-16LE", 
_UTF-16LE'pants':VARCHAR(2147483647) CHARACTER SET "UTF-16LE"), IN($12, 
_UTF-16LE'Home':VARCHAR(2147483647) CHARACTER SET "UTF-16LE", 
_UTF-16LE'Books':VARCHAR(2147483647) CHARACTER SET "UTF-16LE", 
_UTF-16LE'Electronics':VARCHAR(2147483647) CHARACTER SET "UTF-16LE", 
_UTF-16LE'Shoes':VARCHAR(2147483647) CHARACTER SET "UTF-16LE", 
_UTF-16LE'Jewelry':VARCHAR(2147483647) CHARACTER SET "UTF-16LE", 
_UTF-16LE'Men':VARCHAR(2147483647) CHARACTER SET "UTF-16LE"), OR(AND(IN($12, 
_UTF-16LE'Home':VARCHAR(2147483647) CHARACTER SET "UTF-16LE", 
_UTF-16LE'Books':VARCHAR(2147483647) CHARACTER SET "UTF-16LE", 
_UTF-16LE'Electronics':VARCHAR(2147483647) CHARACTER SET "UTF-16LE"), IN($10, 
_UTF-16LE'wallpaper':VARCHAR(2147483647) CHARACTER SET "UTF-16LE", 
_UTF-16LE'parenting':VARCHAR(2147483647) CHARACTER SET "UTF-16LE", 
_UTF-16LE'musical':VARCHAR(2147483647) CHARACTER SET "UTF-16LE")), AND(IN($12, 
_UTF-16LE'Shoes':VARCHAR(2147483647) CHARACTER SET "UTF-16LE", 
_UTF-16LE'Jewelry':VARCHAR(2147483647) CHARACTER SET "UTF-16LE", 
_UTF-16LE'Men':VARCHAR(2147483647) CHARACTER SET "UTF-16LE"), IN($10, 
_UTF-16LE'womens':VARCHAR(2147483647) CHARACTER SET "UTF-16LE", 
_UTF-16LE'birdal':VARCHAR(2147483647) CHARACTER SET "UTF-16LE", 
_UTF-16LE'pants':VARCHAR(2147483647) CHARACTER SET "UTF-16LE"]) 
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-23223) Unnecessary CAST to decimal around CASE statement

2020-04-16 Thread Jesus Camacho Rodriguez (Jira)

Jesus Camacho Rodriguez created HIVE-23223:
--

 Summary: Unnecessary CAST to decimal around CASE statement
 Key: HIVE-23223
 URL: https://issues.apache.org/jira/browse/HIVE-23223
 Project: Hive
  Issue Type: Improvement
  Components: CBO
Affects Versions: 4.0.0
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


After HIVE-23100 went in, there was a missed opportunity on the simplification 
of a CAST statement on top of a CASE clause, which can be seen in 
ql/src/test/results/clientpositive/vector_case_when_2.q.out .
{code}
   expressions: q548284 (type: int), CASE WHEN ((q548284 = 4)) THEN 
(0.8) WHEN ((q548284 = 5)) THEN (1) ELSE (8) END (type: decimal(2,1))
{code}
was replaced by:
{code}
   expressions: q548284 (type: int), CAST( CASE WHEN ((q548284 = 
4)) THEN (0.8) WHEN ((q548284 = 5)) THEN (1) ELSE (8) END AS decimal(11,1)) 
(type: decimal(11,1))
{code}
The type of the CASE expression could be inferred and enforce without the CAST.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-23222) Missed opportunity in IN merge

2020-04-16 Thread Jesus Camacho Rodriguez (Jira)

Jesus Camacho Rodriguez created HIVE-23222:
--

 Summary: Missed opportunity in IN merge
 Key: HIVE-23222
 URL: https://issues.apache.org/jira/browse/HIVE-23222
 Project: Hive
  Issue Type: Improvement
  Components: CBO
Affects Versions: 4.0.0
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


After HIVE-23100 went in, there was a missed opportunity merging IN clauses, 
which can be seen in 
ql/src/test/results/clientpositive/llap/vector_between_in.q.out .
{code}
   filterExpr: (cdecimal1) IN (2365.8945945946, 881.0135135135, 
-3367.6517567568) (type: boolean) 
{code}
was replaced by:
{code}
   filterExpr: ((cdecimal1) IN (2365.8945945946, 
-3367.6517567568) or (cdecimal1) IN (881.0135135135)) (type: boolean)
{code}
The problem seems to be that with decimal type, we are considering values with 
different precision/scale as a different type, thus we do not merge them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-23162) Remove swapping logic to merge joins in AST converter

2020-04-08 Thread Jesus Camacho Rodriguez (Jira)

Jesus Camacho Rodriguez created HIVE-23162:
--

 Summary: Remove swapping logic to merge joins in AST converter
 Key: HIVE-23162
 URL: https://issues.apache.org/jira/browse/HIVE-23162
 Project: Hive
  Issue Type: Bug
  Components: CBO
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


In ASTConverter, there is some logic to invert join inputs so the logic to 
merge joins in SemanticAnalyzer kicks in.
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/ASTConverter.java#L407

There is a bug because inputs are swapped but the schema is not. However, it 
turns out that logic is not needed now that merging is off by default.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-23100) Create RexNode factory and use it in CalcitePlanner

2020-03-28 Thread Jesus Camacho Rodriguez (Jira)

Jesus Camacho Rodriguez created HIVE-23100:
--

 Summary: Create RexNode factory and use it in CalcitePlanner
 Key: HIVE-23100
 URL: https://issues.apache.org/jira/browse/HIVE-23100
 Project: Hive
  Issue Type: Improvement
  Components: CBO
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


Follow-up of HIVE-22746.

This will allow us to generate directly the RexNode from the AST nodes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-23011) Shared work optimizer should check residual predicates when comparing joins

2020-03-10 Thread Jesus Camacho Rodriguez (Jira)

Jesus Camacho Rodriguez created HIVE-23011:
--

 Summary: Shared work optimizer should check residual predicates 
when comparing joins
 Key: HIVE-23011
 URL: https://issues.apache.org/jira/browse/HIVE-23011
 Project: Hive
  Issue Type: Bug
  Components: Physical Optimizer
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-23009) SEL operator created by DynamicPartitionPruningOptimization does not populate colExprMap

2020-03-10 Thread Jesus Camacho Rodriguez (Jira)

Jesus Camacho Rodriguez created HIVE-23009:
--

 Summary: SEL operator created by 
DynamicPartitionPruningOptimization does not populate colExprMap
 Key: HIVE-23009
 URL: https://issues.apache.org/jira/browse/HIVE-23009
 Project: Hive
  Issue Type: Bug
  Components: Physical Optimizer, Statistics
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


This can lead to incorrect column stats propagation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-22996) BasicStats parsing should check proactively for null or empty string

2020-03-06 Thread Jesus Camacho Rodriguez (Jira)

Jesus Camacho Rodriguez created HIVE-22996:
--

 Summary: BasicStats parsing should check proactively for null or 
empty string
 Key: HIVE-22996
 URL: https://issues.apache.org/jira/browse/HIVE-22996
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


Rather than throwing an Exception for control flow, which will create 
unnecessary overhead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-22978) Fix decimal precision and scale inference for aggregate rewriting in Calcite

2020-03-04 Thread Jesus Camacho Rodriguez (Jira)

Jesus Camacho Rodriguez created HIVE-22978:
--

 Summary: Fix decimal precision and scale inference for aggregate 
rewriting in Calcite
 Key: HIVE-22978
 URL: https://issues.apache.org/jira/browse/HIVE-22978
 Project: Hive
  Issue Type: Bug
  Components: CBO
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


Calcite rules can do rewritings of aggregate functions, e.g., {{avg}} into 
{{sum/count}}. When type of {{avg}} is decimal, inference of intermediate 
precision and scale for the division is not done correctly. The reason is that 
we miss support for some types in method {{getDefaultPrecision}} in 
{{HiveTypeSystemImpl}}. Additionally, {{deriveSumType}} should be overridden in 
{{HiveTypeSystemImpl}} to abide by the Hive semantics for sum aggregate type 
inference.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-22962) Reuse HiveRelFieldTrimmer instance across queries

2020-03-02 Thread Jesus Camacho Rodriguez (Jira)

Jesus Camacho Rodriguez created HIVE-22962:
--

 Summary: Reuse HiveRelFieldTrimmer instance across queries
 Key: HIVE-22962
 URL: https://issues.apache.org/jira/browse/HIVE-22962
 Project: Hive
  Issue Type: Improvement
  Components: CBO
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


Currently we create multiple {{HiveRelFieldTrimmer}} instances per query. 
{{HiveRelFieldTrimmer}} uses a method dispatcher that has a built-in caching 
mechanism: given a certain object, it stores the method that was called for the 
object class. However, by instantiating the trimmer multiple times per query 
and across queries, we create a new dispatcher with each instantiation, thus 
effectively removing the caching mechanism that is built within the dispatcher.

This issue is to reutilize the same {{HiveRelFieldTrimmer}} instance within a 
single query and across queries.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-22953) Update Apache Arrow and flatbuffer versions

2020-03-01 Thread Jesus Camacho Rodriguez (Jira)

Jesus Camacho Rodriguez created HIVE-22953:
--

 Summary: Update Apache Arrow and flatbuffer versions
 Key: HIVE-22953
 URL: https://issues.apache.org/jira/browse/HIVE-22953
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


HIVE-22827 updated flatbuffer version to 1.6.0.1. Current Arrow version 
consumed by Hive uses 1.2.0 (com.vlkan:flatbuffers version).
This issue is to update Arrow to at least 0.15.1 and flatbuffers to 1.11.0 
(from official flatbuffers release, same version used by Arrow).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-22923) Extract cumulative cost metadata from HiveRelMdDistinctRowCount metadata provider

2020-02-21 Thread Jesus Camacho Rodriguez (Jira)

Jesus Camacho Rodriguez created HIVE-22923:
--

 Summary: Extract cumulative cost metadata from 
HiveRelMdDistinctRowCount metadata provider 
 Key: HIVE-22923
 URL: https://issues.apache.org/jira/browse/HIVE-22923
 Project: Hive
  Issue Type: Improvement
  Components: CBO
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


It should not contained there.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-22921) materialized_view_partitioned_3.q relies on hive.optimize.sort.dynamic.partition property

2020-02-21 Thread Jesus Camacho Rodriguez (Jira)

Jesus Camacho Rodriguez created HIVE-22921:
--

 Summary: materialized_view_partitioned_3.q relies on 
hive.optimize.sort.dynamic.partition property
 Key: HIVE-22921
 URL: https://issues.apache.org/jira/browse/HIVE-22921
 Project: Hive
  Issue Type: Test
Reporter: Jesus Camacho Rodriguez
Assignee: Vineet Garg


{{hive.optimize.sort.dynamic.partition}} was deprecated in favor of 
{{hive.optimize.sort.dynamic.partition.threshold}} in HIVE-20703. 
{{materialized_view_partitioned_3.q}} specifically tests 
SortedDynPartitionOptimizer for MVs. We need to update the q test.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-22854) hive-service should not depend on hive-exec

2020-02-07 Thread Jesus Camacho Rodriguez (Jira)

Jesus Camacho Rodriguez created HIVE-22854:
--

 Summary: hive-service should not depend on hive-exec
 Key: HIVE-22854
 URL: https://issues.apache.org/jira/browse/HIVE-22854
 Project: Hive
  Issue Type: Improvement
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


It does not need to depend on hive-exec since it does not use it.





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-22842) Timestamp/date vectors in Arrow serializer should use correct calendar for value representation

2020-02-06 Thread Jesus Camacho Rodriguez (Jira)

Jesus Camacho Rodriguez created HIVE-22842:
--

 Summary: Timestamp/date vectors in Arrow serializer should use 
correct calendar for value representation
 Key: HIVE-22842
 URL: https://issues.apache.org/jira/browse/HIVE-22842
 Project: Hive
  Issue Type: Improvement
Reporter: Jesus Camacho Rodriguez
Assignee: Shubham Chaurasia






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-22827) Update Flatbuffer version

2020-02-04 Thread Jesus Camacho Rodriguez (Jira)

Jesus Camacho Rodriguez created HIVE-22827:
--

 Summary: Update Flatbuffer version
 Key: HIVE-22827
 URL: https://issues.apache.org/jira/browse/HIVE-22827
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


Hive currently uses Flatbuffer 1.2.0. Other Apache projects use a more 
up-to-date version, e.g. 1.6.0.1. Upgrade to that version.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-22795) Create new parser and udf module from ql

2020-01-30 Thread Jesus Camacho Rodriguez (Jira)

Jesus Camacho Rodriguez created HIVE-22795:
--

 Summary: Create new parser and udf module from ql
 Key: HIVE-22795
 URL: https://issues.apache.org/jira/browse/HIVE-22795
 Project: Hive
  Issue Type: Improvement
  Components: Build Infrastructure
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


ql is a huge module. I propose to start splitting it by creating new module 
`parser` and `udf` to encapsulate some classes related to SQL parsing and UDF 
declaration, respectively.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-22785) Update/delete/merge statements not optimized through CBO

2020-01-27 Thread Jesus Camacho Rodriguez (Jira)

Jesus Camacho Rodriguez created HIVE-22785:
--

 Summary: Update/delete/merge statements not optimized through CBO
 Key: HIVE-22785
 URL: https://issues.apache.org/jira/browse/HIVE-22785
 Project: Hive
  Issue Type: Improvement
  Components: CBO
Reporter: Jesus Camacho Rodriguez


Currently, CBO is bypassed for update/delete/merge statements.

To support optimizing these statements through CBO, we need to complete three 
main tasks: 1) support for sort in CBO, 2) support for SORT in AST converter, 
and 3) {{RewriteSemanticAnalyzer}} should extend {{CalcitePlanner}} instead of 
{{SemanticAnalyzer}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-22746) Make TypeCheckProcFactory generic

2020-01-17 Thread Jesus Camacho Rodriguez (Jira)

Jesus Camacho Rodriguez created HIVE-22746:
--

 Summary: Make TypeCheckProcFactory generic
 Key: HIVE-22746
 URL: https://issues.apache.org/jira/browse/HIVE-22746
 Project: Hive
  Issue Type: Improvement
  Components: CBO
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


{{TypeCheckProcFactory}} is responsible for processing AST nodes and generating 
ExprNode objects from them. When we generate the expressions for Calcite 
planning, we go through a {{AST node -> ExprNode -> RexNode}} transformation. 
We would like to avoid the overhead of going through the ExprNode, and thus 
generate directly the RexNode from the AST.

To do that, the first step is to make {{TypeCheckProcFactory}} generic, so it 
can receive an expression factory and create expressions in different realms. 
For the time being, the only factory implementation is the ExprNode factory. 
Thus, this patch focuses mainly on refactoring {{TypeCheckProcFactory}} without 
breaking anything that is already working.

In a follow-up patch, we will create a {{RexNode}} factory and use it when we 
parse the query in CalcitePlanner.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-22728) Limit the scope of uniqueness of constraint name to database

2020-01-14 Thread Jesus Camacho Rodriguez (Jira)

Jesus Camacho Rodriguez created HIVE-22728:
--

 Summary: Limit the scope of uniqueness of constraint name to 
database
 Key: HIVE-22728
 URL: https://issues.apache.org/jira/browse/HIVE-22728
 Project: Hive
  Issue Type: Wish
Reporter: Jesus Camacho Rodriguez


Currently, constraint names are globally unique across all databases 
(assumption is that this may have done by design). Nevertheless, though 
behavior seems to be implementation specific, it would be interesting to limit 
the scope to uniqueness per database.
Currently we do not store database information with the constraints. To change 
the scope to one db, we would need to store the DB_ID in the KEY_CONSTRAINTS 
table in metastore when we create a constraint and add the DB_ID to the PRIMARY 
KEY of that table. Some minor changes to the error messages would be needed 
too, since otherwise it would be difficult to identify the correct violation in 
queries that span across multiple databases. Additionally, the SQL scripts will 
need to be updated to populate the DB_ID when we upgrade to new version.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-22589) Add storage support for ProlepticCalendar

2019-12-06 Thread Jesus Camacho Rodriguez (Jira)

Jesus Camacho Rodriguez created HIVE-22589:
--

 Summary: Add storage support for ProlepticCalendar
 Key: HIVE-22589
 URL: https://issues.apache.org/jira/browse/HIVE-22589
 Project: Hive
  Issue Type: Bug
  Components: storage-api
Reporter: Owen O'Malley
Assignee: László Bodor
 Fix For: 4.0.0, 3.1.3, storage-2.7.1


Hive recently moved its processing to the proleptic calendar, which has created 
some issues for users who have dates before 1580 AD.

I'd propose extending the column vectors for times & dates to encode which 
calendar they are using.

* create DateColumnVector that extends LongColumnVector
* add a method to change calendars to both DateColumnVector and 
TimestampColumnVector.

{code}
  /**
   * Change the calendar to or from proleptic. If the new and old values of the 
flag are the
   * same, nothing is done.
   * useProleptic - set the flag for the proleptic calendar
   * updateData - change the data to match the new value of the flag.
   */
  void changeCalendar(useProleptic: boolean, updateData: boolean);

  /**
   * Detect whether this data is using the proleptic calendar.
   */
  boolean usingProlepticCalendar();
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-22549) RS deduplication should not merge final aggregation without keys

2019-11-26 Thread Jesus Camacho Rodriguez (Jira)

Jesus Camacho Rodriguez created HIVE-22549:
--

 Summary: RS deduplication should not merge final aggregation 
without keys
 Key: HIVE-22549
 URL: https://issues.apache.org/jira/browse/HIVE-22549
 Project: Hive
  Issue Type: Bug
  Components: Physical Optimizer
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


This may lead to performance degradation. For instance, this can happen for the 
following query:

{code}
set hive.support.concurrency=true;
set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;

EXPLAIN
CREATE TABLE x STORED AS ORC TBLPROPERTIES('transactional'='true') AS
SELECT * FROM SRC x CLUSTER BY x.key;
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-22538) RS deduplication does not always enforce hive.optimize.reducededuplication.min.reducer

2019-11-25 Thread Jesus Camacho Rodriguez (Jira)

Jesus Camacho Rodriguez created HIVE-22538:
--

 Summary: RS deduplication does not always enforce 
hive.optimize.reducededuplication.min.reducer
 Key: HIVE-22538
 URL: https://issues.apache.org/jira/browse/HIVE-22538
 Project: Hive
  Issue Type: Bug
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


For transactional tables, that property might be overriden to 1, which can lead 
to merging final aggregation into a single stage (hence leading to performance 
degradation). For instance, when autogather column stats is enabled, this can 
happen for the following query:

{code}
set hive.support.concurrency=true;
set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;

EXPLAIN
CREATE TABLE x STORED AS ORC TBLPROPERTIES('transactional'='true') AS
SELECT * FROM SRC x CLUSTER BY x.key;
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-22532) PTFPPD may push limit incorrectly through Rank/DenseRank function

2019-11-22 Thread Jesus Camacho Rodriguez (Jira)

Jesus Camacho Rodriguez created HIVE-22532:
--

 Summary: PTFPPD may push limit incorrectly through Rank/DenseRank 
function
 Key: HIVE-22532
 URL: https://issues.apache.org/jira/browse/HIVE-22532
 Project: Hive
  Issue Type: Bug
  Components: Physical Optimizer
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-22486) Send only accessed columns for masking policies request

2019-11-12 Thread Jesus Camacho Rodriguez (Jira)

Jesus Camacho Rodriguez created HIVE-22486:
--

 Summary: Send only accessed columns for masking policies request
 Key: HIVE-22486
 URL: https://issues.apache.org/jira/browse/HIVE-22486
 Project: Hive
  Issue Type: Improvement
  Components: CBO
Affects Versions: 4.0.0
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


Currently, we send all columns for masking request, even if they are not 
accessed by the given query. We could send only those columns for which the 
masking policy will be necessary.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-22480) IndexOutOfBounds exception while reading ORC files written with empty positions list in first row index entry

2019-11-11 Thread Jesus Camacho Rodriguez (Jira)

Jesus Camacho Rodriguez created HIVE-22480:
--

 Summary: IndexOutOfBounds exception while reading ORC files 
written with empty positions list in first row index entry
 Key: HIVE-22480
 URL: https://issues.apache.org/jira/browse/HIVE-22480
 Project: Hive
  Issue Type: Bug
  Components: ORC
Affects Versions: 2.3.6, 1.2.2
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez
 Fix For: 1.3.0, 1.2.3, 2.4.0, 2.3.7


Although this should not happen, we may end up with empty positions list in 
first row index entry due to some bug (see ORC-569). Since positions in first 
row index are always zero, it would be good if the reader could still read 
these files instead of fail.

The error stack looks like this:
{code}
ERROR : FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, 
vertexId=vertex_1566395485735_11359_2_00, diagnostics=[Task failed, 
taskId=task_1566395485735_11359_2_00_00, diagnostics=[TaskAttempt 0 failed, 
info=[Error: Error while running task ( failure ) : 
attempt_1566395485735_11359_2_00_00_0:java.lang.RuntimeException: 
java.lang.RuntimeException: java.io.IOException: 
java.lang.IndexOutOfBoundsException: Index: 0
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:218)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:172)
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:377)
at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: java.io.IOException: 
java.lang.IndexOutOfBoundsException: Index: 0
at 
org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:206)
at 
org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.(TezGroupedSplitsInputFormat.java:145)
at 
org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat.getRecordReader(TezGroupedSplitsInputFormat.java:111)
at 
org.apache.tez.mapreduce.lib.MRReaderMapred.setupOldRecordReader(MRReaderMapred.java:157)
at org.apache.tez.mapreduce.lib.MRReaderMapred.setSplit(MRReaderMapred.java:83)
at 
org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:694)
at org.apache.tez.mapreduce.input.MRInput.initFromEvent(MRInput.java:653)
at 
org.apache.tez.mapreduce.input.MRInputLegacy.checkAndAwaitRecordReaderInitialization(MRInputLegacy.java:145)
at org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:109)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getMRInput(MapRecordProcessor.java:525)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:171)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:188)
... 14 more
Caused by: java.io.IOException: java.lang.IndexOutOfBoundsException: Index: 0
at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:380)
at 
org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:203)
... 25 more
Caused by: java.lang.IndexOutOfBoundsException: Index: 0
at java.util.Collections$EmptyList.get(Collections.java:4456)
at org.apache.orc.OrcProto$RowIndexEntry.getPositions(OrcProto.java:6867)
at 
org.apache.orc.impl.RecordReaderUtils.addRgFilteredStreamToRanges(RecordReaderUtils.java:257)
at 
org.apache.orc.impl.RecordReaderImpl.planReadPartialDataStreams(RecordReaderImpl.java:942)
at 
org.apache.orc.impl.RecordReaderImpl.readPartialDataStreams(RecordReaderImpl.java:979)
at

[jira] [Created] (HIVE-22430) Avoid creation of additional RS for limit if it is equal to zero

2019-10-29 Thread Jesus Camacho Rodriguez (Jira)

Jesus Camacho Rodriguez created HIVE-22430:
--

 Summary: Avoid creation of additional RS for limit if it is equal 
to zero
 Key: HIVE-22430
 URL: https://issues.apache.org/jira/browse/HIVE-22430
 Project: Hive
  Issue Type: Improvement
  Components: Query Planning
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-22396) CMV creating a Full ACID partitioned table fails because of no writeId

2019-10-23 Thread Jesus Camacho Rodriguez (Jira)

Jesus Camacho Rodriguez created HIVE-22396:
--

 Summary: CMV creating a Full ACID partitioned table fails because 
of no writeId
 Key: HIVE-22396
 URL: https://issues.apache.org/jira/browse/HIVE-22396
 Project: Hive
  Issue Type: Sub-task
  Components: HiveServer2, repl
Affects Versions: 4.0.0
Reporter: Ashutosh Bapat
Assignee: Ashutosh Bapat


create table t1(a int, b int);
insert into t1 values (1, 2), (3, 4);
create table t6_part partitioned by (a) stored as orc tblproperties 
("transactional"="true") as select * from t1;
ERROR : FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.MoveTask. MoveTask : Write id is not set in the 
config by open txn task for migration
Error: Error while processing statement: FAILED: Execution Error, return code 1 
from org.apache.hadoop.hive.ql.exec.MoveTask. MoveTask : Write id is not set in 
the config by open txn task for migration (state=08S01,code=1)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-22341) abortTxns statements should be executed in a single transaction

2019-10-14 Thread Jesus Camacho Rodriguez (Jira)

Jesus Camacho Rodriguez created HIVE-22341:
--

 Summary: abortTxns statements should be executed in a single 
transaction
 Key: HIVE-22341
 URL: https://issues.apache.org/jira/browse/HIVE-22341
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Jesus Camacho Rodriguez


Logic in `abortTxns` should be executed in a single transaction, rather than 
multiple ones. Otherwise, if you restart HMS between txn abort and the lock 
deletion, we end up with orphaned locks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-22339) Change default time for MVs refresh in registry

2019-10-14 Thread Jesus Camacho Rodriguez (Jira)

Jesus Camacho Rodriguez created HIVE-22339:
--

 Summary: Change default time for MVs refresh in registry
 Key: HIVE-22339
 URL: https://issues.apache.org/jira/browse/HIVE-22339
 Project: Hive
  Issue Type: Improvement
  Components: Materialized views
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


Default was set to 60secs in HIVE-21344. It seems it may be too aggressive; 
suggestion is to change default to 1500secs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-22314) Disable count distinct rewrite in Hive optimizer if it is already rewritten by Calcite

2019-10-09 Thread Jesus Camacho Rodriguez (Jira)

Jesus Camacho Rodriguez created HIVE-22314:
--

 Summary: Disable count distinct rewrite in Hive optimizer if it is 
already rewritten by Calcite
 Key: HIVE-22314
 URL: https://issues.apache.org/jira/browse/HIVE-22314
 Project: Hive
  Issue Type: Improvement
  Components: CBO, Logical Optimizer
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-22311) Propagate min/max column values from statistics to the optimizer for timestamp type

2019-10-09 Thread Jesus Camacho Rodriguez (Jira)

Jesus Camacho Rodriguez created HIVE-22311:
--

 Summary: Propagate min/max column values from statistics to the 
optimizer for timestamp type
 Key: HIVE-22311
 URL: https://issues.apache.org/jira/browse/HIVE-22311
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


Currently stats annotation does not consider timestamp type e.g. for estimates 
with range predicates.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-22310) Factor out common code from *ColumnStatsAggregator

2019-10-09 Thread Jesus Camacho Rodriguez (Jira)

Jesus Camacho Rodriguez created HIVE-22310:
--

 Summary: Factor out common code from *ColumnStatsAggregator
 Key: HIVE-22310
 URL: https://issues.apache.org/jira/browse/HIVE-22310
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Reporter: Jesus Camacho Rodriguez


There are different column stats aggregator instances for each different types, 
e.g., {{DateColumnStatsAggregator}}, {{LongColumnStatsAggregator}}, 
{{DoubleColumnStatsAggregator}}, etc. Much of the logic in those classes seems 
to be common or could be generalized and reused; we should move it into 
{{ColumnStatsAggregator}} parent class or a utility class.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-22309) Use finer granularity for different types in column stats

2019-10-09 Thread Jesus Camacho Rodriguez (Jira)

Jesus Camacho Rodriguez created HIVE-22309:
--

 Summary: Use finer granularity for different types in column stats
 Key: HIVE-22309
 URL: https://issues.apache.org/jira/browse/HIVE-22309
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Reporter: Jesus Camacho Rodriguez


For instance, for {{timestamp}} type we are throwing away precision since we 
store min/max in seconds since epoch (no millis nor nanos). This would include 
some changes in metastore tables that store column statistics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-22248) Min value for column in stats is not set correctly for some data types

2019-09-26 Thread Jesus Camacho Rodriguez (Jira)

Jesus Camacho Rodriguez created HIVE-22248:
--

 Summary: Min value for column in stats is not set correctly for 
some data types
 Key: HIVE-22248
 URL: https://issues.apache.org/jira/browse/HIVE-22248
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Reporter: Jesus Camacho Rodriguez
Assignee: Miklos Gergely


I am not sure whether the problem is printing the value or in the value stored 
in the metastore itself, but for some types (e.g. tinyint, smallint, int, 
bigint, double or float), the min value does not seem to be set correctly (set 
to 0).

https://github.com/apache/hive/blob/master/ql/src/test/results/clientpositive/alter_table_update_status.q.out#L342



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-22241) Implement UDF to convert a date/timestamp from Gregorian-Julian hybrid calendar to proleptic Gregorian calendar

2019-09-24 Thread Jesus Camacho Rodriguez (Jira)

Jesus Camacho Rodriguez created HIVE-22241:
--

 Summary: Implement UDF to convert a date/timestamp from 
Gregorian-Julian hybrid calendar to proleptic Gregorian calendar
 Key: HIVE-22241
 URL: https://issues.apache.org/jira/browse/HIVE-22241
 Project: Hive
  Issue Type: Improvement
  Components: UDF
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


UDF that converts a date/timestamp from *Gregorian-Julian hybrid* calendar, 
i.e., calendar that supports both the Julian and Gregorian calendar systems 
with the support of a single discontinuity, which corresponds by default to the 
Gregorian date when the Gregorian calendar was instituted, to *proleptic 
Gregorian calendar* (ISO 8601 standard), which is produced by extending the 
Gregorian calendar backward to dates preceding its official introduction in 
1582.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-22239) Scale data size using column value ranges

2019-09-24 Thread Jesus Camacho Rodriguez (Jira)

Jesus Camacho Rodriguez created HIVE-22239:
--

 Summary: Scale data size using column value ranges
 Key: HIVE-22239
 URL: https://issues.apache.org/jira/browse/HIVE-22239
 Project: Hive
  Issue Type: Improvement
  Components: Physical Optimizer
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


Currently, min/max values for columns are only used to determine whether a 
certain range filter falls out of range and thus filters all rows or none at 
all. If it does not, we just use a heuristic that the condition will filter 1/3 
of the input rows. Instead of using that heuristic, we can use another one that 
assumes that data will be uniformly distributed across that range, and 
calculate the selectivity for the condition accordingly.

This patch also includes the propagation of min/max column values from 
statistics to the optimizer for timestamp type.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-22232) NPE when hive.order.columnalignment is set to false

2019-09-23 Thread Jesus Camacho Rodriguez (Jira)

Jesus Camacho Rodriguez created HIVE-22232:
--

 Summary: NPE when hive.order.columnalignment is set to false
 Key: HIVE-22232
 URL: https://issues.apache.org/jira/browse/HIVE-22232
 Project: Hive
  Issue Type: Bug
  Components: CBO
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


When {{hive.order.columnalignment}} is disabled and the plan contains an 
Aggregate operator, we hit a NPE.

{code}
 java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:163)
at 
org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:111)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:1555)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:483)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12630)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:357)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:285)
at 
org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:175)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:285)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:522)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1385)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1332)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1327)
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:124)
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:217)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:242)
...
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-22219) Bringing a node manager down blocks restart of LLAP service

2019-09-20 Thread Jesus Camacho Rodriguez (Jira)

Jesus Camacho Rodriguez created HIVE-22219:
--

 Summary: Bringing a node manager down blocks restart of LLAP 
service
 Key: HIVE-22219
 URL: https://issues.apache.org/jira/browse/HIVE-22219
 Project: Hive
  Issue Type: Bug
  Components: llap
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


For YARN service, when number of running instances != number of desired 
instances, the service state may be in STARTED or FLEX (instead of STABLE). For 
Hive LLAP side, there is a config to control the threshold of service health 
check. The Hive LLAP code misses checking these states, which can result in the 
service not coming up even if the threshold is met.
https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/cli/status/LlapStatusServiceDriver.java#L382




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-22209) Creating a materialized view with no tables should be handled more gracefully

2019-09-16 Thread Jesus Camacho Rodriguez (Jira)

Jesus Camacho Rodriguez created HIVE-22209:
--

 Summary: Creating a materialized view with no tables should be 
handled more gracefully
 Key: HIVE-22209
 URL: https://issues.apache.org/jira/browse/HIVE-22209
 Project: Hive
  Issue Type: Bug
  Components: Materialized views
Reporter: Jesus Camacho Rodriguez
Assignee: John Sherman


Currently, materialized views without a table reference are not supported. 
However, instead of printing a clear message about it, when a materialized view 
is created without a table reference, we fail with an unclear message.

{code}
> create materialized view mv_test1 as select 5;
(...)
ERROR : FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Add request 
failed :
INSERT INTO MV_TABLES_USED (MV_CREATION_METADATA_ID,TBL_ID) VALUES (?,?) )
INFO : Completed executing 
command(queryId=hive_20190916203511_b609cccf-f5e3-45dd-abfd-6e869d94e39a); Time 
taken: 10.469 seconds
Error: Error while processing statement: FAILED: Execution Error, return code 1 
from org.apache.hadoop.hive.ql.exec.DDLTask. MetaExcep
tion(message:Add request failed : INSERT INTO MV_TABLES_USED 
(MV_CREATION_METADATA_ID,TBL_ID) VALUES (?,?) ) (state=08S01,code=1)
{code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Created] (HIVE-22204) Beeline option to show/not show execution report

2019-09-13 Thread Jesus Camacho Rodriguez (Jira)

Jesus Camacho Rodriguez created HIVE-22204:
--

 Summary: Beeline option to show/not show execution report
 Key: HIVE-22204
 URL: https://issues.apache.org/jira/browse/HIVE-22204
 Project: Hive
  Issue Type: Improvement
  Components: Beeline
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


Currently, {{--silent=true}} will also remove the short report about execution 
(includes number of rows returned by a query and execution time). It would be 
interesting to control whether we want to show that report even if 
{{--silent=true}}, e.g., using an option {{--report=true}}. Default (existing) 
behavior should not change.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Created] (HIVE-22200) Hash collision may cause column resolution to fail

2019-09-13 Thread Jesus Camacho Rodriguez (Jira)

Jesus Camacho Rodriguez created HIVE-22200:
--

 Summary: Hash collision may cause column resolution to fail
 Key: HIVE-22200
 URL: https://issues.apache.org/jira/browse/HIVE-22200
 Project: Hive
  Issue Type: Bug
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


{{ExprNodeDescUtils.getExprNodeColumnDesc}} extracts the {{ExprNodeColumnDesc}} 
(column descriptors) from an expression. In fact, it creates a map from hash to 
the object itself. It same hash value is generated for two different objects, 
this will result in a clash in the map and some expressions not being part of 
its values.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Created] (HIVE-22170) from_unixtime and unix_timestamp should use user session time zone

2019-09-04 Thread Jesus Camacho Rodriguez (Jira)

Jesus Camacho Rodriguez created HIVE-22170:
--

 Summary: from_unixtime and unix_timestamp should use user session 
time zone
 Key: HIVE-22170
 URL: https://issues.apache.org/jira/browse/HIVE-22170
 Project: Hive
  Issue Type: Bug
Affects Versions: 3.1.2, 3.1.1, 3.1.0, 4.0.0, 3.2.0
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


According to documentation, that is the expected behavior (since session time 
zone was not present, system time zone was being used previously). This was 
incorrectly changed by HIVE-12192 / HIVE-20007. This JIRA should fix this issue.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Created] (HIVE-22075) Fix HIVE-14200 properly

2019-08-01 Thread Jesus Camacho Rodriguez (JIRA)

Jesus Camacho Rodriguez created HIVE-22075:
--

 Summary: Fix HIVE-14200 properly
 Key: HIVE-22075
 URL: https://issues.apache.org/jira/browse/HIVE-22075
 Project: Hive
  Issue Type: Bug
Reporter: Jesus Camacho Rodriguez
Assignee: Gopal V






--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Created] (HIVE-22072) Altering table to make a column change does not update constraints references

2019-08-01 Thread Jesus Camacho Rodriguez (JIRA)

Jesus Camacho Rodriguez created HIVE-22072:
--

 Summary: Altering table to make a column change does not update 
constraints references
 Key: HIVE-22072
 URL: https://issues.apache.org/jira/browse/HIVE-22072
 Project: Hive
  Issue Type: Bug
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


The constraint will still point to old column descriptor incorrectly.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Created] (HIVE-22066) Upgrade Apache parent POM to version 21

2019-07-31 Thread Jesus Camacho Rodriguez (JIRA)

Jesus Camacho Rodriguez created HIVE-22066:
--

 Summary: Upgrade Apache parent POM to version 21
 Key: HIVE-22066
 URL: https://issues.apache.org/jira/browse/HIVE-22066
 Project: Hive
  Issue Type: Bug
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez






--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Created] (HIVE-22057) Early bailout in SharedWorkOptimizer if all tables are referenced only once

2019-07-26 Thread Jesus Camacho Rodriguez (JIRA)

Jesus Camacho Rodriguez created HIVE-22057:
--

 Summary: Early bailout in SharedWorkOptimizer if all tables are 
referenced only once
 Key: HIVE-22057
 URL: https://issues.apache.org/jira/browse/HIVE-22057
 Project: Hive
  Issue Type: Improvement
  Components: Physical Optimizer
Reporter: Jesus Camacho Rodriguez


In that case, there is no space for optimization, so we should bail out 
immediately and do not do any extra work.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Created] (HIVE-22046) Differentiate among column stats computed by different engines

2019-07-24 Thread Jesus Camacho Rodriguez (JIRA)

Jesus Camacho Rodriguez created HIVE-22046:
--

 Summary: Differentiate among column stats computed by different 
engines
 Key: HIVE-22046
 URL: https://issues.apache.org/jira/browse/HIVE-22046
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


The goal is to avoid computation of column stats by engines to step on each 
other, e.g., Hive and Impala. In longer term, we may introduce a common 
representation for the column statistics stored by different engines.

For this issue, we will add a new column 'engine' to TAB_COL_STATS HMS table 
(unpartitioned tables) and to PART_COL_STATS HMS table (partitioned tables). 
This will prevent conflicts at the column level stats.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Created] (HIVE-22042) Set hive.exec.dynamic.partition.mode=nonstrict by default

2019-07-24 Thread Jesus Camacho Rodriguez (JIRA)

Jesus Camacho Rodriguez created HIVE-22042:
--

 Summary: Set hive.exec.dynamic.partition.mode=nonstrict by default
 Key: HIVE-22042
 URL: https://issues.apache.org/jira/browse/HIVE-22042
 Project: Hive
  Issue Type: Bug
Reporter: Jesus Camacho Rodriguez






--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Created] (HIVE-22007) Do not push not supported types to specific JDBC sources from Calcite

2019-07-17 Thread Jesus Camacho Rodriguez (JIRA)

Jesus Camacho Rodriguez created HIVE-22007:
--

 Summary: Do not push not supported types to specific JDBC sources 
from Calcite
 Key: HIVE-22007
 URL: https://issues.apache.org/jira/browse/HIVE-22007
 Project: Hive
  Issue Type: Bug
  Components: CBO
Affects Versions: 4.0.0
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


We should not push a project expression if it uses a type that a specific 
dialect does not support, e.g., boolean in Oracle.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Created] (HIVE-22003) Shared work optimizer may leave semijoin branches in plan that are not used

2019-07-16 Thread Jesus Camacho Rodriguez (JIRA)

Jesus Camacho Rodriguez created HIVE-22003:
--

 Summary: Shared work optimizer may leave semijoin branches in plan 
that are not used
 Key: HIVE-22003
 URL: https://issues.apache.org/jira/browse/HIVE-22003
 Project: Hive
  Issue Type: Bug
  Components: Physical Optimizer
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


This may happen only when the TS are the only operators that are shared. Repro 
attached in q file.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Created] (HIVE-21976) Offset should be null instead of zero in Calcite HiveSortLimit

2019-07-09 Thread Jesus Camacho Rodriguez (JIRA)

Jesus Camacho Rodriguez created HIVE-21976:
--

 Summary: Offset should be null instead of zero in Calcite 
HiveSortLimit
 Key: HIVE-21976
 URL: https://issues.apache.org/jira/browse/HIVE-21976
 Project: Hive
  Issue Type: Bug
  Components: CBO
Affects Versions: 4.0.0
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


Calcite expects a value equal or greater than 1. Otherwise, it may generate SQL 
from a plan incorrectly ({{offset 0}}).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-21953) Enable CLUSTERED ON/DISTRIBUTED ON+SORTED ON in incremental rebuild of materialized views

2019-07-03 Thread Jesus Camacho Rodriguez (JIRA)

Jesus Camacho Rodriguez created HIVE-21953:
--

 Summary: Enable CLUSTERED ON/DISTRIBUTED ON+SORTED ON in 
incremental rebuild of materialized views
 Key: HIVE-21953
 URL: https://issues.apache.org/jira/browse/HIVE-21953
 Project: Hive
  Issue Type: Bug
  Components: Materialized views
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


Follow-up of HIVE-18842. For insert and insert branch in merge, we can 
introduce a RS to enforce these properties, as we do when we create the 
materialized view or execute a full rebuild. This will make delta files created 
for the insert to obey the same organization. If the increments are large 
enough, this may improve query execution performance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-21946) Consider data distribution of a materialized view in transparent rewriting

2019-07-02 Thread Jesus Camacho Rodriguez (JIRA)

Jesus Camacho Rodriguez created HIVE-21946:
--

 Summary: Consider data distribution of a materialized view in 
transparent rewriting
 Key: HIVE-21946
 URL: https://issues.apache.org/jira/browse/HIVE-21946
 Project: Hive
  Issue Type: Bug
  Components: Materialized views
Affects Versions: 4.0.0
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


Currently, we do consider partitioning of the original table, but we do not 
take into account data organization (DISTRIBUTE/SORT/CLUSTER) in the optimizer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-21945) Enable sorted dynamic partitioning optimization for materialized views with custom data organization

2019-07-02 Thread Jesus Camacho Rodriguez (JIRA)

Jesus Camacho Rodriguez created HIVE-21945:
--

 Summary: Enable sorted dynamic partitioning optimization for 
materialized views with custom data organization
 Key: HIVE-21945
 URL: https://issues.apache.org/jira/browse/HIVE-21945
 Project: Hive
  Issue Type: Bug
  Components: Materialized views
Affects Versions: 4.0.0
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


After implementing HIVE-18842, we need to extend the optimizer to work with 
partitioned materialized views that are created with custom data organization, 
i.e., using CLUSTERED, DISTRIBUTED, or SORTED. Currently, optimization bails 
out when the materialized view is partitioned and either CLUSTERED, 
DISTRIBUTED, or SORTED.
In particular, we will need to combine the RS operator introduced by the 
translation of these clauses with the new RS needed to distribute and sort the 
data based on the dynamic partition values. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-21928) Fix for statistics annotation in nested AND expressions

2019-06-27 Thread Jesus Camacho Rodriguez (JIRA)

Jesus Camacho Rodriguez created HIVE-21928:
--

 Summary: Fix for statistics annotation in nested AND expressions
 Key: HIVE-21928
 URL: https://issues.apache.org/jira/browse/HIVE-21928
 Project: Hive
  Issue Type: Bug
  Components: Physical Optimizer
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


Discovered while working on HIVE-21867. Having predicates with nested AND 
expressions may result in different stats, even if predicates are basically 
similar (from stats estimation standpoint).
For instance, stats for {{AND(x=5, true, true)}} are different from {{x=5}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-21872) Bucketed tables that load data from data/files/auto_sortmerge_join should be tagged as 'bucketing_version'='1'

2019-06-13 Thread Jesus Camacho Rodriguez (JIRA)

Jesus Camacho Rodriguez created HIVE-21872:
--

 Summary: Bucketed tables that load data from 
data/files/auto_sortmerge_join should be tagged as 'bucketing_version'='1'
 Key: HIVE-21872
 URL: https://issues.apache.org/jira/browse/HIVE-21872
 Project: Hive
  Issue Type: Bug
  Components: Test
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


It is incorrect to use version 2, since the data files were created with old 
hash function.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-21871) Multi-statement transactions in direct SQL

2019-06-13 Thread Jesus Camacho Rodriguez (JIRA)

Jesus Camacho Rodriguez created HIVE-21871:
--

 Summary: Multi-statement transactions in direct SQL
 Key: HIVE-21871
 URL: https://issues.apache.org/jira/browse/HIVE-21871
 Project: Hive
  Issue Type: Bug
  Components: Metastore, Standalone Metastore
Reporter: Jesus Camacho Rodriguez


To access metastore, we may bypass the JDO layer and query the metastore RDBMS 
directly (we refer to this as direct SQL path).

There are some methods in Hive metastore that may issue multiple queries 
against RDBMS to build the return objects (e.g. {{get_partitions_by_names}}). 
Currently going through direct SQL may issue each query to the RDBMS in a 
different transaction (while afaik going through JDO will create a single 
transaction to retrieve and compose such objects). This may lead to failures 
while running some operations concurrently, e.g., in the example above, if a 
partition is being dropped and partitions are being retrieved using direct SQL 
path.

A solution would be to execute all statements needed to retrieve the results 
for such a function within a single transaction when we use direct SQL path.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-21867) Sort semijoin conditions to accelerate query processing

2019-06-12 Thread Jesus Camacho Rodriguez (JIRA)

Jesus Camacho Rodriguez created HIVE-21867:
--

 Summary: Sort semijoin conditions to accelerate query processing
 Key: HIVE-21867
 URL: https://issues.apache.org/jira/browse/HIVE-21867
 Project: Hive
  Issue Type: Bug
  Components: CBO
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


Following approach similar to 
http://db.cs.berkeley.edu/jmh/miscpapers/sigmod93.pdf .

To reorder predicates in AND conditions, we could rank each of elements in the 
clauses in increasing order based on following formula:
{code}
rank = (selectivity - 1) / cost per tuple
{code}
Similarly, for OR conditions:
{code}
rank = (-selectivity) / cost per tuple
{code}
Selectivity can be computed with FilterSelectivityEstimator. For cost per 
tuple, we will need to come up with some heuristic based on how expensive is 
the evaluation of the functions contained in that predicate. Custom UDFs could 
be annotated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-21857) Sort conditions in a filter predicate to accelerate query processing

2019-06-10 Thread Jesus Camacho Rodriguez (JIRA)

Jesus Camacho Rodriguez created HIVE-21857:
--

 Summary: Sort conditions in a filter predicate to accelerate query 
processing
 Key: HIVE-21857
 URL: https://issues.apache.org/jira/browse/HIVE-21857
 Project: Hive
  Issue Type: Bug
  Components: CBO
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


To reorder predicates in AND conditions, we could rank each of elements in the 
clauses in increasing order based on following formula:
{code}
rank = (selectivity - 1) / cost per tuple
{code}
Similarly, for OR conditions:
{code}
rank = (-selectivity) / cost per tuple
{code}
Selectivity can be computed with FilterSelectivityEstimator. For cost per 
tuple, we will need to come up with some heuristic based on how expensive is 
the evaluation of the functions contained in that predicate. Custom UDFs could 
be annotated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-21834) Avoid unnecessary calls to simplify filter conditions

2019-06-04 Thread Jesus Camacho Rodriguez (JIRA)

Jesus Camacho Rodriguez created HIVE-21834:
--

 Summary: Avoid unnecessary calls to simplify filter conditions
 Key: HIVE-21834
 URL: https://issues.apache.org/jira/browse/HIVE-21834
 Project: Hive
  Issue Type: Bug
  Components: CBO
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


Every time we create a filter, we try to simplify its condition. However, we 
already have a rule that simplifies the expressions and it is within the same 
loop as most of the rules that end up creating new filters. Hence, it may seem 
we should be able to remove some of the calls to simplify those conditions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-21827) Multiple calls in Semantic

2019-06-03 Thread Jesus Camacho Rodriguez (JIRA)

Jesus Camacho Rodriguez created HIVE-21827:
--

 Summary: Multiple calls in Semantic
 Key: HIVE-21827
 URL: https://issues.apache.org/jira/browse/HIVE-21827
 Project: Hive
  Issue Type: Bug
Reporter: Jesus Camacho Rodriguez






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-21794) Add materialized view parameters to sqlStdAuthSafeVarNameRegexes

2019-05-24 Thread Jesus Camacho Rodriguez (JIRA)

Jesus Camacho Rodriguez created HIVE-21794:
--

 Summary: Add materialized view parameters to 
sqlStdAuthSafeVarNameRegexes
 Key: HIVE-21794
 URL: https://issues.apache.org/jira/browse/HIVE-21794
 Project: Hive
  Issue Type: Bug
  Components: CBO
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez
 Attachments: HIVE-21794.patch





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 932 matches

Mail list logo