[jira] [Created] (HIVE-23195) set hive.cluster.delegation.token.gc-interval to 15 minutes instead of an hour
Richard Zhang created HIVE-23195: Summary: set hive.cluster.delegation.token.gc-interval to 15 minutes instead of an hour Key: HIVE-23195 URL: https://issues.apache.org/jira/browse/HIVE-23195 Project: Hive Issue Type: Bug Reporter: Richard Zhang Assignee: Richard Zhang the config "hive.cluster.delegation.token.gc-interval" is set as long duration, 1 hour. This created some issues in a heavy loaded cluster in which the tokens may not be cleaned up fast enough and the cleaner thread may fail to clean the tokens. This may cause issues like eating too much space or LLAP startup failures, or slow system startup. If this hive.cluster.delegation.token.gc-interval” is reduced from 1 hour to a relatively shorter period such as 15 mins, then the zookeeper tokens will be cleaned more timely and mitigate these issues. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23194) Use Queue Instead of List for CollectOperator
David Mollitor created HIVE-23194: - Summary: Use Queue Instead of List for CollectOperator Key: HIVE-23194 URL: https://issues.apache.org/jira/browse/HIVE-23194 Project: Hive Issue Type: Improvement Reporter: David Mollitor Assignee: David Mollitor https://github.com/apache/hive/blob/d6948a28ab3e34e5116591a60a96bdf031185e47/ql/src/java/org/apache/hadoop/hive/ql/exec/CollectOperator.java#L85-L88 {code:java|title=CollectOperator.java} rowList = new ArrayList(); ... } else { result.o = rowList.remove(0); result.oi = standardRowInspector; } {code} Removing from the head of an {{ArrayList}} is an expensive operation because it needs to shift all of the elements down in the array for each call. Better to use a {{Queue}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23193) Review of Subset of Debug Logging
David Mollitor created HIVE-23193: - Summary: Review of Subset of Debug Logging Key: HIVE-23193 URL: https://issues.apache.org/jira/browse/HIVE-23193 Project: Hive Issue Type: Improvement Reporter: David Mollitor Assignee: David Mollitor bq. Better yet, use parameterized messages bq. Will outperform the first form by a factor of at least 30, in case of a disabled logging statement. http://www.slf4j.org/faq.html * Use parameterized logging where appropriate * Add logging guards {{if (Log.isDebugEnabled()}} around loops and complex debug message Simplify the code, remove lines of code, and potentially increase performance -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23192) "default" database locationUri should be external warehouse root.
Naveen Gangam created HIVE-23192: Summary: "default" database locationUri should be external warehouse root. Key: HIVE-23192 URL: https://issues.apache.org/jira/browse/HIVE-23192 Project: Hive Issue Type: Sub-task Components: Hive Affects Versions: 4.0.0 Reporter: Naveen Gangam Assignee: Naveen Gangam When creating the default database, the database locationUri should be set to external warehouse. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23191) Prevent redundant output descriptor config serialization
Mustafa Iman created HIVE-23191: --- Summary: Prevent redundant output descriptor config serialization Key: HIVE-23191 URL: https://issues.apache.org/jira/browse/HIVE-23191 Project: Hive Issue Type: Improvement Reporter: Mustafa Iman Assignee: Mustafa Iman {code:java} DagUtils#createVertex(JobConf, BaseWork, Path, TezWork, Map){code} creates an output descriptor if it is leaf vertex. It uses the same config object that is used in processor descriptor. It should not create payload from scratch when processor descriptor has the identical payload. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23190) LLAP: modify IndexCache to pass filesystem object to TezSpillRecord
László Bodor created HIVE-23190: --- Summary: LLAP: modify IndexCache to pass filesystem object to TezSpillRecord Key: HIVE-23190 URL: https://issues.apache.org/jira/browse/HIVE-23190 Project: Hive Issue Type: Bug Reporter: László Bodor -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23189) Change Explain ANALYZE to Explain PROFILE
David Mollitor created HIVE-23189: - Summary: Change Explain ANALYZE to Explain PROFILE Key: HIVE-23189 URL: https://issues.apache.org/jira/browse/HIVE-23189 Project: Hive Issue Type: Improvement Reporter: David Mollitor Assignee: David Mollitor {code:none} EXPLAIN [EXTENDED|CBO|AST|DEPENDENCY|AUTHORIZATION|LOCKS|VECTORIZATION|ANALYZE] query {code} https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Explain#LanguageManualExplain-TheANALYZEClause In Hive, there is an {{EXPLAIN ANALYZE}} query. This can get a bit confusing because you can run an {{EXPLAIN ANALYZE}} against an {{ANALYZE TABLE}} statement, so you have something like,... {code:sql} EXPLAIN ANALYZE ANALYZE TABLE `myTable` COMPUTE STATISTICS; {code} I would like to propose that the name be changed to {{EXPLAIN PROFILE}}. This borrows from Apache Impala because it has a {{PROFILE}} command which produces the stats that actually occurred during the query run (much like this Hive feature). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23188) Allow STATS Token in Analyze Table
David Mollitor created HIVE-23188: - Summary: Allow STATS Token in Analyze Table Key: HIVE-23188 URL: https://issues.apache.org/jira/browse/HIVE-23188 Project: Hive Issue Type: Improvement Reporter: David Mollitor Assignee: David Mollitor -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23187) Make TABLE Token Optional in ANALYZE Statement
David Mollitor created HIVE-23187: - Summary: Make TABLE Token Optional in ANALYZE Statement Key: HIVE-23187 URL: https://issues.apache.org/jira/browse/HIVE-23187 Project: Hive Issue Type: Improvement Reporter: David Mollitor Assignee: David Mollitor -- This message was sent by Atlassian Jira (v8.3.4#803005)
Remove REGEX Column Specification
Hello Gang, I've been tracking a lot of issues recently regarding qualified tables names, qualified table names, table names using back ticks, and other similar circumstances. I've looked into trying to address some of these and noted that these issue goes way back and are go all the way down to the core of Hive. To start with, I wanted to use the ANTLR grammar to address some of these issues and to standardize behavior across all queries. For example, there is currently a patch that disallows table names from having a 'dot' in the name. I'm not 100% sure it applies to all queries, so I wanted to codify this restriction in the parser grammar. So it got me looking at the grammar. In parallel, I also tried to build a supplemental parser in Java for parsing table names (HIVE-23150) and I was hitting some weird, and confusing, edge cases bubbling up from the parser. I eventually traced it back to the fact that there are a lot of weird rules around table names in the grammar including something called "REGEX Column Specification." This feature is problematic as it blindly labels most table names as being a regex. It really should only apply to column names, but the grammar defines a table name as also possibly being a regex. There is a lot of ambiguity because a table named "a" could be a literal value or a legal regex. When a table name is defined as a regex, a different code path is taken from when a table name is considered to be a literal value. Where I first saw this issue was in a qtest where a table name `s/c` was producing a different result than a table named `s+c`. This regex feature is not something I've seen in MySQL or Postgres. In MySQL, any table name surrounded with a back tick can be just about any UTF-8 character, so it's not really feasible to tell, without some kind of SQL hint, that this table name is a regex or a literal value. This feature adds a lot of ambiguity and complexity, it is not supported by other major RDBMS, and it adds only very minor benefit. I also hope to move Hive in a direction of fully supporting UTF-8. I have put a patch up to remove it: https://issues.apache.org/jira/browse/HIVE-23183 References: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select#LanguageManualSelect-REGEXColumnSpecification https://dev.mysql.com/doc/refman/8.0/en/identifiers.html Thanks, David
[jira] [Created] (HIVE-23186) Strict Check SemanticException Should Properly Quote Table Name
David Mollitor created HIVE-23186: - Summary: Strict Check SemanticException Should Properly Quote Table Name Key: HIVE-23186 URL: https://issues.apache.org/jira/browse/HIVE-23186 Project: Hive Issue Type: Improvement Reporter: David Mollitor Assignee: David Mollitor https://github.com/apache/hive/blob/029cab297a9ae40d249f63040721f93857398648/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java#L191-L192 {code:java} throw new SemanticException(error + " No partition predicate for Alias \"" + alias + "\" Table \"" + tab.getTableName() + "\""); {code} Use back ticks and use the database name as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)