[jira] [Created] (HIVE-23195) set hive.cluster.delegation.token.gc-interval to 15 minutes instead of an hour

2020-04-13 Thread Richard Zhang (Jira)
Richard Zhang created HIVE-23195:


 Summary: set hive.cluster.delegation.token.gc-interval to 15 
minutes instead of an hour
 Key: HIVE-23195
 URL: https://issues.apache.org/jira/browse/HIVE-23195
 Project: Hive
  Issue Type: Bug
Reporter: Richard Zhang
Assignee: Richard Zhang


the config "hive.cluster.delegation.token.gc-interval" is set as long duration, 
1 hour. This created some issues in a heavy loaded cluster in which the tokens 
may not be cleaned up fast enough and the cleaner thread may fail to clean the 
tokens. This may cause issues like eating too much space or LLAP startup 
failures, or slow system startup.

If this hive.cluster.delegation.token.gc-interval” is reduced from 1 hour to a 
relatively shorter period such as 15 mins, then the zookeeper tokens will be 
cleaned more timely and mitigate these issues.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23194) Use Queue Instead of List for CollectOperator

2020-04-13 Thread David Mollitor (Jira)
David Mollitor created HIVE-23194:
-

 Summary: Use Queue Instead of List for CollectOperator
 Key: HIVE-23194
 URL: https://issues.apache.org/jira/browse/HIVE-23194
 Project: Hive
  Issue Type: Improvement
Reporter: David Mollitor
Assignee: David Mollitor


https://github.com/apache/hive/blob/d6948a28ab3e34e5116591a60a96bdf031185e47/ql/src/java/org/apache/hadoop/hive/ql/exec/CollectOperator.java#L85-L88

{code:java|title=CollectOperator.java}
   rowList = new ArrayList();
...
} else {
  result.o = rowList.remove(0);
  result.oi = standardRowInspector;
}
{code}

Removing from the head of an {{ArrayList}} is an expensive operation because it 
needs to shift all of the elements down in the array for each call.  Better to 
use a {{Queue}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23193) Review of Subset of Debug Logging

2020-04-13 Thread David Mollitor (Jira)
David Mollitor created HIVE-23193:
-

 Summary: Review of Subset of Debug Logging
 Key: HIVE-23193
 URL: https://issues.apache.org/jira/browse/HIVE-23193
 Project: Hive
  Issue Type: Improvement
Reporter: David Mollitor
Assignee: David Mollitor


bq. Better yet, use parameterized messages
bq.  Will outperform the first form by a factor of at least 30, in case of a 
disabled logging statement.

http://www.slf4j.org/faq.html

* Use parameterized logging where appropriate
* Add logging guards {{if (Log.isDebugEnabled()}} around loops and complex 
debug message

Simplify the code, remove lines of code, and potentially increase performance



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23192) "default" database locationUri should be external warehouse root.

2020-04-13 Thread Naveen Gangam (Jira)
Naveen Gangam created HIVE-23192:


 Summary: "default" database locationUri should be external 
warehouse root.
 Key: HIVE-23192
 URL: https://issues.apache.org/jira/browse/HIVE-23192
 Project: Hive
  Issue Type: Sub-task
  Components: Hive
Affects Versions: 4.0.0
Reporter: Naveen Gangam
Assignee: Naveen Gangam


When creating the default database, the database locationUri should be set to 
external warehouse.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23191) Prevent redundant output descriptor config serialization

2020-04-13 Thread Mustafa Iman (Jira)
Mustafa Iman created HIVE-23191:
---

 Summary: Prevent redundant output descriptor config serialization
 Key: HIVE-23191
 URL: https://issues.apache.org/jira/browse/HIVE-23191
 Project: Hive
  Issue Type: Improvement
Reporter: Mustafa Iman
Assignee: Mustafa Iman


{code:java}
DagUtils#createVertex(JobConf, BaseWork, Path,
 TezWork, Map){code}
creates an output descriptor if it is leaf vertex. It uses the same config 
object that is used in processor descriptor. It should not create payload from 
scratch when processor descriptor has the identical payload.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23190) LLAP: modify IndexCache to pass filesystem object to TezSpillRecord

2020-04-13 Thread Jira
László Bodor created HIVE-23190:
---

 Summary: LLAP: modify IndexCache to pass filesystem object to 
TezSpillRecord
 Key: HIVE-23190
 URL: https://issues.apache.org/jira/browse/HIVE-23190
 Project: Hive
  Issue Type: Bug
Reporter: László Bodor






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23189) Change Explain ANALYZE to Explain PROFILE

2020-04-13 Thread David Mollitor (Jira)
David Mollitor created HIVE-23189:
-

 Summary: Change Explain ANALYZE to Explain PROFILE
 Key: HIVE-23189
 URL: https://issues.apache.org/jira/browse/HIVE-23189
 Project: Hive
  Issue Type: Improvement
Reporter: David Mollitor
Assignee: David Mollitor


{code:none}
EXPLAIN [EXTENDED|CBO|AST|DEPENDENCY|AUTHORIZATION|LOCKS|VECTORIZATION|ANALYZE] 
query
{code}

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Explain#LanguageManualExplain-TheANALYZEClause

In Hive, there is an {{EXPLAIN ANALYZE}} query.  This can get a bit confusing 
because you can run an {{EXPLAIN ANALYZE}} against an {{ANALYZE TABLE}} 
statement, so you have something like,...

{code:sql}
EXPLAIN ANALYZE ANALYZE TABLE `myTable` COMPUTE STATISTICS;
{code}

I would like to propose that the name be changed to {{EXPLAIN PROFILE}}.  This 
borrows from Apache Impala because it has a {{PROFILE}} command which produces 
the stats that actually occurred during the query run (much like this Hive 
feature).




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23188) Allow STATS Token in Analyze Table

2020-04-13 Thread David Mollitor (Jira)
David Mollitor created HIVE-23188:
-

 Summary: Allow STATS Token in Analyze Table
 Key: HIVE-23188
 URL: https://issues.apache.org/jira/browse/HIVE-23188
 Project: Hive
  Issue Type: Improvement
Reporter: David Mollitor
Assignee: David Mollitor






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23187) Make TABLE Token Optional in ANALYZE Statement

2020-04-13 Thread David Mollitor (Jira)
David Mollitor created HIVE-23187:
-

 Summary: Make TABLE Token Optional in ANALYZE Statement
 Key: HIVE-23187
 URL: https://issues.apache.org/jira/browse/HIVE-23187
 Project: Hive
  Issue Type: Improvement
Reporter: David Mollitor
Assignee: David Mollitor






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Remove REGEX Column Specification

2020-04-13 Thread David Mollitor
Hello Gang,

I've been tracking a lot of issues recently regarding qualified tables
names, qualified table names, table names using back ticks, and other
similar circumstances.

I've looked into trying to address some of these and noted that these issue
goes way back and are go all the way down to the core of Hive.

To start with, I wanted to use the ANTLR grammar to address some of these
issues and to standardize behavior across all queries.  For example, there
is currently a patch that disallows table names from having a 'dot' in the
name.  I'm not 100% sure it applies to all queries, so  I wanted to codify
this restriction in the parser grammar.  So it got me looking at the
grammar.

In parallel, I also tried to build a supplemental parser in Java for
parsing table names (HIVE-23150) and I was hitting some weird, and
confusing, edge cases bubbling up from the parser.  I eventually traced it
back to the fact that there are a lot of weird rules around table names in
the grammar including something called "REGEX Column Specification."

This feature is problematic as it blindly labels most table names as being
a regex.  It really should only apply to column names, but the grammar
defines a table name as also possibly being a regex. There is a lot of
ambiguity because a table named "a" could be a literal value or a legal
regex.  When a table name is defined as a regex, a different code path is
taken from when a table name is considered to be a literal value. Where I
first saw this issue was in a qtest where a table name `s/c` was producing
a different result than a table named `s+c`.

This regex feature is not something I've seen in MySQL or Postgres.  In
MySQL, any table name surrounded with a back tick can be just about any
UTF-8 character, so it's not really feasible to tell, without some kind of
SQL hint, that this table name is a regex or a literal value.

This feature adds a lot of ambiguity and complexity, it is not supported by
other major RDBMS, and it adds only very minor benefit.  I also hope to
move Hive in a direction of fully supporting UTF-8.

I have put a patch up to remove it:
https://issues.apache.org/jira/browse/HIVE-23183


References:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select#LanguageManualSelect-REGEXColumnSpecification


https://dev.mysql.com/doc/refman/8.0/en/identifiers.html


Thanks,
David


[jira] [Created] (HIVE-23186) Strict Check SemanticException Should Properly Quote Table Name

2020-04-13 Thread David Mollitor (Jira)
David Mollitor created HIVE-23186:
-

 Summary: Strict Check SemanticException Should Properly Quote 
Table Name
 Key: HIVE-23186
 URL: https://issues.apache.org/jira/browse/HIVE-23186
 Project: Hive
  Issue Type: Improvement
Reporter: David Mollitor
Assignee: David Mollitor


https://github.com/apache/hive/blob/029cab297a9ae40d249f63040721f93857398648/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java#L191-L192

{code:java}
throw new SemanticException(error + " No partition predicate for Alias 
\""
+ alias + "\" Table \"" + tab.getTableName() + "\"");
{code}

Use back ticks and use the database name as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)