[jira] [Updated] (HIVE-23124) Review of SQLOperation Class

2020-04-21 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HIVE-23124:
--
Attachment: (was: HIVE-23124.2.patch)

> Review of SQLOperation Class
> 
>
> Key: HIVE-23124
> URL: https://issues.apache.org/jira/browse/HIVE-23124
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
> Attachments: HIVE-23124.1.patch
>
>
> * Use ConcurrentHashMap instead of synchronized methods to improve 
> multi-threaded access
>  * Use JDK 8 facilities where applicable
>  * General cleanup
>  * Better log messages and Exception messages
>  * Use {{switch}} statement instead of if/else blocks
>  * Checkstyle fixes



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23124) Review of SQLOperation Class

2020-04-21 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HIVE-23124:
--
Attachment: HIVE-23124.2.patch

> Review of SQLOperation Class
> 
>
> Key: HIVE-23124
> URL: https://issues.apache.org/jira/browse/HIVE-23124
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
> Attachments: HIVE-23124.1.patch, HIVE-23124.2.patch
>
>
> * Use ConcurrentHashMap instead of synchronized methods to improve 
> multi-threaded access
>  * Use JDK 8 facilities where applicable
>  * General cleanup
>  * Better log messages and Exception messages
>  * Use {{switch}} statement instead of if/else blocks
>  * Checkstyle fixes



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23176) Remove SELECT REGEX Column Feature

2020-04-21 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HIVE-23176:
--
Labels: backwards-incompatible  (was: )

> Remove SELECT REGEX Column Feature
> --
>
> Key: HIVE-23176
> URL: https://issues.apache.org/jira/browse/HIVE-23176
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: backwards-incompatible
> Attachments: HIVE-23176.1.patch, HIVE-23176.2.patch, 
> HIVE-23176.3.patch, HIVE-23176.4.patch, HIVE-23176.4.patch, HIVE-23176.4.patch
>
>
> Remove the Hive feature: REGEX Column.
>  
> Hive has this interesting feature for doing REGEX to SELECT multiple columns. 
>  This needs to go.  It is not SQL standard and as currently implemented, it 
> is impossible to determine if a column identifier is a REGEX or the actual 
> name of the column.  If a column name is enclosed in back ticks then any 
> UTF-8 character is a valid table name.
>  
> [https://dev.mysql.com/doc/refman/8.0/en/identifiers.html]
> [https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23258) Remove BoneCP Connection Pool

2020-04-21 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17088679#comment-17088679
 ] 

David Mollitor commented on HIVE-23258:
---

[~szita] Yes.  I have added the label.  Like I said, the default in Hive 3 is 
already HikariCP, so we are talking about a very small pool of people that 
bothered to explicitly configure this.

> Remove BoneCP Connection Pool
> -
>
> Key: HIVE-23258
> URL: https://issues.apache.org/jira/browse/HIVE-23258
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: backwards-incompatible
> Fix For: 4.0.0
>
> Attachments: HIVE-23258.1.patch
>
>
> {quote}
> BoneCP is a Java JDBC connection pool implementation that is tuned for high 
> performance by minimizing lock contention to give greater throughput for your 
> application ... but SHOULD NOW BE CONSIDERED DEPRECATED in favour of HikariCP.
> {quote}
> https://github.com/wwadge/bonecp
> The default in Hive 3.x is already HikariCP, so just remove BoneCP in 4.x
> https://github.com/apache/hive/blob/branch-3.1/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java#L392



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23258) Remove BoneCP Connection Pool

2020-04-21 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HIVE-23258:
--
Labels: backwards-incompatible  (was: )

> Remove BoneCP Connection Pool
> -
>
> Key: HIVE-23258
> URL: https://issues.apache.org/jira/browse/HIVE-23258
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: backwards-incompatible
> Fix For: 4.0.0
>
> Attachments: HIVE-23258.1.patch
>
>
> {quote}
> BoneCP is a Java JDBC connection pool implementation that is tuned for high 
> performance by minimizing lock contention to give greater throughput for your 
> application ... but SHOULD NOW BE CONSIDERED DEPRECATED in favour of HikariCP.
> {quote}
> https://github.com/wwadge/bonecp
> The default in Hive 3.x is already HikariCP, so just remove BoneCP in 4.x
> https://github.com/apache/hive/blob/branch-3.1/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java#L392



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23243) Accept SQL type like pattern for Show Databases

2020-04-20 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17088036#comment-17088036
 ] 

David Mollitor commented on HIVE-23243:
---

+1 on [^HIVE-23243.01.patch] pending tests

 

Spoke with [~mgergely] directly.  We will go with this option for now, but 
there should be another Jira created to improve this.

 

This patch loses the benefit of the current setup in that the wildcard is not 
passed all the way down to the RDBMS (as it is now), but instead it requests an 
entire list of Databases and then filters at HS2 level.  It would be nice if 
this filter could also be pushed all the way down to the RDBMS, however, the 
current HMS API only allows the simple use case of having a '*' in the lookup, 
probably to support this feature.  There is currently no easy (or accurate) way 
to map the SQL wildcard syntax to this simplified wildcard setup.  HMS needs a 
more robust, fully featured, capability to generate a list of databases based 
on SQL wild card semantics.

> Accept SQL type like pattern for Show Databases
> ---
>
> Key: HIVE-23243
> URL: https://issues.apache.org/jira/browse/HIVE-23243
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Minor
> Attachments: HIVE-23243.01.patch
>
>
> Show Databases pattern accepts java like pattern with * and ., use SQL like 
> instead with % and _.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23243) Accept SQL type like pattern for Show Databases

2020-04-20 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17087945#comment-17087945
 ] 

David Mollitor commented on HIVE-23243:
---

{code:java}
System.out.println("" + likePatternToRegExp("defaul%"));
// Output: \Qd\E\Qe\E\Qf\E\Qa\E\Qu\E\Ql\E.*?
{code}

> Accept SQL type like pattern for Show Databases
> ---
>
> Key: HIVE-23243
> URL: https://issues.apache.org/jira/browse/HIVE-23243
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Minor
> Attachments: HIVE-23243.01.patch
>
>
> Show Databases pattern accepts java like pattern with * and ., use SQL like 
> instead with % and _.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23243) Accept SQL type like pattern for Show Databases

2020-04-20 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17087944#comment-17087944
 ] 

David Mollitor commented on HIVE-23243:
---

Ya, I took a look at:

https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFLike.java#L64

It generates a pretty complicated {{Pattern}} and I'm not sure why, but it 
generates a Regex and that is not compatible with DataNucleaus.

{quote}
Only the following regular expression patterns are required to be supported and 
are portable: global “(?i)” for case-insensitive matches; and “.” and “.*” for 
wild card matches. The pattern passed to matches must be a literal or parameter.
{quote}

http://www.datanucleus.org/products/accessplatform/jdo/query.html

I would rather see you change this {{UDFLike}} method to generate something 
that is DN-friendly or create a new method and continue to use the 
{{getDatabasesByPattern}} instead of making a new code path.

> Accept SQL type like pattern for Show Databases
> ---
>
> Key: HIVE-23243
> URL: https://issues.apache.org/jira/browse/HIVE-23243
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Minor
> Attachments: HIVE-23243.01.patch
>
>
> Show Databases pattern accepts java like pattern with * and ., use SQL like 
> instead with % and _.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-21242) Calcite Planner Logging Indicates UTF-16 Encoding

2020-04-20 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor resolved HIVE-21242.
---
Resolution: Not A Problem

> Calcite Planner Logging Indicates UTF-16 Encoding
> -
>
> Key: HIVE-21242
> URL: https://issues.apache.org/jira/browse/HIVE-21242
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO
>Affects Versions: 4.0.0, 3.2.0
>Reporter: David Mollitor
>Priority: Major
>
> I noticed some debug logging from calcite and it is using UTF-16.   I would 
> expect UTF-8.
> {code}
> 2019-02-10T19:08:06,393 DEBUG [7db4d3c5-0f88-49db-88fa-ad6428c23784 main] 
> parse.CalcitePlanner: Plan after decorrelation:
> HiveSortLimit(offset=[0], fetch=[2])
>   HiveProject(_o__c0=[array(3, 2, 1)], _o__c1=[map(1, 2001-01-01, 2, null)], 
> _o__c2=[named_struct(_UTF-16LE'c1', 123456, _UTF-16LE'c2', _UTF-16LE'hello', 
> _UTF-16LE'c3', array(_UTF-16LE'aa', _UTF-16LE'bb', _UTF-16LE'cc'), 
> _UTF-16LE'c4', map(_UTF-16LE'abc', 123, _UTF-16LE'xyz', 456), _UTF-16LE'c5', 
> named_struct(_UTF-16LE'c5_1', _UTF-16LE'bye', _UTF-16LE'c5_2', 88))])
> HiveTableScan(table=[[default, src]], table:alias=[src])
> {code}
> I'm not sure if this is a calcite internal thing which can be configured or 
> if this only an artifact of the way the logging works.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23243) Accept SQL type like pattern for Show Databases

2020-04-20 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17087852#comment-17087852
 ] 

David Mollitor commented on HIVE-23243:
---

[~mgergely] OK, I looked into this.

The ORM library that HMS uses is DataNucleus and for this kind of use, it:

bq.  follows the rules of java.lang.String.matches

So it actually accepts a Java Regex string.  I think your code should call 
{{.toString()}} on the {{Pattern}} being created and pass that to 
{{getDatabasesByPattern}}

> Accept SQL type like pattern for Show Databases
> ---
>
> Key: HIVE-23243
> URL: https://issues.apache.org/jira/browse/HIVE-23243
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Minor
> Attachments: HIVE-23243.01.patch
>
>
> Show Databases pattern accepts java like pattern with * and ., use SQL like 
> instead with % and _.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23243) Accept SQL type like pattern for Show Databases

2020-04-20 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17087786#comment-17087786
 ] 

David Mollitor commented on HIVE-23243:
---

[~mgergely] Ya, I don't know what the right move is...  I think from an API 
perspective, it's best to pass around a {{Pattern}} object.  But if HMS wants 
to push the filter all the way down to the RDBMS, it would have to convert it 
back to a SQL-style expression.  It's probably best to start with the HMS doing 
a {{SELECT *}} and filtering with the Pattern object.  It would allow for full 
Regex and it be universally supported across all RDBMS.  If that's too slow, 
there can be further consideration to push it all the way down to the RDBMS.

> Accept SQL type like pattern for Show Databases
> ---
>
> Key: HIVE-23243
> URL: https://issues.apache.org/jira/browse/HIVE-23243
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Minor
> Attachments: HIVE-23243.01.patch
>
>
> Show Databases pattern accepts java like pattern with * and ., use SQL like 
> instead with % and _.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23258) Remove BoneCP Connection Pool

2020-04-20 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HIVE-23258:
--
Attachment: HIVE-23258.1.patch

> Remove BoneCP Connection Pool
> -
>
> Key: HIVE-23258
> URL: https://issues.apache.org/jira/browse/HIVE-23258
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
> Attachments: HIVE-23258.1.patch
>
>
> {quote}
> BoneCP is a Java JDBC connection pool implementation that is tuned for high 
> performance by minimizing lock contention to give greater throughput for your 
> application ... but SHOULD NOW BE CONSIDERED DEPRECATED in favour of HikariCP.
> {quote}
> https://github.com/wwadge/bonecp
> The default in Hive 3.x is already HikariCP, so just remove BoneCP in 4.x
> https://github.com/apache/hive/blob/branch-3.1/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java#L392



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23258) Remove BoneCP Connection Pool

2020-04-20 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HIVE-23258:
--
Status: Patch Available  (was: Open)

> Remove BoneCP Connection Pool
> -
>
> Key: HIVE-23258
> URL: https://issues.apache.org/jira/browse/HIVE-23258
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
> Attachments: HIVE-23258.1.patch
>
>
> {quote}
> BoneCP is a Java JDBC connection pool implementation that is tuned for high 
> performance by minimizing lock contention to give greater throughput for your 
> application ... but SHOULD NOW BE CONSIDERED DEPRECATED in favour of HikariCP.
> {quote}
> https://github.com/wwadge/bonecp
> The default in Hive 3.x is already HikariCP, so just remove BoneCP in 4.x
> https://github.com/apache/hive/blob/branch-3.1/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java#L392



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23258) Remove BoneCP Connection Pool

2020-04-20 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor reassigned HIVE-23258:
-


> Remove BoneCP Connection Pool
> -
>
> Key: HIVE-23258
> URL: https://issues.apache.org/jira/browse/HIVE-23258
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>
> {quote}
> BoneCP is a Java JDBC connection pool implementation that is tuned for high 
> performance by minimizing lock contention to give greater throughput for your 
> application ... but SHOULD NOW BE CONSIDERED DEPRECATED in favour of HikariCP.
> {quote}
> https://github.com/wwadge/bonecp
> The default in Hive 3.x is already HikariCP, so just remove BoneCP in 4.x
> https://github.com/apache/hive/blob/branch-3.1/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java#L392



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-6336) hive 12 : local mode and datanucleus incompatability with org.apache.hadoop.hive.contrib.serde2.RegexSerDe

2020-04-20 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-6336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor resolved HIVE-6336.
--
Resolution: Won't Fix

Please use {{org.apache.hadoop.hive.serde2.RegexSerDe}} and if that is still 
causing errors, open a new JIRA.  Thanks.

> hive 12 : local mode and datanucleus incompatability with 
> org.apache.hadoop.hive.contrib.serde2.RegexSerDe
> --
>
> Key: HIVE-6336
> URL: https://issues.apache.org/jira/browse/HIVE-6336
> Project: Hive
>  Issue Type: Wish
>  Components: HiveServer2
>Affects Versions: 0.12.0
> Environment:  Hadoop 2.2  local derby Meatastore embedded
>Reporter: Nigel Savage
>Priority: Minor
>  Labels: HADOOP
>
> There is an with hive 12 datanucleus incompatability which seems to have 
> invompatibility with org.apache.hadoop.hive.contrib.serde2.RegexSerDe
> The main question: 
> *IF hive 0.12.0  and datanucleus are compatabile, then what is the version of 
> datanucleus I should be using with Hive 12 and Hadoop 2.2?*
> The error which Im getting (this blocks me from properly running hive queries 
> invoked from the "test" phase of a maven project)
> *To reproduce*
> I have hadoop and hive  running as a pseudo cluster local mode and derby as 
> the metastore
> I have the following environment variables
> {noformat}
> HADOOP_HOME=/home/ubu/hadoop
> JAVA_HOME=/usr/lib/jvm/java-7-oracle
> {noformat}
> I have the RegexSerDe declared in the hive-site.xml
> {noformat}
> 
> hive.aux.jars.path
> file:///home/ubu/hadoop/lib/hive-contrib-0.12.0.jar 
> This JAR file  available to all users for 
> alljobs
> 
> {noformat}
> If I run with
> {noformat} 
> 3.0.2 
> {noformat}
> I get the following 1 exception only
> 'java.lang.ClassNotFoundException...org.datanucleus.store.types.backed.Ma'  
> HOWEVER, If I run with  
> {noformat}
> 3.2.0-release 
> {noformat}
> I get the following 1 exception exception only
> java.lang.ClassNotFoundException:
> org/apache/hadoop/hive/contrib/serde2/RegexSerDe 
> EXPLANATION 
> The RegexSerDe class is picked up at run time but the datanucleus Map class 
> is not available, I have checked in the datanucleus-core 3.0.2 jar and it is 
> missing,  Upgrading to the first datanucleus above 3.0.2 that includes the 
> Map class throws the ClassNotFoundException for RegexSerDe. 
> The earlier *3.0.2* datanucleus, code fails with the missing Map class but 
> the RegexSerDe class is found, then when I upgrade to the 
> 3.2.0-release the Map class is found but for some unkown reason the code/Hive 
> no longer finds the RegexSerDe class
> I started using the same datanucleus dependencies found in this hive pom
> http://maven-repository.com/artifact/org.apache.hive/hive-metastore/0.12.0/pom
> below are the dependencies my latest attempts to get a functioning pom
> {noformat}
> 
> org.apache.hbase
> hbase-server
> 0.96.0-hadoop2
> 
> 
> org.apache.hbase
> hbase-client
> 0.96.0-hadoop2
> 
> 
> 
> org.apache.commons
> commons-lang3
> 3.1
> 
> 
> com.google.guava
> guava
> ${guava.version}
> 
> 
> org.apache.derby
> derby
> ${derby.version}
> 
> 
> org.datanucleus
> datanucleus-core
> ${datanucleus.version}
> 
> 
> org.datanucleus
> datanucleus-rdbms
> ${datanucleus-rdbms.version}
> 
> 
> javax.jdo
> jdo-api
> 3.0.1
> 
> 
> org.datanucleus
> datanucleus-api-jdo
> ${datanucleus.jdo.version}
> 
> 
> javax.jdo
> jdo2-api
> 
> 
> junit
> junit
> 
> 
> log4j
> log4j
> 
> 
> 
> 
> 
> org.apache.hadoop
> hadoop-client
> ${hadoop.version}
> 
> 
> 
> org.apache.hive
> hive-common
> ${hive.version}
> provided
> 
> 
> org.apache.hive
> hive-serde
> ${hive.version}
> provided
> 
> 
> org.apache.hive
> hive-exec
> ${hive.version}
> provided
> 
> 
> org.apache.hive
>

[jira] [Commented] (HIVE-4897) Hive should handle AlreadyExists on retries when creating tables/partitions

2020-04-20 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-4897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17087746#comment-17087746
 ] 

David Mollitor commented on HIVE-4897:
--

I think this is often achieved with an 'updated' timestamp value in the 
database schema.  When a request is generated, the client puts in a timestamp.  
If the operation fails and is re-submitted, if the 'updated' timestamp matches 
the requests then a success is returned, otherwise, if the timestamp in HMS is 
older, than the retry happens again.

> Hive should handle AlreadyExists on retries when creating tables/partitions
> ---
>
> Key: HIVE-4897
> URL: https://issues.apache.org/jira/browse/HIVE-4897
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Aihua Xu
>Priority: Major
> Attachments: hive-snippet.log
>
>
> Creating new tables/partitions may fail with an AlreadyExistsException if 
> there is an error part way through the creation and the HMS tries again 
> without properly cleaning up or checking if this is a retry.
> While partitioning a new table via a script on distributed hive (MetaStore on 
> the same machine) there was a long timeout and then:
> {code}
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask. 
> AlreadyExistsException(message:Partition already exists:Partition( ...
> {code}
> I am assuming this is due to retry. Perhaps already-exists on retry could be 
> handled better.
> A similar error occurred while creating a table through Impala, which issued 
> a single createTable call that failed with an AlreadyExistsException. See the 
> logs related to table tmp_proc_8_d2b7b0f133be455ca95615818b8a5879_7 in the 
> attached hive-snippet.log



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-4897) Hive should handle AlreadyExists on retries when creating tables/partitions

2020-04-20 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-4897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17087746#comment-17087746
 ] 

David Mollitor edited comment on HIVE-4897 at 4/20/20, 1:47 PM:


I think this is often achieved with an 'updated' timestamp value in the 
database schema.  When a request is generated, the client puts in a timestamp.  
If the operation fails and is re-submitted, if the 'updated' timestamp matches 
the requests then a success is returned, otherwise, if the timestamp in HMS is 
older, than the retry happens again.  And obvious the 'updated' value is 
updated to match the request's timestamp on success.


was (Author: belugabehr):
I think this is often achieved with an 'updated' timestamp value in the 
database schema.  When a request is generated, the client puts in a timestamp.  
If the operation fails and is re-submitted, if the 'updated' timestamp matches 
the requests then a success is returned, otherwise, if the timestamp in HMS is 
older, than the retry happens again.

> Hive should handle AlreadyExists on retries when creating tables/partitions
> ---
>
> Key: HIVE-4897
> URL: https://issues.apache.org/jira/browse/HIVE-4897
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Aihua Xu
>Priority: Major
> Attachments: hive-snippet.log
>
>
> Creating new tables/partitions may fail with an AlreadyExistsException if 
> there is an error part way through the creation and the HMS tries again 
> without properly cleaning up or checking if this is a retry.
> While partitioning a new table via a script on distributed hive (MetaStore on 
> the same machine) there was a long timeout and then:
> {code}
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask. 
> AlreadyExistsException(message:Partition already exists:Partition( ...
> {code}
> I am assuming this is due to retry. Perhaps already-exists on retry could be 
> handled better.
> A similar error occurred while creating a table through Impala, which issued 
> a single createTable call that failed with an AlreadyExistsException. See the 
> logs related to table tmp_proc_8_d2b7b0f133be455ca95615818b8a5879_7 in the 
> attached hive-snippet.log



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-16362) Flaky test: TestMiniLlapLocalCliDriver.testCliDriver[vector_count_distinct]

2020-04-20 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-16362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17087743#comment-17087743
 ] 

David Mollitor commented on HIVE-16362:
---

[~mgergely] Is this still an issue?  I know you've been looking at HoS tests.

> Flaky test: TestMiniLlapLocalCliDriver.testCliDriver[vector_count_distinct]
> ---
>
> Key: HIVE-16362
> URL: https://issues.apache.org/jira/browse/HIVE-16362
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sahil Takiar
>Priority: Major
>
> Jenkins Link: 
> https://builds.apache.org/job/PreCommit-HIVE-Build/4520/testReport/org.apache.hadoop.hive.cli/TestMiniLlapLocalCliDriver/testCliDriver_vector_count_distinct_/
> I see this in the {{hive.log}} file:
> {code}
> 2017-04-03T13:23:19,766 DEBUG [c11b6d76-6e01-4371-88fd-b89c9e420bb4 main] 
> metadata.Hive: Cancelling 30 dynamic loading tasks
> 2017-04-03T13:23:19,768 ERROR [c11b6d76-6e01-4371-88fd-b89c9e420bb4 main] 
> exec.Task: Failed with exception Exception when loading 30 in table web_sales 
> with 
> loadPath=file:/home/hiveptest/35.184.244.143-hiveptest-0/apache-github-source-source/itests/qtest/target/localfs/warehouse/web_sales/.hive-staging_hive_2017-04-03_13-23-18_143_3140419982719291367-1/-ext-1
> org.apache.hadoop.hive.ql.metadata.HiveException: Exception when loading 30 
> in table web_sales with 
> loadPath=file:/home/hiveptest/35.184.244.143-hiveptest-0/apache-github-source-source/itests/qtest/target/localfs/warehouse/web_sales/.hive-staging_hive_2017-04-03_13-23-18_143_3140419982719291367-1/-ext-1
>   at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadDynamicPartitions(Hive.java:1963)
>   at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:432)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2184)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1840)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1527)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1236)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1226)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:336)
>   at 
> org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:1338)
>   at 
> org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:1312)
>   at 
> org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:173)
>   at 
> org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:104)
>   at 
> org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:59)
>   at sun.reflect.GeneratedMethodAccessor188.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.apache.hadoop.hive.cli.control.CliAdapter$2$1.evaluate(CliAdapter.java:92)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
>   at org.junit.runners.Suite.runChild(Suite.java:127)
>   at org.junit.runners.Suite.runChild(Suite.java:26)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
>   at 

[jira] [Commented] (HIVE-23243) Accept SQL type like pattern for Show Databases

2020-04-20 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17087718#comment-17087718
 ] 

David Mollitor commented on HIVE-23243:
---

{code:java|title=ShowDatabasesOperation.java}

List databases = context.getDb().getAllDatabases();
Pattern pattern = 
Pattern.compile(UDFLike.likePatternToRegExp(desc.getPattern()));

/*/

databases = context.getDb().getDatabasesByPattern(desc.getPattern());
{code}

Currently, the client passes the pattern to the HMS and I am assuming that HMS 
does the filtering on its side so only the relevant results are returned.  With 
this change, the entire list of databases are returned and then filtered on the 
client side.

I'd like to know what the plan is for the existing HMS API.  Does it need to be 
deprecated and removed?  Does the API need to be changed to accept a generic 
{{Pattern}} instead of of a {{String}}?

> Accept SQL type like pattern for Show Databases
> ---
>
> Key: HIVE-23243
> URL: https://issues.apache.org/jira/browse/HIVE-23243
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Minor
> Attachments: HIVE-23243.01.patch
>
>
> Show Databases pattern accepts java like pattern with * and ., use SQL like 
> instead with % and _.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23113) Clean Up HiveCallableStatement

2020-04-20 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HIVE-23113:
--
Fix Version/s: 4.0.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Pushed to master.  Thanks for the review [~pvary]!

> Clean Up HiveCallableStatement
> --
>
> Key: HIVE-23113
> URL: https://issues.apache.org/jira/browse/HIVE-23113
> Project: Hive
>  Issue Type: Improvement
>  Components: JDBC
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Trivial
> Fix For: 4.0.0
>
> Attachments: HIVE-23113.1.patch, HIVE-23113.1.patch
>
>
> * Add a useful class comment
>  * Remove all non-javadoc comments
>  * Remove 'TODO' tags
>  * Add {{@override}} tags
>  * Checkstyle formatting



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23176) Remove SELECT REGEX Column Feature

2020-04-20 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17087702#comment-17087702
 ] 

David Mollitor commented on HIVE-23176:
---

[~ashutoshc] Any thoughts on this one?

> Remove SELECT REGEX Column Feature
> --
>
> Key: HIVE-23176
> URL: https://issues.apache.org/jira/browse/HIVE-23176
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
> Attachments: HIVE-23176.1.patch, HIVE-23176.2.patch, 
> HIVE-23176.3.patch, HIVE-23176.4.patch, HIVE-23176.4.patch, HIVE-23176.4.patch
>
>
> Remove the Hive feature: REGEX Column.
>  
> Hive has this interesting feature for doing REGEX to SELECT multiple columns. 
>  This needs to go.  It is not SQL standard and as currently implemented, it 
> is impossible to determine if a column identifier is a REGEX or the actual 
> name of the column.  If a column name is enclosed in back ticks then any 
> UTF-8 character is a valid table name.
>  
> [https://dev.mysql.com/doc/refman/8.0/en/identifiers.html]
> [https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23177) Upgrade to ANTLR4

2020-04-16 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17084941#comment-17084941
 ] 

David Mollitor commented on HIVE-23177:
---

HIVE-15577 documents some of the limitations of ANTLR3 that Hive is starting to 
come up against.

> Upgrade to ANTLR4
> -
>
> Key: HIVE-23177
> URL: https://issues.apache.org/jira/browse/HIVE-23177
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Priority: Major
>
> Upgrade Hive to ANTL4, ANTLR3 lost support many moons ago.
> This is going to be a big lift.  Many of the Hive rules use the "rule 
> rewrite" feature which no longer exists in ANLTR4 and it must be completely 
> re-implemented:
> https://stackoverflow.com/questions/14565794/antlr-4-tree-inject-rewrite-operator



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23099) Improve Logger for Operation Child Classes

2020-04-16 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HIVE-23099:
--
Attachment: HIVE-23099.2.patch

> Improve Logger for Operation Child Classes
> --
>
> Key: HIVE-23099
> URL: https://issues.apache.org/jira/browse/HIVE-23099
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
> Attachments: HIVE-23099.1.patch, HIVE-23099.2.patch
>
>
> The {{Operation}} class declares its logger this way:
> {code:java|title=Operation.java}
> public abstract class Operation {
>   public static final Logger LOG = 
> LoggerFactory.getLogger(Operation.class.getName());
>   ...
> }
> {code}
> Notice that this is an {{abstract}} class, but the {{Logger}} is tied to the 
> {{Operation.class.getName()}}.  This means that logging cannot be controlled 
> for each subclass of {{Operation}} independently since they all use the same 
> static {{Logger}} instance.
> Make the LOG a {{protected}} instance variable that inherits the name of the 
> child class.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23171) Create Tool To Visualize Hive Parser Tree

2020-04-16 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HIVE-23171:
--
Fix Version/s: 4.0.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Pushed to master! Thank you for the review [~mgergely]!

> Create Tool To Visualize Hive Parser Tree
> -
>
> Key: HIVE-23171
> URL: https://issues.apache.org/jira/browse/HIVE-23171
> Project: Hive
>  Issue Type: Sub-task
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
> Fix For: 4.0.0
>
> Attachments: HIVE-23171.1.patch, HIVE-23171.1.patch, 
> HIVE-23171.1.patch, HIVE-23171.1.patch, select_1.png
>
>
> For some of the work I would like to do on HIVE-23149, it would be nice to 
> visualize the output of the statement parser.
> I have created a tool that spits out the parser tree in DOT file format. This 
> allows it to be visualized using a plethora of tools.
> To use it, compile the \{{hive-parser}} test JAR and run it.  The application 
> takes a single command line argument of a String.  The String is the SQL 
> statement to parse:
> {code:none}
> HqlParser "SELECT 1"
> {code}
> I have attached an example of the output that I generated for a {{SELECT 1}} 
> statement:
>  
>  
> !select_1.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23117) Review of HiveStatement Class

2020-04-15 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17084102#comment-17084102
 ] 

David Mollitor commented on HIVE-23117:
---

[~ngangam] Are you the JDBC point person now? :)

> Review of HiveStatement Class
> -
>
> Key: HIVE-23117
> URL: https://issues.apache.org/jira/browse/HIVE-23117
> Project: Hive
>  Issue Type: Improvement
>  Components: JDBC
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
> Attachments: HIVE-23117.1.patch
>
>
> * Remove unused instance variable(s)
>  * Remove non-JavaDoc comments
>  * Make inPlaceUpdateStream Optional (and remove NO-OP class) (inconsistent 
> behavior with 'null' values)
>  * {{getQueryTimeout()}} returns incorrect value
>  * Unify and improve Exception messages
>  * Checkstyle fixes



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-15577) Simplify current parser

2020-04-15 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-15577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17084090#comment-17084090
 ] 

David Mollitor commented on HIVE-15577:
---

Another option is to upgrade to ANTLR4, but this is a big lift: [HIVE-23177]

> Simplify current parser
> ---
>
> Key: HIVE-15577
> URL: https://issues.apache.org/jira/browse/HIVE-15577
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
>Priority: Major
>
> We encountered "code too large" problem frequently. We need to reduce the 
> code size.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23176) Remove SELECT REGEX Column Feature

2020-04-15 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HIVE-23176:
--
Attachment: HIVE-23176.4.patch

> Remove SELECT REGEX Column Feature
> --
>
> Key: HIVE-23176
> URL: https://issues.apache.org/jira/browse/HIVE-23176
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
> Attachments: HIVE-23176.1.patch, HIVE-23176.2.patch, 
> HIVE-23176.3.patch, HIVE-23176.4.patch, HIVE-23176.4.patch, HIVE-23176.4.patch
>
>
> Remove the Hive feature: REGEX Column.
>  
> Hive has this interesting feature for doing REGEX to SELECT multiple columns. 
>  This needs to go.  It is not SQL standard and as currently implemented, it 
> is impossible to determine if a column identifier is a REGEX or the actual 
> name of the column.  If a column name is enclosed in back ticks then any 
> UTF-8 character is a valid table name.
>  
> [https://dev.mysql.com/doc/refman/8.0/en/identifiers.html]
> [https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23176) Remove SELECT REGEX Column Feature

2020-04-15 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HIVE-23176:
--
Attachment: (was: HIVE-23176.4.patch)

> Remove SELECT REGEX Column Feature
> --
>
> Key: HIVE-23176
> URL: https://issues.apache.org/jira/browse/HIVE-23176
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
> Attachments: HIVE-23176.1.patch, HIVE-23176.2.patch, 
> HIVE-23176.3.patch, HIVE-23176.4.patch, HIVE-23176.4.patch, HIVE-23176.4.patch
>
>
> Remove the Hive feature: REGEX Column.
>  
> Hive has this interesting feature for doing REGEX to SELECT multiple columns. 
>  This needs to go.  It is not SQL standard and as currently implemented, it 
> is impossible to determine if a column identifier is a REGEX or the actual 
> name of the column.  If a column name is enclosed in back ticks then any 
> UTF-8 character is a valid table name.
>  
> [https://dev.mysql.com/doc/refman/8.0/en/identifiers.html]
> [https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23098) Allow Operation assertState to Accept a Collection

2020-04-14 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HIVE-23098:
--
Fix Version/s: 4.0.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Pushed to master!  Thanks again for the review [~ngangam]!

> Allow Operation assertState to Accept a Collection
> --
>
> Key: HIVE-23098
> URL: https://issues.apache.org/jira/browse/HIVE-23098
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
> Fix For: 4.0.0
>
> Attachments: HIVE-23098.1.patch, HIVE-23098.2.patch, 
> HIVE-23098.2.patch, HIVE-23098.2.patch, HIVE-23098.3.patch
>
>
> {code:java|title=Operation.java}
>   protected final void assertState(List states) throws 
> HiveSQLException {
> if (!states.contains(state)) {
>   throw new HiveSQLException("Expected states: " + states.toString() + ", 
> but found "
>   + this.state);
> }
> this.lastAccessTime = System.currentTimeMillis();
>   }
> /*/
> public void someMethod() {
> assertState(new 
> ArrayList(Arrays.asList(OperationState.FINISHED)));
> }
> {code}
> By allowing {{assertState}} to accept a {{Collection}}, one can save an 
> allocation and simplify the code:
> {code:java}
> assertState(Collections.singleton(OperationState.FINISHED));
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23183) Make TABLE Token Optional in TRUNCATE Statement

2020-04-14 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HIVE-23183:
--
Fix Version/s: 4.0.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Pushed to master! Thanks [~mgergely] for your review (and for your help more 
broadly).

> Make TABLE Token Optional in TRUNCATE Statement
> ---
>
> Key: HIVE-23183
> URL: https://issues.apache.org/jira/browse/HIVE-23183
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
> Fix For: 4.0.0
>
> Attachments: HIVE-23183.1.patch, HIVE-23183.1.patch, 
> HIVE-23183.1.patch
>
>
> {code:none}
> TRUNCATE [TABLE] tbl_name
> {code}
> https://dev.mysql.com/doc/refman/8.0/en/truncate-table.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-23176) Remove SELECT REGEX Column Feature

2020-04-14 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17083236#comment-17083236
 ] 

David Mollitor edited comment on HIVE-23176 at 4/14/20, 1:42 PM:
-

[~kgyrtkirk] Thanks for the feedback.

This feature is not standard.

 I discussed the motivation here:

[http://mail-archives.apache.org/mod_mbox/hive-dev/202004.mbox/%3CCAPCi2CmUSVUPkMRgxUQBs6QFosj4Yjr7w51n0_teAqBcZvZHSw%40mail.gmail.com%3E]

There are two primary concerns:
 * If Hive is going to support UTF-8 in the same way other major vendors do, 
then there are almost no restrictions to what characters can be in a object 
identifier, so it is not possible to simply "detect" and is therefore ambiguous 
if a user wanted to use a Regex or a complex table name.
 * This feature accidentally added a bunch of weird edge cases where object 
identifier parsing takes different code paths

This feature could be interesting, though since it's not a SQL standard, it's a 
bit of a Hive-only shortcut which can cause interoperability problems, but it 
is not currently implemented in a great way. It should not be reflected in the 
actual grammar of the SQL parser. To do implement such a feature, it would make 
sense that it be:
 * Not part of the grammar
 * Configurable (enabled/disabled) for interpreting the literal object 
identifiers supplied in the SQL statement in the Java parser code
 * Applies only to back ticked object identifiers that are ASCII-only


was (Author: belugabehr):
[~kgyrtkirk] Thanks for the feedback.

This feature is not standard.

 I discussed the motivation here:

[http://mail-archives.apache.org/mod_mbox/hive-dev/202004.mbox/%3CCAPCi2CmUSVUPkMRgxUQBs6QFosj4Yjr7w51n0_teAqBcZvZHSw%40mail.gmail.com%3E]

There are two primary concerns:
* If Hive is going to support UTF-8 in the same way other major vendors do, 
then there are almost no restrictions to what characters can be in a object 
identifier, so it is not possible to simply "detect" and is therefore ambiguous 
if a user wanted to use a Regex or a complex table name.
* This feature accidentally added a bunch of weird edge cases where object 
identifier parsing takes different code paths

This feature could be interesting, though since it's not a SQL standard, it's a 
bit of a Hive-only shortcut which can cause interoperability problems, but it 
is not currently implemented in a great way.  It should not be reflected in the 
actual grammar of the SQL parser.  To do implement such a feature, it would 
make sense that it be:

* Not part of the grammar
* Configurable (enabled/disabled)
* Applies only to back ticked object identifiers that are ASCII-only

> Remove SELECT REGEX Column Feature
> --
>
> Key: HIVE-23176
> URL: https://issues.apache.org/jira/browse/HIVE-23176
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
> Attachments: HIVE-23176.1.patch, HIVE-23176.2.patch, 
> HIVE-23176.3.patch, HIVE-23176.4.patch, HIVE-23176.4.patch, HIVE-23176.4.patch
>
>
> Remove the Hive feature: REGEX Column.
>  
> Hive has this interesting feature for doing REGEX to SELECT multiple columns. 
>  This needs to go.  It is not SQL standard and as currently implemented, it 
> is impossible to determine if a column identifier is a REGEX or the actual 
> name of the column.  If a column name is enclosed in back ticks then any 
> UTF-8 character is a valid table name.
>  
> [https://dev.mysql.com/doc/refman/8.0/en/identifiers.html]
> [https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-21354) Lock The Entire Table If Majority Of Partitions Are Locked

2020-04-14 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17083243#comment-17083243
 ] 

David Mollitor commented on HIVE-21354:
---

[~pvary]

{code:none}
 _  _
| )/ )
 \\ |//,' __
 (")(_)-"()))=-
(\\
 _   _
  HEELP ( | / )
  \\ \|/,' __
\_o_/ (")(_)-"()))=-
   ) <\\
  /\__
_ \ 
{code} 

> Lock The Entire Table If Majority Of Partitions Are Locked
> --
>
> Key: HIVE-21354
> URL: https://issues.apache.org/jira/browse/HIVE-21354
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 4.0.0, 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>
> One of the bottlenecks of any Hive query is the ZooKeeper locking mechanism.  
> When a Hive query interacts with a table which has a lot of partitions, this 
> may put a lot of stress on the ZK system.
> Please add a heuristic that works like this:
> # Count the number of partitions that a query is required to lock
> # Obtain the total number of partitions in the table
> # If the number of partitions accessed by the query is greater than or equal 
> to half the total number of partitions, simply create one ZNode lock at the 
> table level.
> This would improve performance of many queries, but in particular, a {{select 
> count(1) from table}} ... or ... {{select * from table limit 5}} where the 
> table has many partitions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23176) Remove SELECT REGEX Column Feature

2020-04-14 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HIVE-23176:
--
Attachment: HIVE-23176.4.patch

> Remove SELECT REGEX Column Feature
> --
>
> Key: HIVE-23176
> URL: https://issues.apache.org/jira/browse/HIVE-23176
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
> Attachments: HIVE-23176.1.patch, HIVE-23176.2.patch, 
> HIVE-23176.3.patch, HIVE-23176.4.patch, HIVE-23176.4.patch, HIVE-23176.4.patch
>
>
> Remove the Hive feature: REGEX Column.
>  
> Hive has this interesting feature for doing REGEX to SELECT multiple columns. 
>  This needs to go.  It is not SQL standard and as currently implemented, it 
> is impossible to determine if a column identifier is a REGEX or the actual 
> name of the column.  If a column name is enclosed in back ticks then any 
> UTF-8 character is a valid table name.
>  
> [https://dev.mysql.com/doc/refman/8.0/en/identifiers.html]
> [https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23176) Remove SELECT REGEX Column Feature

2020-04-14 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17083239#comment-17083239
 ] 

David Mollitor commented on HIVE-23176:
---

bq.  is there any other feature which could be used instead?

In MySQL, one can query {{information_schema}} to get the column names

> Remove SELECT REGEX Column Feature
> --
>
> Key: HIVE-23176
> URL: https://issues.apache.org/jira/browse/HIVE-23176
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
> Attachments: HIVE-23176.1.patch, HIVE-23176.2.patch, 
> HIVE-23176.3.patch, HIVE-23176.4.patch, HIVE-23176.4.patch
>
>
> Remove the Hive feature: REGEX Column.
>  
> Hive has this interesting feature for doing REGEX to SELECT multiple columns. 
>  This needs to go.  It is not SQL standard and as currently implemented, it 
> is impossible to determine if a column identifier is a REGEX or the actual 
> name of the column.  If a column name is enclosed in back ticks then any 
> UTF-8 character is a valid table name.
>  
> [https://dev.mysql.com/doc/refman/8.0/en/identifiers.html]
> [https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23176) Remove SELECT REGEX Column Feature

2020-04-14 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17083236#comment-17083236
 ] 

David Mollitor commented on HIVE-23176:
---

[~kgyrtkirk] Thanks for the feedback.

This feature is not standard.

 I discussed the motivation here:

[http://mail-archives.apache.org/mod_mbox/hive-dev/202004.mbox/%3CCAPCi2CmUSVUPkMRgxUQBs6QFosj4Yjr7w51n0_teAqBcZvZHSw%40mail.gmail.com%3E]

There are two primary concerns:
* If Hive is going to support UTF-8 in the same way other major vendors do, 
then there are almost no restrictions to what characters can be in a object 
identifier, so it is not possible to simply "detect" and is therefore ambiguous 
if a user wanted to use a Regex or a complex table name.
* This feature accidentally added a bunch of weird edge cases where object 
identifier parsing takes different code paths

This feature could be interesting, though since it's not a SQL standard, it's a 
bit of a Hive-only shortcut which can cause interoperability problems, but it 
is not currently implemented in a great way.  It should not be reflected in the 
actual grammar of the SQL parser.  To do implement such a feature, it would 
make sense that it be:

* Not part of the grammar
* Configurable (enabled/disabled)
* Applies only to back ticked object identifiers that are ASCII-only

> Remove SELECT REGEX Column Feature
> --
>
> Key: HIVE-23176
> URL: https://issues.apache.org/jira/browse/HIVE-23176
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
> Attachments: HIVE-23176.1.patch, HIVE-23176.2.patch, 
> HIVE-23176.3.patch, HIVE-23176.4.patch, HIVE-23176.4.patch
>
>
> Remove the Hive feature: REGEX Column.
>  
> Hive has this interesting feature for doing REGEX to SELECT multiple columns. 
>  This needs to go.  It is not SQL standard and as currently implemented, it 
> is impossible to determine if a column identifier is a REGEX or the actual 
> name of the column.  If a column name is enclosed in back ticks then any 
> UTF-8 character is a valid table name.
>  
> [https://dev.mysql.com/doc/refman/8.0/en/identifiers.html]
> [https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-23187) Make TABLE Token Optional in ANALYZE Statement

2020-04-14 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor resolved HIVE-23187.
---
Resolution: Won't Fix

MySQL requires the {{TABLE}} keyword, so should Hive.

 

[https://dev.mysql.com/doc/refman/8.0/en/analyze-table.html]

> Make TABLE Token Optional in ANALYZE Statement
> --
>
> Key: HIVE-23187
> URL: https://issues.apache.org/jira/browse/HIVE-23187
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>
> {code:none}
> ANALYZE [TABLE] Table1 PARTITION(ds='2008-04-09', hr=11) COMPUTE STATISTICS;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (HIVE-23189) Change Explain ANALYZE to Explain PROFILE

2020-04-14 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor reopened HIVE-23189:
---

> Change Explain ANALYZE to Explain PROFILE
> -
>
> Key: HIVE-23189
> URL: https://issues.apache.org/jira/browse/HIVE-23189
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
> Attachments: HIVE-23189.1.patch
>
>
> {code:none}
> EXPLAIN 
> [EXTENDED|CBO|AST|DEPENDENCY|AUTHORIZATION|LOCKS|VECTORIZATION|ANALYZE] query
> {code}
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Explain#LanguageManualExplain-TheANALYZEClause
> In Hive, there is an {{EXPLAIN ANALYZE}} query.  This can get a bit confusing 
> because you can run an {{EXPLAIN ANALYZE}} against an {{ANALYZE TABLE}} 
> statement, so you have something like,...
> {code:sql}
> EXPLAIN ANALYZE ANALYZE TABLE `myTable` COMPUTE STATISTICS;
> {code}
> I would like to propose that the name be changed to {{EXPLAIN PROFILE}}.  
> This borrows from Apache Impala because it has a {{PROFILE}} command which 
> produces the stats that actually occurred during the query run (much like 
> this Hive feature).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23189) Change Explain ANALYZE to Explain PROFILE

2020-04-14 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HIVE-23189:
--
Resolution: Not A Bug
Status: Resolved  (was: Patch Available)

> Change Explain ANALYZE to Explain PROFILE
> -
>
> Key: HIVE-23189
> URL: https://issues.apache.org/jira/browse/HIVE-23189
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
> Attachments: HIVE-23189.1.patch
>
>
> {code:none}
> EXPLAIN 
> [EXTENDED|CBO|AST|DEPENDENCY|AUTHORIZATION|LOCKS|VECTORIZATION|ANALYZE] query
> {code}
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Explain#LanguageManualExplain-TheANALYZEClause
> In Hive, there is an {{EXPLAIN ANALYZE}} query.  This can get a bit confusing 
> because you can run an {{EXPLAIN ANALYZE}} against an {{ANALYZE TABLE}} 
> statement, so you have something like,...
> {code:sql}
> EXPLAIN ANALYZE ANALYZE TABLE `myTable` COMPUTE STATISTICS;
> {code}
> I would like to propose that the name be changed to {{EXPLAIN PROFILE}}.  
> This borrows from Apache Impala because it has a {{PROFILE}} command which 
> produces the stats that actually occurred during the query run (much like 
> this Hive feature).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-23189) Change Explain ANALYZE to Explain PROFILE

2020-04-14 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor resolved HIVE-23189.
---
Resolution: Not A Problem

> Change Explain ANALYZE to Explain PROFILE
> -
>
> Key: HIVE-23189
> URL: https://issues.apache.org/jira/browse/HIVE-23189
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
> Attachments: HIVE-23189.1.patch
>
>
> {code:none}
> EXPLAIN 
> [EXTENDED|CBO|AST|DEPENDENCY|AUTHORIZATION|LOCKS|VECTORIZATION|ANALYZE] query
> {code}
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Explain#LanguageManualExplain-TheANALYZEClause
> In Hive, there is an {{EXPLAIN ANALYZE}} query.  This can get a bit confusing 
> because you can run an {{EXPLAIN ANALYZE}} against an {{ANALYZE TABLE}} 
> statement, so you have something like,...
> {code:sql}
> EXPLAIN ANALYZE ANALYZE TABLE `myTable` COMPUTE STATISTICS;
> {code}
> I would like to propose that the name be changed to {{EXPLAIN PROFILE}}.  
> This borrows from Apache Impala because it has a {{PROFILE}} command which 
> produces the stats that actually occurred during the query run (much like 
> this Hive feature).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23189) Change Explain ANALYZE to Explain PROFILE

2020-04-14 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17083216#comment-17083216
 ] 

David Mollitor commented on HIVE-23189:
---

[~kgyrtkirk] Thanks for your input.

I do see now the error of my ways.  I was working on [HIVE-23187] to allow to 
make the token {{TABLE}} in {{ANALYZE TABLE}} optional, after all what else 
could be analyzed?  The grammar did not allow me to do this because of this 
possibility:

{code:none}
EXPLAIN ANALYZE ANALYZE TABLE `myTable` COMPUTE STATISTICS;
{code}

So, my first thought was to change ANALYZE to PROFILE to match Impala.  
However, I have done some digging and see that PostegreSQL first had 
{{EXPLAIN}} analyze and then MySQL later adopted it.  No reason for Hive to 
move away from it.

I did notice that MySQL does not allow you to {{EXPLAIN ANALYZE}} an {{ANALYZE 
TABLE}} statement.  Perhaps we want to revisit that at another point.

{code:none}
{EXPLAIN | DESCRIBE | DESC}
tbl_name [col_name | wild]

{EXPLAIN | DESCRIBE | DESC}
[explain_type]
{explainable_stmt | FOR CONNECTION connection_id}

{EXPLAIN | DESCRIBE | DESC} ANALYZE select_statement

explain_type: {
FORMAT = format_name
}

format_name: {
TRADITIONAL
  | JSON
  | TREE
}

explainable_stmt: {
SELECT statement
  | TABLE statement
  | DELETE statement
  | INSERT statement
  | REPLACE statement
  | UPDATE statement
}
{code}

https://dev.mysql.com/doc/refman/8.0/en/explain.html

> Change Explain ANALYZE to Explain PROFILE
> -
>
> Key: HIVE-23189
> URL: https://issues.apache.org/jira/browse/HIVE-23189
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
> Attachments: HIVE-23189.1.patch
>
>
> {code:none}
> EXPLAIN 
> [EXTENDED|CBO|AST|DEPENDENCY|AUTHORIZATION|LOCKS|VECTORIZATION|ANALYZE] query
> {code}
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Explain#LanguageManualExplain-TheANALYZEClause
> In Hive, there is an {{EXPLAIN ANALYZE}} query.  This can get a bit confusing 
> because you can run an {{EXPLAIN ANALYZE}} against an {{ANALYZE TABLE}} 
> statement, so you have something like,...
> {code:sql}
> EXPLAIN ANALYZE ANALYZE TABLE `myTable` COMPUTE STATISTICS;
> {code}
> I would like to propose that the name be changed to {{EXPLAIN PROFILE}}.  
> This borrows from Apache Impala because it has a {{PROFILE}} command which 
> produces the stats that actually occurred during the query run (much like 
> this Hive feature).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23194) Use Queue Instead of List for CollectOperator

2020-04-13 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HIVE-23194:
--
Status: Patch Available  (was: Open)

> Use Queue Instead of List for CollectOperator
> -
>
> Key: HIVE-23194
> URL: https://issues.apache.org/jira/browse/HIVE-23194
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
> Attachments: HIVE-23194.1.patch
>
>
> https://github.com/apache/hive/blob/d6948a28ab3e34e5116591a60a96bdf031185e47/ql/src/java/org/apache/hadoop/hive/ql/exec/CollectOperator.java#L85-L88
> {code:java|title=CollectOperator.java}
>rowList = new ArrayList();
> ...
> } else {
>   result.o = rowList.remove(0);
>   result.oi = standardRowInspector;
> }
> {code}
> Removing from the head of an {{ArrayList}} is an expensive operation because 
> it needs to shift all of the elements down in the array for each call.  
> Better to use a {{Queue}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23194) Use Queue Instead of List for CollectOperator

2020-04-13 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HIVE-23194:
--
Attachment: HIVE-23194.1.patch

> Use Queue Instead of List for CollectOperator
> -
>
> Key: HIVE-23194
> URL: https://issues.apache.org/jira/browse/HIVE-23194
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
> Attachments: HIVE-23194.1.patch
>
>
> https://github.com/apache/hive/blob/d6948a28ab3e34e5116591a60a96bdf031185e47/ql/src/java/org/apache/hadoop/hive/ql/exec/CollectOperator.java#L85-L88
> {code:java|title=CollectOperator.java}
>rowList = new ArrayList();
> ...
> } else {
>   result.o = rowList.remove(0);
>   result.oi = standardRowInspector;
> }
> {code}
> Removing from the head of an {{ArrayList}} is an expensive operation because 
> it needs to shift all of the elements down in the array for each call.  
> Better to use a {{Queue}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23194) Use Queue Instead of List for CollectOperator

2020-04-13 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor reassigned HIVE-23194:
-


> Use Queue Instead of List for CollectOperator
> -
>
> Key: HIVE-23194
> URL: https://issues.apache.org/jira/browse/HIVE-23194
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>
> https://github.com/apache/hive/blob/d6948a28ab3e34e5116591a60a96bdf031185e47/ql/src/java/org/apache/hadoop/hive/ql/exec/CollectOperator.java#L85-L88
> {code:java|title=CollectOperator.java}
>rowList = new ArrayList();
> ...
> } else {
>   result.o = rowList.remove(0);
>   result.oi = standardRowInspector;
> }
> {code}
> Removing from the head of an {{ArrayList}} is an expensive operation because 
> it needs to shift all of the elements down in the array for each call.  
> Better to use a {{Queue}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23183) Make TABLE Token Optional in TRUNCATE Statement

2020-04-13 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HIVE-23183:
--
Attachment: HIVE-23183.1.patch

> Make TABLE Token Optional in TRUNCATE Statement
> ---
>
> Key: HIVE-23183
> URL: https://issues.apache.org/jira/browse/HIVE-23183
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
> Attachments: HIVE-23183.1.patch, HIVE-23183.1.patch, 
> HIVE-23183.1.patch
>
>
> {code:none}
> TRUNCATE [TABLE] tbl_name
> {code}
> https://dev.mysql.com/doc/refman/8.0/en/truncate-table.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23193) Review of Subset of Debug Logging

2020-04-13 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HIVE-23193:
--
Attachment: HIVE-23193.1.patch

> Review of Subset of Debug Logging
> -
>
> Key: HIVE-23193
> URL: https://issues.apache.org/jira/browse/HIVE-23193
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
> Attachments: HIVE-23193.1.patch
>
>
> bq. Better yet, use parameterized messages
> bq.  Will outperform the first form by a factor of at least 30, in case of a 
> disabled logging statement.
> http://www.slf4j.org/faq.html
> * Use parameterized logging where appropriate
> * Add logging guards {{if (Log.isDebugEnabled()}} around loops and complex 
> debug message
> Simplify the code, remove lines of code, and potentially increase performance



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23193) Review of Subset of Debug Logging

2020-04-13 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HIVE-23193:
--
Status: Patch Available  (was: Open)

> Review of Subset of Debug Logging
> -
>
> Key: HIVE-23193
> URL: https://issues.apache.org/jira/browse/HIVE-23193
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
> Attachments: HIVE-23193.1.patch
>
>
> bq. Better yet, use parameterized messages
> bq.  Will outperform the first form by a factor of at least 30, in case of a 
> disabled logging statement.
> http://www.slf4j.org/faq.html
> * Use parameterized logging where appropriate
> * Add logging guards {{if (Log.isDebugEnabled()}} around loops and complex 
> debug message
> Simplify the code, remove lines of code, and potentially increase performance



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23193) Review of Subset of Debug Logging

2020-04-13 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor reassigned HIVE-23193:
-


> Review of Subset of Debug Logging
> -
>
> Key: HIVE-23193
> URL: https://issues.apache.org/jira/browse/HIVE-23193
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>
> bq. Better yet, use parameterized messages
> bq.  Will outperform the first form by a factor of at least 30, in case of a 
> disabled logging statement.
> http://www.slf4j.org/faq.html
> * Use parameterized logging where appropriate
> * Add logging guards {{if (Log.isDebugEnabled()}} around loops and complex 
> debug message
> Simplify the code, remove lines of code, and potentially increase performance



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23098) Allow Operation assertState to Accept a Collection

2020-04-13 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082703#comment-17082703
 ] 

David Mollitor commented on HIVE-23098:
---

[~ngangam] [~pvary] [~mgergely] Any chance you got a moment to take a peek at 
this?

> Allow Operation assertState to Accept a Collection
> --
>
> Key: HIVE-23098
> URL: https://issues.apache.org/jira/browse/HIVE-23098
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
> Attachments: HIVE-23098.1.patch, HIVE-23098.2.patch, 
> HIVE-23098.2.patch, HIVE-23098.2.patch, HIVE-23098.3.patch
>
>
> {code:java|title=Operation.java}
>   protected final void assertState(List states) throws 
> HiveSQLException {
> if (!states.contains(state)) {
>   throw new HiveSQLException("Expected states: " + states.toString() + ", 
> but found "
>   + this.state);
> }
> this.lastAccessTime = System.currentTimeMillis();
>   }
> /*/
> public void someMethod() {
> assertState(new 
> ArrayList(Arrays.asList(OperationState.FINISHED)));
> }
> {code}
> By allowing {{assertState}} to accept a {{Collection}}, one can save an 
> allocation and simplify the code:
> {code:java}
> assertState(Collections.singleton(OperationState.FINISHED));
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23189) Change Explain ANALYZE to Explain PROFILE

2020-04-13 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HIVE-23189:
--
Status: Patch Available  (was: Open)

> Change Explain ANALYZE to Explain PROFILE
> -
>
> Key: HIVE-23189
> URL: https://issues.apache.org/jira/browse/HIVE-23189
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
> Attachments: HIVE-23189.1.patch
>
>
> {code:none}
> EXPLAIN 
> [EXTENDED|CBO|AST|DEPENDENCY|AUTHORIZATION|LOCKS|VECTORIZATION|ANALYZE] query
> {code}
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Explain#LanguageManualExplain-TheANALYZEClause
> In Hive, there is an {{EXPLAIN ANALYZE}} query.  This can get a bit confusing 
> because you can run an {{EXPLAIN ANALYZE}} against an {{ANALYZE TABLE}} 
> statement, so you have something like,...
> {code:sql}
> EXPLAIN ANALYZE ANALYZE TABLE `myTable` COMPUTE STATISTICS;
> {code}
> I would like to propose that the name be changed to {{EXPLAIN PROFILE}}.  
> This borrows from Apache Impala because it has a {{PROFILE}} command which 
> produces the stats that actually occurred during the query run (much like 
> this Hive feature).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23189) Change Explain ANALYZE to Explain PROFILE

2020-04-13 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HIVE-23189:
--
Attachment: HIVE-23189.1.patch

> Change Explain ANALYZE to Explain PROFILE
> -
>
> Key: HIVE-23189
> URL: https://issues.apache.org/jira/browse/HIVE-23189
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
> Attachments: HIVE-23189.1.patch
>
>
> {code:none}
> EXPLAIN 
> [EXTENDED|CBO|AST|DEPENDENCY|AUTHORIZATION|LOCKS|VECTORIZATION|ANALYZE] query
> {code}
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Explain#LanguageManualExplain-TheANALYZEClause
> In Hive, there is an {{EXPLAIN ANALYZE}} query.  This can get a bit confusing 
> because you can run an {{EXPLAIN ANALYZE}} against an {{ANALYZE TABLE}} 
> statement, so you have something like,...
> {code:sql}
> EXPLAIN ANALYZE ANALYZE TABLE `myTable` COMPUTE STATISTICS;
> {code}
> I would like to propose that the name be changed to {{EXPLAIN PROFILE}}.  
> This borrows from Apache Impala because it has a {{PROFILE}} command which 
> produces the stats that actually occurred during the query run (much like 
> this Hive feature).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23176) Remove SELECT REGEX Column Feature

2020-04-13 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HIVE-23176:
--
Attachment: HIVE-23176.4.patch

> Remove SELECT REGEX Column Feature
> --
>
> Key: HIVE-23176
> URL: https://issues.apache.org/jira/browse/HIVE-23176
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
> Attachments: HIVE-23176.1.patch, HIVE-23176.2.patch, 
> HIVE-23176.3.patch, HIVE-23176.4.patch, HIVE-23176.4.patch
>
>
> Remove the Hive feature: REGEX Column.
>  
> Hive has this interesting feature for doing REGEX to SELECT multiple columns. 
>  This needs to go.  It is not SQL standard and as currently implemented, it 
> is impossible to determine if a column identifier is a REGEX or the actual 
> name of the column.  If a column name is enclosed in back ticks then any 
> UTF-8 character is a valid table name.
>  
> [https://dev.mysql.com/doc/refman/8.0/en/identifiers.html]
> [https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-21354) Lock The Entire Table If Majority Of Partitions Are Locked

2020-04-13 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082578#comment-17082578
 ] 

David Mollitor commented on HIVE-21354:
---

bq. So it all comes down if the lock check does exact matches, or it checks 
stuff hierarchically.

Yes. Exactly :)

I think we are just both guessing on which one is employed.  I will need to dig 
in to figure it out, unless you can point me at the code that does this 
implicit locking check.

> Lock The Entire Table If Majority Of Partitions Are Locked
> --
>
> Key: HIVE-21354
> URL: https://issues.apache.org/jira/browse/HIVE-21354
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 4.0.0, 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>
> One of the bottlenecks of any Hive query is the ZooKeeper locking mechanism.  
> When a Hive query interacts with a table which has a lot of partitions, this 
> may put a lot of stress on the ZK system.
> Please add a heuristic that works like this:
> # Count the number of partitions that a query is required to lock
> # Obtain the total number of partitions in the table
> # If the number of partitions accessed by the query is greater than or equal 
> to half the total number of partitions, simply create one ZNode lock at the 
> table level.
> This would improve performance of many queries, but in particular, a {{select 
> count(1) from table}} ... or ... {{select * from table limit 5}} where the 
> table has many partitions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-21354) Lock The Entire Table If Majority Of Partitions Are Locked

2020-04-13 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082505#comment-17082505
 ] 

David Mollitor commented on HIVE-21354:
---

... something like:

{code:none}
explain locks alter table web_logs drop partition(`date`='2015-11-18')

LOCK INFORMATION:
default.web_logs -> SHARED_READ
default.web_logs.date=2015-11-18 -> EXCLUSIVE
{code}

> Lock The Entire Table If Majority Of Partitions Are Locked
> --
>
> Key: HIVE-21354
> URL: https://issues.apache.org/jira/browse/HIVE-21354
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 4.0.0, 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>
> One of the bottlenecks of any Hive query is the ZooKeeper locking mechanism.  
> When a Hive query interacts with a table which has a lot of partitions, this 
> may put a lot of stress on the ZK system.
> Please add a heuristic that works like this:
> # Count the number of partitions that a query is required to lock
> # Obtain the total number of partitions in the table
> # If the number of partitions accessed by the query is greater than or equal 
> to half the total number of partitions, simply create one ZNode lock at the 
> table level.
> This would improve performance of many queries, but in particular, a {{select 
> count(1) from table}} ... or ... {{select * from table limit 5}} where the 
> table has many partitions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-21354) Lock The Entire Table If Majority Of Partitions Are Locked

2020-04-13 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082501#comment-17082501
 ] 

David Mollitor edited comment on HIVE-21354 at 4/13/20, 5:18 PM:
-

[~pvary] I do not think that Hive has any logic that says "if a partition of a 
table is locked, then the table is locked."  I think it does this in a simple 
way... it comes up with a list of all the required locks and the first one is 
always the table lock, the rest are the required partitions.  That is to say, 
it takes an explicit lock on the table,... there is no logic for an implicit 
table lock:

{code:none}
EXPLAIN LOCKS SELECT * FROM web_logs;

LOCK INFORMATION:
default.web_logs -> SHARED_READ
default.web_logs.date=2015-11-18 -> SHARED_READ
default.web_logs.date=2015-11-19 -> SHARED_READ
default.web_logs.date=2015-11-20 -> SHARED_READ
default.web_logs.date=2015-11-21 -> SHARED_READ
{code}


What I would expect is, for an INSERT into a specific partition, or TRUNCATE 
partition statement, those queries would take a SHARED_READ lock on the 
table-level and an EXCLUSIVE lock on the specific partitions.


was (Author: belugabehr):
[~pvary] I do not think that Hive has any logic that says "if a partition of a 
table is locked, then the table is locked."  I think it does this in a simple 
way... it comes up with a list of all the required locks and the first one is 
always the table lock, the rest are the required partitions.  That is to say, 
it takes an explicit lock on the table,... there is no logic for an implicit 
table lock:

{code:none}
EXPLAIN LOCKS SELECT * FROM web_logs;

LOCK INFORMATION:
default.web_logs -> SHARED_READ
default.web_logs.date=2015-11-18 -> SHARED_READ
default.web_logs.date=2015-11-19 -> SHARED_READ
default.web_logs.date=2015-11-20 -> SHARED_READ
default.web_logs.date=2015-11-21 -> SHARED_READ
{code}

> Lock The Entire Table If Majority Of Partitions Are Locked
> --
>
> Key: HIVE-21354
> URL: https://issues.apache.org/jira/browse/HIVE-21354
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 4.0.0, 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>
> One of the bottlenecks of any Hive query is the ZooKeeper locking mechanism.  
> When a Hive query interacts with a table which has a lot of partitions, this 
> may put a lot of stress on the ZK system.
> Please add a heuristic that works like this:
> # Count the number of partitions that a query is required to lock
> # Obtain the total number of partitions in the table
> # If the number of partitions accessed by the query is greater than or equal 
> to half the total number of partitions, simply create one ZNode lock at the 
> table level.
> This would improve performance of many queries, but in particular, a {{select 
> count(1) from table}} ... or ... {{select * from table limit 5}} where the 
> table has many partitions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-21354) Lock The Entire Table If Majority Of Partitions Are Locked

2020-04-13 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082501#comment-17082501
 ] 

David Mollitor edited comment on HIVE-21354 at 4/13/20, 5:16 PM:
-

[~pvary] I do not think that Hive has any logic that says "if a partition of a 
table is locked, then the table is locked."  I think it does this in a simple 
way... it comes up with a list of all the required locks and the first one is 
always the table lock, the rest are the required partitions.  That is to say, 
it takes an explicit lock on the table,... there is no logic for an implicit 
table lock:

{code:none}
EXPLAIN LOCKS SELECT * FROM web_logs;

LOCK INFORMATION:
default.web_logs -> SHARED_READ
default.web_logs.date=2015-11-18 -> SHARED_READ
default.web_logs.date=2015-11-19 -> SHARED_READ
default.web_logs.date=2015-11-20 -> SHARED_READ
default.web_logs.date=2015-11-21 -> SHARED_READ
{code}


was (Author: belugabehr):
[~pvary] I do not think that Hive has any logic that says "if a partition of a 
table is locked, then the table is locked."  I think it does this this a a 
simple way... it comes up with a list of all the required locks and the first 
one is always the table lock, the rest are the required partitions.  That is to 
say, it takes an explicit lock on the table,... there is no logic for an 
implicit table lock:

{code:none}
EXPLAIN LOCKS SELECT * FROM web_logs;

LOCK INFORMATION:
default.web_logs -> SHARED_READ
default.web_logs.date=2015-11-18 -> SHARED_READ
default.web_logs.date=2015-11-19 -> SHARED_READ
default.web_logs.date=2015-11-20 -> SHARED_READ
default.web_logs.date=2015-11-21 -> SHARED_READ
{code}

> Lock The Entire Table If Majority Of Partitions Are Locked
> --
>
> Key: HIVE-21354
> URL: https://issues.apache.org/jira/browse/HIVE-21354
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 4.0.0, 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>
> One of the bottlenecks of any Hive query is the ZooKeeper locking mechanism.  
> When a Hive query interacts with a table which has a lot of partitions, this 
> may put a lot of stress on the ZK system.
> Please add a heuristic that works like this:
> # Count the number of partitions that a query is required to lock
> # Obtain the total number of partitions in the table
> # If the number of partitions accessed by the query is greater than or equal 
> to half the total number of partitions, simply create one ZNode lock at the 
> table level.
> This would improve performance of many queries, but in particular, a {{select 
> count(1) from table}} ... or ... {{select * from table limit 5}} where the 
> table has many partitions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-21354) Lock The Entire Table If Majority Of Partitions Are Locked

2020-04-13 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082501#comment-17082501
 ] 

David Mollitor commented on HIVE-21354:
---

[~pvary] I do not think that Hive has any logic that says "if a partition of a 
table is locked, then the table is locked."  I think it does this this a a 
simple way... it comes up with a list of all the required locks and the first 
one is always the table lock, the rest are the required partitions.  That is to 
say, it takes an explicit lock on the table,... there is no logic for an 
implicit table lock:

{code:none}
EXPLAIN LOCKS SELECT * FROM web_logs;

LOCK INFORMATION:
default.web_logs -> SHARED_READ
default.web_logs.date=2015-11-18 -> SHARED_READ
default.web_logs.date=2015-11-19 -> SHARED_READ
default.web_logs.date=2015-11-20 -> SHARED_READ
default.web_logs.date=2015-11-21 -> SHARED_READ

> Lock The Entire Table If Majority Of Partitions Are Locked
> --
>
> Key: HIVE-21354
> URL: https://issues.apache.org/jira/browse/HIVE-21354
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 4.0.0, 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>
> One of the bottlenecks of any Hive query is the ZooKeeper locking mechanism.  
> When a Hive query interacts with a table which has a lot of partitions, this 
> may put a lot of stress on the ZK system.
> Please add a heuristic that works like this:
> # Count the number of partitions that a query is required to lock
> # Obtain the total number of partitions in the table
> # If the number of partitions accessed by the query is greater than or equal 
> to half the total number of partitions, simply create one ZNode lock at the 
> table level.
> This would improve performance of many queries, but in particular, a {{select 
> count(1) from table}} ... or ... {{select * from table limit 5}} where the 
> table has many partitions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-21354) Lock The Entire Table If Majority Of Partitions Are Locked

2020-04-13 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082501#comment-17082501
 ] 

David Mollitor edited comment on HIVE-21354 at 4/13/20, 5:15 PM:
-

[~pvary] I do not think that Hive has any logic that says "if a partition of a 
table is locked, then the table is locked."  I think it does this this a a 
simple way... it comes up with a list of all the required locks and the first 
one is always the table lock, the rest are the required partitions.  That is to 
say, it takes an explicit lock on the table,... there is no logic for an 
implicit table lock:

{code:none}
EXPLAIN LOCKS SELECT * FROM web_logs;

LOCK INFORMATION:
default.web_logs -> SHARED_READ
default.web_logs.date=2015-11-18 -> SHARED_READ
default.web_logs.date=2015-11-19 -> SHARED_READ
default.web_logs.date=2015-11-20 -> SHARED_READ
default.web_logs.date=2015-11-21 -> SHARED_READ
{code}


was (Author: belugabehr):
[~pvary] I do not think that Hive has any logic that says "if a partition of a 
table is locked, then the table is locked."  I think it does this this a a 
simple way... it comes up with a list of all the required locks and the first 
one is always the table lock, the rest are the required partitions.  That is to 
say, it takes an explicit lock on the table,... there is no logic for an 
implicit table lock:

{code:none}
EXPLAIN LOCKS SELECT * FROM web_logs;

LOCK INFORMATION:
default.web_logs -> SHARED_READ
default.web_logs.date=2015-11-18 -> SHARED_READ
default.web_logs.date=2015-11-19 -> SHARED_READ
default.web_logs.date=2015-11-20 -> SHARED_READ
default.web_logs.date=2015-11-21 -> SHARED_READ

> Lock The Entire Table If Majority Of Partitions Are Locked
> --
>
> Key: HIVE-21354
> URL: https://issues.apache.org/jira/browse/HIVE-21354
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 4.0.0, 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>
> One of the bottlenecks of any Hive query is the ZooKeeper locking mechanism.  
> When a Hive query interacts with a table which has a lot of partitions, this 
> may put a lot of stress on the ZK system.
> Please add a heuristic that works like this:
> # Count the number of partitions that a query is required to lock
> # Obtain the total number of partitions in the table
> # If the number of partitions accessed by the query is greater than or equal 
> to half the total number of partitions, simply create one ZNode lock at the 
> table level.
> This would improve performance of many queries, but in particular, a {{select 
> count(1) from table}} ... or ... {{select * from table limit 5}} where the 
> table has many partitions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23187) Make TABLE Token Optional in ANALYZE Statement

2020-04-13 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082400#comment-17082400
 ] 

David Mollitor commented on HIVE-23187:
---

This is not possible right now because of [HIVE-23187] leads to some confusion 
for the parser when the TABLE token is made optional.

> Make TABLE Token Optional in ANALYZE Statement
> --
>
> Key: HIVE-23187
> URL: https://issues.apache.org/jira/browse/HIVE-23187
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>
> {code:none}
> ANALYZE [TABLE] Table1 PARTITION(ds='2008-04-09', hr=11) COMPUTE STATISTICS;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23189) Change Explain ANALYZE to Explain PROFILE

2020-04-13 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HIVE-23189:
--
Target Version/s: 4.0.0

> Change Explain ANALYZE to Explain PROFILE
> -
>
> Key: HIVE-23189
> URL: https://issues.apache.org/jira/browse/HIVE-23189
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>
> {code:none}
> EXPLAIN 
> [EXTENDED|CBO|AST|DEPENDENCY|AUTHORIZATION|LOCKS|VECTORIZATION|ANALYZE] query
> {code}
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Explain#LanguageManualExplain-TheANALYZEClause
> In Hive, there is an {{EXPLAIN ANALYZE}} query.  This can get a bit confusing 
> because you can run an {{EXPLAIN ANALYZE}} against an {{ANALYZE TABLE}} 
> statement, so you have something like,...
> {code:sql}
> EXPLAIN ANALYZE ANALYZE TABLE `myTable` COMPUTE STATISTICS;
> {code}
> I would like to propose that the name be changed to {{EXPLAIN PROFILE}}.  
> This borrows from Apache Impala because it has a {{PROFILE}} command which 
> produces the stats that actually occurred during the query run (much like 
> this Hive feature).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23189) Change Explain ANALYZE to Explain PROFILE

2020-04-13 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor reassigned HIVE-23189:
-


> Change Explain ANALYZE to Explain PROFILE
> -
>
> Key: HIVE-23189
> URL: https://issues.apache.org/jira/browse/HIVE-23189
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>
> {code:none}
> EXPLAIN 
> [EXTENDED|CBO|AST|DEPENDENCY|AUTHORIZATION|LOCKS|VECTORIZATION|ANALYZE] query
> {code}
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Explain#LanguageManualExplain-TheANALYZEClause
> In Hive, there is an {{EXPLAIN ANALYZE}} query.  This can get a bit confusing 
> because you can run an {{EXPLAIN ANALYZE}} against an {{ANALYZE TABLE}} 
> statement, so you have something like,...
> {code:sql}
> EXPLAIN ANALYZE ANALYZE TABLE `myTable` COMPUTE STATISTICS;
> {code}
> I would like to propose that the name be changed to {{EXPLAIN PROFILE}}.  
> This borrows from Apache Impala because it has a {{PROFILE}} command which 
> produces the stats that actually occurred during the query run (much like 
> this Hive feature).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-21354) Lock The Entire Table If Majority Of Partitions Are Locked

2020-04-13 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082362#comment-17082362
 ] 

David Mollitor commented on HIVE-21354:
---

[~pvary] I'm not sure on the exact relationship between table and partition.  I 
believe they overlap in some meta data, but maybe not all?  There might be an 
issue of:

* Client 1: Read partition 'a'
* Client 2: Change the table-level meta data
* Client 1: Read partition 'b'

.. but I don't know


Doing a 'DROP' makes sense to lock just the partition... whatever the meta data 
change might be is irrelevant because... well,... it's going to be dropped 
anyway.

> Lock The Entire Table If Majority Of Partitions Are Locked
> --
>
> Key: HIVE-21354
> URL: https://issues.apache.org/jira/browse/HIVE-21354
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 4.0.0, 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>
> One of the bottlenecks of any Hive query is the ZooKeeper locking mechanism.  
> When a Hive query interacts with a table which has a lot of partitions, this 
> may put a lot of stress on the ZK system.
> Please add a heuristic that works like this:
> # Count the number of partitions that a query is required to lock
> # Obtain the total number of partitions in the table
> # If the number of partitions accessed by the query is greater than or equal 
> to half the total number of partitions, simply create one ZNode lock at the 
> table level.
> This would improve performance of many queries, but in particular, a {{select 
> count(1) from table}} ... or ... {{select * from table limit 5}} where the 
> table has many partitions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23188) Allow STATS Token in Analyze Table

2020-04-13 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HIVE-23188:
--
Description: 
{code:none}
ANALYZE TABLE Table1 PARTITION(ds='2008-04-09', hr=11) COMPUTE 
[STATISTICS|STATS];
{code}

Save a few keyboard strokes.

> Allow STATS Token in Analyze Table
> --
>
> Key: HIVE-23188
> URL: https://issues.apache.org/jira/browse/HIVE-23188
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>
> {code:none}
> ANALYZE TABLE Table1 PARTITION(ds='2008-04-09', hr=11) COMPUTE 
> [STATISTICS|STATS];
> {code}
> Save a few keyboard strokes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23187) Make TABLE Token Optional in ANALYZE Statement

2020-04-13 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HIVE-23187:
--
Description: 
{code:none}
ANALYZE [TABLE] Table1 PARTITION(ds='2008-04-09', hr=11) COMPUTE STATISTICS;
{code}

> Make TABLE Token Optional in ANALYZE Statement
> --
>
> Key: HIVE-23187
> URL: https://issues.apache.org/jira/browse/HIVE-23187
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>
> {code:none}
> ANALYZE [TABLE] Table1 PARTITION(ds='2008-04-09', hr=11) COMPUTE STATISTICS;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23188) Allow STATS Token in Analyze Table

2020-04-13 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor reassigned HIVE-23188:
-


> Allow STATS Token in Analyze Table
> --
>
> Key: HIVE-23188
> URL: https://issues.apache.org/jira/browse/HIVE-23188
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23187) Make TABLE Token Optional in ANALYZE Statement

2020-04-13 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor reassigned HIVE-23187:
-


> Make TABLE Token Optional in ANALYZE Statement
> --
>
> Key: HIVE-23187
> URL: https://issues.apache.org/jira/browse/HIVE-23187
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23183) Make TABLE Token Optional in TRUNCATE Statement

2020-04-13 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HIVE-23183:
--
Attachment: HIVE-23183.1.patch

> Make TABLE Token Optional in TRUNCATE Statement
> ---
>
> Key: HIVE-23183
> URL: https://issues.apache.org/jira/browse/HIVE-23183
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
> Attachments: HIVE-23183.1.patch, HIVE-23183.1.patch
>
>
> {code:none}
> TRUNCATE [TABLE] tbl_name
> {code}
> https://dev.mysql.com/doc/refman/8.0/en/truncate-table.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23176) Remove SELECT REGEX Column Feature

2020-04-13 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HIVE-23176:
--
Attachment: HIVE-23176.4.patch

> Remove SELECT REGEX Column Feature
> --
>
> Key: HIVE-23176
> URL: https://issues.apache.org/jira/browse/HIVE-23176
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
> Attachments: HIVE-23176.1.patch, HIVE-23176.2.patch, 
> HIVE-23176.3.patch, HIVE-23176.4.patch
>
>
> Remove the Hive feature: REGEX Column.
>  
> Hive has this interesting feature for doing REGEX to SELECT multiple columns. 
>  This needs to go.  It is not SQL standard and as currently implemented, it 
> is impossible to determine if a column identifier is a REGEX or the actual 
> name of the column.  If a column name is enclosed in back ticks then any 
> UTF-8 character is a valid table name.
>  
> [https://dev.mysql.com/doc/refman/8.0/en/identifiers.html]
> [https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23186) Strict Check SemanticException Should Properly Quote Table Name

2020-04-13 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor reassigned HIVE-23186:
-


> Strict Check SemanticException Should Properly Quote Table Name
> ---
>
> Key: HIVE-23186
> URL: https://issues.apache.org/jira/browse/HIVE-23186
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>
> https://github.com/apache/hive/blob/029cab297a9ae40d249f63040721f93857398648/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java#L191-L192
> {code:java}
> throw new SemanticException(error + " No partition predicate for 
> Alias \""
> + alias + "\" Table \"" + tab.getTableName() + "\"");
> {code}
> Use back ticks and use the database name as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23176) Remove SELECT REGEX Column Feature

2020-04-12 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HIVE-23176:
--
Attachment: HIVE-23176.3.patch

> Remove SELECT REGEX Column Feature
> --
>
> Key: HIVE-23176
> URL: https://issues.apache.org/jira/browse/HIVE-23176
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
> Attachments: HIVE-23176.1.patch, HIVE-23176.2.patch, 
> HIVE-23176.3.patch
>
>
> Remove the Hive feature: REGEX Column.
>  
> Hive has this interesting feature for doing REGEX to SELECT multiple columns. 
>  This needs to go.  It is not SQL standard and as currently implemented, it 
> is impossible to determine if a column identifier is a REGEX or the actual 
> name of the column.  If a column name is enclosed in back ticks then any 
> UTF-8 character is a valid table name.
>  
> [https://dev.mysql.com/doc/refman/8.0/en/identifiers.html]
> [https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23183) Make TABLE Token Optional in TRUNCATE Statement

2020-04-12 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HIVE-23183:
--
Attachment: HIVE-23183.1.patch

> Make TABLE Token Optional in TRUNCATE Statement
> ---
>
> Key: HIVE-23183
> URL: https://issues.apache.org/jira/browse/HIVE-23183
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
> Attachments: HIVE-23183.1.patch
>
>
> {code:none}
> TRUNCATE [TABLE] tbl_name
> {code}
> https://dev.mysql.com/doc/refman/8.0/en/truncate-table.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23183) Make TABLE Token Optional in TRUNCATE Statement

2020-04-12 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HIVE-23183:
--
Status: Patch Available  (was: Open)

> Make TABLE Token Optional in TRUNCATE Statement
> ---
>
> Key: HIVE-23183
> URL: https://issues.apache.org/jira/browse/HIVE-23183
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
> Attachments: HIVE-23183.1.patch
>
>
> {code:none}
> TRUNCATE [TABLE] tbl_name
> {code}
> https://dev.mysql.com/doc/refman/8.0/en/truncate-table.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23183) Make TABLE Token Optional in TRUNCATE Statement

2020-04-12 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HIVE-23183:
--
Description: 
{code:none}
TRUNCATE [TABLE] tbl_name
{code}

https://dev.mysql.com/doc/refman/8.0/en/truncate-table.html

  was:It's optional in MySQL, let's make it optional for Hive too.


> Make TABLE Token Optional in TRUNCATE Statement
> ---
>
> Key: HIVE-23183
> URL: https://issues.apache.org/jira/browse/HIVE-23183
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>
> {code:none}
> TRUNCATE [TABLE] tbl_name
> {code}
> https://dev.mysql.com/doc/refman/8.0/en/truncate-table.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23183) Make TABLE Token Optional in TRUNCATE Statement

2020-04-12 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor reassigned HIVE-23183:
-


> Make TABLE Token Optional in TRUNCATE Statement
> ---
>
> Key: HIVE-23183
> URL: https://issues.apache.org/jira/browse/HIVE-23183
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>
> It's optional in MySQL, let's make it optional for Hive too.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23176) Remove SELECT REGEX Column Feature

2020-04-12 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HIVE-23176:
--
Attachment: HIVE-23176.2.patch

> Remove SELECT REGEX Column Feature
> --
>
> Key: HIVE-23176
> URL: https://issues.apache.org/jira/browse/HIVE-23176
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
> Attachments: HIVE-23176.1.patch, HIVE-23176.2.patch
>
>
> Remove the Hive feature: REGEX Column.
>  
> Hive has this interesting feature for doing REGEX to SELECT multiple columns. 
>  This needs to go.  It is not SQL standard and as currently implemented, it 
> is impossible to determine if a column identifier is a REGEX or the actual 
> name of the column.  If a column name is enclosed in back ticks then any 
> UTF-8 character is a valid table name.
>  
> [https://dev.mysql.com/doc/refman/8.0/en/identifiers.html]
> [https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23171) Create Tool To Visualize Hive Parser Tree

2020-04-12 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HIVE-23171:
--
Attachment: HIVE-23171.1.patch

> Create Tool To Visualize Hive Parser Tree
> -
>
> Key: HIVE-23171
> URL: https://issues.apache.org/jira/browse/HIVE-23171
> Project: Hive
>  Issue Type: Sub-task
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
> Attachments: HIVE-23171.1.patch, HIVE-23171.1.patch, 
> HIVE-23171.1.patch, HIVE-23171.1.patch, select_1.png
>
>
> For some of the work I would like to do on HIVE-23149, it would be nice to 
> visualize the output of the statement parser.
> I have created a tool that spits out the parser tree in DOT file format. This 
> allows it to be visualized using a plethora of tools.
> To use it, compile the \{{hive-parser}} test JAR and run it.  The application 
> takes a single command line argument of a String.  The String is the SQL 
> statement to parse:
> {code:none}
> HqlParser "SELECT 1"
> {code}
> I have attached an example of the output that I generated for a {{SELECT 1}} 
> statement:
>  
>  
> !select_1.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23182) Semantic Exception: rule Identifier failed predicate allowQuotedId

2020-04-11 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17081632#comment-17081632
 ] 

David Mollitor commented on HIVE-23182:
---

I propose to make the default value "true" if no context is provided, therefore 
any quoted IDs will be rejected on the first pass if the feature is disabled in 
the user's session.  If the feature is turned on, well, it won't have any 
problem parsing it on the second pass because the default is 'true'.

> Semantic Exception: rule Identifier failed predicate allowQuotedId
> --
>
> Key: HIVE-23182
> URL: https://issues.apache.org/jira/browse/HIVE-23182
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Priority: Major
> Attachments: Querying a Hive Table (via Hiveserver2) with Colum... - 
> Cloudera Community.pdf
>
>
> Querying a Hive Table (via Hiveserver2) with Column Masking enabled via 
> Ranger Hive Plugin returns with an error.
> {code:none}
> [42000]: Error while compiling statement: FAILED: SemanticException 
> org.apache.hadoop.hive.ql.parse.ParseException: line 1:62 rule Identifier 
> failed predicate: {allowQuotedId()}? line 1:74 rule Identifier failed 
> predicate: {allowQuotedId()}? line 1:94 rule Identifier failed predicate: 
> {allowQuotedId()}? line 1:117 rule Identifier failed predicate: 
> {allowQuotedId()}?
> {code}
> https://community.cloudera.com/t5/Support-Questions/Querying-a-Hive-Table-via-Hiveserver2-with-Column-Masking/td-p/167260



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23176) Remove SELECT REGEX Column Feature

2020-04-11 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17081631#comment-17081631
 ] 

David Mollitor commented on HIVE-23176:
---

I have logged this {{allowQuotedId}} issue in [HIVE-23182].  Read the full 
details there.

I will probably "fix" the issue here though.  The "fix" I propose is to make 
the default value of allowQuotedId to be 'true' instead of the current value of 
'false'.

> Remove SELECT REGEX Column Feature
> --
>
> Key: HIVE-23176
> URL: https://issues.apache.org/jira/browse/HIVE-23176
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
> Attachments: HIVE-23176.1.patch
>
>
> Remove the Hive feature: REGEX Column.
>  
> Hive has this interesting feature for doing REGEX to SELECT multiple columns. 
>  This needs to go.  It is not SQL standard and as currently implemented, it 
> is impossible to determine if a column identifier is a REGEX or the actual 
> name of the column.  If a column name is enclosed in back ticks then any 
> UTF-8 character is a valid table name.
>  
> [https://dev.mysql.com/doc/refman/8.0/en/identifiers.html]
> [https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-23182) Semantic Exception: rule Identifier failed predicate allowQuotedId

2020-04-11 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17081627#comment-17081627
 ] 

David Mollitor edited comment on HIVE-23182 at 4/12/20, 3:52 AM:
-

I created this JIRA based on a support case in the Cloudera/Hortonworks 
community forums.

I just stumbled across the answer to this working on something seemingly 
unrelated.

The user is experiencing this issue because, for masking, the query is parsed 
by ANTLR, manipulated to mask the appropriate values, and then passed through 
the Hive ANTLR SQL parser to be processed again. However, it fails the second 
pass through the parser because there is a feature called 
{{hive.support.quoted.identifiers}}. This feature is enabled in HS2 by default 
and it tells Hive if it should accept (or reject) values wrapped in backticks 
(or quotes). When the query is parsed the first time, the flag is correctly 
passed from the user's session context, so table names with quotes are handled 
successfully. However, the session context (and therefore this flag) is not 
provided to the parser the second time around and therefore the parser rejects 
the quoted IDs. This is because, if no context is provided, the default 
behavior becomes "false".
{code:java|title=HiveLexer.g}
  public void setHiveConf(Configuration hiveConf) {
this.hiveConf = hiveConf;
  }
  
  protected boolean allowQuotedId() {
if(hiveConf == null){
  // This line here: No context provided? Feature is disable.
  return false;
}
String supportedQIds = HiveConf.getVar(hiveConf, 
HiveConf.ConfVars.HIVE_QUOTEDID_SUPPORT);
return !"none".equals(supportedQIds);
  }
{code}


was (Author: belugabehr):
I created this JIRA based on a support case in the Cloudera/Hortonworks 
community forums.

I just stumbled across the answer to this.

The user is experiencing and issue because, for masking, the query is 
manipulated to mask the appropriate values and then passed through the Hive 
ANTLR SQL parser to be processed again.  However, it fails because there is a 
feature called {{hive.support.quoted.identifiers}}.  This feature is enabled by 
default, and it allows the original query to be parsed correctly.  However, 
this flag (via the session context) is not provided to the parser the second 
time around and therefore the parser mishandles the backticks.  If no context 
is provided, the default behavior becomes "false".

{code:java|title=HiveLexer.g}
  public void setHiveConf(Configuration hiveConf) {
this.hiveConf = hiveConf;
  }
  
  protected boolean allowQuotedId() {
if(hiveConf == null){
  return false;
}
String supportedQIds = HiveConf.getVar(hiveConf, 
HiveConf.ConfVars.HIVE_QUOTEDID_SUPPORT);
return !"none".equals(supportedQIds);
  }
{code}

> Semantic Exception: rule Identifier failed predicate allowQuotedId
> --
>
> Key: HIVE-23182
> URL: https://issues.apache.org/jira/browse/HIVE-23182
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Priority: Major
> Attachments: Querying a Hive Table (via Hiveserver2) with Colum... - 
> Cloudera Community.pdf
>
>
> Querying a Hive Table (via Hiveserver2) with Column Masking enabled via 
> Ranger Hive Plugin returns with an error.
> {code:none}
> [42000]: Error while compiling statement: FAILED: SemanticException 
> org.apache.hadoop.hive.ql.parse.ParseException: line 1:62 rule Identifier 
> failed predicate: {allowQuotedId()}? line 1:74 rule Identifier failed 
> predicate: {allowQuotedId()}? line 1:94 rule Identifier failed predicate: 
> {allowQuotedId()}? line 1:117 rule Identifier failed predicate: 
> {allowQuotedId()}?
> {code}
> https://community.cloudera.com/t5/Support-Questions/Querying-a-Hive-Table-via-Hiveserver2-with-Column-Masking/td-p/167260



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23182) Semantic Exception: rule Identifier failed predicate allowQuotedId

2020-04-11 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17081629#comment-17081629
 ] 

David Mollitor commented on HIVE-23182:
---

{code:java|title=SemanticAnalyzer.java}
  private ASTNode rewriteASTWithMaskAndFilter(TableMask tableMask, ASTNode ast, 
TokenRewriteStream tokenRewriteStream,
Context ctx, Hive db, 
Map tabNameToTabObject) {
...
 try {
// Right here... the configuration information stored in the Context 
'ctx' is not passed along
// and the information regarding HIVE_QUOTEDID_SUPPORT is lost and the 
user's preference
// which was applied to the original query, is gone
rewrittenTree = ParseUtils.parse(rewrittenQuery);
  } catch (ParseException e) {
throw new SemanticException(e);
  }
}
{code}

> Semantic Exception: rule Identifier failed predicate allowQuotedId
> --
>
> Key: HIVE-23182
> URL: https://issues.apache.org/jira/browse/HIVE-23182
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Priority: Major
> Attachments: Querying a Hive Table (via Hiveserver2) with Colum... - 
> Cloudera Community.pdf
>
>
> Querying a Hive Table (via Hiveserver2) with Column Masking enabled via 
> Ranger Hive Plugin returns with an error.
> {code:none}
> [42000]: Error while compiling statement: FAILED: SemanticException 
> org.apache.hadoop.hive.ql.parse.ParseException: line 1:62 rule Identifier 
> failed predicate: {allowQuotedId()}? line 1:74 rule Identifier failed 
> predicate: {allowQuotedId()}? line 1:94 rule Identifier failed predicate: 
> {allowQuotedId()}? line 1:117 rule Identifier failed predicate: 
> {allowQuotedId()}?
> {code}
> https://community.cloudera.com/t5/Support-Questions/Querying-a-Hive-Table-via-Hiveserver2-with-Column-Masking/td-p/167260



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-23182) Semantic Exception: rule Identifier failed predicate allowQuotedId

2020-04-11 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17081627#comment-17081627
 ] 

David Mollitor edited comment on HIVE-23182 at 4/12/20, 3:18 AM:
-

I created this JIRA based on a support case in the Cloudera/Hortonworks 
community forums.

I just stumbled across the answer to this.

The user is experiencing and issue because, for masking, the query is 
manipulated to mask the appropriate values and then passed through the Hive 
ANTLR SQL parser to be processed again.  However, it fails because there is a 
feature called {{hive.support.quoted.identifiers}}.  This feature is enabled by 
default, and it allows the original query to be parsed correctly.  However, 
this flag (via the session context) is not provided to the parser the second 
time around and therefore the parser mishandles the backticks.  If no context 
is provided, the default behavior becomes "false".

{code:java|title=HiveLexer.g}
  public void setHiveConf(Configuration hiveConf) {
this.hiveConf = hiveConf;
  }
  
  protected boolean allowQuotedId() {
if(hiveConf == null){
  return false;
}
String supportedQIds = HiveConf.getVar(hiveConf, 
HiveConf.ConfVars.HIVE_QUOTEDID_SUPPORT);
return !"none".equals(supportedQIds);
  }
{code}


was (Author: belugabehr):
I created this JIRA based on a support case in the Cloudera/Hortonworks 
community forums.

> Semantic Exception: rule Identifier failed predicate allowQuotedId
> --
>
> Key: HIVE-23182
> URL: https://issues.apache.org/jira/browse/HIVE-23182
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Priority: Major
> Attachments: Querying a Hive Table (via Hiveserver2) with Colum... - 
> Cloudera Community.pdf
>
>
> Querying a Hive Table (via Hiveserver2) with Column Masking enabled via 
> Ranger Hive Plugin returns with an error.
> {code:none}
> [42000]: Error while compiling statement: FAILED: SemanticException 
> org.apache.hadoop.hive.ql.parse.ParseException: line 1:62 rule Identifier 
> failed predicate: {allowQuotedId()}? line 1:74 rule Identifier failed 
> predicate: {allowQuotedId()}? line 1:94 rule Identifier failed predicate: 
> {allowQuotedId()}? line 1:117 rule Identifier failed predicate: 
> {allowQuotedId()}?
> {code}
> https://community.cloudera.com/t5/Support-Questions/Querying-a-Hive-Table-via-Hiveserver2-with-Column-Masking/td-p/167260



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23182) Semantic Exception: rule Identifier failed predicate allowQuotedId

2020-04-11 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17081627#comment-17081627
 ] 

David Mollitor commented on HIVE-23182:
---

I created this JIRA based on a support case in the Cloudera/Hortonworks 
community forums.

> Semantic Exception: rule Identifier failed predicate allowQuotedId
> --
>
> Key: HIVE-23182
> URL: https://issues.apache.org/jira/browse/HIVE-23182
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Priority: Major
> Attachments: Querying a Hive Table (via Hiveserver2) with Colum... - 
> Cloudera Community.pdf
>
>
> Querying a Hive Table (via Hiveserver2) with Column Masking enabled via 
> Ranger Hive Plugin returns with an error.
> {code:none}
> [42000]: Error while compiling statement: FAILED: SemanticException 
> org.apache.hadoop.hive.ql.parse.ParseException: line 1:62 rule Identifier 
> failed predicate: {allowQuotedId()}? line 1:74 rule Identifier failed 
> predicate: {allowQuotedId()}? line 1:94 rule Identifier failed predicate: 
> {allowQuotedId()}? line 1:117 rule Identifier failed predicate: 
> {allowQuotedId()}?
> {code}
> https://community.cloudera.com/t5/Support-Questions/Querying-a-Hive-Table-via-Hiveserver2-with-Column-Masking/td-p/167260



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23182) Semantic Exception: rule Identifier failed predicate allowQuotedId

2020-04-11 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HIVE-23182:
--
Attachment: Querying a Hive Table (via Hiveserver2) with Colum... - 
Cloudera Community.pdf

> Semantic Exception: rule Identifier failed predicate allowQuotedId
> --
>
> Key: HIVE-23182
> URL: https://issues.apache.org/jira/browse/HIVE-23182
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Priority: Major
> Attachments: Querying a Hive Table (via Hiveserver2) with Colum... - 
> Cloudera Community.pdf
>
>
> Querying a Hive Table (via Hiveserver2) with Column Masking enabled via 
> Ranger Hive Plugin returns with an error.
> {code:none}
> [42000]: Error while compiling statement: FAILED: SemanticException 
> org.apache.hadoop.hive.ql.parse.ParseException: line 1:62 rule Identifier 
> failed predicate: {allowQuotedId()}? line 1:74 rule Identifier failed 
> predicate: {allowQuotedId()}? line 1:94 rule Identifier failed predicate: 
> {allowQuotedId()}? line 1:117 rule Identifier failed predicate: 
> {allowQuotedId()}?
> {code}
> Querying a Hive Table (via Hiveserver2) with Column Masking enabled via 
> Ranger Hive Plugin returns with an error.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23182) Semantic Exception: rule Identifier failed predicate allowQuotedId

2020-04-11 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HIVE-23182:
--
Description: 
Querying a Hive Table (via Hiveserver2) with Column Masking enabled via Ranger 
Hive Plugin returns with an error.

{code:none}
[42000]: Error while compiling statement: FAILED: SemanticException 
org.apache.hadoop.hive.ql.parse.ParseException: line 1:62 rule Identifier 
failed predicate: {allowQuotedId()}? line 1:74 rule Identifier failed 
predicate: {allowQuotedId()}? line 1:94 rule Identifier failed predicate: 
{allowQuotedId()}? line 1:117 rule Identifier failed predicate: 
{allowQuotedId()}?
{code}

https://community.cloudera.com/t5/Support-Questions/Querying-a-Hive-Table-via-Hiveserver2-with-Column-Masking/td-p/167260


  was:
Querying a Hive Table (via Hiveserver2) with Column Masking enabled via Ranger 
Hive Plugin returns with an error.

{code:none}
[42000]: Error while compiling statement: FAILED: SemanticException 
org.apache.hadoop.hive.ql.parse.ParseException: line 1:62 rule Identifier 
failed predicate: {allowQuotedId()}? line 1:74 rule Identifier failed 
predicate: {allowQuotedId()}? line 1:94 rule Identifier failed predicate: 
{allowQuotedId()}? line 1:117 rule Identifier failed predicate: 
{allowQuotedId()}?
{code}

Querying a Hive Table (via Hiveserver2) with Column Masking enabled via Ranger 
Hive Plugin returns with an error.



> Semantic Exception: rule Identifier failed predicate allowQuotedId
> --
>
> Key: HIVE-23182
> URL: https://issues.apache.org/jira/browse/HIVE-23182
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Priority: Major
> Attachments: Querying a Hive Table (via Hiveserver2) with Colum... - 
> Cloudera Community.pdf
>
>
> Querying a Hive Table (via Hiveserver2) with Column Masking enabled via 
> Ranger Hive Plugin returns with an error.
> {code:none}
> [42000]: Error while compiling statement: FAILED: SemanticException 
> org.apache.hadoop.hive.ql.parse.ParseException: line 1:62 rule Identifier 
> failed predicate: {allowQuotedId()}? line 1:74 rule Identifier failed 
> predicate: {allowQuotedId()}? line 1:94 rule Identifier failed predicate: 
> {allowQuotedId()}? line 1:117 rule Identifier failed predicate: 
> {allowQuotedId()}?
> {code}
> https://community.cloudera.com/t5/Support-Questions/Querying-a-Hive-Table-via-Hiveserver2-with-Column-Masking/td-p/167260



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23176) Remove SELECT REGEX Column Feature

2020-04-11 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17081587#comment-17081587
 ] 

David Mollitor commented on HIVE-23176:
---

As I pointed out in [HIVE-23172] once I removed the REGEX rule, quite a few 
unit tests are failing:

{code:none}
Caused by: org.apache.hadoop.hive.ql.parse.ParseException: line 1:7 rule 
Identifier failed predicate: {allowQuotedId()}?
line 1:23 rule Identifier failed predicate: {allowQuotedId()}?
line 1:25 rule Identifier failed predicate: {allowQuotedId()}?
line 1:29 rule Identifier failed predicate: {allowQuotedId()}?
line 1:32 rule Identifier failed predicate: {allowQuotedId()}?
line 1:48 rule Identifier failed predicate: {allowQuotedId()}?
{code}

Because they are now being parsed as an ID (appropriately) whereas they were 
being parsed as a REGEX before.  The tests use quoted strings in the 
identifiers, but {{allowQuotedId()}} is not enabled so the table names don't 
match a valid ID even though they should.

> Remove SELECT REGEX Column Feature
> --
>
> Key: HIVE-23176
> URL: https://issues.apache.org/jira/browse/HIVE-23176
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
> Attachments: HIVE-23176.1.patch
>
>
> Remove the Hive feature: REGEX Column.
>  
> Hive has this interesting feature for doing REGEX to SELECT multiple columns. 
>  This needs to go.  It is not SQL standard and as currently implemented, it 
> is impossible to determine if a column identifier is a REGEX or the actual 
> name of the column.  If a column name is enclosed in back ticks then any 
> UTF-8 character is a valid table name.
>  
> [https://dev.mysql.com/doc/refman/8.0/en/identifiers.html]
> [https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23171) Create Tool To Visualize Hive Parser Tree

2020-04-11 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HIVE-23171:
--
Attachment: HIVE-23171.1.patch

> Create Tool To Visualize Hive Parser Tree
> -
>
> Key: HIVE-23171
> URL: https://issues.apache.org/jira/browse/HIVE-23171
> Project: Hive
>  Issue Type: Sub-task
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
> Attachments: HIVE-23171.1.patch, HIVE-23171.1.patch, 
> HIVE-23171.1.patch, select_1.png
>
>
> For some of the work I would like to do on HIVE-23149, it would be nice to 
> visualize the output of the statement parser.
> I have created a tool that spits out the parser tree in DOT file format. This 
> allows it to be visualized using a plethora of tools.
> To use it, compile the \{{hive-parser}} test JAR and run it.  The application 
> takes a single command line argument of a String.  The String is the SQL 
> statement to parse:
> {code:none}
> HqlParser "SELECT 1"
> {code}
> I have attached an example of the output that I generated for a {{SELECT 1}} 
> statement:
>  
>  
> !select_1.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23171) Create Tool To Visualize Hive Parser Tree

2020-04-10 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HIVE-23171:
--
Attachment: HIVE-23171.1.patch

> Create Tool To Visualize Hive Parser Tree
> -
>
> Key: HIVE-23171
> URL: https://issues.apache.org/jira/browse/HIVE-23171
> Project: Hive
>  Issue Type: Sub-task
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
> Attachments: HIVE-23171.1.patch, HIVE-23171.1.patch, select_1.png
>
>
> For some of the work I would like to do on HIVE-23149, it would be nice to 
> visualize the output of the statement parser.
> I have created a tool that spits out the parser tree in DOT file format. This 
> allows it to be visualized using a plethora of tools.
> To use it, compile the \{{hive-parser}} test JAR and run it.  The application 
> takes a single command line argument of a String.  The String is the SQL 
> statement to parse:
> {code:none}
> HqlParser "SELECT 1"
> {code}
> I have attached an example of the output that I generated for a {{SELECT 1}} 
> statement:
>  
>  
> !select_1.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23176) Remove SELECT REGEX Column Feature

2020-04-10 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HIVE-23176:
--
Summary: Remove SELECT REGEX Column Feature  (was: Remove REGEX Column 
Feature)

> Remove SELECT REGEX Column Feature
> --
>
> Key: HIVE-23176
> URL: https://issues.apache.org/jira/browse/HIVE-23176
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
> Attachments: HIVE-23176.1.patch
>
>
> Remove the Hive feature: REGEX Column.
>  
> Hive has this interesting feature for doing REGEX to SELECT multiple columns. 
>  This needs to go.  It is not SQL standard and as currently implemented, it 
> is impossible to determine if a column identifier is a REGEX or the actual 
> name of the column.  If a column name is enclosed in back ticks then any 
> UTF-8 character is a valid table name.
>  
> [https://dev.mysql.com/doc/refman/8.0/en/identifiers.html]
> [https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23176) Remove REGEX Column Feature

2020-04-10 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor reassigned HIVE-23176:
-

Assignee: David Mollitor

> Remove REGEX Column Feature
> ---
>
> Key: HIVE-23176
> URL: https://issues.apache.org/jira/browse/HIVE-23176
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
> Attachments: HIVE-23176.1.patch
>
>
> Remove the Hive feature: REGEX Column.
>  
> Hive has this interesting feature for doing REGEX to SELECT multiple columns. 
>  This needs to go.  It is not SQL standard and as currently implemented, it 
> is impossible to determine if a column identifier is a REGEX or the actual 
> name of the column.  If a column name is enclosed in back ticks then any 
> UTF-8 character is a valid table name.
>  
> [https://dev.mysql.com/doc/refman/8.0/en/identifiers.html]
> [https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23176) Remove REGEX Column Feature

2020-04-10 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HIVE-23176:
--
Attachment: HIVE-23176.1.patch

> Remove REGEX Column Feature
> ---
>
> Key: HIVE-23176
> URL: https://issues.apache.org/jira/browse/HIVE-23176
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Priority: Major
> Attachments: HIVE-23176.1.patch
>
>
> Remove the Hive feature: REGEX Column.
>  
> Hive has this interesting feature for doing REGEX to SELECT multiple columns. 
>  This needs to go.  It is not SQL standard and as currently implemented, it 
> is impossible to determine if a column identifier is a REGEX or the actual 
> name of the column.  If a column name is enclosed in back ticks then any 
> UTF-8 character is a valid table name.
>  
> [https://dev.mysql.com/doc/refman/8.0/en/identifiers.html]
> [https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23176) Remove REGEX Column Feature

2020-04-10 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HIVE-23176:
--
Status: Patch Available  (was: Open)

> Remove REGEX Column Feature
> ---
>
> Key: HIVE-23176
> URL: https://issues.apache.org/jira/browse/HIVE-23176
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
> Attachments: HIVE-23176.1.patch
>
>
> Remove the Hive feature: REGEX Column.
>  
> Hive has this interesting feature for doing REGEX to SELECT multiple columns. 
>  This needs to go.  It is not SQL standard and as currently implemented, it 
> is impossible to determine if a column identifier is a REGEX or the actual 
> name of the column.  If a column name is enclosed in back ticks then any 
> UTF-8 character is a valid table name.
>  
> [https://dev.mysql.com/doc/refman/8.0/en/identifiers.html]
> [https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23176) Remove REGEX Column Feature

2020-04-10 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17080653#comment-17080653
 ] 

David Mollitor commented on HIVE-23176:
---

* 
https://stackoverflow.com/questions/7450750/selecting-column-using-regexp-in-mysql
* http://bogiecom.com/2013/01/mysql-regex-column-name-selection/
* 
https://stackoverflow.com/questions/16002690/regular-expression-to-get-selected-columns
* 
https://stackoverflow.com/questions/22532419/sql-query-for-searching-column-name-in-database

This is supported in other RDBMS, but it's not baked in. It is possible to do 
this, but with some extra leg work for the client.

> Remove REGEX Column Feature
> ---
>
> Key: HIVE-23176
> URL: https://issues.apache.org/jira/browse/HIVE-23176
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Priority: Major
>
> Remove the Hive feature: REGEX Column.
>  
> Hive has this interesting feature for doing REGEX to SELECT multiple columns. 
>  This needs to go.  It is not SQL standard and as currently implemented, it 
> is impossible to determine if a column identifier is a REGEX or the actual 
> name of the column.  If a column name is enclosed in back ticks then any 
> UTF-8 character is a valid table name.
>  
> [https://dev.mysql.com/doc/refman/8.0/en/identifiers.html]
> [https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-21354) Lock The Entire Table If Majority Of Partitions Are Locked

2020-04-10 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17080512#comment-17080512
 ] 

David Mollitor edited comment on HIVE-21354 at 4/10/20, 3:43 PM:
-

[~pvary] Since the queries are always taking the table lock... why do they also 
bother to take the partition locks?


was (Author: belugabehr):
[~pvary] Since the queries are always taking the table lock... why do they also 
take the partition locks?

> Lock The Entire Table If Majority Of Partitions Are Locked
> --
>
> Key: HIVE-21354
> URL: https://issues.apache.org/jira/browse/HIVE-21354
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 4.0.0, 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>
> One of the bottlenecks of any Hive query is the ZooKeeper locking mechanism.  
> When a Hive query interacts with a table which has a lot of partitions, this 
> may put a lot of stress on the ZK system.
> Please add a heuristic that works like this:
> # Count the number of partitions that a query is required to lock
> # Obtain the total number of partitions in the table
> # If the number of partitions accessed by the query is greater than or equal 
> to half the total number of partitions, simply create one ZNode lock at the 
> table level.
> This would improve performance of many queries, but in particular, a {{select 
> count(1) from table}} ... or ... {{select * from table limit 5}} where the 
> table has many partitions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23171) Create Tool To Visualize Hive Parser Tree

2020-04-10 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HIVE-23171:
--
Description: 
For some of the work I would like to do on HIVE-23149, it would be nice to 
visualize the output of the statement parser.

I have created a tool that spits out the parser tree in DOT file format. This 
allows it to be visualized using a plethora of tools.

To use it, compile the \{{hive-parser}} test JAR and run it.  The application 
takes a single command line argument of a String.  The String is the SQL 
statement to parse:

{code:none}
HqlParser "SELECT 1"
{code}

I have attached an example of the output that I generated for a {{SELECT 1}} 
statement:

 

 

!select_1.png!

  was:
For some of the work I would like to do on HIVE-23149, it would be nice to 
visualize the output of the statement parser.

I have created a tool that spits out the parser tree in DOT file format. This 
allows it to be visualized using a plethora of tools.

I have attached an example of the output that I generated for a {{SELECT 1}} 
statement:

 

 

!select_1.png!


> Create Tool To Visualize Hive Parser Tree
> -
>
> Key: HIVE-23171
> URL: https://issues.apache.org/jira/browse/HIVE-23171
> Project: Hive
>  Issue Type: Sub-task
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
> Attachments: HIVE-23171.1.patch, select_1.png
>
>
> For some of the work I would like to do on HIVE-23149, it would be nice to 
> visualize the output of the statement parser.
> I have created a tool that spits out the parser tree in DOT file format. This 
> allows it to be visualized using a plethora of tools.
> To use it, compile the \{{hive-parser}} test JAR and run it.  The application 
> takes a single command line argument of a String.  The String is the SQL 
> statement to parse:
> {code:none}
> HqlParser "SELECT 1"
> {code}
> I have attached an example of the output that I generated for a {{SELECT 1}} 
> statement:
>  
>  
> !select_1.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-23171) Create Tool To Visualize Hive Parser Tree

2020-04-10 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17080505#comment-17080505
 ] 

David Mollitor edited comment on HIVE-23171 at 4/10/20, 3:05 PM:
-

[~mgergely] Thanks for the review.
 # I removed the version tag from {{antlr-runtime}} artifact because it is 
inherited from the parent POM. Including it again is just superfluous and is 
the primary benefit of declaring it in the parent POM. (Even my IDE flags this 
with a warning)
 # This is the only project that uses this artifact and it's scoped to test. 
There is no reason to publish this information in the parent POM. The parent 
POM is used to manage version information across the entire project. Since no 
other projects are using it, no need to clutter the parent POM or to modify it.


was (Author: belugabehr):
[~mgergely] Thanks for the review.

# I removed the version tag from {{antlr-runtime}} artifact because it is 
inherited from the parent POM.  Including it again is just superfluous and is 
the primary benefit of declaring it in the parent POM.
# This is the only project that uses this artifact and it's scoped to test.  
There is no reason to publish this information in the parent POM.  The parent 
POM is used to manage version information across the entire project.  Since no 
other projects are using it, no need to clutter the parent POM or to modify it.

> Create Tool To Visualize Hive Parser Tree
> -
>
> Key: HIVE-23171
> URL: https://issues.apache.org/jira/browse/HIVE-23171
> Project: Hive
>  Issue Type: Sub-task
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
> Attachments: HIVE-23171.1.patch, select_1.png
>
>
> For some of the work I would like to do on HIVE-23149, it would be nice to 
> visualize the output of the statement parser.
> I have created a tool that spits out the parser tree in DOT file format. This 
> allows it to be visualized using a plethora of tools.
> I have attached an example of the output that I generated for a {{SELECT 1}} 
> statement:
>  
>  
> !select_1.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-21354) Lock The Entire Table If Majority Of Partitions Are Locked

2020-04-10 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17080512#comment-17080512
 ] 

David Mollitor commented on HIVE-21354:
---

[~pvary] Since the queries are always taking the table lock... why do they also 
take the partition locks?

> Lock The Entire Table If Majority Of Partitions Are Locked
> --
>
> Key: HIVE-21354
> URL: https://issues.apache.org/jira/browse/HIVE-21354
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 4.0.0, 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>
> One of the bottlenecks of any Hive query is the ZooKeeper locking mechanism.  
> When a Hive query interacts with a table which has a lot of partitions, this 
> may put a lot of stress on the ZK system.
> Please add a heuristic that works like this:
> # Count the number of partitions that a query is required to lock
> # Obtain the total number of partitions in the table
> # If the number of partitions accessed by the query is greater than or equal 
> to half the total number of partitions, simply create one ZNode lock at the 
> table level.
> This would improve performance of many queries, but in particular, a {{select 
> count(1) from table}} ... or ... {{select * from table limit 5}} where the 
> table has many partitions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-21354) Lock The Entire Table If Majority Of Partitions Are Locked

2020-04-10 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor reassigned HIVE-21354:
-

Assignee: David Mollitor

> Lock The Entire Table If Majority Of Partitions Are Locked
> --
>
> Key: HIVE-21354
> URL: https://issues.apache.org/jira/browse/HIVE-21354
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 4.0.0, 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>
> One of the bottlenecks of any Hive query is the ZooKeeper locking mechanism.  
> When a Hive query interacts with a table which has a lot of partitions, this 
> may put a lot of stress on the ZK system.
> Please add a heuristic that works like this:
> # Count the number of partitions that a query is required to lock
> # Obtain the total number of partitions in the table
> # If the number of partitions accessed by the query is greater than or equal 
> to half the total number of partitions, simply create one ZNode lock at the 
> table level.
> This would improve performance of many queries, but in particular, a {{select 
> count(1) from table}} ... or ... {{select * from table limit 5}} where the 
> table has many partitions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-21354) Lock The Entire Table If Majority Of Partitions Are Locked

2020-04-10 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17080510#comment-17080510
 ] 

David Mollitor commented on HIVE-21354:
---

Hey [~pvary], I'll take a crack at it.

> Lock The Entire Table If Majority Of Partitions Are Locked
> --
>
> Key: HIVE-21354
> URL: https://issues.apache.org/jira/browse/HIVE-21354
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 4.0.0, 3.2.0
>Reporter: David Mollitor
>Priority: Major
>
> One of the bottlenecks of any Hive query is the ZooKeeper locking mechanism.  
> When a Hive query interacts with a table which has a lot of partitions, this 
> may put a lot of stress on the ZK system.
> Please add a heuristic that works like this:
> # Count the number of partitions that a query is required to lock
> # Obtain the total number of partitions in the table
> # If the number of partitions accessed by the query is greater than or equal 
> to half the total number of partitions, simply create one ZNode lock at the 
> table level.
> This would improve performance of many queries, but in particular, a {{select 
> count(1) from table}} ... or ... {{select * from table limit 5}} where the 
> table has many partitions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23171) Create Tool To Visualize Hive Parser Tree

2020-04-10 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17080505#comment-17080505
 ] 

David Mollitor commented on HIVE-23171:
---

[~mgergely] Thanks for the review.

# I removed the version tag from {{antlr-runtime}} artifact because it is 
inherited from the parent POM.  Including it again is just superfluous and is 
the primary benefit of declaring it in the parent POM.
# This is the only project that uses this artifact and it's scoped to test.  
There is no reason to publish this information in the parent POM.  The parent 
POM is used to manage version information across the entire project.  Since no 
other projects are using it, no need to clutter the parent POM or to modify it.

> Create Tool To Visualize Hive Parser Tree
> -
>
> Key: HIVE-23171
> URL: https://issues.apache.org/jira/browse/HIVE-23171
> Project: Hive
>  Issue Type: Sub-task
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
> Attachments: HIVE-23171.1.patch, select_1.png
>
>
> For some of the work I would like to do on HIVE-23149, it would be nice to 
> visualize the output of the statement parser.
> I have created a tool that spits out the parser tree in DOT file format. This 
> allows it to be visualized using a plethora of tools.
> I have attached an example of the output that I generated for a {{SELECT 1}} 
> statement:
>  
>  
> !select_1.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-15577) Simplify current parser

2020-04-10 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-15577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17080488#comment-17080488
 ] 

David Mollitor commented on HIVE-15577:
---

http://mail-archives.apache.org/mod_mbox/hive-dev/202004.mbox/%3CCAPCi2CniqzjcnFitVC0vtTt%3DkRu_1eCZczLGPArN-6D_fLPLyA%40mail.gmail.com%3E

> Simplify current parser
> ---
>
> Key: HIVE-15577
> URL: https://issues.apache.org/jira/browse/HIVE-15577
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
>Priority: Major
>
> We encountered "code too large" problem frequently. We need to reduce the 
> code size.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23171) Create Tool To Visualize Hive Parser Tree

2020-04-10 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17080471#comment-17080471
 ] 

David Mollitor commented on HIVE-23171:
---

[~mgergely] [~jmrodri]

> Create Tool To Visualize Hive Parser Tree
> -
>
> Key: HIVE-23171
> URL: https://issues.apache.org/jira/browse/HIVE-23171
> Project: Hive
>  Issue Type: Sub-task
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
> Attachments: HIVE-23171.1.patch, select_1.png
>
>
> For some of the work I would like to do on HIVE-23149, it would be nice to 
> visualize the output of the statement parser.
> I have created a tool that spits out the parser tree in DOT file format. This 
> allows it to be visualized using a plethora of tools.
> I have attached an example of the output that I generated for a {{SELECT 1}} 
> statement:
>  
>  
> !select_1.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


<    2   3   4   5   6   7   8   9   10   11   >