[jira] [Updated] (HIVE-22447) Update HBase Version to GA

2023-11-27 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HIVE-22447:
--
Fix Version/s: 4.0.0-alpha-1
   Resolution: Duplicate
   Status: Resolved  (was: Patch Available)

> Update HBase Version to GA
> --
>
> Key: HIVE-22447
> URL: https://issues.apache.org/jira/browse/HIVE-22447
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
> Fix For: 4.0.0-alpha-1
>
> Attachments: HIVE-22447.1.patch, HIVE-22447.2.patch
>
>
> Currently at:
> {code:none}
> 2.0.0-alpha4
> {code}
> Upgrade to a GA release (2.2.2)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-22447) Update HBase Version to GA

2023-11-27 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17790336#comment-17790336
 ] 

David Mollitor commented on HIVE-22447:
---

[~stoty] Sounds good. Thanks.

> Update HBase Version to GA
> --
>
> Key: HIVE-22447
> URL: https://issues.apache.org/jira/browse/HIVE-22447
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
> Attachments: HIVE-22447.1.patch, HIVE-22447.2.patch
>
>
> Currently at:
> {code:none}
> 2.0.0-alpha4
> {code}
> Upgrade to a GA release (2.2.2)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-26470) Remove stringifyException from Standalone MetaStore

2022-08-12 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HIVE-26470:
--
Summary: Remove stringifyException from Standalone MetaStore  (was: Remove 
stringifyException from MetaStore)

> Remove stringifyException from Standalone MetaStore
> ---
>
> Key: HIVE-26470
> URL: https://issues.apache.org/jira/browse/HIVE-26470
> Project: Hive
>  Issue Type: Sub-task
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-26470) Remove stringifyException from MetaStore

2022-08-12 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor reassigned HIVE-26470:
-


> Remove stringifyException from MetaStore
> 
>
> Key: HIVE-26470
> URL: https://issues.apache.org/jira/browse/HIVE-26470
> Project: Hive
>  Issue Type: Sub-task
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-26469) Remove stringifyException Method From QL Package

2022-08-12 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor reassigned HIVE-26469:
-


> Remove stringifyException Method From QL Package
> 
>
> Key: HIVE-26469
> URL: https://issues.apache.org/jira/browse/HIVE-26469
> Project: Hive
>  Issue Type: Sub-task
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-22417) Remove stringifyException from MetaStore

2022-08-05 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HIVE-22417:
--
Fix Version/s: 4.0.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Fixed after 4.0.0.alpha was released but there is no 4.1 or 5.0 target 
currently available.

> Remove stringifyException from MetaStore
> 
>
> Key: HIVE-22417
> URL: https://issues.apache.org/jira/browse/HIVE-22417
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore, Standalone Metastore
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-22417.1.patch, HIVE-22417.2.patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-26456) Remove stringifyException Method From Storage Handlers

2022-08-05 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor reassigned HIVE-26456:
-


> Remove stringifyException Method From Storage Handlers
> --
>
> Key: HIVE-26456
> URL: https://issues.apache.org/jira/browse/HIVE-26456
> Project: Hive
>  Issue Type: Sub-task
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-22417) Remove stringifyException from MetaStore

2022-07-26 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17571707#comment-17571707
 ] 

David Mollitor commented on HIVE-22417:
---

[~zabetak] Can you please start with a review of this one?  I just put a PR up 
on GitHub.

> Remove stringifyException from MetaStore
> 
>
> Key: HIVE-22417
> URL: https://issues.apache.org/jira/browse/HIVE-22417
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore, Standalone Metastore
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22417.1.patch, HIVE-22417.2.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-25580) Increase the performance of getTableColumnStatistics and getPartitionColumnStatistics

2021-09-30 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17422984#comment-17422984
 ] 

David Mollitor commented on HIVE-25580:
---

Sorry, was looking at an older version of the schema.  4.0 is:

{code:sql}
CREATE INDEX TAB_COL_STATS_IDX ON TAB_COL_STATS (CAT_NAME, DB_NAME, TABLE_NAME, 
COLUMN_NAME) USING BTREE;
{code}

> Increase the performance of getTableColumnStatistics and 
> getPartitionColumnStatistics
> -
>
> Key: HIVE-25580
> URL: https://issues.apache.org/jira/browse/HIVE-25580
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When the PART_COL_STATS table contains high number of rows the 
> getTableColumnStatistics and getPartitionColumnStatistics response time 
> increases.
> The root cause is the full table scan for the jdbc query below:
> {code:java}
> 2021-09-27 13:22:44,218 DEBUG DataNucleus.Datastore.Native: 
> [pool-6-thread-199]: SELECT DISTINCT "A0"."ENGINE" FROM "PART_COL_STATS" "A0"
> 2021-09-27 13:22:50,569 DEBUG DataNucleus.Datastore.Retrieve: 
> [pool-6-thread-199]: Execution Time = 6351 ms {code}
> The time spent in 
> [here|https://github.com/apache/hive/blob/ed1882ef569f8d00317597c269cfae35ace5a5fa/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L9965]:
> {code:java}
>   query = pm.newQuery(MPartitionColumnStatistics.class);
>   query.setResult("DISTINCT engine");
>   Collection names = (Collection) query.execute();
> {code}
> We might get a better performance if we limit the query range based on the 
> cat/db/table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-25580) Increase the performance of getTableColumnStatistics and getPartitionColumnStatistics

2021-09-30 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17422788#comment-17422788
 ] 

David Mollitor commented on HIVE-25580:
---

Yes.

{code:sql}
CREATE INDEX PCS_STATS_IDX ON PART_COL_STATS 
(DB_NAME,TABLE_NAME,COLUMN_NAME,PARTITION_NAME) USING BTREE;
{code}

DB/TBL should hit the index and be much faster.

> Increase the performance of getTableColumnStatistics and 
> getPartitionColumnStatistics
> -
>
> Key: HIVE-25580
> URL: https://issues.apache.org/jira/browse/HIVE-25580
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>
> When the PART_COL_STATS table contains high number of rows the 
> getTableColumnStatistics and getPartitionColumnStatistics response time 
> increases.
> The root cause is the full table scan for the jdbc query below:
> {code:java}
> 2021-09-27 13:22:44,218 DEBUG DataNucleus.Datastore.Native: 
> [pool-6-thread-199]: SELECT DISTINCT "A0"."ENGINE" FROM "PART_COL_STATS" "A0"
> 2021-09-27 13:22:50,569 DEBUG DataNucleus.Datastore.Retrieve: 
> [pool-6-thread-199]: Execution Time = 6351 ms {code}
> The time spent in 
> [here|https://github.com/apache/hive/blob/ed1882ef569f8d00317597c269cfae35ace5a5fa/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L9965]:
> {code:java}
>   query = pm.newQuery(MPartitionColumnStatistics.class);
>   query.setResult("DISTINCT engine");
>   Collection names = (Collection) query.execute();
> {code}
> We might get a better performance if we limit the query range based on the 
> cat/db/table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25544) Remove Dependency of hive-meta-common From hive-common

2021-09-21 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor reassigned HIVE-25544:
-


> Remove Dependency of hive-meta-common From hive-common
> --
>
> Key: HIVE-25544
> URL: https://issues.apache.org/jira/browse/HIVE-25544
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>
> These two things should not be linked and it means any HS2 client libraries 
> pulling in hive-common library also has to pull in a ton of metastore code as 
> well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25543) Add Read-Only Capability to ObjectStore

2021-09-21 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor reassigned HIVE-25543:
-


> Add Read-Only Capability to ObjectStore
> ---
>
> Key: HIVE-25543
> URL: https://issues.apache.org/jira/browse/HIVE-25543
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>
> Recently saw some stack-traces that shows that calling "commit" triggers 
> quite a bit of work within DataNucleus, as I understand it, to look for 
> changes in the transaction and to commit those changes.
> Given that many of the RPCs within the Metastore are look-ups, Hive can avoid 
> all these needless work by making transaction read-only (rollbackOnly).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25542) Remove References to Index Configurations

2021-09-21 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HIVE-25542:
--
Summary: Remove References to Index Configurations  (was: Remove References 
to hive.optimize.index.filter)

> Remove References to Index Configurations
> -
>
> Key: HIVE-25542
> URL: https://issues.apache.org/jira/browse/HIVE-25542
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Priority: Minor
>  Labels: newbie, noob
>
> Hive indexes were removed from 4.x series.
> Please remove all references to the Index configurations
> For example: hive.optimize.index.filter
> Also update the docs:
> https://cwiki.apache.org/confluence/display/hive/configuration+properties



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-25496) hadoop 3.3.1 / hive 3.2.1 / OpenJDK11 compatible?

2021-09-13 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17414271#comment-17414271
 ] 

David Mollitor commented on HIVE-25496:
---

I've been working on this effort part-time for more than 18 months: [HIVE-24484]

 

 

> hadoop 3.3.1 / hive 3.2.1 / OpenJDK11 compatible?
> -
>
> Key: HIVE-25496
> URL: https://issues.apache.org/jira/browse/HIVE-25496
> Project: Hive
>  Issue Type: Bug
> Environment: Linux VM
>Reporter: Jerome Le Ray
>Assignee: Jerome Le Ray
>Priority: Major
>
> We used the following configuration
> hadoop 3.2.1
> hive 3.1.2
> PostGres 12
> Java - OracleJDK 8
> For internal reasons, we have to migrate to OpenJDK11.
> So, I've migrated hadoop 3.2.1 to the new version hadoop 3.3.1
> When I'm starting the hiveserver2 service, I've got the error :
> which: no hbase in 
> (/usr/local/bin:/bin:/usr/pgsql-12/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/jdk-11.0.10+9/bin:/opt/hivemetastore/hadoop-3.3.1/bin:/opt/hivemetastore/apache-hive-3.1.2-bin/b
> in)
> 2021-09-02 16:48:05: Starting HiveServer2
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/opt/hivemetastore/hadoop-3.3.1/share/hadoop/common/lib/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/opt/hivemetastore/apache-hive-3.1.2-bin/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> 2021-09-02 16:48:06,744 INFO conf.HiveConf: Found configuration file 
> file:/opt/hivemetastore/apache-hive-3.1.2-bin/conf/hive-site.xml
> 2021-09-02 16:48:07,169 WARN conf.HiveConf: HiveConf of name 
> hive.metastore.local does not exist
> 2021-09-02 16:48:07,169 WARN conf.HiveConf: HiveConf of name 
> hive.metastore.thrift.bind.host does not exist
> 2021-09-02 16:48:07,170 WARN conf.HiveConf: HiveConf of name 
> hive.enforce.bucketing does not exist
> 2021-09-02 16:48:08,414 INFO server.HiveServer2: STARTUP_MSG:
> /
> STARTUP_MSG: Starting HiveServer2
> STARTUP_MSG: host = lhroelcspt1001.enterprisenet.org/10.90.122.159
> STARTUP_MSG: args = [-hiveconf, mapred.job.tracker=local, -hiveconf, 
> fs.default.name=file:///cip-data, -hiveconf, 
> hive.metastore.warehouse.dir=file:cip-data, --hiveconf, hive.server2.thrif
> t.port=1, --hiveconf, hive.root.logger=INFO,console]
> STARTUP_MSG: version = 3.1.2
> (...)
> STARTUP_MSG: build = git://HW13934/Users/gates/tmp/hive-branch-3.1/hive -r 
> 8190d2be7b7165effa62bd21b7d60ef81fb0e4af; compiled by 'gates' on Thu Aug 22 
> 15:01:18 PDT 2019
> /
> 2021-09-02 16:48:08,436 INFO server.HiveServer2: Starting HiveServer2
> 2021-09-02 16:48:08,462 WARN conf.HiveConf: HiveConf of name 
> hive.metastore.local does not exist
> 2021-09-02 16:48:08,463 WARN conf.HiveConf: HiveConf of name 
> hive.metastore.thrift.bind.host does not exist
> 2021-09-02 16:48:08,463 WARN conf.HiveConf: HiveConf of name 
> hive.enforce.bucketing does not exist
> Hive Session ID = 440449ff-99b7-429c-82d9-e20bdcc9b46f
> 2021-09-02 16:48:08,566 INFO SessionState: Hive Session ID = 
> 440449ff-99b7-429c-82d9-e20bdcc9b46f
> 2021-09-02 16:48:08,566 INFO server.HiveServer2: Shutting down HiveServer2
> 2021-09-02 16:48:08,584 INFO server.HiveServer2: Stopping/Disconnecting tez 
> sessions.
> 2021-09-02 16:48:08,585 WARN server.HiveServer2: Error starting HiveServer2 
> on attempt 1, will retry in 6ms
> java.lang.RuntimeException: Error applying authorization policy on hive 
> configuration: class jdk.internal.loader.ClassLoaders$AppClassLoader cannot 
> be cast to class java.net.URLClassLoader (jdk.
> internal.loader.ClassLoaders$AppClassLoader and java.net.URLClassLoader are 
> in module java.base of loader 'bootstrap')
>  at org.apache.hive.service.cli.CLIService.init(CLIService.java:118)
>  at org.apache.hive.service.CompositeService.init(CompositeService.java:59)
>  at org.apache.hive.service.server.HiveServer2.init(HiveServer2.java:230)
>  at 
> org.apache.hive.service.server.HiveServer2.startHiveServer2(HiveServer2.java:1036)
>  at 
> org.apache.hive.service.server.HiveServer2.access$1600(HiveServer2.java:140)
>  at 
> org.apache.hive.service.server.HiveServer2$StartOptionExecutor.execute(HiveServer2.java:1305)
>  at org.apache.hive.service.server.HiveServer2.main(HiveServer2.java:1149)
>  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
>  at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> 

[jira] [Assigned] (HIVE-25495) Upgrade to JLine3

2021-09-01 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor reassigned HIVE-25495:
-


> Upgrade to JLine3
> -
>
> Key: HIVE-25495
> URL: https://issues.apache.org/jira/browse/HIVE-25495
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>
> Jline 2 has been discontinued a long while ago.  Hadoop uses JLine3 so Hive 
> should match.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-25476) Remove Unused Dependencies for JDBC Driver

2021-08-27 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor resolved HIVE-25476.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master.  Thank you [~mgergely] for the review!

> Remove Unused Dependencies for JDBC Driver
> --
>
> Key: HIVE-25476
> URL: https://issues.apache.org/jira/browse/HIVE-25476
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> I am using JDBC driver in a project and was very surprised by the number of 
> dependencies it has.  Remove some unnecessary dependencies to make it a 
> little easier to work with.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25477) Clean Up JDBC Code

2021-08-25 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HIVE-25477:
--
Description: 
* Remove unused imports
 * Remove unused code
 * Remove redundant code

  was:
* Remove unused imports
 * Remove unused code


> Clean Up JDBC Code
> --
>
> Key: HIVE-25477
> URL: https://issues.apache.org/jira/browse/HIVE-25477
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> * Remove unused imports
>  * Remove unused code
>  * Remove redundant code



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-25477) Clean Up JDBC Code

2021-08-25 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor resolved HIVE-25477.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master! Thanks [~mgergely] for the review!

> Clean Up JDBC Code
> --
>
> Key: HIVE-25477
> URL: https://issues.apache.org/jira/browse/HIVE-25477
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> * Remove unused imports
>  * Remove unused code
>  * Remove redundant code



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25477) Clean Up JDBC Code

2021-08-24 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor reassigned HIVE-25477:
-


> Clean Up JDBC Code
> --
>
> Key: HIVE-25477
> URL: https://issues.apache.org/jira/browse/HIVE-25477
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>
> * Remove unused imports
>  * Remove unused code



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25476) Remove Unused Dependencies for JDBC Driver

2021-08-24 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor reassigned HIVE-25476:
-


> Remove Unused Dependencies for JDBC Driver
> --
>
> Key: HIVE-25476
> URL: https://issues.apache.org/jira/browse/HIVE-25476
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>
> I am using JDBC driver in a project and was very surprised by the number of 
> dependencies it has.  Remove some unnecessary dependencies to make it a 
> little easier to work with.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23071) Remove hive.optimize.sort.dynamic.partition config

2021-07-27 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17388132#comment-17388132
 ] 

David Mollitor commented on HIVE-23071:
---

[~mszurap] Updated docs to reflect that it's removed though it would be better 
to have the developers of {{hive.optimize.sort.dynamic.partition.threshold}} 
contribute the updated guidance.

> Remove hive.optimize.sort.dynamic.partition config
> --
>
> Key: HIVE-23071
> URL: https://issues.apache.org/jira/browse/HIVE-23071
> Project: Hive
>  Issue Type: Task
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
> Attachments: HIVE-23071.1.patch
>
>
> {{hive.optimize.sort.dynamic.partition.threshold}} has replaced this config, 
> we should remove the original config.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23071) Remove hive.optimize.sort.dynamic.partition config

2021-07-27 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HIVE-23071:
--
Resolution: Duplicate
Status: Resolved  (was: Patch Available)

> Remove hive.optimize.sort.dynamic.partition config
> --
>
> Key: HIVE-23071
> URL: https://issues.apache.org/jira/browse/HIVE-23071
> Project: Hive
>  Issue Type: Task
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
> Attachments: HIVE-23071.1.patch
>
>
> {{hive.optimize.sort.dynamic.partition.threshold}} has replaced this config, 
> we should remove the original config.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-25320) Purge hive.optimize.sort.dynamic.partition

2021-07-27 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor resolved HIVE-25320.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master.  Thanks [~yodalexsun] for the contribution and [~mgergely] 
for the review!

> Purge hive.optimize.sort.dynamic.partition
> --
>
> Key: HIVE-25320
> URL: https://issues.apache.org/jira/browse/HIVE-25320
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: David Mollitor
>Assignee: Alex Sun
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> {{hive.optimize.sort.dynamic.partition}} has been replace with 
> {{hive.optimize.sort.dynamic.partition.threshold}} .  It has been marked as 
> "deprecated", but it's actually totally defunct in the current code base.  
> Deprecation would allow an admin to continue to use it (maybe as an alias to 
> {{threadshold}} = 0/-1), but that is not the case here.
>  
> Remove all references to "hive.optimize.sort.dynamic.partition" in the q 
> tests and remove {{HIVEOPTSORTDYNAMICPARTITION}} all together.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-25395) Update hadoop to a more recent version

2021-07-27 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor resolved HIVE-25395.
---
Resolution: Duplicate

> Update hadoop to a more recent version
> --
>
> Key: HIVE-25395
> URL: https://issues.apache.org/jira/browse/HIVE-25395
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>
> we are still depending on hadoop 3.1.0
> which doesn't have source attachments - and makes development harder



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-25319) Allow HiveDecimalWritable to Accept Java BigDecimal

2021-07-09 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17378065#comment-17378065
 ] 

David Mollitor commented on HIVE-25319:
---

There does exist a {{HiveDecimal.create(BigDecimal);}}, which I am currently 
utilizing, but would like to keep a "reuse" {{HiveDecimal}} to avoid 
instantiating one for each loop.

> Allow HiveDecimalWritable to Accept Java BigDecimal
> ---
>
> Key: HIVE-25319
> URL: https://issues.apache.org/jira/browse/HIVE-25319
> Project: Hive
>  Issue Type: Improvement
>  Components: storage-api
>Reporter: David Mollitor
>Priority: Minor
>
> Add support for {{set}} in {{HiveDecimalWritable}} of a Java BigDecimal value.
>  
> Also, the unit tests in {{TestHiveDecimalWritable}} are really lacking.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-25319) Allow HiveDecimalWritable to Accept Java BigDecimal

2021-07-09 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17378065#comment-17378065
 ] 

David Mollitor edited comment on HIVE-25319 at 7/9/21, 1:31 PM:


There does exist a {{HiveDecimal.create(BigDecimal);}}, which I am currently 
utilizing, but would like to keep a "reuse" {{HiveDecimalWritable}} to avoid 
instantiating one for each loop.


was (Author: belugabehr):
There does exist a {{HiveDecimal.create(BigDecimal);}}, which I am currently 
utilizing, but would like to keep a "reuse" {{HiveDecimal}} to avoid 
instantiating one for each loop.

> Allow HiveDecimalWritable to Accept Java BigDecimal
> ---
>
> Key: HIVE-25319
> URL: https://issues.apache.org/jira/browse/HIVE-25319
> Project: Hive
>  Issue Type: Improvement
>  Components: storage-api
>Reporter: David Mollitor
>Priority: Minor
>
> Add support for {{set}} in {{HiveDecimalWritable}} of a Java BigDecimal value.
>  
> Also, the unit tests in {{TestHiveDecimalWritable}} are really lacking.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-21311) upgrade netty 4.1.39

2021-07-07 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor resolved HIVE-21311.
---
Resolution: Won't Fix

> upgrade netty 4.1.39
> 
>
> Key: HIVE-21311
> URL: https://issues.apache.org/jira/browse/HIVE-21311
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 3.1.1
>Reporter: t oo
>Priority: Minor
>
> upgrade netty to 4.1.33



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24484) Upgrade Hadoop to 3.3.1

2021-06-29 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17371600#comment-17371600
 ] 

David Mollitor commented on HIVE-24484:
---

OK. When there is Hive+Tez and a LIMIT clause, each Tez Vertex is "interrupted" 
to signal that it should stop running when the LIMIT is reached.  
[HADOOP-17313] adds a lock into the FileSystem API that throws an 
{{InterruptedIOException}} if the thread is interrupted (and clears the 
interrupt flag).  Tez sees this exception as a failure and reports an error.  
Tez probably needs to be updated to handle this situation, but it ain't fun.

https://github.com/apache/hadoop/blob/a3b9c37a397ad4188041dd80621bdeefc46885f2/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java#L3556-L3560

{code:none}
Error while running task ( failure ) : java.lang.RuntimeException: 
java.io.IOException: java.io.IOException: java.io.InterruptedIOException: 
java.lang.InterruptedException
 at 
org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:206)
 at 
org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.(TezGroupedSplitsInputFormat.java:145)
 at 
org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat.getRecordReader(TezGroupedSplitsInputFormat.java:111)
 at 
org.apache.tez.mapreduce.lib.MRReaderMapred.setupOldRecordReader(MRReaderMapred.java:156)
 at org.apache.tez.mapreduce.lib.MRReaderMapred.(MRReaderMapred.java:75)
 at 
org.apache.tez.mapreduce.input.MultiMRInput.initFromEvent(MultiMRInput.java:196)
 at 
org.apache.tez.mapreduce.input.MultiMRInput.handleEvents(MultiMRInput.java:154)
 at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.handleEvent(LogicalIOProcessorRuntimeTask.java:739)
 at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.access$600(LogicalIOProcessorRuntimeTask.java:108)
 at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$1.runInternal(LogicalIOProcessorRuntimeTask.java:816)
 at org.apache.tez.common.RunnableWithNdc.run(RunnableWithNdc.java:35)
 at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: java.io.IOException: 
java.io.InterruptedIOException: java.lang.InterruptedException
 at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
 at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
 at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:422)
 at 
org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:203)
 ... 11 more
Caused by: java.io.IOException: java.io.InterruptedIOException: 
java.lang.InterruptedException
 at 
org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat.getRecordReader(LlapInputFormat.java:141)
 at 
org.apache.hadoop.hive.ql.io.RecordReaderWrapper.create(RecordReaderWrapper.java:72)
 at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:419)
 ... 12 more
Caused by: java.io.InterruptedIOException: java.lang.InterruptedException
 at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3559)
 at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3521)
 at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:540)
 at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365)
 at org.apache.hadoop.mapred.LineRecordReader.(LineRecordReader.java:111)
 at 
org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
 at 
org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat.getRecordReader(LlapInputFormat.java:123)
 ... 14 more
Caused by: java.lang.InterruptedException
 at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1306)
 at java.util.concurrent.Semaphore.acquire(Semaphore.java:312)
 at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3556)
 ... 20 more
{code}

> Upgrade Hadoop to 3.3.1
> ---
>
> Key: HIVE-24484
> URL: https://issues.apache.org/jira/browse/HIVE-24484
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 23m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24484) Upgrade Hadoop to 3.3.1

2021-06-29 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17371578#comment-17371578
 ] 

David Mollitor commented on HIVE-24484:
---

It seems that [HADOOP-17313] is causing one test to fail with 
{{InterruptedException}}.

> Upgrade Hadoop to 3.3.1
> ---
>
> Key: HIVE-24484
> URL: https://issues.apache.org/jira/browse/HIVE-24484
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 23m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24484) Upgrade Hadoop to 3.3.1

2021-06-29 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17371568#comment-17371568
 ] 

David Mollitor commented on HIVE-24484:
---

bq. Apparently Hive is doing something that it shouldn't be in one of its tests 
for moving data across encryption zones. T

As I understand it, the test is checking to make sure that this is not allowed. 
 Hive is using DistCP for the copy.  As I understand it, DistCP would fail 
quietly when moving across encryption zones (RAW zones).  The test would check 
that there was no replication.  However, after [HDFS-14884], DistCP actually 
fails and Hive isn't handling it.  I updated unit test to catch this Exception 
instead of looking for no changes in the destination file system.

> Upgrade Hadoop to 3.3.1
> ---
>
> Key: HIVE-24484
> URL: https://issues.apache.org/jira/browse/HIVE-24484
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 23m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24484) Upgrade Hadoop to 3.3.1

2021-06-29 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24484?focusedWorklogId=616656=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-616656
 ]

David Mollitor logged work on HIVE-24484:
-

Author: David Mollitor
Created on: 29/Jun/21 18:21
Start Date: 29/Jun/21 18:21
Worklog Time Spent: 0.05h 

Issue Time Tracking
---

Worklog Id: (was: 616656)
Time Spent: 2h 23m  (was: 2h 20m)

> Upgrade Hadoop to 3.3.1
> ---
>
> Key: HIVE-24484
> URL: https://issues.apache.org/jira/browse/HIVE-24484
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 23m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-24484) Upgrade Hadoop to 3.3.1

2021-06-29 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17371529#comment-17371529
 ] 

David Mollitor edited comment on HIVE-24484 at 6/29/21, 5:06 PM:
-

Apparently Hive is doing something that it shouldn't be in one of its tests for 
moving data across encryption zones.  This is now forbidden via [HDFS-14884].  
Not sure what the fix is yet.

{{org.apache.hadoop.hive.ql.parse.TestReplicationOnHDFSEncryptedZones}}


was (Author: belugabehr):
Apparently Hive is doing something that it shouldn't be in one of its tests for 
moving data across encryption zones.  This is now forbidden via [HDFS-14884]

> Upgrade Hadoop to 3.3.1
> ---
>
> Key: HIVE-24484
> URL: https://issues.apache.org/jira/browse/HIVE-24484
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24484) Upgrade Hadoop to 3.3.1

2021-06-29 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17371529#comment-17371529
 ] 

David Mollitor commented on HIVE-24484:
---

Apparently Hive is doing something that it shouldn't be in one of its tests for 
moving data across encryption zones.  This is now forbidden via [HDFS-14884]

> Upgrade Hadoop to 3.3.1
> ---
>
> Key: HIVE-24484
> URL: https://issues.apache.org/jira/browse/HIVE-24484
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24484) Upgrade Hadoop to 3.3.1

2021-06-28 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17370667#comment-17370667
 ] 

David Mollitor commented on HIVE-24484:
---

Just bumped into another one... the latest Hadoop added some new DEBUG logging 
that is *very* chatty.  Hive has a feature that allows clients to download the 
logging, however, it is currently capped at 
{{HIVE_SERVER2_THRIFT_RESULTSET_MAX_FETCH_SIZE}} (default: 1) right now.  
Why is it capped at the fetch size?  Because HS will truncate the row count and 
return {{hasMoreRows}} 'false' to the client, so the client does not know there 
are more rows to fetch.  The extra DEBUG log lines explodes the line count and 
pushes out the required (by unit tests) log lines past the truncation mark.

https://github.com/apache/hive/blob/f7a21abf5579a8df07117928caff2d72ecae27e3/service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java#L888-L912

It looks like there was some other work related to this via [HIVE-24861].

> Upgrade Hadoop to 3.3.1
> ---
>
> Key: HIVE-24484
> URL: https://issues.apache.org/jira/browse/HIVE-24484
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-24484) Upgrade Hadoop to 3.3.1

2021-06-25 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369708#comment-17369708
 ] 

David Mollitor edited comment on HIVE-24484 at 6/25/21, 9:00 PM:
-

A lot of heartburn still with [HADOOP-17367]. Ugh.  So, HMS uses Hadoop's 
{{ProxyUser}} class which stores its configuration in a static variable.  Well, 
there are some tests in Hive that launch two HMS instances within the same JVM. 
 So, setting the configuration for one instance of HMS blows away the other 
instance's Proxy configuration.  This was working previously because the the 
code would only load the instance once if it's already been loaded before 
(first-loader wins).  But since the change with [HADOOP-17367] this setup in 
HMS no longer works (it cannot detect if the {{ProxyUser}} has already been 
created because now a default instance is always returned).  So even though the 
second instance would technically be misconfigured if it were stand-alone, it 
would inherit the correct proxy settings by virtue of the first instance of HMS 
being configured correctly.


was (Author: belugabehr):
A lot of heartburn still with [HADOOP-17367]. Ugh.  So, HMS uses Hadoop's 
{{ProxyUser}} class which stores its configuration in a static variable.  Well, 
there are some tests in Hive that launch two HMS instances within the same JVM. 
 So, setting the configuration for one instance of HMS blows away the other 
instance's Proxy configuration.  This was working previously because the the 
code would only load the instance once if it's already been loaded before 
(first-loader wins).  But since the change with [HADOOP-17367] this setup in 
HMS no longer works (it cannot detect if the {{ProxyUser}} has already been 
created because now a default instance is always returned).

> Upgrade Hadoop to 3.3.1
> ---
>
> Key: HIVE-24484
> URL: https://issues.apache.org/jira/browse/HIVE-24484
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-24484) Upgrade Hadoop to 3.3.1

2021-06-25 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369708#comment-17369708
 ] 

David Mollitor edited comment on HIVE-24484 at 6/25/21, 9:00 PM:
-

A lot of heartburn still with [HADOOP-17367]. Ugh.  So, HMS uses Hadoop's 
{{ProxyUser}} class which stores its configuration in a static variable.  Well, 
there are some tests in Hive that launch two HMS instances within the same JVM. 
 So, setting the configuration for one instance of HMS blows away the other 
instance's Proxy configuration.  This was working previously because the the 
code would only load the instance once if it's already been loaded before 
(first-loader wins).  But since the change with [HADOOP-17367] this setup in 
HMS no longer works (it cannot detect if the {{ProxyUser}} has already been 
created because now a default instance is always returned).  So even though the 
second instance would technically be misconfigured if it were stand-alone, it 
inherits the correct proxy settings by virtue of the first instance of HMS 
being configured correctly.


was (Author: belugabehr):
A lot of heartburn still with [HADOOP-17367]. Ugh.  So, HMS uses Hadoop's 
{{ProxyUser}} class which stores its configuration in a static variable.  Well, 
there are some tests in Hive that launch two HMS instances within the same JVM. 
 So, setting the configuration for one instance of HMS blows away the other 
instance's Proxy configuration.  This was working previously because the the 
code would only load the instance once if it's already been loaded before 
(first-loader wins).  But since the change with [HADOOP-17367] this setup in 
HMS no longer works (it cannot detect if the {{ProxyUser}} has already been 
created because now a default instance is always returned).  So even though the 
second instance would technically be misconfigured if it were stand-alone, it 
would inherit the correct proxy settings by virtue of the first instance of HMS 
being configured correctly.

> Upgrade Hadoop to 3.3.1
> ---
>
> Key: HIVE-24484
> URL: https://issues.apache.org/jira/browse/HIVE-24484
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24484) Upgrade Hadoop to 3.3.1

2021-06-25 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369708#comment-17369708
 ] 

David Mollitor commented on HIVE-24484:
---

A lot of heartburn still with [HADOOP-17367]. Ugh.  So, HMS uses Hadoop's 
{{ProxyUser}} class which stores its configuration in a static variable.  Well, 
there are some tests in Hive that launch two HMS instances within the same JVM. 
 So, setting the configuration for one instance of HMS blows away the other 
instance's Proxy configuration.  This was working previously because the the 
code would only load the instance once if it's already been loaded before 
(first-loader wins).  But since the change with [HADOOP-17367] this setup in 
HMS no longer works (it cannot detect if the {{ProxyUser}} has already been 
created because now a default instance is always returned).

> Upgrade Hadoop to 3.3.1
> ---
>
> Key: HIVE-24484
> URL: https://issues.apache.org/jira/browse/HIVE-24484
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24846) Log When HS2 Goes OOM

2021-06-25 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor resolved HIVE-24846.
---
Resolution: Won't Fix

> Log When HS2 Goes OOM
> -
>
> Key: HIVE-24846
> URL: https://issues.apache.org/jira/browse/HIVE-24846
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Otherwise the server just shuts down without any justification.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-24484) Upgrade Hadoop to 3.3.1

2021-06-24 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17368871#comment-17368871
 ] 

David Mollitor edited comment on HIVE-24484 at 6/24/21, 2:05 PM:
-

[HADOOP-17367] is another biting issue for Hive.  In 3.1.0, the the return 
value from {{ProxyUsers#getDefaultImpersonationProvider}} changed.  In 3.1.0, 
the method could return a {{null}} value and then it was up to the caller to 
create a new one and initialize it (providing the Configuration object).  It 
seems like in 3.3.1, it now always returns a value, but it looks like the 
initialization isn't what Hive is expecting.  The initialization  
{{refreshSuperUserGroupsConfiguration}} creates its own (empty) configuration 
whereas before Hive was passing in its own Configuration.


was (Author: belugabehr):
[HADOOP-17367] is another biting issue for Hive.  In 3.1.0, the the return 
value from {{ProxyUsers# getDefaultImpersonationProvider}} changed.  In 3.1.0, 
the method could return a {{null}} value and then it was up to the caller to 
create a new one and initialize it.  It seems like in 3.3.1, it always returns 
a value, but it looks like the initialization isn't what Hive is expecting.  
The initialization  {{refreshSuperUserGroupsConfiguration}} creates its own 
configuration whereas before Hive was passing in its own Configuration.

> Upgrade Hadoop to 3.3.1
> ---
>
> Key: HIVE-24484
> URL: https://issues.apache.org/jira/browse/HIVE-24484
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-24484) Upgrade Hadoop to 3.3.1

2021-06-24 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17368871#comment-17368871
 ] 

David Mollitor edited comment on HIVE-24484 at 6/24/21, 1:55 PM:
-

[HADOOP-17367] is another biting issue for Hive.  In 3.1.0, the the return 
value from {{ProxyUsers# getDefaultImpersonationProvider}} changed.  In 3.1.0, 
the method could return a {{null}} value and then it was up to the caller to 
create a new one and initialize it.  It seems like in 3.3.1, it always returns 
a value, but it looks like the initialization isn't what Hive is expecting.  
The initialization  {{refreshSuperUserGroupsConfiguration}} creates its own 
configuration whereas before Hive was passing in its own Configuration.


was (Author: belugabehr):
[HADOOP-17367] is another biting issue for Hive.  In 3.1.0, the the return 
value from {{ProxyUsers# getDefaultImpersonationProvider}} changed.  In 3.1.0, 
the method could return a {{null} value and then it was up to the caller to 
create a new one and initialize it.  It seems like in 3.3.1, it always returns 
a value, but it looks like the initialization isn't what Hive is expecting.  
The initialization  {{refreshSuperUserGroupsConfiguration}} creates its own 
configuration whereas before Hive was passing in its own Configuration.

> Upgrade Hadoop to 3.3.1
> ---
>
> Key: HIVE-24484
> URL: https://issues.apache.org/jira/browse/HIVE-24484
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24484) Upgrade Hadoop to 3.3.1

2021-06-24 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17368871#comment-17368871
 ] 

David Mollitor commented on HIVE-24484:
---

[HADOOP-17367] is another biting issue for Hive.  In 3.1.0, the the return 
value from {{ProxyUsers# getDefaultImpersonationProvider}} changed.  In 3.1.0, 
the method could return a {{null} value and then it was up to the caller to 
create a new one and initialize it.  It seems like in 3.3.1, it always returns 
a value, but it looks like the initialization isn't what Hive is expecting.  
The initialization  {{refreshSuperUserGroupsConfiguration}} creates its own 
configuration whereas before Hive was passing in its own Configuration.

> Upgrade Hadoop to 3.3.1
> ---
>
> Key: HIVE-24484
> URL: https://issues.apache.org/jira/browse/HIVE-24484
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-24484) Upgrade Hadoop to 3.3.1

2021-06-23 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17368433#comment-17368433
 ] 

David Mollitor edited comment on HIVE-24484 at 6/23/21, 8:30 PM:
-

[HDFS-13505] changed {{dfs.namenode.acls.enabled}} from {{false}} to {{true}}.  
This broke a test in {{TestHCatMultiOutputFormat}}.  The test creates some 
files and then changes their permissions manually.  The test actually checks 
that the file permissions are a certain value.  The overall effect is that the 
files inherited the permissions of their parent directories.  With 
{{dfs.namenode.acls.enabled}} set to {{true}} this manual process 
{{HdfsUtils#setFullFileStatus}} does not perform the manual "set permissions" 
process.


was (Author: belugabehr):
[HDFS-13505] changed {{dfs.namenode.acls.enabled}} from {{false}} to {{true}}.  
This broke a test in {{TestHCatMultiOutputFormat}}.  The test creates some 
files and then changes their permissions manually.  The test actually checks 
that the file permissions are a certain value.  The overall effect is that the 
files inherited the permissions of their parent directories.  With 
{{dfs.namenode.acls.enabled}} set to {{true}} this manual process 
{{HdfsUtils#setFullFileStatus}} does not perform the manual process.

> Upgrade Hadoop to 3.3.1
> ---
>
> Key: HIVE-24484
> URL: https://issues.apache.org/jira/browse/HIVE-24484
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24484) Upgrade Hadoop to 3.3.1

2021-06-23 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17368433#comment-17368433
 ] 

David Mollitor commented on HIVE-24484:
---

[HDFS-13505] changed {{dfs.namenode.acls.enabled}} from {{false}} to {{true}}.  
This broke a test in {{TestHCatMultiOutputFormat}}.  The test creates some 
files and then changes their permissions manually.  The test actually checks 
that the file permissions are a certain value.  The overall effect is that the 
files inherited the permissions of their parent directories.  With 
{{dfs.namenode.acls.enabled}} set to {{true}} this manual process 
{{HdfsUtils#setFullFileStatus}} does not perform the manual process.

> Upgrade Hadoop to 3.3.1
> ---
>
> Key: HIVE-24484
> URL: https://issues.apache.org/jira/browse/HIVE-24484
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24484) Upgrade Hadoop to 3.3.1

2021-06-22 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HIVE-24484:
--
Summary: Upgrade Hadoop to 3.3.1  (was: Upgrade Hadoop to 3.3.0)

> Upgrade Hadoop to 3.3.1
> ---
>
> Key: HIVE-24484
> URL: https://issues.apache.org/jira/browse/HIVE-24484
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24484) Upgrade Hadoop to 3.3.0

2021-06-22 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17367606#comment-17367606
 ] 

David Mollitor commented on HIVE-24484:
---

[HADOOP-16458] caused a small regression for Hive.

{code:java|title=InvalidInputException.java}
if (!probs.isEmpty()) {
  initCause(probs.get(0));
}
{code}

{code:java|title=FetchOperator.java}
try {
  splits = inputFormat.getSplits(job, 1);
} catch (Exception ex) {
  Throwable t = ExceptionUtils.getRootCause(ex);
  if (t instanceof FileNotFoundException || t instanceof 
InvalidInputException) {
LOG.warn("Input path " + currPath + " is empty", t.getMessage());
return;
  }
  throw ex;
}
{code}

So Hive is looking at the "root cause" and was seeing the 
{{InvalidInputException}}.  With [HADOOP-16458], the {{InvalidInputException}} 
now sets a cause so Hive is finding the underlying cause which happens to be an 
{{IOException}}.

I updated my PR to be able to catch the {{InvalidInputException}} and not the 
root cause.

> Upgrade Hadoop to 3.3.0
> ---
>
> Key: HIVE-24484
> URL: https://issues.apache.org/jira/browse/HIVE-24484
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-24484) Upgrade Hadoop to 3.3.0

2021-06-22 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17367606#comment-17367606
 ] 

David Mollitor edited comment on HIVE-24484 at 6/22/21, 6:40 PM:
-

[HADOOP-16458] caused a small regression for Hive.

{code:java|title=InvalidInputException.java (Hadoop)}
if (!probs.isEmpty()) {
  initCause(probs.get(0));
}
{code}

{code:java|title=FetchOperator.java (Hive)}
try {
  splits = inputFormat.getSplits(job, 1);
} catch (Exception ex) {
  Throwable t = ExceptionUtils.getRootCause(ex);
  if (t instanceof FileNotFoundException || t instanceof 
InvalidInputException) {
LOG.warn("Input path " + currPath + " is empty", t.getMessage());
return;
  }
  throw ex;
}
{code}

So Hive is looking at the "root cause" and was seeing the 
{{InvalidInputException}}.  With [HADOOP-16458], the {{InvalidInputException}} 
now sets a cause so Hive is finding the underlying cause which happens to be an 
{{IOException}}.

I updated my PR to be able to catch the {{InvalidInputException}} and not the 
root cause.


was (Author: belugabehr):
[HADOOP-16458] caused a small regression for Hive.

{code:java|title=InvalidInputException.java}
if (!probs.isEmpty()) {
  initCause(probs.get(0));
}
{code}

{code:java|title=FetchOperator.java}
try {
  splits = inputFormat.getSplits(job, 1);
} catch (Exception ex) {
  Throwable t = ExceptionUtils.getRootCause(ex);
  if (t instanceof FileNotFoundException || t instanceof 
InvalidInputException) {
LOG.warn("Input path " + currPath + " is empty", t.getMessage());
return;
  }
  throw ex;
}
{code}

So Hive is looking at the "root cause" and was seeing the 
{{InvalidInputException}}.  With [HADOOP-16458], the {{InvalidInputException}} 
now sets a cause so Hive is finding the underlying cause which happens to be an 
{{IOException}}.

I updated my PR to be able to catch the {{InvalidInputException}} and not the 
root cause.

> Upgrade Hadoop to 3.3.0
> ---
>
> Key: HIVE-24484
> URL: https://issues.apache.org/jira/browse/HIVE-24484
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-25235) Remove ThreadPoolExecutorWithOomHook

2021-06-21 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor resolved HIVE-25235.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master. Thanks [~mgergely] and [~dengzh] for the reviews!

> Remove ThreadPoolExecutorWithOomHook
> 
>
> Key: HIVE-25235
> URL: https://issues.apache.org/jira/browse/HIVE-25235
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> While I was looking at [HIVE-24846] to better perform OOM logging and I just 
> realized that this is not a good way to handle OOM.
> https://stackoverflow.com/questions/1692230/is-it-possible-to-catch-out-of-memory-exception-in-java
> bq. there's likely no easy way for you to recover from it if you do catch it
> If we want to handle OOM, it's best to do it from outside. It's best to do it 
> with the JVM facilities:
> {{-XX:+ExitOnOutOfMemoryError}}
> {{-XX:OnOutOfMemoryError}}
> It seems odd that the OOM handler attempts to load a handler and then do more 
> work when clearly the server is hosed at this point and just requesting to do 
> more work will further add to memory pressure.
> The current OOM logic in {{HiveServer2OomHookRunner}} causes HiveServer2 to 
> shutdown, but we already have that with the JVM shutdown hook.  This JVM 
> shutdown hook is triggered if {{-XX:OnOutOfMemoryError="kill -9 %p"}} exists 
> and is the appropriate thing to do.
> https://github.com/apache/hive/blob/328d197431b2ff1000fd9c56ce758013eff81ad8/service/src/java/org/apache/hive/service/server/HiveServer2.java#L443-L444
> https://github.com/apache/hive/blob/cb0541a31b87016fae8e4c0e7130532c6e5f8de7/service/src/java/org/apache/hive/service/server/HiveServer2OomHookRunner.java#L42-L44



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25241) Simplify Metrics System

2021-06-11 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor reassigned HIVE-25241:
-


> Simplify Metrics System
> ---
>
> Key: HIVE-25241
> URL: https://issues.apache.org/jira/browse/HIVE-25241
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>
> Took a look at the {{Metrics}} stuff in Hive and found a lot of boilerplate 
> code on the client code to interact with Metrics.  It's too much stuff and 
> it's done differently in different places.
> * Never allow Metrics System to be "null" - supply a no-op version by default
> * Metrics system should never throw an error to the client, just 
> log-and-ignore. Metrics shouldn't break a query or other operation
> * General cleanup



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25235) Remove ThreadPoolExecutorWithOomHook

2021-06-10 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HIVE-25235:
--
Description: 
While I was looking at [HIVE-24846] to better perform OOM logging and I just 
realized that this is not a good way to handle OOM.

https://stackoverflow.com/questions/1692230/is-it-possible-to-catch-out-of-memory-exception-in-java

bq. there's likely no easy way for you to recover from it if you do catch it

If we want to handle OOM, it's best to do it from outside. It's best to do it 
with the JVM facilities:

{{-XX:+ExitOnOutOfMemoryError}}
{{-XX:OnOutOfMemoryError}}

It seems odd that the OOM handler attempts to load a handler and then do more 
work when clearly the server is hosed at this point and just requesting to do 
more work will further add to memory pressure.

The current OOM logic in {{HiveServer2OomHookRunner}} causes HiveServer2 to 
shutdown, but we already have that with the JVM shutdown hook.  This JVM 
shutdown hook is triggered if {{-XX:OnOutOfMemoryError="kill -9 %p"}} exists 
and is the appropriate thing to do.

https://github.com/apache/hive/blob/328d197431b2ff1000fd9c56ce758013eff81ad8/service/src/java/org/apache/hive/service/server/HiveServer2.java#L443-L444
https://github.com/apache/hive/blob/cb0541a31b87016fae8e4c0e7130532c6e5f8de7/service/src/java/org/apache/hive/service/server/HiveServer2OomHookRunner.java#L42-L44

  was:
While I was looking at [HIVE-24846] to better perform OOM logging and I just 
realized that this is not a good way to handle OOM.

https://stackoverflow.com/questions/1692230/is-it-possible-to-catch-out-of-memory-exception-in-java

bq. there's likely no easy way for you to recover from it if you do catch it

If we want to handle OOM, it's best to do it from outside. It's best to do it 
with the JVM facilities:

{{-XX:+ExitOnOutOfMemoryError}}
{{-XX:OnOutOfMemoryError}}

It seems odd that the OOM handler attempts to load a handler and then do more 
work when clearly the server is hosed at this point and just requesting to do 
more work will further add to memory pressure.


> Remove ThreadPoolExecutorWithOomHook
> 
>
> Key: HIVE-25235
> URL: https://issues.apache.org/jira/browse/HIVE-25235
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> While I was looking at [HIVE-24846] to better perform OOM logging and I just 
> realized that this is not a good way to handle OOM.
> https://stackoverflow.com/questions/1692230/is-it-possible-to-catch-out-of-memory-exception-in-java
> bq. there's likely no easy way for you to recover from it if you do catch it
> If we want to handle OOM, it's best to do it from outside. It's best to do it 
> with the JVM facilities:
> {{-XX:+ExitOnOutOfMemoryError}}
> {{-XX:OnOutOfMemoryError}}
> It seems odd that the OOM handler attempts to load a handler and then do more 
> work when clearly the server is hosed at this point and just requesting to do 
> more work will further add to memory pressure.
> The current OOM logic in {{HiveServer2OomHookRunner}} causes HiveServer2 to 
> shutdown, but we already have that with the JVM shutdown hook.  This JVM 
> shutdown hook is triggered if {{-XX:OnOutOfMemoryError="kill -9 %p"}} exists 
> and is the appropriate thing to do.
> https://github.com/apache/hive/blob/328d197431b2ff1000fd9c56ce758013eff81ad8/service/src/java/org/apache/hive/service/server/HiveServer2.java#L443-L444
> https://github.com/apache/hive/blob/cb0541a31b87016fae8e4c0e7130532c6e5f8de7/service/src/java/org/apache/hive/service/server/HiveServer2OomHookRunner.java#L42-L44



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25235) Remove ThreadPoolExecutorWithOomHook

2021-06-10 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HIVE-25235:
--
Description: 
While I was looking at [HIVE-24846] to better perform OOM logging and I just 
realized that this is not a good way to handle OOM.

https://stackoverflow.com/questions/1692230/is-it-possible-to-catch-out-of-memory-exception-in-java

bq. there's likely no easy way for you to recover from it if you do catch it

If we want to handle OOM, it's best to do it from outside. It's best to do it 
with the JVM facilities:

{{-XX:+ExitOnOutOfMemoryError}}
{{-XX:OnOutOfMemoryError}}

It seems odd that the OOM handler attempts to load a handler and then do more 
work when clearly the server is hosed at this point and just requesting to do 
more work will further add to memory pressure.

  was:
While I was looking at [HIVE-24846] to better perform OOM logging and I just 
realized that this is not a good way to handle OOM.

https://stackoverflow.com/questions/1692230/is-it-possible-to-catch-out-of-memory-exception-in-java

bq. there's likely no easy way for you to recover from it if you do catch it

If we want to handle OOM, it's best to do it from outside. It's be to do it 
with the JVM facilities:

{{-XX:+ExitOnOutOfMemoryError}}
{{-XX:OnOutOfMemoryError}}

It seems odd that the OOM handler attempts to load a handler and then do more 
work when clearly the server is hosed at this point and just requesting to do 
more work will further add to memory pressure.


> Remove ThreadPoolExecutorWithOomHook
> 
>
> Key: HIVE-25235
> URL: https://issues.apache.org/jira/browse/HIVE-25235
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>
> While I was looking at [HIVE-24846] to better perform OOM logging and I just 
> realized that this is not a good way to handle OOM.
> https://stackoverflow.com/questions/1692230/is-it-possible-to-catch-out-of-memory-exception-in-java
> bq. there's likely no easy way for you to recover from it if you do catch it
> If we want to handle OOM, it's best to do it from outside. It's best to do it 
> with the JVM facilities:
> {{-XX:+ExitOnOutOfMemoryError}}
> {{-XX:OnOutOfMemoryError}}
> It seems odd that the OOM handler attempts to load a handler and then do more 
> work when clearly the server is hosed at this point and just requesting to do 
> more work will further add to memory pressure.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25235) Remove ThreadPoolExecutorWithOomHook

2021-06-10 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor reassigned HIVE-25235:
-


> Remove ThreadPoolExecutorWithOomHook
> 
>
> Key: HIVE-25235
> URL: https://issues.apache.org/jira/browse/HIVE-25235
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>
> While I was looking at [HIVE-24846] to better perform OOM logging and I just 
> realized that this is not a good way to handle OOM.
> https://stackoverflow.com/questions/1692230/is-it-possible-to-catch-out-of-memory-exception-in-java
> bq. there's likely no easy way for you to recover from it if you do catch it
> If we want to handle OOM, it's best to do it from outside. It's be to do it 
> with the JVM facilities:
> {{-XX:+ExitOnOutOfMemoryError}}
> {{-XX:OnOutOfMemoryError}}
> It seems odd that the OOM handler attempts to load a handler and then do more 
> work when clearly the server is hosed at this point and just requesting to do 
> more work will further add to memory pressure.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25144) Remove RetryingHMSHandler

2021-06-09 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HIVE-25144:
--
Description: 
I have recently seen an issue where a Hive {{CREATE TABLE}} method fails with 
{{AlreadyExistsException}} even though the table does absolutely not exist.

 

I believe the issue is that there is a timeout/transient error with HMS and the 
backend database.  If the client submits a {{create table}} request to HMS, and 
the request to the backend database fails, the request to the DB may be retried 
even though the HMS has lost state of the DB.  When the HMS Handler "retry" 
functionality kicks in, the second time the request is submitted, the table 
looks like it already exists.

 

If something goes wrong during a HMS CREATE operation, we do not know the state 
of the operation and therefore it should just fail.

 

It would certainly be more transparent to the end-user what is going on.  An 
{{AlreadyExistsException}}  is confusing.

  was:
I have recently seen an issue where a Hive {{CREATE TABLE}} method fails with 
{{AlreadyExistsException}} even though the table does absolutely not exist.

 

I believe the issue is there there is a timeout/transient error with HMS and 
the backend database.  So, the client submits the request to HMS, and the 
request does eventually succeed, but only after the connection to the client 
connects.  Therefore, when the HMS Client "retry" functionality kicks it, the 
second time around, the table looks like it already exists.

 

If something goes wrong during a HMS CREATE operation, we do not know the state 
of the operation and therefore it should just fail.

 

It would certainly be more transparent to the end-user what is going on.  An 
{{AlreadyExistsException}}  is confusing.


> Remove RetryingHMSHandler
> -
>
> Key: HIVE-25144
> URL: https://issues.apache.org/jira/browse/HIVE-25144
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> I have recently seen an issue where a Hive {{CREATE TABLE}} method fails with 
> {{AlreadyExistsException}} even though the table does absolutely not exist.
>  
> I believe the issue is that there is a timeout/transient error with HMS and 
> the backend database.  If the client submits a {{create table}} request to 
> HMS, and the request to the backend database fails, the request to the DB may 
> be retried even though the HMS has lost state of the DB.  When the HMS 
> Handler "retry" functionality kicks in, the second time the request is 
> submitted, the table looks like it already exists.
>  
> If something goes wrong during a HMS CREATE operation, we do not know the 
> state of the operation and therefore it should just fail.
>  
> It would certainly be more transparent to the end-user what is going on.  An 
> {{AlreadyExistsException}}  is confusing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-25144) Remove RetryingHMSHandler

2021-06-09 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17360102#comment-17360102
 ] 

David Mollitor edited comment on HIVE-25144 at 6/9/21, 2:08 PM:


OK, I've been looking at this all wrong.  This is all happening internal to 
HMS.  I thought it was the HMS-client retrying, but that's not what this is.  
This is a re-try capability ON TOP of the client re-try.  This should be 
removed.  This is very confusing, hides transient errors, and looking at the 
code, seems very error-prone and fragile as the project changes.  If an HMS 
error occurs, it should report back to the client and let the client decide how 
it wants to handle the error.  For example, the client may want to re-try the 
request with a different instance of HMS instead of waiting for this one to try 
N times and fail.


was (Author: belugabehr):
OK, I've been looking at this all wrong.  This is all happening internal to 
HMS.  I thought it was the HMS-client retrying.  This should be removed.  This 
is very confusing, hides transient errors, and looking at the code, seems very 
error-prone and fragile as the project changes.  If an HMS error occurs, it 
should report back to the client and let the client decide how it wants to 
handle the error.  For example, the client may want to re-try the request with 
a different instance of HMS instead of waiting for this one to try N times and 
fail.

> Remove RetryingHMSHandler
> -
>
> Key: HIVE-25144
> URL: https://issues.apache.org/jira/browse/HIVE-25144
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> I have recently seen an issue where a Hive {{CREATE TABLE}} method fails with 
> {{AlreadyExistsException}} even though the table does absolutely not exist.
>  
> I believe the issue is there there is a timeout/transient error with HMS and 
> the backend database.  So, the client submits the request to HMS, and the 
> request does eventually succeed, but only after the connection to the client 
> connects.  Therefore, when the HMS Client "retry" functionality kicks it, the 
> second time around, the table looks like it already exists.
>  
> If something goes wrong during a HMS CREATE operation, we do not know the 
> state of the operation and therefore it should just fail.
>  
> It would certainly be more transparent to the end-user what is going on.  An 
> {{AlreadyExistsException}}  is confusing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-25144) Remove RetryingHMSHandler

2021-06-09 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17360102#comment-17360102
 ] 

David Mollitor commented on HIVE-25144:
---

OK, I've been looking at this all wrong.  This is all happening internal to 
HMS.  I thought it was the HMS-client retrying.  This should be removed.  This 
is very confusing, hides transient errors, and looking at the code, seems very 
error-prone and fragile as the project changes.  If an HMS error occurs, it 
should report back to the client and let the client decide how it wants to 
handle the error.  For example, the client may want to re-try the request with 
a different instance of HMS instead of waiting for this one to try N times and 
fail.

> Remove RetryingHMSHandler
> -
>
> Key: HIVE-25144
> URL: https://issues.apache.org/jira/browse/HIVE-25144
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> I have recently seen an issue where a Hive {{CREATE TABLE}} method fails with 
> {{AlreadyExistsException}} even though the table does absolutely not exist.
>  
> I believe the issue is there there is a timeout/transient error with HMS and 
> the backend database.  So, the client submits the request to HMS, and the 
> request does eventually succeed, but only after the connection to the client 
> connects.  Therefore, when the HMS Client "retry" functionality kicks it, the 
> second time around, the table looks like it already exists.
>  
> If something goes wrong during a HMS CREATE operation, we do not know the 
> state of the operation and therefore it should just fail.
>  
> It would certainly be more transparent to the end-user what is going on.  An 
> {{AlreadyExistsException}}  is confusing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25144) Remove RetryingHMSHandler

2021-06-09 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HIVE-25144:
--
Summary: Remove RetryingHMSHandler  (was: Add NoReconnect Annotation to 
CreateXXX Methods With AlreadyExistsException)

> Remove RetryingHMSHandler
> -
>
> Key: HIVE-25144
> URL: https://issues.apache.org/jira/browse/HIVE-25144
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> I have recently seen an issue where a Hive {{CREATE TABLE}} method fails with 
> {{AlreadyExistsException}} even though the table does absolutely not exist.
>  
> I believe the issue is there there is a timeout/transient error with HMS and 
> the backend database.  So, the client submits the request to HMS, and the 
> request does eventually succeed, but only after the connection to the client 
> connects.  Therefore, when the HMS Client "retry" functionality kicks it, the 
> second time around, the table looks like it already exists.
>  
> If something goes wrong during a HMS CREATE operation, we do not know the 
> state of the operation and therefore it should just fail.
>  
> It would certainly be more transparent to the end-user what is going on.  An 
> {{AlreadyExistsException}}  is confusing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-25185) Improve Logging On Polling Tez Session from Pool

2021-06-09 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor resolved HIVE-25185.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master.  Thanks [~pgaref] and [~mgergely] for the reviews!

> Improve Logging On Polling Tez Session from Pool
> 
>
> Key: HIVE-25185
> URL: https://issues.apache.org/jira/browse/HIVE-25185
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24463) Add special case for Derby and MySQL in Get Next ID DbNotificationListener

2021-06-08 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor resolved HIVE-24463.
---
Resolution: Won't Fix

> Add special case for Derby and MySQL in Get Next ID DbNotificationListener
> --
>
> Key: HIVE-24463
> URL: https://issues.apache.org/jira/browse/HIVE-24463
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> * Derby does not support {{SELECT FOR UPDATE}} statements
>  * MySQL can be optimized to use {{LAST_INSERT_ID()}}
>  
> Debry tables are locked in other parts of the code already, but not in this 
> path.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-25144) Add NoReconnect Annotation to CreateXXX Methods With AlreadyExistsException

2021-06-08 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17359364#comment-17359364
 ] 

David Mollitor commented on HIVE-25144:
---

And here is the logging...

 
{code:none}
2021-06-04 12:01:25,927 INFO  
org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [pool-9-thread-3]: 
ugi=kudu/host@DOMAIN ip=xx.xx.xx.xx  cmd=create_table: 
Table(tableName:test_table, dbName:test_database, owner:user, createTime:0, 
lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:... 
tableType:MANAGED_TABLE, temporary:false, ownerType:USER)
2021-06-04 12:01:26,001 INFO  org.apache.hadoop.hive.common.FileUtils: 
[pool-9-thread-3]: Creating directory if it doesn't exist: 
hdfs://ns1/user/hive/warehouse/test_database.db/test_table
2021-06-04 12:01:26,185 ERROR com.jolbox.bonecp.ConnectionHandle: 
[pool-9-thread-3]: Database access problem. Killing off this connection and all 
remaining connections in the connection pool. SQL State = 08S01
2021-06-04 12:01:26,294 INFO  org.apache.hadoop.fs.TrashPolicyDefault: 
[pool-9-thread-3]: Moved: 
'hdfs://ns1/user/hive/warehouse/test_database.db/test_table' to trash at: 
hdfs://ns1/user/.Trash/kudu/Current/user/hive/warehouse/test_database.db/test_table
2021-06-04 12:01:26,304 ERROR 
org.apache.hadoop.hive.metastore.RetryingHMSHandler: [pool-9-thread-3]: 
Retrying HMSHandler after 2000 ms (attempt 1 of 10) with error: 
javax.jdo.JDODataStoreException: Communications link failure

The last packet successfully received from the server was 1,521,446 
milliseconds ago.  The last packet sent successfully to the server was 
1,521,447 milliseconds ago.
at 
org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:543)
at 
org.datanucleus.api.jdo.JDOTransaction.commit(JDOTransaction.java:171)
at 
org.apache.hadoop.hive.metastore.ObjectStore.commitTransaction(ObjectStore.java:727)
at sun.reflect.GeneratedMethodAccessor34.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:101)
at com.sun.proxy.$Proxy26.commitTransaction(Unknown Source)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:1582)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_with_environment_context(HiveMetaStore.java:1615)
at sun.reflect.GeneratedMethodAccessor79.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:140)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:99)
at com.sun.proxy.$Proxy28.create_table_with_environment_context(Unknown 
Source)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$create_table_with_environment_context.getResult(ThriftHiveMetastore.java:10993)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$create_table_with_environment_context.getResult(ThriftHiveMetastore.java:10977)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at 
org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:594)
at 
org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:589)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
at 
org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:589)
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
NestedThrowablesStackTrace:
com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link 
failure

The last packet successfully received from the server was 1,521,446 
milliseconds ago.  The last packet sent successfully to the server was 
1,521,447 milliseconds ago.
at sun.reflect.GeneratedConstructorAccessor84.newInstance(Unknown 
Source)
at 

[jira] [Commented] (HIVE-25188) JsonSerDe: Unable to read the string value from a nested json

2021-06-03 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17356612#comment-17356612
 ] 

David Mollitor commented on HIVE-25188:
---

[~dengzh] As I understand the request, I am not in support of it.  The "data" 
field is not a valid JSON String type and therefore we should not allow this 
type of interaction.  Hive is already far too lenient it what it allows, which 
leads to break downs in testing, knowledge debt, and a larger testing surface 
area.  Just my opinion on the matter, maybe other disagree and can chime in.

> JsonSerDe: Unable to read the string value from a nested json
> -
>
> Key: HIVE-25188
> URL: https://issues.apache.org/jira/browse/HIVE-25188
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 4.0.0
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Steps to reproduce:
> create table json_table(data string, messageid string, publish_time bigint, 
> attributes string);
>  
> if the data of the table stored like:
> {code:java}
> {"data":{"H":{"event":"track_active","platform":"Android"},"B":{"device_type":"Phone","uuid":"[36ffec24-f6a4-4f5d-aa39-72e5513d2cae,11883bee-a7aa-4010-8a66-6c3c63a73f16]"}},"messageId":"2475185636801962","publish_time":1622514629783,"attributes":{"region":"IN"}}"}}{code}
> Exception will be thrown when trying to deserialize the data:
>  
> Caused by: java.lang.IllegalArgumentException
>  at com.google.common.base.Preconditions.checkArgument(Preconditions.java:108)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitLeafNode(HiveJsonReader.java:374)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitNode(HiveJsonReader.java:216)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitStructNode(HiveJsonReader.java:327)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitNode(HiveJsonReader.java:221)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.parseStruct(HiveJsonReader.java:198)
>  at org.apache.hadoop.hive.serde2.JsonSerDe.deserialize(JsonSerDe.java:181)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-25188) JsonSerDe: Unable to read the string value from a nested json

2021-06-02 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17355908#comment-17355908
 ] 

David Mollitor edited comment on HIVE-25188 at 6/2/21, 6:29 PM:


[~dengzh] I've formatted the JSON to make it easier to read for discussion 
sake.  FYI, there are a few stray characters at the end of your example that 
were giving me issues during formatting.

{code:json}
{
"data": {
"H": {
"event": "track_active",
"platform": "Android"
},
"B": {
"device_type": "Phone",
"uuid": 
"[36ffec24-f6a4-4f5d-aa39-72e5513d2cae,11883bee-a7aa-4010-8a66-6c3c63a73f16]"
}
},
"messageId": "2475185636801962",
"publish_time": 1622514629783,
"attributes": {
"region": "IN"
}
}
{code}

create table json_table(data string, messageid string, publish_time bigint, 
attributes string);

The {{data}} field is not a String type.  It is itself a data type of type 
struct.  If you intend to do something like stuffing arbitrary data in that 
field, then "data" should be a Base-64 string and then you can declare it as a 
Binary type in Hive.  I think that's the preferred approach instead of just 
allowing an overloaded String type.

If you need to parse/query specific data from there, you would base64 decode 
the data value and use the {{get_json_object}} or {{json_tuple}} UDFs to read 
it.



was (Author: belugabehr):
[~dengzh] I've formatted the JSON to make it easier to read for discussion 
sake.  FYI, there are a few stray characters at the end of your example that 
were giving me issues during formatting.

{code:json}
{
"data": {
"H": {
"event": "track_active",
"platform": "Android"
},
"B": {
"device_type": "Phone",
"uuid": 
"[36ffec24-f6a4-4f5d-aa39-72e5513d2cae,11883bee-a7aa-4010-8a66-6c3c63a73f16]"
}
},
"messageId": "2475185636801962",
"publish_time": 1622514629783,
"attributes": {
"region": "IN"
}
}
{code}

create table json_table(data string, messageid string, publish_time bigint, 
attributes string);

The {{data}} field is not a String type.  It is itself a data type of type 
struct.  If you intend to do something like stuffing arbitrary data in that 
field, then "data" should be a Base-64 string and then you can declare it as a 
Binary type in Hive.  I think that's the preferred approach instead of just 
allowing an overloaded String type.

If you need to parse/query specific data from there, you would un-base64 it and 
use the {{get_json_object}} or {{json_tuple}} UDFs to read it.


> JsonSerDe: Unable to read the string value from a nested json
> -
>
> Key: HIVE-25188
> URL: https://issues.apache.org/jira/browse/HIVE-25188
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 4.0.0
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Steps to reproduce:
> create table json_table(data string, messageid string, publish_time bigint, 
> attributes string);
>  
> if the data of the table stored like:
> {code:java}
> {"data":{"H":{"event":"track_active","platform":"Android"},"B":{"device_type":"Phone","uuid":"[36ffec24-f6a4-4f5d-aa39-72e5513d2cae,11883bee-a7aa-4010-8a66-6c3c63a73f16]"}},"messageId":"2475185636801962","publish_time":1622514629783,"attributes":{"region":"IN"}}"}}{code}
> Exception will be thrown when trying to deserialize the data:
>  
> Caused by: java.lang.IllegalArgumentException
>  at com.google.common.base.Preconditions.checkArgument(Preconditions.java:108)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitLeafNode(HiveJsonReader.java:374)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitNode(HiveJsonReader.java:216)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitStructNode(HiveJsonReader.java:327)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitNode(HiveJsonReader.java:221)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.parseStruct(HiveJsonReader.java:198)
>  at org.apache.hadoop.hive.serde2.JsonSerDe.deserialize(JsonSerDe.java:181)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-25188) JsonSerDe: Unable to read the string value from a nested json

2021-06-02 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17355908#comment-17355908
 ] 

David Mollitor edited comment on HIVE-25188 at 6/2/21, 6:29 PM:


[~dengzh] I've formatted the JSON to make it easier to read for discussion 
sake.  FYI, there are a few stray characters at the end of your example that 
were giving me issues during formatting.

{code:json}
{
"data": {
"H": {
"event": "track_active",
"platform": "Android"
},
"B": {
"device_type": "Phone",
"uuid": 
"[36ffec24-f6a4-4f5d-aa39-72e5513d2cae,11883bee-a7aa-4010-8a66-6c3c63a73f16]"
}
},
"messageId": "2475185636801962",
"publish_time": 1622514629783,
"attributes": {
"region": "IN"
}
}
{code}

create table json_table(data string, messageid string, publish_time bigint, 
attributes string);

The {{data}} field is not a String type.  It is itself a data type of type 
struct.  If you intend to do something like stuffing arbitrary data in that 
field, then "data" should be a Base-64 string and then you can declare it as a 
Binary type in Hive.  I think that's the preferred approach instead of just 
allowing an overloaded String type.

If you need to parse/query specific data from there, you would un-base64 it and 
use the {{get_json_object}} or {{json_tuple}} UDFs to read it.



was (Author: belugabehr):
[~dengzh] I've formatted the JSON to make it easier to read for discussion 
sake.  FYI, there are a few stray characters at the end of your example that 
were giving me issues during formatting.

{code:json}
{
"data": {
"H": {
"event": "track_active",
"platform": "Android"
},
"B": {
"device_type": "Phone",
"uuid": 
"[36ffec24-f6a4-4f5d-aa39-72e5513d2cae,11883bee-a7aa-4010-8a66-6c3c63a73f16]"
}
},
"messageId": "2475185636801962",
"publish_time": 1622514629783,
"attributes": {
"region": "IN"
}
}
{code}

create table json_table(data string, messageid string, publish_time bigint, 
attributes string);

The {{data}} field is not a String type.  It is itself a data type of type 
struct.  If you intend to do something like stuffing arbitrary data in that 
field, then "data" should be a Base-64 string and then you can declare it as a 
Binary type in Hive.  I think that's the preferred approach instead of just 
allowing an overloaded String type.


> JsonSerDe: Unable to read the string value from a nested json
> -
>
> Key: HIVE-25188
> URL: https://issues.apache.org/jira/browse/HIVE-25188
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 4.0.0
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Steps to reproduce:
> create table json_table(data string, messageid string, publish_time bigint, 
> attributes string);
>  
> if the data of the table stored like:
> {code:java}
> {"data":{"H":{"event":"track_active","platform":"Android"},"B":{"device_type":"Phone","uuid":"[36ffec24-f6a4-4f5d-aa39-72e5513d2cae,11883bee-a7aa-4010-8a66-6c3c63a73f16]"}},"messageId":"2475185636801962","publish_time":1622514629783,"attributes":{"region":"IN"}}"}}{code}
> Exception will be thrown when trying to deserialize the data:
>  
> Caused by: java.lang.IllegalArgumentException
>  at com.google.common.base.Preconditions.checkArgument(Preconditions.java:108)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitLeafNode(HiveJsonReader.java:374)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitNode(HiveJsonReader.java:216)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitStructNode(HiveJsonReader.java:327)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitNode(HiveJsonReader.java:221)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.parseStruct(HiveJsonReader.java:198)
>  at org.apache.hadoop.hive.serde2.JsonSerDe.deserialize(JsonSerDe.java:181)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-25188) JsonSerDe: Unable to read the string value from a nested json

2021-06-02 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17355908#comment-17355908
 ] 

David Mollitor commented on HIVE-25188:
---

[~dengzh] I've formatted the JSON to make it easier to read for discussion 
sake.  FYI, there are a few stray characters at the end of your example that 
were giving me issues during formatting.

{code:json}
{
"data": {
"H": {
"event": "track_active",
"platform": "Android"
},
"B": {
"device_type": "Phone",
"uuid": 
"[36ffec24-f6a4-4f5d-aa39-72e5513d2cae,11883bee-a7aa-4010-8a66-6c3c63a73f16]"
}
},
"messageId": "2475185636801962",
"publish_time": 1622514629783,
"attributes": {
"region": "IN"
}
}
{code}

create table json_table(data string, messageid string, publish_time bigint, 
attributes string);

The {{data}} field is not a String type.  It is itself a data type of type 
struct.  If you intend to do something like stuffing arbitrary data in that 
field, then "data" should be a Base-64 string and then you can declare it as a 
Binary type in Hive.  I think that's the preferred approach instead of just 
allowing an overloaded String type.


> JsonSerDe: Unable to read the string value from a nested json
> -
>
> Key: HIVE-25188
> URL: https://issues.apache.org/jira/browse/HIVE-25188
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 4.0.0
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Steps to reproduce:
> create table json_table(data string, messageid string, publish_time bigint, 
> attributes string);
>  
> if the data of the table stored like:
> {code:java}
> {"data":{"H":{"event":"track_active","platform":"Android"},"B":{"device_type":"Phone","uuid":"[36ffec24-f6a4-4f5d-aa39-72e5513d2cae,11883bee-a7aa-4010-8a66-6c3c63a73f16]"}},"messageId":"2475185636801962","publish_time":1622514629783,"attributes":{"region":"IN"}}"}}{code}
> Exception will be thrown when trying to deserialize the data:
>  
> Caused by: java.lang.IllegalArgumentException
>  at com.google.common.base.Preconditions.checkArgument(Preconditions.java:108)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitLeafNode(HiveJsonReader.java:374)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitNode(HiveJsonReader.java:216)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitStructNode(HiveJsonReader.java:327)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitNode(HiveJsonReader.java:221)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.parseStruct(HiveJsonReader.java:198)
>  at org.apache.hadoop.hive.serde2.JsonSerDe.deserialize(JsonSerDe.java:181)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-25188) JsonSerDe: Unable to read the string value from a nested json

2021-06-02 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17355717#comment-17355717
 ] 

David Mollitor commented on HIVE-25188:
---

Hello and thanks for the report.

 

The "data" field is not a String, it's a struct.  What do you propose is the 
expected behavior here?

> JsonSerDe: Unable to read the string value from a nested json
> -
>
> Key: HIVE-25188
> URL: https://issues.apache.org/jira/browse/HIVE-25188
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 4.0.0
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Steps to reproduce:
> create table json_table(data string, messageid string, publish_time bigint, 
> attributes string);
>  
> if the data of the table stored like:
> {code:java}
> {"data":{"H":{"event":"track_active","platform":"Android"},"B":{"device_type":"Phone","uuid":"[36ffec24-f6a4-4f5d-aa39-72e5513d2cae,11883bee-a7aa-4010-8a66-6c3c63a73f16]"}},"messageId":"2475185636801962","publish_time":1622514629783,"attributes":{"region":"IN"}}"}}{code}
> Exception will be thrown when trying to deserialize the data:
>  
> Caused by: java.lang.IllegalArgumentException
>  at com.google.common.base.Preconditions.checkArgument(Preconditions.java:108)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitLeafNode(HiveJsonReader.java:374)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitNode(HiveJsonReader.java:216)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitStructNode(HiveJsonReader.java:327)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitNode(HiveJsonReader.java:221)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.parseStruct(HiveJsonReader.java:198)
>  at org.apache.hadoop.hive.serde2.JsonSerDe.deserialize(JsonSerDe.java:181)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25185) Improve Logging On Polling Tez Session from Pool

2021-06-01 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor reassigned HIVE-25185:
-


> Improve Logging On Polling Tez Session from Pool
> 
>
> Key: HIVE-25185
> URL: https://issues.apache.org/jira/browse/HIVE-25185
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25184) ReExecDriver Only Run afterExecute If No Exceptions

2021-06-01 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HIVE-25184:
--
Description: 
Was recently troubleshooting an issue and noticed a NPE in the logs.  I tracked 
it down to {{ReExecDriver}} code.  The "afterExecute" code gets called if the 
Driver call succeed or fails. However, if there is a failure, the Driver is 
instructed to "clean up" by some internal try-catch and so there the 
afterExecute code fails with a NPE when it tried to read state out of the 
Driver class.

 

[https://github.com/apache/hive/blob/1cc87d09cf0514f3fb962a816babb7eea859163c/ql/src/java/org/apache/hadoop/hive/ql/reexec/ReExecDriver.java#L163-L170]

 

Move this afterExecute code into the try-catch block so it's only executed on 
success (and there is valid state within the Driver).  I looked at the code a 
bit and it seems like the only listener that handles this afterExecute code 
assumes the state is always valid, so there is currently no way to pass it 
'null' on a failure or 'state' on a success.

> ReExecDriver Only Run afterExecute If No Exceptions
> ---
>
> Key: HIVE-25184
> URL: https://issues.apache.org/jira/browse/HIVE-25184
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>
> Was recently troubleshooting an issue and noticed a NPE in the logs.  I 
> tracked it down to {{ReExecDriver}} code.  The "afterExecute" code gets 
> called if the Driver call succeed or fails. However, if there is a failure, 
> the Driver is instructed to "clean up" by some internal try-catch and so 
> there the afterExecute code fails with a NPE when it tried to read state out 
> of the Driver class.
>  
> [https://github.com/apache/hive/blob/1cc87d09cf0514f3fb962a816babb7eea859163c/ql/src/java/org/apache/hadoop/hive/ql/reexec/ReExecDriver.java#L163-L170]
>  
> Move this afterExecute code into the try-catch block so it's only executed on 
> success (and there is valid state within the Driver).  I looked at the code a 
> bit and it seems like the only listener that handles this afterExecute code 
> assumes the state is always valid, so there is currently no way to pass it 
> 'null' on a failure or 'state' on a success.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25184) ReExecDriver Only Run afterExecute If No Exceptions

2021-06-01 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor reassigned HIVE-25184:
-


> ReExecDriver Only Run afterExecute If No Exceptions
> ---
>
> Key: HIVE-25184
> URL: https://issues.apache.org/jira/browse/HIVE-25184
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-25176) Print DAG ID to Console

2021-05-29 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor resolved HIVE-25176.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master. Thanks [~mgergely] and [~abstractdog] for the review!

> Print DAG ID to Console
> ---
>
> Key: HIVE-25176
> URL: https://issues.apache.org/jira/browse/HIVE-25176
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Would be helpful when troubleshooting.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-25177) Add Additional Debugging Help for HBase Reader

2021-05-29 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor resolved HIVE-25177.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master. Thanks [~mgergely] for the review!

> Add Additional Debugging Help for HBase Reader
> --
>
> Key: HIVE-25177
> URL: https://issues.apache.org/jira/browse/HIVE-25177
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> I recently was wishing I had this data available to me.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25177) Add Additional Debugging Help for HBase Reader

2021-05-28 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HIVE-25177:
--
Description: I recently was wishing I had this data available to me.

> Add Additional Debugging Help for HBase Reader
> --
>
> Key: HIVE-25177
> URL: https://issues.apache.org/jira/browse/HIVE-25177
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>
> I recently was wishing I had this data available to me.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25177) Add Additional Debugging Help for HBase Reader

2021-05-28 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor reassigned HIVE-25177:
-


> Add Additional Debugging Help for HBase Reader
> --
>
> Key: HIVE-25177
> URL: https://issues.apache.org/jira/browse/HIVE-25177
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25176) Print DAG ID to Console

2021-05-28 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HIVE-25176:
--
Description: Would be helpful when troubleshooting.

> Print DAG ID to Console
> ---
>
> Key: HIVE-25176
> URL: https://issues.apache.org/jira/browse/HIVE-25176
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>
> Would be helpful when troubleshooting.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25176) Print DAG ID to Console

2021-05-28 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor reassigned HIVE-25176:
-


> Print DAG ID to Console
> ---
>
> Key: HIVE-25176
> URL: https://issues.apache.org/jira/browse/HIVE-25176
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-25112) Simplify TXN Compactor Heartbeat Thread

2021-05-25 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor resolved HIVE-25112.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master. Thank you [~klcopp] for the review!

> Simplify TXN Compactor Heartbeat Thread
> ---
>
> Key: HIVE-25112
> URL: https://issues.apache.org/jira/browse/HIVE-25112
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Simplify the Thread structure.  Threads do not need a "start"/"stop" state, 
> they already have it.  It is running/interrupted and it is designed to work 
> this way with thread pools and forced exits.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25157) Clean up QueryResultsCache Code

2021-05-25 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HIVE-25157:
--
Description: 
* Remove superfluous code
 * Simplify lock usage (remove instances of {{synchronization}})
 * Simplify code with Guava {{Multimap}}

  was:
* Remove superfluous code
* Simplify lock usage (remove instances of {{synchronization}})
* Re-do "LRU" map. The original contributor's understanding of 
{{LinkedHashMap}} as a {LRU} map is incorrect.
* Simplify code with Guava {{Multimap}}


> Clean up QueryResultsCache Code
> ---
>
> Key: HIVE-25157
> URL: https://issues.apache.org/jira/browse/HIVE-25157
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> * Remove superfluous code
>  * Simplify lock usage (remove instances of {{synchronization}})
>  * Simplify code with Guava {{Multimap}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24810) Use JDK 8 String Switch in TruncDateFromTimestamp

2021-05-24 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor resolved HIVE-24810.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master.  Thanks [~pgaref] for the review!!

> Use JDK 8 String Switch in TruncDateFromTimestamp
> -
>
> Key: HIVE-24810
> URL: https://issues.apache.org/jira/browse/HIVE-24810
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25157) Clean up QueryResultsCache Code

2021-05-24 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor reassigned HIVE-25157:
-


> Clean up QueryResultsCache Code
> ---
>
> Key: HIVE-25157
> URL: https://issues.apache.org/jira/browse/HIVE-25157
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>
> * Remove superfluous code
> * Simplify lock usage (remove instances of {{synchronization}})
> * Re-do "LRU" map. The original contributor's understanding of 
> {{LinkedHashMap}} as a {LRU} map is incorrect.
> * Simplify code with Guava {{Multimap}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24850) Don't Cache SQL Text in Hive Query Results Cache

2021-05-24 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor reassigned HIVE-24850:
-

Assignee: David Mollitor

> Don't Cache SQL Text in Hive Query Results Cache
> 
>
> Key: HIVE-24850
> URL: https://issues.apache.org/jira/browse/HIVE-24850
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>
> In class {{QueryResultsCache}}, the Map used to map queries to results is 
> keyed on the query string, but we have no idea how large those strings are.  
> Instead, hash the MD5 (SHA256) of each query instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-25152) Remove Superfluous Logging Code

2021-05-24 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor resolved HIVE-25152.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master.  Thanks [~mgergely] and [~pgaref] for the reviews!

> Remove Superfluous Logging Code
> ---
>
> Key: HIVE-25152
> URL: https://issues.apache.org/jira/browse/HIVE-25152
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> So much logging code can be removed to lessen the amount of code in the 
> project (and perhaps some small performance gains).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-25151) Remove Unused Interner from HiveMetastoreChecker

2021-05-24 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor resolved HIVE-25151.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master.  Thanks [~mgergely] for the review!

> Remove Unused Interner from HiveMetastoreChecker
> 
>
> Key: HIVE-25151
> URL: https://issues.apache.org/jira/browse/HIVE-25151
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> {code:java|title=HiveMetastoreChecker}
>   for (int i = 0; i < getPartitionSpec(table, partition).size(); i++) {
> Path qualifiedPath = partPath.makeQualified(fs);
> pathInterner.intern(qualifiedPath);
> partPaths.add(qualifiedPath);
> partPath = partPath.getParent();
>   }
> {code}
>  
> The items are being "interned" and then the returned values are ignored.  
> This is wrong and make the {{Interner}} useless.
> For now simply remove this stuff.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24484) Upgrade Hadoop to 3.3.0

2021-05-24 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17350425#comment-17350425
 ] 

David Mollitor commented on HIVE-24484:
---

[~ganeshas] No ETA.  Still just waiting.

I'm hoping Hadoop 3.3.1 will fix the issue with breaking backwards 
compatibility with ProtobufRpcEngine.
Would be great if folks could work on syncing the version of Guava which these 
products use, especially upgrading Druid.

> Upgrade Hadoop to 3.3.0
> ---
>
> Key: HIVE-24484
> URL: https://issues.apache.org/jira/browse/HIVE-24484
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25128) Remove Thrift Exceptions From RawStore alterCatalog

2021-05-22 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HIVE-25128:
--
Description: 
{code:java|title=RawStore.java}
  /**
   * Alter an existing catalog.  Only description and location can be changed, 
and the change of
   * location is for internal use only.
   * @param catName name of the catalog to alter.
   * @param cat new version of the catalog.
   * @throws MetaException something went wrong, usually in the database.
   * @throws InvalidOperationException attempt to change something about the 
catalog that is not
   * changeable, like the name.
   */
  void alterCatalog(String catName, Catalog cat) throws MetaException, 
InvalidOperationException;
{code}

Please check out parent task [HIVE-25126] for the motivation here, but I would 
like to remove all Thrift-based Exceptions from the {{RawStore}} interface to 
include MetaException and InvalidOperationException. These should be replaced 
with something that is specific to Hive and not tied to the RPC layer.

I propose instead introducing RuntimeExceptions called HiveMetaRuntimeException 
and sub-class HiveMetaDataAccessException to replace these.

HiveMetaDataAccessException  = Unable to load data from underlying data store
HiveMetaRuntimeException = Generic exception for something that was thrown by 
the RawStore but not specifically handled

> Remove Thrift Exceptions From RawStore alterCatalog
> ---
>
> Key: HIVE-25128
> URL: https://issues.apache.org/jira/browse/HIVE-25128
> Project: Hive
>  Issue Type: Sub-task
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> {code:java|title=RawStore.java}
>   /**
>* Alter an existing catalog.  Only description and location can be 
> changed, and the change of
>* location is for internal use only.
>* @param catName name of the catalog to alter.
>* @param cat new version of the catalog.
>* @throws MetaException something went wrong, usually in the database.
>* @throws InvalidOperationException attempt to change something about the 
> catalog that is not
>* changeable, like the name.
>*/
>   void alterCatalog(String catName, Catalog cat) throws MetaException, 
> InvalidOperationException;
> {code}
> Please check out parent task [HIVE-25126] for the motivation here, but I 
> would like to remove all Thrift-based Exceptions from the {{RawStore}} 
> interface to include MetaException and InvalidOperationException. These 
> should be replaced with something that is specific to Hive and not tied to 
> the RPC layer.
> I propose instead introducing RuntimeExceptions called 
> HiveMetaRuntimeException and sub-class HiveMetaDataAccessException to replace 
> these.
> HiveMetaDataAccessException  = Unable to load data from underlying data store
> HiveMetaRuntimeException = Generic exception for something that was thrown by 
> the RawStore but not specifically handled



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25152) Remove Superfluous Logging Code

2021-05-21 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor reassigned HIVE-25152:
-


> Remove Superfluous Logging Code
> ---
>
> Key: HIVE-25152
> URL: https://issues.apache.org/jira/browse/HIVE-25152
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>
> So much logging code can be removed to lessen the amount of code in the 
> project (and perhaps some small performance gains).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25151) Remove Unused Interner from HiveMetastoreChecker

2021-05-21 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HIVE-25151:
--
Description: 
{code:java|title=HiveMetastoreChecker}
  for (int i = 0; i < getPartitionSpec(table, partition).size(); i++) {
Path qualifiedPath = partPath.makeQualified(fs);
pathInterner.intern(qualifiedPath);
partPaths.add(qualifiedPath);
partPath = partPath.getParent();
  }
{code}
 
The items are being "interned" and then the returned values are ignored.  This 
is wrong and make the {{Interner}} useless.

For now simply remove this stuff.

  was:
{code:java|title=HiveMetastoreChecker}
  for (int i = 0; i < getPartitionSpec(table, partition).size(); i++) {
Path qualifiedPath = partPath.makeQualified(fs);
pathInterner.intern(qualifiedPath);
partPaths.add(qualifiedPath);
partPath = partPath.getParent();
  }
{code}
 
The items is being "interned" and then the returned value is ignored.  This is 
wrong and make the {{Interner}} useless.

For now simply remove this stuff.


> Remove Unused Interner from HiveMetastoreChecker
> 
>
> Key: HIVE-25151
> URL: https://issues.apache.org/jira/browse/HIVE-25151
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>
> {code:java|title=HiveMetastoreChecker}
>   for (int i = 0; i < getPartitionSpec(table, partition).size(); i++) {
> Path qualifiedPath = partPath.makeQualified(fs);
> pathInterner.intern(qualifiedPath);
> partPaths.add(qualifiedPath);
> partPath = partPath.getParent();
>   }
> {code}
>  
> The items are being "interned" and then the returned values are ignored.  
> This is wrong and make the {{Interner}} useless.
> For now simply remove this stuff.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25151) Remove Unused Interner from HiveMetastoreChecker

2021-05-21 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HIVE-25151:
--
Description: 
{code:java|title=HiveMetastoreChecker}
  for (int i = 0; i < getPartitionSpec(table, partition).size(); i++) {
Path qualifiedPath = partPath.makeQualified(fs);
pathInterner.intern(qualifiedPath);
partPaths.add(qualifiedPath);
partPath = partPath.getParent();
  }
{code}
 
The items is being "interned" and then the returned value is ignored.  This is 
wrong and make the {{Interner}} useless.

For now simply remove this stuff.

> Remove Unused Interner from HiveMetastoreChecker
> 
>
> Key: HIVE-25151
> URL: https://issues.apache.org/jira/browse/HIVE-25151
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>
> {code:java|title=HiveMetastoreChecker}
>   for (int i = 0; i < getPartitionSpec(table, partition).size(); i++) {
> Path qualifiedPath = partPath.makeQualified(fs);
> pathInterner.intern(qualifiedPath);
> partPaths.add(qualifiedPath);
> partPath = partPath.getParent();
>   }
> {code}
>  
> The items is being "interned" and then the returned value is ignored.  This 
> is wrong and make the {{Interner}} useless.
> For now simply remove this stuff.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25151) Remove Unused Interner from HiveMetastoreChecker

2021-05-21 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor reassigned HIVE-25151:
-


> Remove Unused Interner from HiveMetastoreChecker
> 
>
> Key: HIVE-25151
> URL: https://issues.apache.org/jira/browse/HIVE-25151
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-25126) Remove Thrift Exceptions From RawStore

2021-05-21 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17346442#comment-17346442
 ] 

David Mollitor edited comment on HIVE-25126 at 5/21/21, 7:17 PM:
-

{{MetaException}} is also inconsistent.  Some functions that clearly access the 
DB do not throw this exception in the signature.


was (Author: belugabehr):
{{MetaException}} is also inconstantly.  Some functions that clearly access the 
DB do not throw this exception in the signature.

> Remove Thrift Exceptions From RawStore
> --
>
> Key: HIVE-25126
> URL: https://issues.apache.org/jira/browse/HIVE-25126
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Remove all references to 
> NoSuchObjectException/InvalidOperationException/MetaException from the method 
> signature of RawStore.  These Exceptions are generated by Thrift and are used 
> to communicate error conditions across the wire.  They are not designed for 
> use as part of the underlying stack, yet in Hive, they have been pushed down 
> into these data access operators. 
>  
> The RawStore should not have to be this tightly coupled to the transport 
> layer.
>  
> Remove all checked Exceptions from RawStore in favor of Hive runtime 
> exceptions.  This is a popular format and is used (and therefore dovetails 
> nicely) with the underlying database access library DataNucleaus.
> All of the logging of un-checked Exceptions, and transforming them into 
> Thrift exceptions, should happen at the Hive-Thrift bridge ({{HMSHandler}}).
>  
> The RawStore is a pretty generic Data Access Object.  Given the name "Raw" I 
> assume that the backing data store could really be anything.  With that said, 
> I would say there are two phases of this:
>  
>  # Remove Thrift Exception to decouple from Thrift
>  # Throw relevant Hive Runtime Exceptions to decouple from JDO/DataNucleaus
>  
> Item number 2 is required because DataNucleaus throws a lot of unchecked 
> runtime exceptions. From reading the current {{ObjectStore}} code, it appears 
> that many of these exceptions are bubbled up to the caller thus tying the 
> caller to handle different exceptions depending on the data source (though 
> they only see a {{RawStore}}).  The calling code should only have to deal 
> with Hive exceptions and be hidden from the underlying data storage layer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25144) Add NoReconnect Annotation to CreateXXX Methods With AlreadyExistsException

2021-05-21 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HIVE-25144:
--
Summary: Add NoReconnect Annotation to CreateXXX Methods With 
AlreadyExistsException  (was: Add NoReconnect Annotation to Create 
AlreadyExistsException Methods)

> Add NoReconnect Annotation to CreateXXX Methods With AlreadyExistsException
> ---
>
> Key: HIVE-25144
> URL: https://issues.apache.org/jira/browse/HIVE-25144
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> I have recently seen an issue where a Hive {{CREATE TABLE}} method fails with 
> {{AlreadyExistsException}} even though the table does absolutely not exist.
>  
> I believe the issue is there there is a timeout/transient error with HMS and 
> the backend database.  So, the client submits the request to HMS, and the 
> request does eventually succeed, but only after the connection to the client 
> connects.  Therefore, when the HMS Client "retry" functionality kicks it, the 
> second time around, the table looks like it already exists.
>  
> If something goes wrong during a HMS CREATE operation, we do not know the 
> state of the operation and therefore it should just fail.
>  
> It would certainly be more transparent to the end-user what is going on.  An 
> {{AlreadyExistsException}}  is confusing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25144) Add NoReconnect Annotation to Create AlreadyExistsException Methods

2021-05-20 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HIVE-25144:
--
Description: 
I have recently seen an issue where a Hive {{CREATE TABLE}} method fails with 
{{AlreadyExistsException}} even though the table does absolutely not exist.

 

I believe the issue is there there is a timeout/transient error with HMS and 
the backend database.  So, the client submits the request to HMS, and the 
request does eventually succeed, but only after the connection to the client 
connects.  Therefore, when the HMS Client "retry" functionality kicks it, the 
second time around, the table looks like it already exists.

 

If something goes wrong during a HMS CREATE operation, we do not know the state 
of the operation and therefore it should just fail.

 

It would certainly be more transparent to the end-user what is going on.  An 
{{AlreadyExistsException}}  is confusing.

  was:
I have recently seen an issue where a Hive {{CREATE TABLE}} method fails with 
{{AlreadyExistsException}} even though the table does absolutely not exist.

 

I believe the issue is there there is a timeout/transient error with HMS and 
the backend database.  So, the client submits the request to HMS, and the 
request does eventually succeed, but only after the connection to the client 
connects.  Therefore, when the HMS Client "retry" functionality kicks it, the 
second time around, the table looks like it already exists.

 

If something goes wrong during a HMS CREATE operation, we do not know the state 
of the operation and therefore it should just fail.


> Add NoReconnect Annotation to Create AlreadyExistsException Methods
> ---
>
> Key: HIVE-25144
> URL: https://issues.apache.org/jira/browse/HIVE-25144
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I have recently seen an issue where a Hive {{CREATE TABLE}} method fails with 
> {{AlreadyExistsException}} even though the table does absolutely not exist.
>  
> I believe the issue is there there is a timeout/transient error with HMS and 
> the backend database.  So, the client submits the request to HMS, and the 
> request does eventually succeed, but only after the connection to the client 
> connects.  Therefore, when the HMS Client "retry" functionality kicks it, the 
> second time around, the table looks like it already exists.
>  
> If something goes wrong during a HMS CREATE operation, we do not know the 
> state of the operation and therefore it should just fail.
>  
> It would certainly be more transparent to the end-user what is going on.  An 
> {{AlreadyExistsException}}  is confusing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25144) Add NoReconnect Annotation to Create AlreadyExistsException Methods

2021-05-20 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor reassigned HIVE-25144:
-


> Add NoReconnect Annotation to Create AlreadyExistsException Methods
> ---
>
> Key: HIVE-25144
> URL: https://issues.apache.org/jira/browse/HIVE-25144
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>
> I have recently seen an issue where a Hive {{CREATE TABLE}} method fails with 
> {{AlreadyExistsException}} even though the table does absolutely not exist.
>  
> I believe the issue is there there is a timeout/transient error with HMS and 
> the backend database.  So, the client submits the request to HMS, and the 
> request does eventually succeed, but only after the connection to the client 
> connects.  Therefore, when the HMS Client "retry" functionality kicks it, the 
> second time around, the table looks like it already exists.
>  
> If something goes wrong during a HMS CREATE operation, we do not know the 
> state of the operation and therefore it should just fail.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25143) Improve ERROR Logging in QL Package

2021-05-20 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HIVE-25143:
--
Description: 
I went through and reviewed all of the ERROR logging in the HS2 {{ql}} module 
and I removed (most of) the following bad habits:

 
 * Log-and-Throw (log or throw, not both)
 * Pass in the Exception to the logging framework instead of logging its 
toString() : LOG.error("alter table update columns: {}", e);
 * Add additional context instead of copying the message from the wrapped 
Exception : throw new SemanticException(e.getMessage(), e);
 * The wrapped exception is being lost in some case, though the message 
survives :  throw new HiveException(e.getMessage());
 * Remove new-lines from Exception messages, this is annoying as log messages 
should all be on a single line for GREP
 * Not logging the Exception stack trace :  LOG.error("Error in close loader: " 
+ ie);
 * Logging information but not passing it into an Exception for bubbling up:  
LOG.error("Failed to return session: {} to pool", session, e); throw e;
 * Other miscellaneous improvements

> Improve ERROR Logging in QL Package
> ---
>
> Key: HIVE-25143
> URL: https://issues.apache.org/jira/browse/HIVE-25143
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I went through and reviewed all of the ERROR logging in the HS2 {{ql}} module 
> and I removed (most of) the following bad habits:
>  
>  * Log-and-Throw (log or throw, not both)
>  * Pass in the Exception to the logging framework instead of logging its 
> toString() : LOG.error("alter table update columns: {}", e);
>  * Add additional context instead of copying the message from the wrapped 
> Exception : throw new SemanticException(e.getMessage(), e);
>  * The wrapped exception is being lost in some case, though the message 
> survives :  throw new HiveException(e.getMessage());
>  * Remove new-lines from Exception messages, this is annoying as log messages 
> should all be on a single line for GREP
>  * Not logging the Exception stack trace :  LOG.error("Error in close loader: 
> " + ie);
>  * Logging information but not passing it into an Exception for bubbling up:  
> LOG.error("Failed to return session: {} to pool", session, e); throw e;
>  * Other miscellaneous improvements



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25143) Improve ERROR Logging in QL Package

2021-05-20 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor reassigned HIVE-25143:
-


> Improve ERROR Logging in QL Package
> ---
>
> Key: HIVE-25143
> URL: https://issues.apache.org/jira/browse/HIVE-25143
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-25127) Remove Thrift Exceptions From RawStore getCatalogs

2021-05-20 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor resolved HIVE-25127.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master. Thanks [~mgergely] for the review!

> Remove Thrift Exceptions From RawStore getCatalogs
> --
>
> Key: HIVE-25127
> URL: https://issues.apache.org/jira/browse/HIVE-25127
> Project: Hive
>  Issue Type: Sub-task
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25141) Review Error Level Logging in HMS Module

2021-05-19 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor reassigned HIVE-25141:
-


> Review Error Level Logging in HMS Module
> 
>
> Key: HIVE-25141
> URL: https://issues.apache.org/jira/browse/HIVE-25141
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>
> * Remove "log *and* throw" (it should be one or the other
>  * Remove superfluous code
>  * Ensure the stack traces are being logged (and not just the Exception 
> message) to ease troubleshooting
>  * Remove double-printing the Exception message (SLF4J dictates that the 
> Exception message will be printed as part of the logger's formatting



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25136) Remove MetaExceptions From RawStore First Cut

2021-05-18 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor reassigned HIVE-25136:
-


> Remove MetaExceptions From RawStore First Cut
> -
>
> Key: HIVE-25136
> URL: https://issues.apache.org/jira/browse/HIVE-25136
> Project: Hive
>  Issue Type: Sub-task
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-25126) Remove Thrift Exceptions From RawStore

2021-05-17 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17346433#comment-17346433
 ] 

David Mollitor edited comment on HIVE-25126 at 5/17/21, 9:48 PM:
-

As I work through this, I see that in {{RawStore}}
{code:java}
 /**
  * 
  * @throws MetaException general database exception
  */

 /**
  * 
  * @throws MetaException something went wrong, usually in the RDBMS or storage
  */ 
{code}
That's not actually true. A "general database exception" is almost never caught 
in {{ObjectStore}} and thrown as a {{MetaException}}. All "general database 
exceptions" are {{RuntimeExceptions}} generated by DataNucleaus and bubble-up 
(and never handled). All the more reason to get rid of this.

 

Also, {{RawStore}} should be storage agnostic, so things like "RDBMS" in a 
comment shouldn't be permissiable


was (Author: belugabehr):
As I work through this, I see that in {{RawStore}}

{code:java}
 /**
  * 
  * @throws MetaException general database exception
  */

 /**
  * 
  * @throws MetaException something went wrong, usually in the RDBMS or storage
  */ 
{code}

That's not actually true.  A "general database exception" is almost never 
caught in {{ObjectStore}} and thrown as a {{MetaException}}.  All "general 
database exceptions" are {{RuntimeExceptions}} generated by DataNucleaus and 
bubble-up (and never handled).  All the more reason to get rid of this.

> Remove Thrift Exceptions From RawStore
> --
>
> Key: HIVE-25126
> URL: https://issues.apache.org/jira/browse/HIVE-25126
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Remove all references to 
> NoSuchObjectException/InvalidOperationException/MetaException from the method 
> signature of RawStore.  These Exceptions are generated by Thrift and are used 
> to communicate error conditions across the wire.  They are not designed for 
> use as part of the underlying stack, yet in Hive, they have been pushed down 
> into these data access operators. 
>  
> The RawStore should not have to be this tightly coupled to the transport 
> layer.
>  
> Remove all checked Exceptions from RawStore in favor of Hive runtime 
> exceptions.  This is a popular format and is used (and therefore dovetails 
> nicely) with the underlying database access library DataNucleaus.
> All of the logging of un-checked Exceptions, and transforming them into 
> Thrift exceptions, should happen at the Hive-Thrift bridge ({{HMSHandler}}).
>  
> The RawStore is a pretty generic Data Access Object.  Given the name "Raw" I 
> assume that the backing data store could really be anything.  With that said, 
> I would say there are two phases of this:
>  
>  # Remove Thrift Exception to decouple from Thrift
>  # Throw relevant Hive Runtime Exceptions to decouple from JDO/DataNucleaus
>  
> Item number 2 is required because DataNucleaus throws a lot of unchecked 
> runtime exceptions. From reading the current {{ObjectStore}} code, it appears 
> that many of these exceptions are bubbled up to the caller thus tying the 
> caller to handle different exceptions depending on the data source (though 
> they only see a {{RawStore}}).  The calling code should only have to deal 
> with Hive exceptions and be hidden from the underlying data storage layer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-25126) Remove Thrift Exceptions From RawStore

2021-05-17 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17346442#comment-17346442
 ] 

David Mollitor commented on HIVE-25126:
---

{{MetaException}} is also inconstantly.  Some functions that clearly access the 
DB do not throw this exception in the signature.

> Remove Thrift Exceptions From RawStore
> --
>
> Key: HIVE-25126
> URL: https://issues.apache.org/jira/browse/HIVE-25126
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Remove all references to 
> NoSuchObjectException/InvalidOperationException/MetaException from the method 
> signature of RawStore.  These Exceptions are generated by Thrift and are used 
> to communicate error conditions across the wire.  They are not designed for 
> use as part of the underlying stack, yet in Hive, they have been pushed down 
> into these data access operators. 
>  
> The RawStore should not have to be this tightly coupled to the transport 
> layer.
>  
> Remove all checked Exceptions from RawStore in favor of Hive runtime 
> exceptions.  This is a popular format and is used (and therefore dovetails 
> nicely) with the underlying database access library DataNucleaus.
> All of the logging of un-checked Exceptions, and transforming them into 
> Thrift exceptions, should happen at the Hive-Thrift bridge ({{HMSHandler}}).
>  
> The RawStore is a pretty generic Data Access Object.  Given the name "Raw" I 
> assume that the backing data store could really be anything.  With that said, 
> I would say there are two phases of this:
>  
>  # Remove Thrift Exception to decouple from Thrift
>  # Throw relevant Hive Runtime Exceptions to decouple from JDO/DataNucleaus
>  
> Item number 2 is required because DataNucleaus throws a lot of unchecked 
> runtime exceptions. From reading the current {{ObjectStore}} code, it appears 
> that many of these exceptions are bubbled up to the caller thus tying the 
> caller to handle different exceptions depending on the data source (though 
> they only see a {{RawStore}}).  The calling code should only have to deal 
> with Hive exceptions and be hidden from the underlying data storage layer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-25126) Remove Thrift Exceptions From RawStore

2021-05-17 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17346433#comment-17346433
 ] 

David Mollitor edited comment on HIVE-25126 at 5/17/21, 9:07 PM:
-

As I work through this, I see that in {{RawStore}}

{code:java}
 /**
  * 
  * @throws MetaException general database exception
  */

 /**
  * 
  * @throws MetaException something went wrong, usually in the RDBMS or storage
  */ 
{code}

That's not actually true.  A "general database exception" is almost never 
caught in {{ObjectStore}} and thrown as a {{MetaException}}.  All "general 
database exceptions" are {{RuntimeExceptions}} generated by DataNucleaus and 
bubble-up (and never handled).  All the more reason to get rid of this.


was (Author: belugabehr):
As I work through this, I see that in {{RawStore}}

{code:java}
 /**
  *
  * @throws MetaException general database exception
   */
{code}

That's not actually true.  A "general database exception" is almost never 
caught in {{ObjectStore}} and thrown as a {{MetaException}}.  All "general 
database exceptions" are {{RuntimeExceptions}} generated by DataNucleaus and 
bubble-up (and never handled).  All the more reason to get rid of this.

> Remove Thrift Exceptions From RawStore
> --
>
> Key: HIVE-25126
> URL: https://issues.apache.org/jira/browse/HIVE-25126
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Remove all references to 
> NoSuchObjectException/InvalidOperationException/MetaException from the method 
> signature of RawStore.  These Exceptions are generated by Thrift and are used 
> to communicate error conditions across the wire.  They are not designed for 
> use as part of the underlying stack, yet in Hive, they have been pushed down 
> into these data access operators. 
>  
> The RawStore should not have to be this tightly coupled to the transport 
> layer.
>  
> Remove all checked Exceptions from RawStore in favor of Hive runtime 
> exceptions.  This is a popular format and is used (and therefore dovetails 
> nicely) with the underlying database access library DataNucleaus.
> All of the logging of un-checked Exceptions, and transforming them into 
> Thrift exceptions, should happen at the Hive-Thrift bridge ({{HMSHandler}}).
>  
> The RawStore is a pretty generic Data Access Object.  Given the name "Raw" I 
> assume that the backing data store could really be anything.  With that said, 
> I would say there are two phases of this:
>  
>  # Remove Thrift Exception to decouple from Thrift
>  # Throw relevant Hive Runtime Exceptions to decouple from JDO/DataNucleaus
>  
> Item number 2 is required because DataNucleaus throws a lot of unchecked 
> runtime exceptions. From reading the current {{ObjectStore}} code, it appears 
> that many of these exceptions are bubbled up to the caller thus tying the 
> caller to handle different exceptions depending on the data source (though 
> they only see a {{RawStore}}).  The calling code should only have to deal 
> with Hive exceptions and be hidden from the underlying data storage layer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-25126) Remove Thrift Exceptions From RawStore

2021-05-17 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17346433#comment-17346433
 ] 

David Mollitor edited comment on HIVE-25126 at 5/17/21, 8:55 PM:
-

As I work through this, I see that in {{RawStore}}

{code:java}
 /**
  *
  * @throws MetaException general database exception
   */
{code}

That's not actually true.  A "general database exception" is almost never 
caught in {{ObjectStore}} and thrown as a {{MetaException}}.  All "general 
database exceptions" are {{RuntimeExceptions}} generated by DataNucleaus and 
bubble-up (and never handled).  All the more reason to get rid of this.


was (Author: belugabehr):
As I work through this, I see that in {{RawStore}}

{code:java}
 /**
  *
  * @throws MetaException general database exception
   */
{code}

That's not actually true.  A "general database exception" is almost never 
caught in {{ObjectStore}} and thrown as a {{MetaException}}.  All "general 
database exceptions" are {{RuntimeExceptions}} generated by DataNucleaus and 
bubble-up (and never handled)

> Remove Thrift Exceptions From RawStore
> --
>
> Key: HIVE-25126
> URL: https://issues.apache.org/jira/browse/HIVE-25126
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Remove all references to 
> NoSuchObjectException/InvalidOperationException/MetaException from the method 
> signature of RawStore.  These Exceptions are generated by Thrift and are used 
> to communicate error conditions across the wire.  They are not designed for 
> use as part of the underlying stack, yet in Hive, they have been pushed down 
> into these data access operators. 
>  
> The RawStore should not have to be this tightly coupled to the transport 
> layer.
>  
> Remove all checked Exceptions from RawStore in favor of Hive runtime 
> exceptions.  This is a popular format and is used (and therefore dovetails 
> nicely) with the underlying database access library DataNucleaus.
> All of the logging of un-checked Exceptions, and transforming them into 
> Thrift exceptions, should happen at the Hive-Thrift bridge ({{HMSHandler}}).
>  
> The RawStore is a pretty generic Data Access Object.  Given the name "Raw" I 
> assume that the backing data store could really be anything.  With that said, 
> I would say there are two phases of this:
>  
>  # Remove Thrift Exception to decouple from Thrift
>  # Throw relevant Hive Runtime Exceptions to decouple from JDO/DataNucleaus
>  
> Item number 2 is required because DataNucleaus throws a lot of unchecked 
> runtime exceptions. From reading the current {{ObjectStore}} code, it appears 
> that many of these exceptions are bubbled up to the caller thus tying the 
> caller to handle different exceptions depending on the data source (though 
> they only see a {{RawStore}}).  The calling code should only have to deal 
> with Hive exceptions and be hidden from the underlying data storage layer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-25126) Remove Thrift Exceptions From RawStore

2021-05-17 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17346433#comment-17346433
 ] 

David Mollitor commented on HIVE-25126:
---

As I work through this, I see that in {{RawStore}}

{code:java}
 /**
  *
  * @throws MetaException general database exception
   */
{code}

That's not actually true.  A "general database exception" is almost never 
caught in {{ObjectStore}} and thrown as a {{MetaException}}.  All "general 
database exceptions" are {{RuntimeExceptions}} generated by DataNucleaus and 
bubble-up (and never handled)

> Remove Thrift Exceptions From RawStore
> --
>
> Key: HIVE-25126
> URL: https://issues.apache.org/jira/browse/HIVE-25126
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Remove all references to 
> NoSuchObjectException/InvalidOperationException/MetaException from the method 
> signature of RawStore.  These Exceptions are generated by Thrift and are used 
> to communicate error conditions across the wire.  They are not designed for 
> use as part of the underlying stack, yet in Hive, they have been pushed down 
> into these data access operators. 
>  
> The RawStore should not have to be this tightly coupled to the transport 
> layer.
>  
> Remove all checked Exceptions from RawStore in favor of Hive runtime 
> exceptions.  This is a popular format and is used (and therefore dovetails 
> nicely) with the underlying database access library DataNucleaus.
> All of the logging of un-checked Exceptions, and transforming them into 
> Thrift exceptions, should happen at the Hive-Thrift bridge ({{HMSHandler}}).
>  
> The RawStore is a pretty generic Data Access Object.  Given the name "Raw" I 
> assume that the backing data store could really be anything.  With that said, 
> I would say there are two phases of this:
>  
>  # Remove Thrift Exception to decouple from Thrift
>  # Throw relevant Hive Runtime Exceptions to decouple from JDO/DataNucleaus
>  
> Item number 2 is required because DataNucleaus throws a lot of unchecked 
> runtime exceptions. From reading the current {{ObjectStore}} code, it appears 
> that many of these exceptions are bubbled up to the caller thus tying the 
> caller to handle different exceptions depending on the data source (though 
> they only see a {{RawStore}}).  The calling code should only have to deal 
> with Hive exceptions and be hidden from the underlying data storage layer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25128) Remove Thrift Exceptions From RawStore alterCatalog

2021-05-17 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor reassigned HIVE-25128:
-


> Remove Thrift Exceptions From RawStore alterCatalog
> ---
>
> Key: HIVE-25128
> URL: https://issues.apache.org/jira/browse/HIVE-25128
> Project: Hive
>  Issue Type: Sub-task
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   3   4   5   6   7   8   9   10   >