[jira] [Created] (HIVE-20174) Vectorization: Fix NULL / Wrong Results issues in GROUP BY Aggregation Functions
Matt McCline created HIVE-20174: --- Summary: Vectorization: Fix NULL / Wrong Results issues in GROUP BY Aggregation Functions Key: HIVE-20174 URL: https://issues.apache.org/jira/browse/HIVE-20174 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: Matt McCline Write new UT tests that use random data and intentional isRepeating batches to checks for NULL and Wrong Results for vectorized aggregation functions: -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20173) MetaStoreDirectSql#executeWithArray should not catch RuntimeExceptions from JDO
Aaron Gottlieb created HIVE-20173: - Summary: MetaStoreDirectSql#executeWithArray should not catch RuntimeExceptions from JDO Key: HIVE-20173 URL: https://issues.apache.org/jira/browse/HIVE-20173 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 1.1.0 Reporter: Aaron Gottlieb When attempting to test the existence of a Hive database, the Metastore will query the backing database. The method MetaStoreDirectSql#executeWithArray catches all exceptions, turning them into MetaExceptions. Further up the stack, the ObjectStore#getDatabase explicitly catches MetaExceptions and turns them into NoSuchObjectExceptions. Finally, RetryingHMSHandler explicitly looks for NoSuchObjectExceptions and does _not_ retry them, thinking they are legitimate answers. If the exception in MetaStoreDirectSql#executeWithArray was a runtime JDOException due to, say, some sort of network error between the Metastore and the backing database, this inability to query the backing database looks just like an answer of "no database exists" higher up the stack. Any program depending on this information will continue with an incorrect answer rather than retrying the original getDatabase query. I am unsure the extent of the effects of this, but I imagine that explicitly _not_ catching RuntimeExceptions in MetaStoreDirectSql#executeWithArray will allow the exception to raise all the way up to the RetryingHMSHandler which will, correctly, retry the operation. Would allowing RuntimeExceptions to be thrown from MetaStoreDirectSql#executeWithArray be too deleterious? Or did I miss some code path such that my observations are incorrect? Thanks, Aaron -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] hive pull request #400: HIVE-20172: StatsUpdater failed with GSS Exception w...
GitHub user rajkrrsingh opened a pull request: https://github.com/apache/hive/pull/400 HIVE-20172: StatsUpdater failed with GSS Exception while trying to co⦠since metastore client is running in HMS so there is no need to connect to remote URI, so a part of this PR I will be updating the metastore URI so that it connects in embedded mode. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rajkrrsingh/hive HIVE-20172 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/400.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #400 commit 3efc2d9ba96822101b30c645d746849e772e478c Author: Rajkumar singh Date: 2018-07-13T21:17:40Z HIVE-20172: StatsUpdater failed with GSS Exception while trying to connect to remote metastore ---
[jira] [Created] (HIVE-20172) StatsUpdater failed with GSS Exception while trying to connect to remote metastore
Rajkumar Singh created HIVE-20172: - Summary: StatsUpdater failed with GSS Exception while trying to connect to remote metastore Key: HIVE-20172 URL: https://issues.apache.org/jira/browse/HIVE-20172 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 2.1.1 Environment: Hive-1.2.1,Hive2.1,java8 Reporter: Rajkumar Singh Assignee: Rajkumar Singh StatsUpdater task failed with GSS Exception while trying to connect to remote Metastore. {code} org.apache.thrift.transport.TTransportException: GSS initiate failed at org.apache.thrift.transport.TSaslTransport.sendAndThrowMessage(TSaslTransport.java:232) at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:316) at org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37) at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52) at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866) at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:487) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:282) at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:76) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1564) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:92) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:138) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:110) at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3526) at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3558) at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:533) at org.apache.hadoop.hive.ql.txn.compactor.Worker$StatsUpdater.gatherStats(Worker.java:300) at org.apache.hadoop.hive.ql.txn.compactor.CompactorMR.run(CompactorMR.java:265) at org.apache.hadoop.hive.ql.txn.compactor.Worker$1.run(Worker.java:177) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866) at org.apache.hadoop.hive.ql.txn.compactor.Worker.run(Worker.java:174) ) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:534) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:282) at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:76) {code} since metastore client is running in HMS so there is no need to connect to remote URI. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20171) Make hive.stats.autogather Per Table
BELUGA BEHR created HIVE-20171: -- Summary: Make hive.stats.autogather Per Table Key: HIVE-20171 URL: https://issues.apache.org/jira/browse/HIVE-20171 Project: Hive Issue Type: Improvement Components: HiveServer2, Standalone Metastore Affects Versions: 3.0.0, 4.0.0 Reporter: BELUGA BEHR {{hive.stats.autogather}} {{hive.stats.column.autogather}} https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties These are currently global-level settings. Make these global setting the 'default' values for tables but allow for these configurations to be override by the table's properties. Recently started seeing tables backed by S3 that are not regularly queried but that the CREATE TABLE is very slow to collect the stats (30+ minutes) for all of the files in the table. We would like to turn this feature off for certain S3 tables. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Review Request 67887: HIVE-20090
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/67887/ --- (Updated July 13, 2018, 4:04 p.m.) Review request for hive, Ashutosh Chauhan, Deepak Jaiswal, and Gopal V. Bugs: HIVE-20090 https://issues.apache.org/jira/browse/HIVE-20090 Repository: hive-git Description --- HIVE-20090 Diffs (updated) - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 6ea68c35000a5dadb7a01db47bbd8183bff966da itests/src/test/resources/testconfiguration.properties 4001b9f452f9dbeaff31c2e766334259605a51af ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java 119aa925c1a71502e649b4f2d193a7ff974263c1 ql/src/java/org/apache/hadoop/hive/ql/ppd/SyntheticJoinPredicate.java dec2d1ef38b748a5c9b40d06af491dd168d70b72 ql/src/test/queries/clientpositive/dynamic_semijoin_reduction_sw2.q PRE-CREATION ql/src/test/results/clientpositive/llap/dynamic_semijoin_reduction_sw2.q.out PRE-CREATION ql/src/test/results/clientpositive/llap/explainuser_1.q.out f87fe36e11a7c7e535678dbfaaced04f33bbb501 ql/src/test/results/clientpositive/llap/tez_fixed_bucket_pruning.q.out 6987a96809e3c3300e1b76ea5df3069b3c1d162f ql/src/test/results/clientpositive/perf/tez/query1.q.out 579940c66e25ebf5e7d0635aaedd0c0cc994f4e0 ql/src/test/results/clientpositive/perf/tez/query16.q.out 0b64c55b0f4ba036aeba4c49f478e9ee1409087c ql/src/test/results/clientpositive/perf/tez/query17.q.out 2e5e254b2ddc3507f962cbc7691db51f1abafbca ql/src/test/results/clientpositive/perf/tez/query18.q.out e8585275b4e51a55ce778dd154033fcdf859e617 ql/src/test/results/clientpositive/perf/tez/query2.q.out d24899ccf371ad42ef88cebc26cc671c097686da ql/src/test/results/clientpositive/perf/tez/query23.q.out 6725bec30106bc3321c2869dfc304d0a4da82cf8 ql/src/test/results/clientpositive/perf/tez/query24.q.out 9fcec42c3ab29b898c9c947544a2e29dd08e95e8 ql/src/test/results/clientpositive/perf/tez/query25.q.out a885cf344b7e29dcf1b2d93d1914e7f9a8d4b921 ql/src/test/results/clientpositive/perf/tez/query29.q.out 46ff49d41a01591f075b2c48ae5a692640fd6eec ql/src/test/results/clientpositive/perf/tez/query31.q.out c4d717d8680f6ac6f8f8b6ed01742384a84ddcf9 ql/src/test/results/clientpositive/perf/tez/query32.q.out 6be6f7aa6e6fc50bcedebe3f4d1b5fc00b52ee86 ql/src/test/results/clientpositive/perf/tez/query39.q.out 5966e243ea79b4b884950f34a5b7336e40f92889 ql/src/test/results/clientpositive/perf/tez/query40.q.out 2f116f12ebcba44b876508d0d0f0d827e3a8b28d ql/src/test/results/clientpositive/perf/tez/query54.q.out 8ab239ce260fb37d988d956fcb9e4eb98a3aeb88 ql/src/test/results/clientpositive/perf/tez/query59.q.out 6b2dcc38737cfc9b955cca1d5b1ac99a7901370b ql/src/test/results/clientpositive/perf/tez/query64.q.out a673b9f753a641e111e30a7a4427206d5f2c3da3 ql/src/test/results/clientpositive/perf/tez/query69.q.out a9c7ac3b21b3e0588e7df7e8c2129fc641d090f1 ql/src/test/results/clientpositive/perf/tez/query72.q.out 48682e340db2916800e9bc5ad61c08c0fb4a8a8b ql/src/test/results/clientpositive/perf/tez/query77.q.out 163805b2a3dba3e4169d487bd44e7906f66e5868 ql/src/test/results/clientpositive/perf/tez/query78.q.out 90b6f17e1d10ca1e3af17bc53b6df50ffa310af4 ql/src/test/results/clientpositive/perf/tez/query80.q.out 816b525c301fe74460e5657d0b230287d0a6729f ql/src/test/results/clientpositive/perf/tez/query91.q.out 5e0f00a3e7321c4233f927703701051cab641fb0 ql/src/test/results/clientpositive/perf/tez/query92.q.out 061fcf729d6fa7fde52de3ccd46a800379a92211 ql/src/test/results/clientpositive/perf/tez/query94.q.out 5d19a1634b4657e9ef9595891401e8831d9b0bd4 ql/src/test/results/clientpositive/perf/tez/query95.q.out 400cc1958116b2347a06b52a1460320fd0e0be43 ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning_3.q.out eafc1c4a005fa2b3bc169aa4453376f5da6841bc Diff: https://reviews.apache.org/r/67887/diff/4/ Changes: https://reviews.apache.org/r/67887/diff/3-4/ Testing --- Thanks, Jesús Camacho Rodríguez
[jira] [Created] (HIVE-20170) Improve JoinOperator "rows for join key" Logging
BELUGA BEHR created HIVE-20170: -- Summary: Improve JoinOperator "rows for join key" Logging Key: HIVE-20170 URL: https://issues.apache.org/jira/browse/HIVE-20170 Project: Hive Issue Type: Improvement Components: Operators Affects Versions: 3.0.0, 4.0.0 Reporter: BELUGA BEHR {code} 2018-06-25 09:37:33,193 INFO [main] org.apache.hadoop.hive.ql.exec.CommonJoinOperator: table 0 has 5728000 rows for join key [333, 22] 2018-06-25 09:37:33,901 INFO [main] org.apache.hadoop.hive.ql.exec.CommonJoinOperator: table 0 has 5828000 rows for join key [333, 22] 2018-06-25 09:37:34,623 INFO [main] org.apache.hadoop.hive.ql.exec.CommonJoinOperator: table 0 has 5928000 rows for join key [333, 22] 2018-06-25 09:37:35,342 INFO [main] org.apache.hadoop.hive.ql.exec.CommonJoinOperator: table 0 has 6028000 rows for join key [333, 22] {code} https://github.com/apache/hive/blob/6d890faf22fd1ede3658a5eed097476eab3c67e9/ql/src/java/org/apache/hadoop/hive/ql/exec/JoinOperator.java#L120 This logging should use the same facilities as the other Operators for emitting this type of log message. [HIVE-10078] Maybe this feature should be refactored into an AbstractOperator class? Also, it should print a final count for each join value. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20169) Print Final Rows Processed in MapOperator
BELUGA BEHR created HIVE-20169: -- Summary: Print Final Rows Processed in MapOperator Key: HIVE-20169 URL: https://issues.apache.org/jira/browse/HIVE-20169 Project: Hive Issue Type: Improvement Components: Operators Affects Versions: 3.0.0, 4.0.0 Reporter: BELUGA BEHR https://github.com/apache/hive/blob/ac6b2a3fb195916e22b2e5f465add2ffbcdc7430/ql/src/java/org/apache/hadoop/hive/ql/exec/MapOperator.java#L573-L582 This class emits a log message every time it a certain number of records are processed, but it does not print a final count. Overload the {{MapOperator}} class's {{closeOp}} method to print a final log message providing the total number of rows read by this mapper. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20168) ReduceSinkOperator Logging Hidden
BELUGA BEHR created HIVE-20168: -- Summary: ReduceSinkOperator Logging Hidden Key: HIVE-20168 URL: https://issues.apache.org/jira/browse/HIVE-20168 Project: Hive Issue Type: Bug Components: Operators Affects Versions: 3.0.0, 4.0.0 Reporter: BELUGA BEHR [https://github.com/apache/hive/blob/ac6b2a3fb195916e22b2e5f465add2ffbcdc7430/ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java] {code:java} if (LOG.isTraceEnabled()) { if (numRows == cntr) { cntr = logEveryNRows == 0 ? cntr * 10 : numRows + logEveryNRows; if (cntr < 0 || numRows < 0) { cntr = 0; numRows = 1; } LOG.info(toString() + ": records written - " + numRows); } } ... if (LOG.isTraceEnabled()) { LOG.info(toString() + ": records written - " + numRows); } {code} There are logging guards here checking for TRACE level debugging but the logging is actually INFO. This is important logging for detecting data skew. Please change guards to check for INFO... or I would prefer that the guards are removed altogether since it's very rare that a service is running with only WARN level logging. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20167) apostrophe in midline comment fails with ParseException
Trey Fore created HIVE-20167: Summary: apostrophe in midline comment fails with ParseException Key: HIVE-20167 URL: https://issues.apache.org/jira/browse/HIVE-20167 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 2.3.2 Environment: Observed on an AWS EMR cluster. Hive cli, executing script from bash with "hive -f ..." (not interactive). Reporter: Trey Fore This line causes a ParseException: {{ , member_id string -- standardizing from client's memberID}} When the apostrophe is removed, leaving: {{ , member_id string -- standardizing from clients memberID}} the line is parsed correctly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20166) LazyBinaryStruct Warn Level Logging
BELUGA BEHR created HIVE-20166: -- Summary: LazyBinaryStruct Warn Level Logging Key: HIVE-20166 URL: https://issues.apache.org/jira/browse/HIVE-20166 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Affects Versions: 3.0.0, 4.0.0 Reporter: BELUGA BEHR https://github.com/apache/hive/blob/6d890faf22fd1ede3658a5eed097476eab3c67e9/serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryStruct.java#L177-L180 {code} // Extra bytes at the end? if (!extraFieldWarned && lastFieldByteEnd < structByteEnd) { extraFieldWarned = true; LOG.warn("Extra bytes detected at the end of the row! " + "Last field end " + lastFieldByteEnd + " and serialize buffer end " + structByteEnd + ". " + "Ignoring similar problems."); } // Missing fields? if (!missingFieldWarned && lastFieldByteEnd > structByteEnd) { missingFieldWarned = true; LOG.info("Missing fields! Expected " + fields.length + " fields but " + "only got " + fieldId + "! " + "Last field end " + lastFieldByteEnd + " and serialize buffer end " + structByteEnd + ". " + "Ignoring similar problems."); } {code} The first log statement is a 'warn' level logging, the second is an 'info' level logging. Please change the second log to also be a 'warn'. This seems like it could be a problem that the user would like to know about. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] hive pull request #399: HIVE-20152: reset db state, when repl dump fails, so...
GitHub user anishek opened a pull request: https://github.com/apache/hive/pull/399 HIVE-20152: reset db state, when repl dump fails, so rename table can be done You can merge this pull request into a Git repository by running: $ git pull https://github.com/anishek/hive HIVE-20152 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/399.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #399 commit 780ebaa59627ba4954c4e72fa7e60dad2089a771 Author: Anishek Agarwal Date: 2018-07-13T09:53:09Z HIVE-20152: reset db state, when repl dump fails, so rename table can be done ---
Re: Review Request 67895: Improve HiveMetaStoreClient.dropDatabase
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/67895/ --- (Updated July 13, 2018, 9:22 a.m.) Review request for hive. Changes --- Rebased to top of master again... Bugs: HIVE-18705 https://issues.apache.org/jira/browse/HIVE-18705 Repository: hive-git Description --- HiveMetaStoreClient.dropDatabase has a strange implementation to ensure dealing with client side hooks (for non-native tables e.g. HBase). Currently it starts by retrieving all the tables from HMS, and then sends dropTable calls to HMS table-by-table. At the end a dropDatabase just to be sure I believe this could be refactored so that it speeds up the dropDB in situations where the average table count per DB is very high. Diffs (updated) - hbase-handler/src/test/queries/positive/drop_database_table_hooks.q PRE-CREATION hbase-handler/src/test/results/positive/drop_database_table_hooks.q.out PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/metadata/TableIterable.java d8e771d0ffa7d680b2a22436727f896674cd40ff ql/src/test/org/apache/hadoop/hive/ql/metadata/TestTableIterable.java 6637d150b84c9fa86e6a3a90449606437e7c9d72 service/src/java/org/apache/hive/service/cli/operation/GetColumnsOperation.java 838dd89ca82792ca8af8eb0f30aa63e690e41f43 standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 8d88749effa89e50d8be8ed216419cd77836fd34 standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java bfd7141a8b987e5288277a46d56de32574d9aa69 standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/TableIterable.java PRE-CREATION standalone-metastore/metastore-common/src/test/java/org/apache/hadoop/hive/metastore/TestTableIterable.java PRE-CREATION Diff: https://reviews.apache.org/r/67895/diff/2/ Changes: https://reviews.apache.org/r/67895/diff/1-2/ Testing --- Drop database is an existing feature - existing tests should be fine, but since I'm poking around client side hooks I've added an HBase drop db qtest so that code path is covered Thanks, Adam Szita
[jira] [Created] (HIVE-20165) Enable ZLIB for streaming ingest
Prasanth Jayachandran created HIVE-20165: Summary: Enable ZLIB for streaming ingest Key: HIVE-20165 URL: https://issues.apache.org/jira/browse/HIVE-20165 Project: Hive Issue Type: Bug Components: Streaming, Transactions Affects Versions: 4.0.0, 3.2.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Per [~gopalv]'s recommendation tried running streaming ingest with and without zlib. Following are the numbers *Compression: NONE* Total rows committed: 9380 Throughput: *156* rows/second [prasanth@cn105-10 culvert]$ hdfs dfs -du -s -h /apps/hive/warehouse/prasanth.db/culvert *14.1 G* /apps/hive/warehouse/prasanth.db/culvert *Compression: ZLIB* Total rows committed: 9210 Throughput: *1535000* rows/second [prasanth@cn105-10 culvert]$ hdfs dfs -du -s -h /apps/hive/warehouse/prasanth.db/culvert *7.4 G* /apps/hive/warehouse/prasanth.db/culvert ZLIB is getting us 2x compression and only 2% lesser throughput. We should enable ZLIB by default for streaming ingest. -- This message was sent by Atlassian JIRA (v7.6.3#76005)