[jira] [Created] (HIVE-22663) Quote all table and column names or do not quote any
Ashutosh Bapat created HIVE-22663: - Summary: Quote all table and column names or do not quote any Key: HIVE-22663 URL: https://issues.apache.org/jira/browse/HIVE-22663 Project: Hive Issue Type: Bug Components: HiveServer2, Standalone Metastore Affects Versions: 4.0.0 Reporter: Ashutosh Bapat The change in HIVE-22546 is causing following stack trace when I run Hive with PostgreSQL as backend db for the metastore. 0: jdbc:hive2://localhost:1> create database dumpdb with ('repl.source.for'='1,2,3');0: jdbc:hive2://localhost:1> create database dumpdb with ('repl.source.for'='1,2,3');Error: Error while compiling statement: FAILED: ParseException line 1:28 missing KW_DBPROPERTIES at '(' near '' (state=42000,code=4)0: jdbc:hive2://localhost:1> create database dumpdb with dbproperties ('repl.source.for'='1,2,3');ERROR : FAILED: Hive Internal Error: org.apache.hadoop.hive.ql.lockmgr.LockException(Error communicating with the metastore)org.apache.hadoop.hive.ql.lockmgr.LockException: Error communicating with the metastore at org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.commitTxn(DbTxnManager.java:541) at org.apache.hadoop.hive.ql.Driver.releaseLocksAndCommitOrRollback(Driver.java:687) at org.apache.hadoop.hive.ql.Driver.releaseLocksAndCommitOrRollback(Driver.java:653) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:969) ... stack trace clipped java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)Caused by: MetaException(message:Unable to update transaction database org.postgresql.util.PSQLException: ERROR: relation "materialization_rebuild_locks" does not exist Position: 13 at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2440) at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2183) at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:308) at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:441) at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:365) at This happens because the table names in all the queries in TxnHandler.java (including the one at 1312, which causes this stack trace) are not quoting the table names. All the tablenames and column names should be quoted there. Just the change in HIVE-22546 won't suffice. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22559) Maintain ownership of parent directories of an external table directory after replication
Ashutosh Bapat created HIVE-22559: - Summary: Maintain ownership of parent directories of an external table directory after replication Key: HIVE-22559 URL: https://issues.apache.org/jira/browse/HIVE-22559 Project: Hive Issue Type: Improvement Reporter: Ashutosh Bapat Assignee: Anishek Agarwal For replicating an external table we specify a base directory on the target (say /base_ext for example). The path of an external table directory on the source (say /xyz/abc/ext_t1) is prefixed with the base directory on the target (/base_ext in our example) when replicating the external table data. Thus the path of the external table on the target becomes /base_ext/xyz/abc/ext_t1. In this path only the ownership permissions of ext_t1 directory is preserved but the owenship of xyz and abc directories is set to the user executing REPL LOAD. Instead we should preserve the ownership of xyz and abc as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22512) Use direct SQL to fetch column privileges in refreshPrivileges
Ashutosh Bapat created HIVE-22512: - Summary: Use direct SQL to fetch column privileges in refreshPrivileges Key: HIVE-22512 URL: https://issues.apache.org/jira/browse/HIVE-22512 Project: Hive Issue Type: Improvement Components: Metastore Affects Versions: 4.0.0 Reporter: Ashutosh Bapat Assignee: Ashutosh Bapat refreshPrivileges() calls listTableAllColumnGrants() to fetch the column level privileges. The later function retrieves the individual column objects by firing one query per column privilege object, thus causing the backend db to be swamped by these queries when PrivilegeSynchronizer is run. PrivilegeSynchronizer synchronizes privileges of all the databases, tables and columns and thus the backend db can get swamped really bad when there are thousands of tables with hundreds of columns. The output of listTableAllColumnGrants() is not used completely so all the columns the PM has tried to retrieves anyway goes waste. Fix this by using direct SQL to fetch column privileges. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22313) Some of the HMS auth LDAP hive config names do not start with "hive."
Ashutosh Bapat created HIVE-22313: - Summary: Some of the HMS auth LDAP hive config names do not start with "hive." Key: HIVE-22313 URL: https://issues.apache.org/jira/browse/HIVE-22313 Project: Hive Issue Type: Bug Affects Versions: 4.0.0 Reporter: Ashutosh Bapat Assignee: Ashutosh Bapat -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22300) Deduplicate the authentication and LDAP code in HMS and HS2
Ashutosh Bapat created HIVE-22300: - Summary: Deduplicate the authentication and LDAP code in HMS and HS2 Key: HIVE-22300 URL: https://issues.apache.org/jira/browse/HIVE-22300 Project: Hive Issue Type: Improvement Components: HiveServer2, Standalone Metastore Reporter: Ashutosh Bapat Assignee: Ashutosh Bapat HIVE-22267 has duplicated code from hive-service/auth directory under standalone-metastore directory. Deduplicate this code. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22267) Support password based authentication in HMS
Ashutosh Bapat created HIVE-22267: - Summary: Support password based authentication in HMS Key: HIVE-22267 URL: https://issues.apache.org/jira/browse/HIVE-22267 Project: Hive Issue Type: New Feature Reporter: Ashutosh Bapat Assignee: Ashutosh Bapat Similar to HS2, support password based authentication in HMS. Right now we provide LDAP and CONFIG based options. The later allows to set user and password in config and is used only for testing. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22110) Initialize ReplChangeManager before starting actual dump
Ashutosh Bapat created HIVE-22110: - Summary: Initialize ReplChangeManager before starting actual dump Key: HIVE-22110 URL: https://issues.apache.org/jira/browse/HIVE-22110 Project: Hive Issue Type: Bug Components: HiveServer2, repl Affects Versions: 4.0.0 Reporter: Ashutosh Bapat Assignee: Ashutosh Bapat Fix For: 4.0.0 REPL DUMP calls ReplChageManager.encodeFileUri() to add cmroot and checksum to the url. This requires ReplChangeManager to be initialized. So, initialize Repl change manager when taking a dump. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (HIVE-22068) Add more logging to notification cleaner and replication to track events
Ashutosh Bapat created HIVE-22068: - Summary: Add more logging to notification cleaner and replication to track events Key: HIVE-22068 URL: https://issues.apache.org/jira/browse/HIVE-22068 Project: Hive Issue Type: Improvement Reporter: Ashutosh Bapat Assignee: Ashutosh Bapat * Add more logging to DB notification listener cleaner thread ** The time when it considered cleaning, the interval and time before which events were cleared, the min and max id at that time ** how many events were cleared ** min and max id after the cleaning. * In REPL::START document the starting event, end event if specified and the maximum number of events, if specified. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (HIVE-22036) HMS should identify events corresponding to replicated database for Atlas HMS hook
Ashutosh Bapat created HIVE-22036: - Summary: HMS should identify events corresponding to replicated database for Atlas HMS hook Key: HIVE-22036 URL: https://issues.apache.org/jira/browse/HIVE-22036 Project: Hive Issue Type: Improvement Components: Metastore Reporter: Ashutosh Bapat Assignee: Ashutosh Bapat An HMS Atlas hook allows an Atlas to create/update/delete its metadata based on the corresponding events in HMS. But Atlas replication happens out-side and before the Hive replication. Thus any events generated during replication may change the Atlas data already replicated, thus interfering with Atlas replication. Hence, provide an HMS interface which the hook can use to identify the events caused by Hive replication flow. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (HIVE-21960) HMS tasks on replica
Ashutosh Bapat created HIVE-21960: - Summary: HMS tasks on replica Key: HIVE-21960 URL: https://issues.apache.org/jira/browse/HIVE-21960 Project: Hive Issue Type: Improvement Components: HiveServer2, repl Affects Versions: 4.0.0 Reporter: Ashutosh Bapat Assignee: Ashutosh Bapat An HMS performs a number of housekeeping tasks. Assess whether # They are required to be performed in the replicated data # Performing those on replicated data causes any issues and how to fix those. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21841) Leader election in HMS to run housekeeping tasks.
Ashutosh Bapat created HIVE-21841: - Summary: Leader election in HMS to run housekeeping tasks. Key: HIVE-21841 URL: https://issues.apache.org/jira/browse/HIVE-21841 Project: Hive Issue Type: New Feature Reporter: Ashutosh Bapat Assignee: Ashutosh Bapat HMS performs housekeeping tasks. When there are multiple HMSes we need to have a leader HMS elected which will carry out those housekeeping tasks. These tasks include execution of compaction tasks, auto-discovering partitions for external tables, generation of compaction tasks, repl thread etc. Note that, though the code for compaction tasks, auto-discovery of partitions etc. is in Hive, the actual tasks are initiated by an HMS configured to do so. So, leader election is required only for HMS and not for HS2. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21801) Tests using miniHS2 with HTTP as transport are creating miniHS2 with binary transport
Ashutosh Bapat created HIVE-21801: - Summary: Tests using miniHS2 with HTTP as transport are creating miniHS2 with binary transport Key: HIVE-21801 URL: https://issues.apache.org/jira/browse/HIVE-21801 Project: Hive Issue Type: Bug Reporter: Ashutosh Bapat Assignee: Ashutosh Bapat Even though tests using miniHS2 set the config hive.server2.transport.mode is set to http, miniHS2 is created with binary transport. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21783) Avoid authentication for connection from the same domain
Ashutosh Bapat created HIVE-21783: - Summary: Avoid authentication for connection from the same domain Key: HIVE-21783 URL: https://issues.apache.org/jira/browse/HIVE-21783 Project: Hive Issue Type: New Feature Components: HiveServer2 Reporter: Ashutosh Bapat Assignee: Ashutosh Bapat When a connection comes from the same domain do not authenticate the user. This is similar to NONE authentication but only for the connection from the same domain. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21776) Add test for incremental replication of a UDF with jar on HDFS
Ashutosh Bapat created HIVE-21776: - Summary: Add test for incremental replication of a UDF with jar on HDFS Key: HIVE-21776 URL: https://issues.apache.org/jira/browse/HIVE-21776 Project: Hive Issue Type: Test Affects Versions: 4.0.0 Reporter: Ashutosh Bapat Assignee: Ashutosh Bapat Fix For: 4.0.0 TestReplicationScenariosAcrossInstances has test to test bootstrap of a UDF with jar on HDFS but no test for incremental. Add the same. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21679) Replicating a CTAS event creating an MM partitioned table fails
Ashutosh Bapat created HIVE-21679: - Summary: Replicating a CTAS event creating an MM partitioned table fails Key: HIVE-21679 URL: https://issues.apache.org/jira/browse/HIVE-21679 Project: Hive Issue Type: Sub-task Components: HiveServer2, repl Affects Versions: 4.0.0 Reporter: Ashutosh Bapat use dumpdb; create table t1 (a int, b int); insert into t1 values (1, 2), (3, 4); create table t6_mm_part partitioned by (a) stored as orc tblproperties ("transactional"="true", "transactional_properties"="insert_only") as select * from t1 create table t6_mm stored as orc tblproperties ("transactional"="true", "transactional_properties"="insert_only") as select * from t1; repl dump dumpdb; create table t6_mm_part_2 partitioned by (a) stored as orc tblproperties ("transactional"="true", "transactional_properties"="insert_only") as select * from t1; create table t6_mm_2 partitioned by (a) stored as orc tblproperties ("transactional"="true", "transactional_properties"="insert_only") as select * from t1; repl dump dumpdb from repl load loaddb from '/tmp/dump/next'; ERROR : failed replication org.apache.hadoop.hive.ql.parse.SemanticException: Invalid table name loaddb.dumpdb.t6_mm_part_2 at org.apache.hadoop.hive.ql.exec.Utilities.getDbTableName(Utilities.java:2253) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.Utilities.getDbTableName(Utilities.java:2239) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.plan.AlterTableDesc.setOldName(AlterTableDesc.java:419) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.repl.incremental.IncrementalLoadTasksBuilder.tableUpdateReplStateTask(IncrementalLoadTasksBuilder.java:286) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.repl.incremental.IncrementalLoadTasksBuilder.addUpdateReplStateTasks(IncrementalLoadTasksBuilder.java:371) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.repl.incremental.IncrementalLoadTasksBuilder.analyzeEventLoad(IncrementalLoadTasksBuilder.java:244) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.repl.incremental.IncrementalLoadTasksBuilder.build(IncrementalLoadTasksBuilder.java:139) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.repl.ReplLoadTask.executeIncrementalLoad(ReplLoadTask.java:488) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.repl.ReplLoadTask.execute(ReplLoadTask.java:102) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:212) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2709) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:2361) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:2028) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1788) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1782) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:162) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:233) ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hive.service.cli.operation.SQLOperation.access$600(SQLOperation.java:88) ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:332) ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_191] at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_191] at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1688) ~[hadoop-common-3.1.0.3.0.0.0-1634.jar:?] at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:350) ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_191] at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_191] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_191] at java.util.concurrent.ThreadPool
[jira] [Created] (HIVE-21678) CTAS creating a partitioned table fails because of no writeId
Ashutosh Bapat created HIVE-21678: - Summary: CTAS creating a partitioned table fails because of no writeId Key: HIVE-21678 URL: https://issues.apache.org/jira/browse/HIVE-21678 Project: Hive Issue Type: Sub-task Components: HiveServer2, repl Affects Versions: 4.0.0 Reporter: Ashutosh Bapat create table t1(a int, b int); insert into t1 values (1, 2), (3, 4); create table t6_part partitioned by (a) stored as orc tblproperties ("transactional"="true") as select * from t1; ERROR : FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask. MoveTask : Write id is not set in the config by open txn task for migration Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask. MoveTask : Write id is not set in the config by open txn task for migration (state=08S01,code=1) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21677) Using strict managed tables for ACID table testing (Replication tests)
Ashutosh Bapat created HIVE-21677: - Summary: Using strict managed tables for ACID table testing (Replication tests) Key: HIVE-21677 URL: https://issues.apache.org/jira/browse/HIVE-21677 Project: Hive Issue Type: Bug Components: HiveServer2, repl Affects Versions: 4.0.0 Reporter: Ashutosh Bapat The replication tests which exclusively test ACID table replication are adding transactional properties to the create table/alter table statements when creating the table. Instead they should use hive.strict.managed.tables = true in those tests. Tests derived from BaseReplicationScenariosAcidTables, and org.apache.hadoop.hive.ql.parse.TestReplicationScenariosIncrementalLoadAcidTables are examples of those. Change all such tests use hive.strict.managed.tables = true. Some of these tests create non-acid tables for testing, which will then require explicit 'transactional'=false set when creating the tables. With this change we might see some test failures (See subtasks). Please create subtasks for those so that it can be tracked within this JIRA. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: [ANNOUNCE] New committer: Laszlo Bodor
Congratulations Laszlo. On Mon, Apr 15, 2019 at 9:08 AM Ashutosh Chauhan wrote: > Apache Hive's Project Management Committee (PMC) has invited Laszlo > Bodor to become a committer, and we are pleased to announce that he has > accepted. > > Laszlo welcome, thank you for your contributions, and we look forward your > further interactions with the community! > > Ashutosh Chauhan (on behalf of the Apache Hive PMC) > -- -- Best Wishes, Ashutosh Bapat
[jira] [Created] (HIVE-21598) CTAS on ACID table during incremental does not replicate data
Ashutosh Bapat created HIVE-21598: - Summary: CTAS on ACID table during incremental does not replicate data Key: HIVE-21598 URL: https://issues.apache.org/jira/browse/HIVE-21598 Project: Hive Issue Type: Bug Components: HiveServer2, repl Reporter: Ashutosh Bapat Scenario create database dumpdb with dbproperties('repl.source.for'='1,2,3'); use dumpdb; create table t1 (id int) clustered by(id) into 3 buckets stored as orc tblproperties ("transactional"="true"); insert into t1 values(1); insert into t1 values(2); repl dump dumpdb; repl load loaddb from ; use loaddb; select * from t1; ++ | t6.id | ++ | 1 | | 2 | + use dumpdb; create table t6 stored as orc tblproperties ("transactional"="true") as select * from t1; select * from t6; ++ | t6.id | ++ | 1 | | 2 | ++ repl dump dumpdb from repl load loaddb from ; use loaddb; select * from t6; ++ | t6.id | ++ ++ t6 gets created but there's no data. On further investigation, I see that the CommitTxnEvent's dump directory has _files but it is empty. Looks like we do not log names of the files created as part of CTAS. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21476) Wrap metastore backing db upgrade scripts into transaction
Ashutosh Bapat created HIVE-21476: - Summary: Wrap metastore backing db upgrade scripts into transaction Key: HIVE-21476 URL: https://issues.apache.org/jira/browse/HIVE-21476 Project: Hive Issue Type: Improvement Reporter: Ashutosh Bapat The metastore backing db upgrade scripts like upgrade* scripts in standalone-metastore/metastore-server/src/main/sql/* directories do not use transactions. So if a command fails in those scripts metastore db is left in an inconsistent state. Instead we should wrap each of those scripts in a transaction so that all or none of the commands take effect. Some RDBMSes, I think derby, do not support DDL in transaction. So we should do this change only for the databases which support DDL in transaction. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21462) Upgrading SQL server backed metastore when changing data type of a column with constraints
Ashutosh Bapat created HIVE-21462: - Summary: Upgrading SQL server backed metastore when changing data type of a column with constraints Key: HIVE-21462 URL: https://issues.apache.org/jira/browse/HIVE-21462 Project: Hive Issue Type: Bug Components: Standalone Metastore Reporter: Ashutosh Bapat Assignee: Ashutosh Bapat Fix For: 4.0.0 SQL server does not allow changing data type of a column which has a constraint or an index on it. The constraint or the index needs to be dropped before changing the data type and needs to be recreated after that. Metastore upgrade scripts aren't doing this and thus upgrade fails. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21430) INSERT into a dynamically partitioned table with hive.stats.autogather = false throws a MetaException
Ashutosh Bapat created HIVE-21430: - Summary: INSERT into a dynamically partitioned table with hive.stats.autogather = false throws a MetaException Key: HIVE-21430 URL: https://issues.apache.org/jira/browse/HIVE-21430 Project: Hive Issue Type: Bug Reporter: Ashutosh Bapat Assignee: Ashutosh Bapat Attachments: metaexception_repro.patch, org.apache.hadoop.hive.ql.stats.TestStatsUpdaterThread-output.txt When the test TestStatsUpdaterThread#testTxnDynamicPartitions added in the attached patch is run it throws exception (full logs attached.) org.apache.hadoop.hive.metastore.api.MetaException: Cannot change stats state for a transactional table default.simple_stats without providing the transactional write state for verification (new write ID 5, valid write IDs null; current state \{"BASIC_STATS":"true","COLUMN_STATS":{"s":"true"}}; new state null at org.apache.hadoop.hive.metastore.ObjectStore.alterPartitionNoTxn(ObjectStore.java:4328) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21306) Upgrade HttpComponents to the latest versions similar to what Hadoop has done.
Ashutosh Bapat created HIVE-21306: - Summary: Upgrade HttpComponents to the latest versions similar to what Hadoop has done. Key: HIVE-21306 URL: https://issues.apache.org/jira/browse/HIVE-21306 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 4.0.0 Reporter: Ashutosh Bapat Assignee: Ashutosh Bapat Fix For: 4.0.0 The use of HTTPClient 4.5.2 breaks the use of SPNEGO over TLS. It mistakenly added HTTPS instead of HTTP to the principal when over SSL and thus breaks the authentication. This was upgraded recently in Hadoop and needs to be done for Hive as well. See: HADOOP-16076 Where we upgraded from 4.5.2 and 4.4.4 to 4.5.6 and 4.4.10. 4.5.2 4.4.4 + 4.5.6 + 4.4.10 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] hive pull request #511: HIVE-21078: Replicate column and table level statist...
Github user ashutosh-bapat closed the pull request at: https://github.com/apache/hive/pull/511 ---
[GitHub] hive pull request #522: HIVE-21079: Stats replication for partitioned table
GitHub user ashutosh-bapat reopened a pull request: https://github.com/apache/hive/pull/522 HIVE-21079: Stats replication for partitioned table The first commit is for stats replication for partitioned table. The other two commits are fixing bugs in existing code, AFAIU. @sankarh can you please review? You can merge this pull request into a Git repository by running: $ git pull https://github.com/ashutosh-bapat/hive hive21079 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/522.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #522 commit a8b729ab1f120cd50c0ad0e096bb0f724a178838 Author: Ashutosh Bapat Date: 2019-01-15T12:06:29Z HIVE-21079: Replicate statistics for partitioned, non-transactional tables. Ashutosh Bapat commit c8aea7b85a06ab53873ce60eb08fcf0514806787 Author: Ashutosh Bapat Date: 2019-01-18T05:31:44Z HIVE-21079: ALTER PARTITION events not applied during incremental replication In AlterPartitionHandler, we set withinContext.replicationSpec.setIsMetadataOnly(true); In ImportSemanticAnalyzer.createReplImportTasks(), per code around line 1197, we do not add new PartitionSpecs and corresponding tasks. This means that we never apply an ALTER_PARTITION event during incremental load. That looks like a serious bug. Either we should check PartitionDescs irrespective of replicationSpec.setIsMetadataOnly() OR we shouldnât set replicationSpec.setIsMetadataOnly() to true while dumping an ALTER_PARTITION event. We set replicationSpec.setIsMetadataOnly(true) for ALTER TABLE events as well, so doing that for ALTER PARTITION event looks fine. Ashutosh Bapat. commit 536492395cd5c280738c2ec1038c39036b477209 Author: Ashutosh Bapat Date: 2019-01-18T06:07:37Z HIVE-21079: Do not dump partition related events during a metadata only dump. During bootstrap metadata-only dump we do not dump partitions (See TableExport.getPartitions(). For bootstrap dump we always pass TableSpec with TABLE_ONLY set.). So don't dump partition related events for a metadata-only dump. Ashutosh Bapat. ---
[GitHub] hive pull request #522: HIVE-21079: Stats replication for partitioned table
Github user ashutosh-bapat closed the pull request at: https://github.com/apache/hive/pull/522 ---
[GitHub] hive pull request #522: Hive21079: Stats replication for partitioned table
GitHub user ashutosh-bapat opened a pull request: https://github.com/apache/hive/pull/522 Hive21079: Stats replication for partitioned table The first commit is for stats replication for partitioned table. The other two commits are fixing bugs in existing code, AFAIU. @sankarh can you please review? You can merge this pull request into a Git repository by running: $ git pull https://github.com/ashutosh-bapat/hive hive21079 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/522.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #522 commit a8b729ab1f120cd50c0ad0e096bb0f724a178838 Author: Ashutosh Bapat Date: 2019-01-15T12:06:29Z HIVE-21079: Replicate statistics for partitioned, non-transactional tables. Ashutosh Bapat commit c8aea7b85a06ab53873ce60eb08fcf0514806787 Author: Ashutosh Bapat Date: 2019-01-18T05:31:44Z HIVE-21079: ALTER PARTITION events not applied during incremental replication In AlterPartitionHandler, we set withinContext.replicationSpec.setIsMetadataOnly(true); In ImportSemanticAnalyzer.createReplImportTasks(), per code around line 1197, we do not add new PartitionSpecs and corresponding tasks. This means that we never apply an ALTER_PARTITION event during incremental load. That looks like a serious bug. Either we should check PartitionDescs irrespective of replicationSpec.setIsMetadataOnly() OR we shouldnât set replicationSpec.setIsMetadataOnly() to true while dumping an ALTER_PARTITION event. We set replicationSpec.setIsMetadataOnly(true) for ALTER TABLE events as well, so doing that for ALTER PARTITION event looks fine. Ashutosh Bapat. commit 536492395cd5c280738c2ec1038c39036b477209 Author: Ashutosh Bapat Date: 2019-01-18T06:07:37Z HIVE-21079: Do not dump partition related events during a metadata only dump. During bootstrap metadata-only dump we do not dump partitions (See TableExport.getPartitions(). For bootstrap dump we always pass TableSpec with TABLE_ONLY set.). So don't dump partition related events for a metadata-only dump. Ashutosh Bapat. ---
getTable variants in InjectableBehaviourObjectStore
Hi, There are two getTable variants in InjectableBehaviourObjectStore() each calling corresponding super.getTable() and passing the returned value to getTableModifier.apply(). ObjectStore, which is super class here, itself has the same getTable variants. A result is that when one variant of getTable() which calls another in ObjectStore is called from InjectableBehaviourObjectStore, it may result in a NULL pointer exception if the injected apply is not careful about the input value. E.g let's say the getTable variants are getTable(A) and getTable(B) in ObjectStore.java. InjectableBehaviourObjectStore also implements those two variants calling corresponding variants of super class ObjectStore (using super.getTable()). With the inheritance the call stack looks like InjectableBehaviourObjectStore.getTable(A) calls ObjectStore.getTable(A) calls InjectableBehaviourObjectStore.getTable(A, B) calls Object.getTable(A, B). If getTable() variants in InjectableBehaviourObjectStore return NULL, as most of them do, the apply() method will end up with a NullPointerException if it's not careful about its input. And not many implementation of apply are careful. I think this should be fixed in InjectableBehaviourObjectStore(), by avoiding to apply apply() method i.e injection twice by tracking whether the apply() method has already been applied already. Does that sound good? -- Best Wishes, Ashutosh Bapat
[jira] [Created] (HIVE-21110) Stats replication for materialized views
Ashutosh Bapat created HIVE-21110: - Summary: Stats replication for materialized views Key: HIVE-21110 URL: https://issues.apache.org/jira/browse/HIVE-21110 Project: Hive Issue Type: Sub-task Reporter: Ashutosh Bapat Assignee: Ashutosh Bapat Check if materialized views have stats associated with them. If so, support replicating those statistics. Most of this should be testing whether the code for table level stats replication is working for materialized views as well. But since materialized views are handled as views, they have slightly different code path than normal tables e.g. creating a materialized view. Those paths will need fixes along the lines of normal table. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21108) Assign writeId for stats update for a converted transactional table
Ashutosh Bapat created HIVE-21108: - Summary: Assign writeId for stats update for a converted transactional table Key: HIVE-21108 URL: https://issues.apache.org/jira/browse/HIVE-21108 Project: Hive Issue Type: Sub-task Reporter: Ashutosh Bapat Assignee: Ashutosh Bapat When a non-ACID table on the source is converted to an ACID table on the target, the subsequent statistics update (column as well as table level) dumped on the source won't have writeId and snapshot associated with those. When loading those updates on the target we need to associate an appropriate writeId with them. This applies to both a bootstrap and an incremental dump and load. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] hive pull request #511: HIVE-21078: Replicate column and table level statist...
GitHub user ashutosh-bapat opened a pull request: https://github.com/apache/hive/pull/511 HIVE-21078: Replicate column and table level statistics for unpartitioned Hive tables @maheshk114, @sankarh can you please review? You can merge this pull request into a Git repository by running: $ git pull https://github.com/ashutosh-bapat/hive hive21078 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/511.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #511 commit db98502a44f69f255924231b03e2145248c9be0f Author: Ashutosh Bapat Date: 2018-12-19T04:49:29Z HIVE-21078: Replicate column and table level statistics for unpartitioned Hive tables The column statistics is included as part of the Table object during bootstrap dump and loaded when corresponding table is created on replica. During incremental dump and load, UpdateTableColStats event is used to replicate the statistics. In both the cases, the statistics is replicated only when the data is replicated. Ashutosh Bapat ---
[jira] [Created] (HIVE-21079) Replicate column statistics for partitions of partitioned Hive table.
Ashutosh Bapat created HIVE-21079: - Summary: Replicate column statistics for partitions of partitioned Hive table. Key: HIVE-21079 URL: https://issues.apache.org/jira/browse/HIVE-21079 Project: Hive Issue Type: Sub-task Reporter: Ashutosh Bapat Assignee: Ashutosh Bapat This task is for replicating statistics for partitions of a partitioned Hive table. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21078) Replicate table level column statistics for Hive tables
Ashutosh Bapat created HIVE-21078: - Summary: Replicate table level column statistics for Hive tables Key: HIVE-21078 URL: https://issues.apache.org/jira/browse/HIVE-21078 Project: Hive Issue Type: Sub-task Reporter: Ashutosh Bapat Assignee: Ashutosh Bapat This task is for replicating table level statistics. Partition level statistics will be worked upon in a separate sub-task. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21037) Replicate column statistics for Hive tables
Ashutosh Bapat created HIVE-21037: - Summary: Replicate column statistics for Hive tables Key: HIVE-21037 URL: https://issues.apache.org/jira/browse/HIVE-21037 Project: Hive Issue Type: Improvement Components: HiveServer2 Reporter: Ashutosh Bapat Assignee: Ashutosh Bapat Statistics is important for query optimizations and thus keeping those up-to-date on replica is important from query performance perspective. The statistics are collected by scanning a table entirely. Thus when the data is replicated a. we could update the statistics by scanning it on replica or b. we could just replicate the statistics also. For following reasons we desire to go by the second approach instead of the first. # Scanning the data on replica isn’t a good option since it wastes CPU cycles and puts load during replication, which can be significant. # Storages like S3 may not have compute capabilities and thus when we are replicating from on-prem to cloud, we can not rely on the target to gather statistics. # For ACID tables, the statistics should be associated with the snapshot. This means the statistics collection on target should sync with the write-id on the source since target doesn't generate target ids of its own. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21022) Fix remote metastore tests which use ZooKeeper
Ashutosh Bapat created HIVE-21022: - Summary: Fix remote metastore tests which use ZooKeeper Key: HIVE-21022 URL: https://issues.apache.org/jira/browse/HIVE-21022 Project: Hive Issue Type: Bug Components: Tests Affects Versions: 4.0.0 Reporter: Ashutosh Bapat Assignee: Ashutosh Bapat Fix For: 4.0.0 Per [~vgarg]'s comment on HIVE-20794 at https://issues.apache.org/jira/browse/HIVE-20794?focusedCommentId=16714093=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16714093, the remote metatstore tests using ZooKeeper are flaky. They are failing with error "Got exception: org.apache.zookeeper.KeeperException$NoNodeException KeeperErrorCode = NoNode for /hs2mszktest". Both of these tests are using the same root namespace and hence the reason for this failure could be that the root namespace becomes unavailable to one test when the other drops it. The drop seems to be happening automatically through TestingServer code. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] hive pull request #491: HIVE-20953: Fix testcase TestReplicationScenariosAcr...
Github user ashutosh-bapat closed the pull request at: https://github.com/apache/hive/pull/491 ---
[GitHub] hive pull request #487: Hive20794
Github user ashutosh-bapat closed the pull request at: https://github.com/apache/hive/pull/487 ---
[GitHub] hive pull request #445: HIVE-20542
Github user ashutosh-bapat closed the pull request at: https://github.com/apache/hive/pull/445 ---
[GitHub] hive pull request #439: HIVE-20644: Avoid exposing sensitive infomation thro...
Github user ashutosh-bapat closed the pull request at: https://github.com/apache/hive/pull/439 ---
[GitHub] hive pull request #491: HIVE-20953: Fix testcase TestReplicationScenariosAcr...
GitHub user ashutosh-bapat opened a pull request: https://github.com/apache/hive/pull/491 HIVE-20953: Fix testcase TestReplicationScenariosAcrossInstances#test⦠Fix testcase TestReplicationScenariosAcrossInstances#testBootstrapReplLoadRetryAfterFailureForPartitions to not depend upon the order in which objects get loaded. @anishek or @maheshk114 can you please review the change? You can merge this pull request into a Git repository by running: $ git pull https://github.com/ashutosh-bapat/hive hive20953 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/491.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #491 commit 5c9a72bd5f772b7cefdc86c397a7771c2059043a Author: Ashutosh Bapat Date: 2018-11-21T08:25:38Z HIVE-20953: Fix testcase TestReplicationScenariosAcrossInstances#testBootstrapReplLoadRetryAfterFailureForPartitions to not depend upon the order in which objects get loaded Ashutosh Bapat ---
[jira] [Created] (HIVE-20953) Fix testcase TestReplicationScenariosAcrossInstances#testBootstrapReplLoadRetryAfterFailureForPartitions to not depend upon the order in which objects get loaded
Ashutosh Bapat created HIVE-20953: - Summary: Fix testcase TestReplicationScenariosAcrossInstances#testBootstrapReplLoadRetryAfterFailureForPartitions to not depend upon the order in which objects get loaded Key: HIVE-20953 URL: https://issues.apache.org/jira/browse/HIVE-20953 Project: Hive Issue Type: Bug Components: Tests Affects Versions: 4.0.0 Reporter: Ashutosh Bapat Assignee: Ashutosh Bapat Fix For: 4.0.0 The testcase is intended to test REPL LOAD with retry. The test creates a partitioned table and a function in the source database and loads those to the replica. The first attempt to load a dump is intended to fail while loading one of the partitions. Based on the order in which the objects get loaded, if the function is queued after the table, it will not be available in replica after the load failure. But if it's queued before the table, it will be available in replica even after the load failure. The test assumes the later case, which may not be true always. Hence fix the testcase to order the objects by a fixed ordering. By setting hive.in.repl.test.files.sorted to true, the objects are ordered by the directory names. This ordering is available with minimal changes for testing, hence we use it. With this ordering a function gets loaded before a table. So changed the test to not expect the function to be available after the failed load, but be available after the retry -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: [ANNOUNCE] New committer: Mahesh Behera
Congratulations Mahesh. On Sat, Nov 17, 2018 at 7:30 PM Peter Vary wrote: > Congratulations Mahesh! > > > On Nov 17, 2018, at 04:36, Sankar Hariappan > wrote: > > > > Congrats Mahesh! > > > > Best regards > > Sankar > > > > > > > > > > > > > > > > > > > > On 17/11/18, 6:54 AM, "Ashutosh Chauhan" wrote: > > > >> Apache Hive's Project Management Committee (PMC) has invited Mahesh > >> Behera to become a committer, and we are pleased to announce that he has > >> accepted. > >> Mahesh, welcome, thank you for your contributions, and we look forward > to > >> your further interactions with the community! > >> > >> Thanks, > >> Ashutosh Chauhan (on behalf of the Apache Hive PMC) > > -- -- Best Wishes, Ashutosh Bapat
[GitHub] hive pull request #487: Hive20794
GitHub user ashutosh-bapat opened a pull request: https://github.com/apache/hive/pull/487 Hive20794 Find more details about the changes in HIVE-20794. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ashutosh-bapat/hive hive20794 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/487.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #487 commit 383e8be33934d078bad2e8fe1233cc0f3c6119ed Author: Ashutosh Bapat Date: 2018-10-26T08:22:04Z HIVE-20794: Use Zookeeper for dynamic service discovery of metastore. The patch also adds new ZooKeeper configurations for metastore. We reuse THRIFT_URIs to specify ZooKeeper quorum and have another configuration by name THRIFT_SERVICE_DISCOVERY_MODE to specify what method to use for dynamic service discovery. Ashutosh Bapat commit a38e2e8c9fdc85cd809a1aac9d16ed1d204117bb Author: Ashutosh Bapat Date: 2018-11-13T09:05:03Z HIVE-20794: Refactor existing code for supporting metastore dynamic discovery using Zookeeper Extract the code in HiveServer2 dealing with ZooKeeper into a ZooKeeperHiveHelper class so that it can be used by MetaStore server as well. This also moves the ZooKeeperHiveHelper.java into a location common to both HiveServer2 and MetaStore code. Ashutosh Bapat ---
[jira] [Created] (HIVE-20794) Use Zookeeper for metastore service discovery
Ashutosh Bapat created HIVE-20794: - Summary: Use Zookeeper for metastore service discovery Key: HIVE-20794 URL: https://issues.apache.org/jira/browse/HIVE-20794 Project: Hive Issue Type: Improvement Reporter: Ashutosh Bapat Assignee: Ashutosh Bapat Right now, multiple metastore services can be specified in hive.metastore.uris configuration, but that list is static and can not be modified dynamically. Use Zookeeper for dynamic service discovery of metastore. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] hive pull request #447: HIVE-20708: Load an external table as an external ta...
GitHub user ashutosh-bapat opened a pull request: https://github.com/apache/hive/pull/447 HIVE-20708: Load an external table as an external table on target with the same location as on the source Dump an external table as an external table. When loading an external table set the location of the target table same as the location of source, but relative to the file system of the target location. IOW, the scheme, authority of the target location is same as the target file system but the path relative to the file system is same as the source. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ashutosh-bapat/hive hive20708 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/447.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #447 commit c076bbbd2b0fd1b193ac51a1595911a80324b923 Author: Ashutosh Bapat Date: 2018-10-15T05:09:05Z HIVE-20708: Load an external table as an external table on target with the same location as on the source Dump an external table as an external table. When loading an external table set the location of the target table same as the location of source, but relative to the file system of the target location. IOW, the scheme, authority of the target location is same as the target file system but the path relative to the file system is same as the source. ---
[GitHub] hive pull request #445: HIVE-20542
GitHub user ashutosh-bapat opened a pull request: https://github.com/apache/hive/pull/445 HIVE-20542 Changes for HIVE-20542. There are three separate commits, with each commit message explaining purpose of that commit. They all should be pulled together as a single commit into Apache hive repository. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ashutosh-bapat/hive hive20542 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/445.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #445 commit e76dd0d51662d5e1bc1890010b95d305f2505ea8 Author: Ashutosh Bapat Date: 2018-10-01T05:17:06Z HIVE-20542: Insert NULL value for columns of NOTIFICATION_LOG for which values are not available When no database is associated with an event we insert 'null' as database name in the metastore. With this commit, we insert NULL as database name. When no tablename is associated with an event we insert an empty string as table name in the metastore. With this commit, we insert NULL as table name. Even if a catalog name is associated with an event, addNotificationLog() doesn't insert catalog in the metastore. With this commit we take care of that as well. Ashutosh Bapat. commit 31aee469baffb95641fb68f70a6bcaa0ca725d28 Author: Ashutosh Bapat Date: 2018-10-01T06:11:15Z HIVE-20542: Modify query used to count the number of events to be replicated incrementally The query used to count the events for a given incremental replication does not 1. Count event with NULL database, table or catalog names. 2. Does not consider toEventId and Limit for the given incremental replication. Ashutosh Bapat. commit d3a319c5fd347018572c13b40e1e8f7cdbe72050 Author: Ashutosh Bapat Date: 2018-10-04T04:08:44Z HIVE-20644: Add tests for testing getNotificationEventsCount(). Ashutosh Bapat. ---
[jira] [Created] (HIVE-20708) Load (dumped) an external table as an external table on target with the same location as on the source
Ashutosh Bapat created HIVE-20708: - Summary: Load (dumped) an external table as an external table on target with the same location as on the source Key: HIVE-20708 URL: https://issues.apache.org/jira/browse/HIVE-20708 Project: Hive Issue Type: Improvement Components: repl Reporter: Ashutosh Bapat Assignee: Ashutosh Bapat External tables are currently mapped to managed tables on target. A lot of jobs in user environment are dependent upon locations specified in external table definitions to run, hence, the path for external tables on the target and on the source are expected to be the same. An external table being loaded as a managed table makes it difficult for failover (Controlled Failover) / failback since there is no option of moving data from managed to external table. So the external table replicated to target cluster needs to be kept as external table with same location as on the source. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Error running checkstyle:checkstyle goal
Hi, I am trying to run "mvn checkstyle:checkstyle" to catch checkstyle errors before submitting a patch. But while running that command I get an error [ERROR] Failed to execute goal org.apache.maven.plugins:maven-checkstyle-plugin:2.17:checkstyle (default-cli) on project hive-standalone-metastore-common: An error has occurred in Checkstyle report generation. Failed during checkstyle execution: Unable to find configuration file at location: /work/bug110208/cr/standalone-metastore/metastore-common/checkstyle//checkstyle.xml: Could not find resource '/work/bug110208/cr/standalone-metastore/metastore-common/checkstyle//checkstyle.xml'. Looks like we are missing checkstyle.xml file in the given location. Looks like we need to fix that. I also wonder, how does jenkins is able to run checkstyle. -- Best Wishes, Ashutosh Bapat
Confluence edit permission
Hi Lefty, I would like to get permissions to edit pages in confluence. I am working with Hive team in Hortonworks. I couldn't find a way to request edit permissions through confluence, hence this mail. -- Best Wishes, Ashutosh Bapat
[GitHub] hive pull request #439: HIVE-20644: Avoid exposing sensitive infomation thro...
GitHub user ashutosh-bapat opened a pull request: https://github.com/apache/hive/pull/439 HIVE-20644: Avoid exposing sensitive infomation through a Hive Runtime exception (Ashutosh Bapat) You can merge this pull request into a Git repository by running: $ git pull https://github.com/ashutosh-bapat/hive hive20644 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/439.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #439 commit f5c4b22ebbfc4893b2aa436d9ea9b4241f04340b Author: Ashutosh Bapat Date: 2018-09-27T05:51:52Z HIVE-20644: Avoid exposing sensitive infomation through a Hive Runtime exception (Ashutosh Bapat) ---
[jira] [Created] (HIVE-20644) Avoid exposing sensitive infomation through an error message
Ashutosh Bapat created HIVE-20644: - Summary: Avoid exposing sensitive infomation through an error message Key: HIVE-20644 URL: https://issues.apache.org/jira/browse/HIVE-20644 Project: Hive Issue Type: Improvement Components: HiveServer2 Reporter: Ashutosh Bapat Assignee: Ashutosh Bapat The HiveException raised from the following methods is exposing the datarow the caused the run time exception. # ReduceRecordSource::GroupIterator::next() - around line 372 # MapOperator::process() - around line 567 # ExecReducer::reduce() - around line 243 In all the cases, a string representation of the row is constructed on the fly and is included in the error message. VectorMapOperator::process() - around line 973 raises the same exception but it's not exposing the row since the row contents are not included in the error message. While trying to reproduce above error, I also found that the arguments to a UDF get exposed in log messages from FunctionRegistry::invoke() around line 1114. This too can cause sensitive information to be leaked through error message. This way some sensitive information is leaked to a user through exception message. That information may not be available to the user otherwise. Hence it's a kind of security breach or violation of access control. The contents of the row or the arguments to a function may be useful for debugging and hence it's worth to add those to logs. Hence proposal here to log a separate message with log level DEBUG or INFO containing the string representation of the row. Users can configure their logging so that DEBUG/INFO messages do not go to the client but at the same time are available in the hive server logs for debugging. The actual exception message will not contain any sensitive data like row data or argument data. -- This message was sent by Atlassian JIRA (v7.6.3#76005)