[jira] [Created] (HIVE-22663) Quote all table and column names or do not quote any

2019-12-18 Thread Ashutosh Bapat (Jira)
Ashutosh Bapat created HIVE-22663:
-

 Summary: Quote all table and column names or do not quote any
 Key: HIVE-22663
 URL: https://issues.apache.org/jira/browse/HIVE-22663
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, Standalone Metastore
Affects Versions: 4.0.0
Reporter: Ashutosh Bapat


The change in HIVE-22546 is causing following stack trace when I run Hive with 
PostgreSQL as backend db for the metastore.

0: jdbc:hive2://localhost:1> create database dumpdb with 
('repl.source.for'='1,2,3');0: jdbc:hive2://localhost:1> create database 
dumpdb with ('repl.source.for'='1,2,3');Error: Error while compiling statement: 
FAILED: ParseException line 1:28 missing KW_DBPROPERTIES at '(' near '' 
(state=42000,code=4)0: jdbc:hive2://localhost:1> create database dumpdb 
with dbproperties ('repl.source.for'='1,2,3');ERROR : FAILED: Hive Internal 
Error: org.apache.hadoop.hive.ql.lockmgr.LockException(Error communicating with 
the metastore)org.apache.hadoop.hive.ql.lockmgr.LockException: Error 
communicating with the metastore at 
org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.commitTxn(DbTxnManager.java:541) 
at 
org.apache.hadoop.hive.ql.Driver.releaseLocksAndCommitOrRollback(Driver.java:687)
 at 
org.apache.hadoop.hive.ql.Driver.releaseLocksAndCommitOrRollback(Driver.java:653)
 at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:969)

... stack trace clipped

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
at java.lang.Thread.run(Thread.java:748)Caused by: MetaException(message:Unable 
to update transaction database org.postgresql.util.PSQLException: ERROR: 
relation "materialization_rebuild_locks" does not exist  Position: 13 at 
org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2440)
 at 
org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2183)
 at 
org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:308) at 
org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:441) at 
org.postgresql.jdbc.PgStatement.execute(PgStatement.java:365) at 

This happens because the table names in all the queries in TxnHandler.java 
(including the one at 1312, which causes this stack trace) are not quoting the 
table names. All the tablenames and column names should be quoted there. Just 
the change in HIVE-22546 won't suffice.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22559) Maintain ownership of parent directories of an external table directory after replication

2019-11-28 Thread Ashutosh Bapat (Jira)
Ashutosh Bapat created HIVE-22559:
-

 Summary: Maintain ownership of parent directories of an external 
table directory after replication
 Key: HIVE-22559
 URL: https://issues.apache.org/jira/browse/HIVE-22559
 Project: Hive
  Issue Type: Improvement
Reporter: Ashutosh Bapat
Assignee: Anishek Agarwal


For replicating an external table we specify a base directory on the target 
(say /base_ext for example). The path of an external table directory on the 
source (say /xyz/abc/ext_t1) is prefixed with the base directory on the target 
(/base_ext in our example) when replicating the external table data. Thus the 
path of the external table on the target becomes /base_ext/xyz/abc/ext_t1. In 
this path only the ownership permissions of ext_t1 directory is preserved but 
the owenship of xyz and abc directories is set to the user executing REPL LOAD. 
Instead we should preserve the ownership of xyz and abc as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22512) Use direct SQL to fetch column privileges in refreshPrivileges

2019-11-19 Thread Ashutosh Bapat (Jira)
Ashutosh Bapat created HIVE-22512:
-

 Summary: Use direct SQL to fetch column privileges in 
refreshPrivileges
 Key: HIVE-22512
 URL: https://issues.apache.org/jira/browse/HIVE-22512
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Affects Versions: 4.0.0
Reporter: Ashutosh Bapat
Assignee: Ashutosh Bapat


refreshPrivileges() calls listTableAllColumnGrants() to fetch the column level 
privileges. The later function retrieves the individual column objects by 
firing one query per column privilege object, thus causing the backend db to be 
swamped by these queries when PrivilegeSynchronizer is run. 
PrivilegeSynchronizer synchronizes privileges of all the databases, tables and 
columns and thus the backend db can get swamped really bad when there are 
thousands of tables with hundreds of columns.

The output of listTableAllColumnGrants() is not used completely so all the 
columns the PM has tried to retrieves anyway goes waste.

Fix this by using direct SQL to fetch column privileges.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22313) Some of the HMS auth LDAP hive config names do not start with "hive."

2019-10-09 Thread Ashutosh Bapat (Jira)
Ashutosh Bapat created HIVE-22313:
-

 Summary: Some of the HMS auth LDAP hive config names do not start 
with "hive."
 Key: HIVE-22313
 URL: https://issues.apache.org/jira/browse/HIVE-22313
 Project: Hive
  Issue Type: Bug
Affects Versions: 4.0.0
Reporter: Ashutosh Bapat
Assignee: Ashutosh Bapat






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22300) Deduplicate the authentication and LDAP code in HMS and HS2

2019-10-07 Thread Ashutosh Bapat (Jira)
Ashutosh Bapat created HIVE-22300:
-

 Summary: Deduplicate the authentication and LDAP code in HMS and 
HS2
 Key: HIVE-22300
 URL: https://issues.apache.org/jira/browse/HIVE-22300
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2, Standalone Metastore
Reporter: Ashutosh Bapat
Assignee: Ashutosh Bapat


HIVE-22267 has duplicated code from hive-service/auth directory under 
standalone-metastore directory. Deduplicate this code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22267) Support password based authentication in HMS

2019-09-28 Thread Ashutosh Bapat (Jira)
Ashutosh Bapat created HIVE-22267:
-

 Summary: Support password based authentication in HMS
 Key: HIVE-22267
 URL: https://issues.apache.org/jira/browse/HIVE-22267
 Project: Hive
  Issue Type: New Feature
Reporter: Ashutosh Bapat
Assignee: Ashutosh Bapat


Similar to HS2, support password based authentication in HMS.

Right now we provide LDAP and CONFIG based options. The later allows to set 
user and password in config and is used only for testing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22110) Initialize ReplChangeManager before starting actual dump

2019-08-14 Thread Ashutosh Bapat (JIRA)
Ashutosh Bapat created HIVE-22110:
-

 Summary: Initialize ReplChangeManager before starting actual dump
 Key: HIVE-22110
 URL: https://issues.apache.org/jira/browse/HIVE-22110
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, repl
Affects Versions: 4.0.0
Reporter: Ashutosh Bapat
Assignee: Ashutosh Bapat
 Fix For: 4.0.0


REPL DUMP calls ReplChageManager.encodeFileUri() to add cmroot and checksum to 
the url. This requires ReplChangeManager to be initialized. So, initialize Repl 
change manager when taking a dump.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (HIVE-22068) Add more logging to notification cleaner and replication to track events

2019-08-01 Thread Ashutosh Bapat (JIRA)
Ashutosh Bapat created HIVE-22068:
-

 Summary: Add more logging to notification cleaner and replication 
to track events
 Key: HIVE-22068
 URL: https://issues.apache.org/jira/browse/HIVE-22068
 Project: Hive
  Issue Type: Improvement
Reporter: Ashutosh Bapat
Assignee: Ashutosh Bapat


* Add more logging to DB notification listener cleaner thread
 ** The time when it considered cleaning, the interval and time before which 
events were cleared, the min and max id at that time
 ** how many events were cleared
 ** min and max id after the cleaning.
 * In REPL::START document the starting event, end event if specified and the 
maximum number of events, if specified.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (HIVE-22036) HMS should identify events corresponding to replicated database for Atlas HMS hook

2019-07-24 Thread Ashutosh Bapat (JIRA)
Ashutosh Bapat created HIVE-22036:
-

 Summary: HMS should identify events corresponding to replicated 
database for Atlas HMS hook
 Key: HIVE-22036
 URL: https://issues.apache.org/jira/browse/HIVE-22036
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: Ashutosh Bapat
Assignee: Ashutosh Bapat


An HMS Atlas hook allows an Atlas to create/update/delete its metadata based on 
the corresponding events in HMS. But Atlas replication happens out-side and 
before the Hive replication. Thus any events generated during replication may 
change the Atlas data already replicated, thus interfering with Atlas 
replication. Hence, provide an HMS interface which the hook can use to identify 
the events caused by Hive replication flow.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (HIVE-21960) HMS tasks on replica

2019-07-05 Thread Ashutosh Bapat (JIRA)
Ashutosh Bapat created HIVE-21960:
-

 Summary: HMS tasks on replica
 Key: HIVE-21960
 URL: https://issues.apache.org/jira/browse/HIVE-21960
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2, repl
Affects Versions: 4.0.0
Reporter: Ashutosh Bapat
Assignee: Ashutosh Bapat


An HMS performs a number of housekeeping tasks. Assess whether
 # They are required to be performed in the replicated data
 # Performing those on replicated data causes any issues and how to fix those.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21841) Leader election in HMS to run housekeeping tasks.

2019-06-06 Thread Ashutosh Bapat (JIRA)
Ashutosh Bapat created HIVE-21841:
-

 Summary: Leader election in HMS to run housekeeping tasks.
 Key: HIVE-21841
 URL: https://issues.apache.org/jira/browse/HIVE-21841
 Project: Hive
  Issue Type: New Feature
Reporter: Ashutosh Bapat
Assignee: Ashutosh Bapat


HMS performs housekeeping tasks. When there are multiple HMSes we need to have 
a leader HMS elected which will carry out those housekeeping tasks. 
These tasks include execution of compaction tasks, auto-discovering partitions 
for external tables, generation of compaction tasks, repl thread etc.

Note that, though the code for compaction tasks, auto-discovery of partitions 
etc. is in Hive, the actual tasks are initiated by an HMS configured to do so. 
So, leader election is required only for HMS and not for HS2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21801) Tests using miniHS2 with HTTP as transport are creating miniHS2 with binary transport

2019-05-29 Thread Ashutosh Bapat (JIRA)
Ashutosh Bapat created HIVE-21801:
-

 Summary: Tests using miniHS2 with HTTP as transport are creating 
miniHS2 with binary transport
 Key: HIVE-21801
 URL: https://issues.apache.org/jira/browse/HIVE-21801
 Project: Hive
  Issue Type: Bug
Reporter: Ashutosh Bapat
Assignee: Ashutosh Bapat


Even though tests using miniHS2 set the config hive.server2.transport.mode is 
set to http, miniHS2 is created with binary transport.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21783) Avoid authentication for connection from the same domain

2019-05-23 Thread Ashutosh Bapat (JIRA)
Ashutosh Bapat created HIVE-21783:
-

 Summary: Avoid authentication for connection from the same domain
 Key: HIVE-21783
 URL: https://issues.apache.org/jira/browse/HIVE-21783
 Project: Hive
  Issue Type: New Feature
  Components: HiveServer2
Reporter: Ashutosh Bapat
Assignee: Ashutosh Bapat


When a connection comes from the same domain do not authenticate the user. This 
is similar to NONE authentication but only for the connection from the same 
domain.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21776) Add test for incremental replication of a UDF with jar on HDFS

2019-05-22 Thread Ashutosh Bapat (JIRA)
Ashutosh Bapat created HIVE-21776:
-

 Summary: Add test for incremental replication of a UDF with jar on 
HDFS
 Key: HIVE-21776
 URL: https://issues.apache.org/jira/browse/HIVE-21776
 Project: Hive
  Issue Type: Test
Affects Versions: 4.0.0
Reporter: Ashutosh Bapat
Assignee: Ashutosh Bapat
 Fix For: 4.0.0


TestReplicationScenariosAcrossInstances has test to test bootstrap of a UDF 
with jar on HDFS but no test for incremental. Add the same.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21679) Replicating a CTAS event creating an MM partitioned table fails

2019-05-02 Thread Ashutosh Bapat (JIRA)
Ashutosh Bapat created HIVE-21679:
-

 Summary: Replicating a CTAS event creating an MM partitioned table 
fails
 Key: HIVE-21679
 URL: https://issues.apache.org/jira/browse/HIVE-21679
 Project: Hive
  Issue Type: Sub-task
  Components: HiveServer2, repl
Affects Versions: 4.0.0
Reporter: Ashutosh Bapat


use dumpdb;
create table t1 (a int, b int);
insert into t1 values (1, 2), (3, 4);
create table t6_mm_part partitioned by (a) stored as orc tblproperties 
("transactional"="true", "transactional_properties"="insert_only") as select * 
from t1
create table t6_mm stored as orc tblproperties ("transactional"="true", 
"transactional_properties"="insert_only") as select * from t1;
repl dump dumpdb;
create table t6_mm_part_2 partitioned by (a) stored as orc tblproperties 
("transactional"="true", "transactional_properties"="insert_only") as select * 
from t1;
create table t6_mm_2 partitioned by (a) stored as orc tblproperties 
("transactional"="true", "transactional_properties"="insert_only") as select * 
from t1;
repl dump dumpdb from 
repl load loaddb from '/tmp/dump/next';
ERROR : failed replication
org.apache.hadoop.hive.ql.parse.SemanticException: Invalid table name 
loaddb.dumpdb.t6_mm_part_2
 at 
org.apache.hadoop.hive.ql.exec.Utilities.getDbTableName(Utilities.java:2253) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
 at 
org.apache.hadoop.hive.ql.exec.Utilities.getDbTableName(Utilities.java:2239) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
 at 
org.apache.hadoop.hive.ql.plan.AlterTableDesc.setOldName(AlterTableDesc.java:419)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
 at 
org.apache.hadoop.hive.ql.exec.repl.incremental.IncrementalLoadTasksBuilder.tableUpdateReplStateTask(IncrementalLoadTasksBuilder.java:286)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
 at 
org.apache.hadoop.hive.ql.exec.repl.incremental.IncrementalLoadTasksBuilder.addUpdateReplStateTasks(IncrementalLoadTasksBuilder.java:371)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
 at 
org.apache.hadoop.hive.ql.exec.repl.incremental.IncrementalLoadTasksBuilder.analyzeEventLoad(IncrementalLoadTasksBuilder.java:244)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
 at 
org.apache.hadoop.hive.ql.exec.repl.incremental.IncrementalLoadTasksBuilder.build(IncrementalLoadTasksBuilder.java:139)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
 at 
org.apache.hadoop.hive.ql.exec.repl.ReplLoadTask.executeIncrementalLoad(ReplLoadTask.java:488)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
 at 
org.apache.hadoop.hive.ql.exec.repl.ReplLoadTask.execute(ReplLoadTask.java:102) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
 at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:212) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
 at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
 at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2709) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
 at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:2361) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
 at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:2028) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1788) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1782) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
 at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:162) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
 at 
org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:233)
 ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
 at 
org.apache.hive.service.cli.operation.SQLOperation.access$600(SQLOperation.java:88)
 ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
 at 
org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:332)
 ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
 at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_191]
 at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_191]
 at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1688)
 ~[hadoop-common-3.1.0.3.0.0.0-1634.jar:?]
 at 
org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:350)
 ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
~[?:1.8.0_191]
 at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_191]
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
~[?:1.8.0_191]
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
~[?:1.8.0_191]
 at java.lang.Thread.run(Thread.java:748) [?:1.8.0_191]
ERROR : FAILED: Execution Error, return 

[jira] [Created] (HIVE-21678) CTAS creating a partitioned table fails because of no writeId

2019-05-02 Thread Ashutosh Bapat (JIRA)
Ashutosh Bapat created HIVE-21678:
-

 Summary: CTAS creating a partitioned table fails because of no 
writeId
 Key: HIVE-21678
 URL: https://issues.apache.org/jira/browse/HIVE-21678
 Project: Hive
  Issue Type: Sub-task
  Components: HiveServer2, repl
Affects Versions: 4.0.0
Reporter: Ashutosh Bapat


create table t1(a int, b int);
insert into t1 values (1, 2), (3, 4);
create table t6_part partitioned by (a) stored as orc tblproperties 
("transactional"="true") as select * from t1;
ERROR : FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.MoveTask. MoveTask : Write id is not set in the 
config by open txn task for migration
Error: Error while processing statement: FAILED: Execution Error, return code 1 
from org.apache.hadoop.hive.ql.exec.MoveTask. MoveTask : Write id is not set in 
the config by open txn task for migration (state=08S01,code=1)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21677) Using strict managed tables for ACID table testing (Replication tests)

2019-05-02 Thread Ashutosh Bapat (JIRA)
Ashutosh Bapat created HIVE-21677:
-

 Summary: Using strict managed tables for ACID table testing 
(Replication tests)
 Key: HIVE-21677
 URL: https://issues.apache.org/jira/browse/HIVE-21677
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, repl
Affects Versions: 4.0.0
Reporter: Ashutosh Bapat


The replication tests which exclusively test ACID table replication are adding 
transactional properties to the create table/alter table statements when 
creating the table. Instead they should use hive.strict.managed.tables = true 
in those tests. Tests derived from BaseReplicationScenariosAcidTables, and 
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosIncrementalLoadAcidTables
 are examples of those. Change all such tests use hive.strict.managed.tables = 
true. Some of these tests create non-acid tables for testing, which will then 
require explicit 'transactional'=false set when creating the tables.

With this change we might see some test failures (See subtasks). Please create 
subtasks for those so that it can be tracked within this JIRA.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21598) CTAS on ACID table during incremental does not replicate data

2019-04-10 Thread Ashutosh Bapat (JIRA)
Ashutosh Bapat created HIVE-21598:
-

 Summary: CTAS on ACID table during incremental does not replicate 
data
 Key: HIVE-21598
 URL: https://issues.apache.org/jira/browse/HIVE-21598
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, repl
Reporter: Ashutosh Bapat


Scenario

create database dumpdb with dbproperties('repl.source.for'='1,2,3');

use dumpdb;

create table t1 (id int) clustered by(id) into 3 buckets stored as orc 
tblproperties ("transactional"="true");

insert into t1 values(1);

insert into t1 values(2);

repl dump dumpdb;

repl load loaddb from ;

use loaddb;

select * from t1;

++
| t6.id |
++
| 1 |
| 2 |
+

use dumpdb;

create table t6 stored as orc tblproperties ("transactional"="true") as select 
* from t1;

select * from t6;

++
| t6.id |
++
| 1 |
| 2 |
++

repl dump dumpdb from 

repl load loaddb from ;

use loaddb;

select * from t6;

++
| t6.id |
++
++

t6 gets created but there's no data.

 

On further investigation, I see that the CommitTxnEvent's dump directory has 
_files but it is empty. Looks like we do not log names of the files created as 
part of CTAS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21476) Wrap metastore backing db upgrade scripts into transaction

2019-03-19 Thread Ashutosh Bapat (JIRA)
Ashutosh Bapat created HIVE-21476:
-

 Summary: Wrap metastore backing db upgrade scripts into transaction
 Key: HIVE-21476
 URL: https://issues.apache.org/jira/browse/HIVE-21476
 Project: Hive
  Issue Type: Improvement
Reporter: Ashutosh Bapat


The metastore backing db upgrade scripts like upgrade* scripts in 
standalone-metastore/metastore-server/src/main/sql/* directories do not use 
transactions. So if a command fails in those scripts metastore db is left in an 
inconsistent state. Instead we should wrap each of those scripts in a 
transaction so that all or none of the commands take effect. Some RDBMSes, I 
think derby, do not support DDL in transaction. So we should do this change 
only for the databases which support DDL in transaction.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21462) Upgrading SQL server backed metastore when changing data type of a column with constraints

2019-03-17 Thread Ashutosh Bapat (JIRA)
Ashutosh Bapat created HIVE-21462:
-

 Summary: Upgrading SQL server backed metastore when changing data 
type of a column with constraints
 Key: HIVE-21462
 URL: https://issues.apache.org/jira/browse/HIVE-21462
 Project: Hive
  Issue Type: Bug
  Components: Standalone Metastore
Reporter: Ashutosh Bapat
Assignee: Ashutosh Bapat
 Fix For: 4.0.0


SQL server does not allow changing data type of a column which has a constraint 
or an index on it. The constraint or the index needs to be dropped before 
changing the data type and needs to be recreated after that. Metastore upgrade 
scripts aren't doing this and thus upgrade fails.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21430) INSERT into a dynamically partitioned table with hive.stats.autogather = false throws a MetaException

2019-03-12 Thread Ashutosh Bapat (JIRA)
Ashutosh Bapat created HIVE-21430:
-

 Summary: INSERT into a dynamically partitioned table with 
hive.stats.autogather = false throws a MetaException
 Key: HIVE-21430
 URL: https://issues.apache.org/jira/browse/HIVE-21430
 Project: Hive
  Issue Type: Bug
Reporter: Ashutosh Bapat
Assignee: Ashutosh Bapat
 Attachments: metaexception_repro.patch, 
org.apache.hadoop.hive.ql.stats.TestStatsUpdaterThread-output.txt

When the test TestStatsUpdaterThread#testTxnDynamicPartitions added in the 
attached patch is run it throws exception (full logs attached.)

org.apache.hadoop.hive.metastore.api.MetaException: Cannot change stats state 
for a transactional table default.simple_stats without providing the 
transactional write state for verification (new write ID 5, valid write IDs 
null; current state \{"BASIC_STATS":"true","COLUMN_STATS":{"s":"true"}}; new 
state null
 at 
org.apache.hadoop.hive.metastore.ObjectStore.alterPartitionNoTxn(ObjectStore.java:4328)
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21306) Upgrade HttpComponents to the latest versions similar to what Hadoop has done.

2019-02-21 Thread Ashutosh Bapat (JIRA)
Ashutosh Bapat created HIVE-21306:
-

 Summary: Upgrade HttpComponents to the latest versions similar to 
what Hadoop has done.
 Key: HIVE-21306
 URL: https://issues.apache.org/jira/browse/HIVE-21306
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 4.0.0
Reporter: Ashutosh Bapat
Assignee: Ashutosh Bapat
 Fix For: 4.0.0


The use of HTTPClient 4.5.2 breaks the use of SPNEGO over TLS.
It mistakenly added HTTPS instead of HTTP to the principal when over SSL and 
thus breaks the authentication.

This was upgraded recently in Hadoop and needs to be done for Hive as well.

See: HADOOP-16076

Where we upgraded from 4.5.2 and 4.4.4 to 4.5.6 and 4.4.10.



4.5.2
4.4.4
+ 4.5.6
+ 4.4.10



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21110) Stats replication for materialized views

2019-01-08 Thread Ashutosh Bapat (JIRA)
Ashutosh Bapat created HIVE-21110:
-

 Summary: Stats replication for materialized views
 Key: HIVE-21110
 URL: https://issues.apache.org/jira/browse/HIVE-21110
 Project: Hive
  Issue Type: Sub-task
Reporter: Ashutosh Bapat
Assignee: Ashutosh Bapat


Check if materialized views have stats associated with them. If so, support 
replicating those statistics. Most of this should be testing whether the code 
for table level stats replication is working for materialized views as well. 
But since materialized views are handled as views, they have slightly different 
code path than normal tables e.g. creating a materialized view. Those paths 
will need fixes along the lines of normal table.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21108) Assign writeId for stats update for a converted transactional table

2019-01-08 Thread Ashutosh Bapat (JIRA)
Ashutosh Bapat created HIVE-21108:
-

 Summary: Assign writeId for stats update for a converted 
transactional table
 Key: HIVE-21108
 URL: https://issues.apache.org/jira/browse/HIVE-21108
 Project: Hive
  Issue Type: Sub-task
Reporter: Ashutosh Bapat
Assignee: Ashutosh Bapat


When a non-ACID table on the source is converted to an ACID table on the 
target, the subsequent statistics update (column as well as table level) dumped 
on the source won't have writeId and snapshot associated with those. When 
loading those updates on the target we need to associate an appropriate writeId 
with them. This applies to both a bootstrap and an incremental dump and load.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21079) Replicate column statistics for partitions of partitioned Hive table.

2019-01-02 Thread Ashutosh Bapat (JIRA)
Ashutosh Bapat created HIVE-21079:
-

 Summary: Replicate column statistics for partitions of partitioned 
Hive table.
 Key: HIVE-21079
 URL: https://issues.apache.org/jira/browse/HIVE-21079
 Project: Hive
  Issue Type: Sub-task
Reporter: Ashutosh Bapat
Assignee: Ashutosh Bapat


This task is for replicating statistics for partitions of a partitioned Hive 
table.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21078) Replicate table level column statistics for Hive tables

2019-01-02 Thread Ashutosh Bapat (JIRA)
Ashutosh Bapat created HIVE-21078:
-

 Summary: Replicate table level column statistics for Hive tables
 Key: HIVE-21078
 URL: https://issues.apache.org/jira/browse/HIVE-21078
 Project: Hive
  Issue Type: Sub-task
Reporter: Ashutosh Bapat
Assignee: Ashutosh Bapat


This task is for replicating table level statistics. Partition level statistics 
will be worked upon in a separate sub-task.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21037) Replicate column statistics for Hive tables

2018-12-13 Thread Ashutosh Bapat (JIRA)
Ashutosh Bapat created HIVE-21037:
-

 Summary: Replicate column statistics for Hive tables
 Key: HIVE-21037
 URL: https://issues.apache.org/jira/browse/HIVE-21037
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Reporter: Ashutosh Bapat
Assignee: Ashutosh Bapat


Statistics is important for query optimizations and thus keeping those 
up-to-date on replica is important from query performance perspective. The 
statistics are collected by scanning a table entirely. Thus when the data is 
replicated a. we could update the statistics by scanning it on replica or b. we 
could just replicate the statistics also. For following reasons we desire to go 
by the second approach instead of the first.
 # Scanning the data on replica isn’t a good option since it wastes CPU cycles 
and puts load during replication, which can be significant.
 # Storages like S3 may not have compute capabilities and thus when we are 
replicating from on-prem to cloud, we can not rely on the target to gather 
statistics.
 # For ACID tables, the statistics should be associated with the snapshot. This 
means the statistics collection on target should sync with the write-id on the 
source since target doesn't generate target ids of its own.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21022) Fix remote metastore tests which use ZooKeeper

2018-12-09 Thread Ashutosh Bapat (JIRA)
Ashutosh Bapat created HIVE-21022:
-

 Summary: Fix remote metastore tests which use ZooKeeper
 Key: HIVE-21022
 URL: https://issues.apache.org/jira/browse/HIVE-21022
 Project: Hive
  Issue Type: Bug
  Components: Tests
Affects Versions: 4.0.0
Reporter: Ashutosh Bapat
Assignee: Ashutosh Bapat
 Fix For: 4.0.0


Per [~vgarg]'s comment on HIVE-20794 at 
https://issues.apache.org/jira/browse/HIVE-20794?focusedCommentId=16714093=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16714093,
 the remote metatstore tests using ZooKeeper are flaky. They are failing with 
error "Got exception: org.apache.zookeeper.KeeperException$NoNodeException 
KeeperErrorCode = NoNode for /hs2mszktest".

Both of these tests are using the same root namespace and hence the reason for 
this failure could be that the root namespace becomes unavailable to one test 
when the other drops it. The drop seems to be happening automatically through 
TestingServer code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20953) Fix testcase TestReplicationScenariosAcrossInstances#testBootstrapReplLoadRetryAfterFailureForPartitions to not depend upon the order in which objects get loaded

2018-11-20 Thread Ashutosh Bapat (JIRA)
Ashutosh Bapat created HIVE-20953:
-

 Summary: Fix testcase 
TestReplicationScenariosAcrossInstances#testBootstrapReplLoadRetryAfterFailureForPartitions
 to not depend upon the order in which objects get loaded
 Key: HIVE-20953
 URL: https://issues.apache.org/jira/browse/HIVE-20953
 Project: Hive
  Issue Type: Bug
  Components: Tests
Affects Versions: 4.0.0
Reporter: Ashutosh Bapat
Assignee: Ashutosh Bapat
 Fix For: 4.0.0


The testcase is intended to test REPL LOAD with retry. The test creates a 
partitioned table and a function in the source database and loads those to the 
replica. The first attempt to load a dump is intended to fail while loading one 
of the partitions. Based on the order in which the objects get loaded, if the 
function is queued after the table, it will not be available in replica after 
the load failure. But if it's queued before the table, it will be available in 
replica even after the load failure. The test assumes the later case, which may 
not be true always.
 
 Hence fix the testcase to order the objects by a fixed ordering. By setting 
hive.in.repl.test.files.sorted to true, the objects are ordered by the 
directory names. This
 ordering is available with minimal changes for testing, hence we use it. With 
this ordering a
 function gets loaded before a table. So changed the test to not expect the 
function to be available after the failed load, but be available after the retry



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20794) Use Zookeeper for metastore service discovery

2018-10-24 Thread Ashutosh Bapat (JIRA)
Ashutosh Bapat created HIVE-20794:
-

 Summary: Use Zookeeper for metastore service discovery
 Key: HIVE-20794
 URL: https://issues.apache.org/jira/browse/HIVE-20794
 Project: Hive
  Issue Type: Improvement
Reporter: Ashutosh Bapat
Assignee: Ashutosh Bapat


Right now, multiple metastore services can be specified in hive.metastore.uris 
configuration, but that list is static and can not be modified dynamically. Use 
Zookeeper for dynamic service discovery of metastore.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20708) Load (dumped) an external table as an external table on target with the same location as on the source

2018-10-08 Thread Ashutosh Bapat (JIRA)
Ashutosh Bapat created HIVE-20708:
-

 Summary: Load (dumped) an external table as an external table on 
target with the same location as on the source
 Key: HIVE-20708
 URL: https://issues.apache.org/jira/browse/HIVE-20708
 Project: Hive
  Issue Type: Improvement
  Components: repl
Reporter: Ashutosh Bapat
Assignee: Ashutosh Bapat


External tables are currently mapped to managed tables on target. A lot of jobs 
in user environment are dependent upon locations specified in external table 
definitions to run, hence, the path for external tables on the target and on 
the source are expected to be the same. An external table being loaded as a  
managed table makes it difficult for failover (Controlled Failover) / failback 
since there is no option of moving data from managed to external table. So the 
external table replicated to target cluster needs to be kept as external table 
with same location as on the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20644) Avoid exposing sensitive infomation through an error message

2018-09-26 Thread Ashutosh Bapat (JIRA)
Ashutosh Bapat created HIVE-20644:
-

 Summary: Avoid exposing sensitive infomation through an error 
message
 Key: HIVE-20644
 URL: https://issues.apache.org/jira/browse/HIVE-20644
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Reporter: Ashutosh Bapat
Assignee: Ashutosh Bapat


The HiveException raised from the following methods is exposing the datarow the 
caused the run time exception.
 # ReduceRecordSource::GroupIterator::next() - around line 372
 # MapOperator::process() - around line 567
 # ExecReducer::reduce() - around line 243

In all the cases, a string representation of the row is constructed on the fly 
and is included in
the error message.

VectorMapOperator::process() - around line 973 raises the same exception but 
it's not exposing the row since the row contents are not included in the error 
message.

While trying to reproduce above error, I also found that the arguments to a UDF 
get exposed in log messages from FunctionRegistry::invoke() around line 1114. 
This too can cause sensitive information to be leaked through error message.

This way some sensitive information is leaked to a user through exception 
message. That information may not be available to the user otherwise. Hence 
it's a kind of security breach or violation of access control.

The contents of the row or the arguments to a function may be useful for 
debugging and hence it's worth to add those to logs. Hence proposal here to log 
a separate message with log level DEBUG or INFO containing the string 
representation of the row. Users can configure their logging so that DEBUG/INFO 
messages do not go to the client but at the same time are available in the hive 
server logs for debugging. The actual exception message will not contain any 
sensitive data like row data or argument data.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)