[jira] [Created] (HIVE-22663) Quote all table and column names or do not quote any

2019-12-18 Thread Ashutosh Bapat (Jira)
Ashutosh Bapat created HIVE-22663:
-

 Summary: Quote all table and column names or do not quote any
 Key: HIVE-22663
 URL: https://issues.apache.org/jira/browse/HIVE-22663
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, Standalone Metastore
Affects Versions: 4.0.0
Reporter: Ashutosh Bapat


The change in HIVE-22546 is causing following stack trace when I run Hive with 
PostgreSQL as backend db for the metastore.

0: jdbc:hive2://localhost:1> create database dumpdb with 
('repl.source.for'='1,2,3');0: jdbc:hive2://localhost:1> create database 
dumpdb with ('repl.source.for'='1,2,3');Error: Error while compiling statement: 
FAILED: ParseException line 1:28 missing KW_DBPROPERTIES at '(' near '' 
(state=42000,code=4)0: jdbc:hive2://localhost:1> create database dumpdb 
with dbproperties ('repl.source.for'='1,2,3');ERROR : FAILED: Hive Internal 
Error: org.apache.hadoop.hive.ql.lockmgr.LockException(Error communicating with 
the metastore)org.apache.hadoop.hive.ql.lockmgr.LockException: Error 
communicating with the metastore at 
org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.commitTxn(DbTxnManager.java:541) 
at 
org.apache.hadoop.hive.ql.Driver.releaseLocksAndCommitOrRollback(Driver.java:687)
 at 
org.apache.hadoop.hive.ql.Driver.releaseLocksAndCommitOrRollback(Driver.java:653)
 at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:969)

... stack trace clipped

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
at java.lang.Thread.run(Thread.java:748)Caused by: MetaException(message:Unable 
to update transaction database org.postgresql.util.PSQLException: ERROR: 
relation "materialization_rebuild_locks" does not exist  Position: 13 at 
org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2440)
 at 
org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2183)
 at 
org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:308) at 
org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:441) at 
org.postgresql.jdbc.PgStatement.execute(PgStatement.java:365) at 

This happens because the table names in all the queries in TxnHandler.java 
(including the one at 1312, which causes this stack trace) are not quoting the 
table names. All the tablenames and column names should be quoted there. Just 
the change in HIVE-22546 won't suffice.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22559) Maintain ownership of parent directories of an external table directory after replication

2019-11-28 Thread Ashutosh Bapat (Jira)
Ashutosh Bapat created HIVE-22559:
-

 Summary: Maintain ownership of parent directories of an external 
table directory after replication
 Key: HIVE-22559
 URL: https://issues.apache.org/jira/browse/HIVE-22559
 Project: Hive
  Issue Type: Improvement
Reporter: Ashutosh Bapat
Assignee: Anishek Agarwal


For replicating an external table we specify a base directory on the target 
(say /base_ext for example). The path of an external table directory on the 
source (say /xyz/abc/ext_t1) is prefixed with the base directory on the target 
(/base_ext in our example) when replicating the external table data. Thus the 
path of the external table on the target becomes /base_ext/xyz/abc/ext_t1. In 
this path only the ownership permissions of ext_t1 directory is preserved but 
the owenship of xyz and abc directories is set to the user executing REPL LOAD. 
Instead we should preserve the ownership of xyz and abc as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22512) Use direct SQL to fetch column privileges in refreshPrivileges

2019-11-19 Thread Ashutosh Bapat (Jira)
Ashutosh Bapat created HIVE-22512:
-

 Summary: Use direct SQL to fetch column privileges in 
refreshPrivileges
 Key: HIVE-22512
 URL: https://issues.apache.org/jira/browse/HIVE-22512
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Affects Versions: 4.0.0
Reporter: Ashutosh Bapat
Assignee: Ashutosh Bapat


refreshPrivileges() calls listTableAllColumnGrants() to fetch the column level 
privileges. The later function retrieves the individual column objects by 
firing one query per column privilege object, thus causing the backend db to be 
swamped by these queries when PrivilegeSynchronizer is run. 
PrivilegeSynchronizer synchronizes privileges of all the databases, tables and 
columns and thus the backend db can get swamped really bad when there are 
thousands of tables with hundreds of columns.

The output of listTableAllColumnGrants() is not used completely so all the 
columns the PM has tried to retrieves anyway goes waste.

Fix this by using direct SQL to fetch column privileges.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22313) Some of the HMS auth LDAP hive config names do not start with "hive."

2019-10-09 Thread Ashutosh Bapat (Jira)
Ashutosh Bapat created HIVE-22313:
-

 Summary: Some of the HMS auth LDAP hive config names do not start 
with "hive."
 Key: HIVE-22313
 URL: https://issues.apache.org/jira/browse/HIVE-22313
 Project: Hive
  Issue Type: Bug
Affects Versions: 4.0.0
Reporter: Ashutosh Bapat
Assignee: Ashutosh Bapat






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22300) Deduplicate the authentication and LDAP code in HMS and HS2

2019-10-07 Thread Ashutosh Bapat (Jira)
Ashutosh Bapat created HIVE-22300:
-

 Summary: Deduplicate the authentication and LDAP code in HMS and 
HS2
 Key: HIVE-22300
 URL: https://issues.apache.org/jira/browse/HIVE-22300
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2, Standalone Metastore
Reporter: Ashutosh Bapat
Assignee: Ashutosh Bapat


HIVE-22267 has duplicated code from hive-service/auth directory under 
standalone-metastore directory. Deduplicate this code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22267) Support password based authentication in HMS

2019-09-28 Thread Ashutosh Bapat (Jira)
Ashutosh Bapat created HIVE-22267:
-

 Summary: Support password based authentication in HMS
 Key: HIVE-22267
 URL: https://issues.apache.org/jira/browse/HIVE-22267
 Project: Hive
  Issue Type: New Feature
Reporter: Ashutosh Bapat
Assignee: Ashutosh Bapat


Similar to HS2, support password based authentication in HMS.

Right now we provide LDAP and CONFIG based options. The later allows to set 
user and password in config and is used only for testing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22110) Initialize ReplChangeManager before starting actual dump

2019-08-14 Thread Ashutosh Bapat (JIRA)
Ashutosh Bapat created HIVE-22110:
-

 Summary: Initialize ReplChangeManager before starting actual dump
 Key: HIVE-22110
 URL: https://issues.apache.org/jira/browse/HIVE-22110
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, repl
Affects Versions: 4.0.0
Reporter: Ashutosh Bapat
Assignee: Ashutosh Bapat
 Fix For: 4.0.0


REPL DUMP calls ReplChageManager.encodeFileUri() to add cmroot and checksum to 
the url. This requires ReplChangeManager to be initialized. So, initialize Repl 
change manager when taking a dump.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (HIVE-22068) Add more logging to notification cleaner and replication to track events

2019-08-01 Thread Ashutosh Bapat (JIRA)
Ashutosh Bapat created HIVE-22068:
-

 Summary: Add more logging to notification cleaner and replication 
to track events
 Key: HIVE-22068
 URL: https://issues.apache.org/jira/browse/HIVE-22068
 Project: Hive
  Issue Type: Improvement
Reporter: Ashutosh Bapat
Assignee: Ashutosh Bapat


* Add more logging to DB notification listener cleaner thread
 ** The time when it considered cleaning, the interval and time before which 
events were cleared, the min and max id at that time
 ** how many events were cleared
 ** min and max id after the cleaning.
 * In REPL::START document the starting event, end event if specified and the 
maximum number of events, if specified.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (HIVE-22036) HMS should identify events corresponding to replicated database for Atlas HMS hook

2019-07-24 Thread Ashutosh Bapat (JIRA)
Ashutosh Bapat created HIVE-22036:
-

 Summary: HMS should identify events corresponding to replicated 
database for Atlas HMS hook
 Key: HIVE-22036
 URL: https://issues.apache.org/jira/browse/HIVE-22036
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: Ashutosh Bapat
Assignee: Ashutosh Bapat


An HMS Atlas hook allows an Atlas to create/update/delete its metadata based on 
the corresponding events in HMS. But Atlas replication happens out-side and 
before the Hive replication. Thus any events generated during replication may 
change the Atlas data already replicated, thus interfering with Atlas 
replication. Hence, provide an HMS interface which the hook can use to identify 
the events caused by Hive replication flow.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (HIVE-21960) HMS tasks on replica

2019-07-05 Thread Ashutosh Bapat (JIRA)
Ashutosh Bapat created HIVE-21960:
-

 Summary: HMS tasks on replica
 Key: HIVE-21960
 URL: https://issues.apache.org/jira/browse/HIVE-21960
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2, repl
Affects Versions: 4.0.0
Reporter: Ashutosh Bapat
Assignee: Ashutosh Bapat


An HMS performs a number of housekeeping tasks. Assess whether
 # They are required to be performed in the replicated data
 # Performing those on replicated data causes any issues and how to fix those.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21841) Leader election in HMS to run housekeeping tasks.

2019-06-06 Thread Ashutosh Bapat (JIRA)
Ashutosh Bapat created HIVE-21841:
-

 Summary: Leader election in HMS to run housekeeping tasks.
 Key: HIVE-21841
 URL: https://issues.apache.org/jira/browse/HIVE-21841
 Project: Hive
  Issue Type: New Feature
Reporter: Ashutosh Bapat
Assignee: Ashutosh Bapat


HMS performs housekeeping tasks. When there are multiple HMSes we need to have 
a leader HMS elected which will carry out those housekeeping tasks. 
These tasks include execution of compaction tasks, auto-discovering partitions 
for external tables, generation of compaction tasks, repl thread etc.

Note that, though the code for compaction tasks, auto-discovery of partitions 
etc. is in Hive, the actual tasks are initiated by an HMS configured to do so. 
So, leader election is required only for HMS and not for HS2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21801) Tests using miniHS2 with HTTP as transport are creating miniHS2 with binary transport

2019-05-29 Thread Ashutosh Bapat (JIRA)
Ashutosh Bapat created HIVE-21801:
-

 Summary: Tests using miniHS2 with HTTP as transport are creating 
miniHS2 with binary transport
 Key: HIVE-21801
 URL: https://issues.apache.org/jira/browse/HIVE-21801
 Project: Hive
  Issue Type: Bug
Reporter: Ashutosh Bapat
Assignee: Ashutosh Bapat


Even though tests using miniHS2 set the config hive.server2.transport.mode is 
set to http, miniHS2 is created with binary transport.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21783) Avoid authentication for connection from the same domain

2019-05-23 Thread Ashutosh Bapat (JIRA)
Ashutosh Bapat created HIVE-21783:
-

 Summary: Avoid authentication for connection from the same domain
 Key: HIVE-21783
 URL: https://issues.apache.org/jira/browse/HIVE-21783
 Project: Hive
  Issue Type: New Feature
  Components: HiveServer2
Reporter: Ashutosh Bapat
Assignee: Ashutosh Bapat


When a connection comes from the same domain do not authenticate the user. This 
is similar to NONE authentication but only for the connection from the same 
domain.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21776) Add test for incremental replication of a UDF with jar on HDFS

2019-05-22 Thread Ashutosh Bapat (JIRA)
Ashutosh Bapat created HIVE-21776:
-

 Summary: Add test for incremental replication of a UDF with jar on 
HDFS
 Key: HIVE-21776
 URL: https://issues.apache.org/jira/browse/HIVE-21776
 Project: Hive
  Issue Type: Test
Affects Versions: 4.0.0
Reporter: Ashutosh Bapat
Assignee: Ashutosh Bapat
 Fix For: 4.0.0


TestReplicationScenariosAcrossInstances has test to test bootstrap of a UDF 
with jar on HDFS but no test for incremental. Add the same.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21679) Replicating a CTAS event creating an MM partitioned table fails

2019-05-02 Thread Ashutosh Bapat (JIRA)
Ashutosh Bapat created HIVE-21679:
-

 Summary: Replicating a CTAS event creating an MM partitioned table 
fails
 Key: HIVE-21679
 URL: https://issues.apache.org/jira/browse/HIVE-21679
 Project: Hive
  Issue Type: Sub-task
  Components: HiveServer2, repl
Affects Versions: 4.0.0
Reporter: Ashutosh Bapat


use dumpdb;
create table t1 (a int, b int);
insert into t1 values (1, 2), (3, 4);
create table t6_mm_part partitioned by (a) stored as orc tblproperties 
("transactional"="true", "transactional_properties"="insert_only") as select * 
from t1
create table t6_mm stored as orc tblproperties ("transactional"="true", 
"transactional_properties"="insert_only") as select * from t1;
repl dump dumpdb;
create table t6_mm_part_2 partitioned by (a) stored as orc tblproperties 
("transactional"="true", "transactional_properties"="insert_only") as select * 
from t1;
create table t6_mm_2 partitioned by (a) stored as orc tblproperties 
("transactional"="true", "transactional_properties"="insert_only") as select * 
from t1;
repl dump dumpdb from 
repl load loaddb from '/tmp/dump/next';
ERROR : failed replication
org.apache.hadoop.hive.ql.parse.SemanticException: Invalid table name 
loaddb.dumpdb.t6_mm_part_2
 at 
org.apache.hadoop.hive.ql.exec.Utilities.getDbTableName(Utilities.java:2253) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
 at 
org.apache.hadoop.hive.ql.exec.Utilities.getDbTableName(Utilities.java:2239) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
 at 
org.apache.hadoop.hive.ql.plan.AlterTableDesc.setOldName(AlterTableDesc.java:419)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
 at 
org.apache.hadoop.hive.ql.exec.repl.incremental.IncrementalLoadTasksBuilder.tableUpdateReplStateTask(IncrementalLoadTasksBuilder.java:286)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
 at 
org.apache.hadoop.hive.ql.exec.repl.incremental.IncrementalLoadTasksBuilder.addUpdateReplStateTasks(IncrementalLoadTasksBuilder.java:371)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
 at 
org.apache.hadoop.hive.ql.exec.repl.incremental.IncrementalLoadTasksBuilder.analyzeEventLoad(IncrementalLoadTasksBuilder.java:244)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
 at 
org.apache.hadoop.hive.ql.exec.repl.incremental.IncrementalLoadTasksBuilder.build(IncrementalLoadTasksBuilder.java:139)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
 at 
org.apache.hadoop.hive.ql.exec.repl.ReplLoadTask.executeIncrementalLoad(ReplLoadTask.java:488)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
 at 
org.apache.hadoop.hive.ql.exec.repl.ReplLoadTask.execute(ReplLoadTask.java:102) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
 at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:212) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
 at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
 at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2709) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
 at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:2361) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
 at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:2028) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1788) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1782) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
 at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:162) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
 at 
org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:233)
 ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
 at 
org.apache.hive.service.cli.operation.SQLOperation.access$600(SQLOperation.java:88)
 ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
 at 
org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:332)
 ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
 at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_191]
 at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_191]
 at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1688)
 ~[hadoop-common-3.1.0.3.0.0.0-1634.jar:?]
 at 
org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:350)
 ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
~[?:1.8.0_191]
 at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_191]
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
~[?:1.8.0_191]
 at 
java.util.concurrent.ThreadPool

[jira] [Created] (HIVE-21678) CTAS creating a partitioned table fails because of no writeId

2019-05-02 Thread Ashutosh Bapat (JIRA)
Ashutosh Bapat created HIVE-21678:
-

 Summary: CTAS creating a partitioned table fails because of no 
writeId
 Key: HIVE-21678
 URL: https://issues.apache.org/jira/browse/HIVE-21678
 Project: Hive
  Issue Type: Sub-task
  Components: HiveServer2, repl
Affects Versions: 4.0.0
Reporter: Ashutosh Bapat


create table t1(a int, b int);
insert into t1 values (1, 2), (3, 4);
create table t6_part partitioned by (a) stored as orc tblproperties 
("transactional"="true") as select * from t1;
ERROR : FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.MoveTask. MoveTask : Write id is not set in the 
config by open txn task for migration
Error: Error while processing statement: FAILED: Execution Error, return code 1 
from org.apache.hadoop.hive.ql.exec.MoveTask. MoveTask : Write id is not set in 
the config by open txn task for migration (state=08S01,code=1)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21677) Using strict managed tables for ACID table testing (Replication tests)

2019-05-02 Thread Ashutosh Bapat (JIRA)
Ashutosh Bapat created HIVE-21677:
-

 Summary: Using strict managed tables for ACID table testing 
(Replication tests)
 Key: HIVE-21677
 URL: https://issues.apache.org/jira/browse/HIVE-21677
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, repl
Affects Versions: 4.0.0
Reporter: Ashutosh Bapat


The replication tests which exclusively test ACID table replication are adding 
transactional properties to the create table/alter table statements when 
creating the table. Instead they should use hive.strict.managed.tables = true 
in those tests. Tests derived from BaseReplicationScenariosAcidTables, and 
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosIncrementalLoadAcidTables
 are examples of those. Change all such tests use hive.strict.managed.tables = 
true. Some of these tests create non-acid tables for testing, which will then 
require explicit 'transactional'=false set when creating the tables.

With this change we might see some test failures (See subtasks). Please create 
subtasks for those so that it can be tracked within this JIRA.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [ANNOUNCE] New committer: Laszlo Bodor

2019-04-14 Thread Ashutosh Bapat
Congratulations Laszlo.

On Mon, Apr 15, 2019 at 9:08 AM Ashutosh Chauhan 
wrote:

>  Apache Hive's Project Management Committee (PMC) has invited Laszlo
> Bodor to become a committer, and we are pleased to announce that he has
> accepted.
>
> Laszlo welcome, thank you for your contributions, and we look forward your
> further interactions with the community!
>
> Ashutosh Chauhan (on behalf of the Apache Hive PMC)
>


-- 
--
Best Wishes,
Ashutosh Bapat


[jira] [Created] (HIVE-21598) CTAS on ACID table during incremental does not replicate data

2019-04-10 Thread Ashutosh Bapat (JIRA)
Ashutosh Bapat created HIVE-21598:
-

 Summary: CTAS on ACID table during incremental does not replicate 
data
 Key: HIVE-21598
 URL: https://issues.apache.org/jira/browse/HIVE-21598
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, repl
Reporter: Ashutosh Bapat


Scenario

create database dumpdb with dbproperties('repl.source.for'='1,2,3');

use dumpdb;

create table t1 (id int) clustered by(id) into 3 buckets stored as orc 
tblproperties ("transactional"="true");

insert into t1 values(1);

insert into t1 values(2);

repl dump dumpdb;

repl load loaddb from ;

use loaddb;

select * from t1;

++
| t6.id |
++
| 1 |
| 2 |
+

use dumpdb;

create table t6 stored as orc tblproperties ("transactional"="true") as select 
* from t1;

select * from t6;

++
| t6.id |
++
| 1 |
| 2 |
++

repl dump dumpdb from 

repl load loaddb from ;

use loaddb;

select * from t6;

++
| t6.id |
++
++

t6 gets created but there's no data.

 

On further investigation, I see that the CommitTxnEvent's dump directory has 
_files but it is empty. Looks like we do not log names of the files created as 
part of CTAS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21476) Wrap metastore backing db upgrade scripts into transaction

2019-03-19 Thread Ashutosh Bapat (JIRA)
Ashutosh Bapat created HIVE-21476:
-

 Summary: Wrap metastore backing db upgrade scripts into transaction
 Key: HIVE-21476
 URL: https://issues.apache.org/jira/browse/HIVE-21476
 Project: Hive
  Issue Type: Improvement
Reporter: Ashutosh Bapat


The metastore backing db upgrade scripts like upgrade* scripts in 
standalone-metastore/metastore-server/src/main/sql/* directories do not use 
transactions. So if a command fails in those scripts metastore db is left in an 
inconsistent state. Instead we should wrap each of those scripts in a 
transaction so that all or none of the commands take effect. Some RDBMSes, I 
think derby, do not support DDL in transaction. So we should do this change 
only for the databases which support DDL in transaction.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21462) Upgrading SQL server backed metastore when changing data type of a column with constraints

2019-03-17 Thread Ashutosh Bapat (JIRA)
Ashutosh Bapat created HIVE-21462:
-

 Summary: Upgrading SQL server backed metastore when changing data 
type of a column with constraints
 Key: HIVE-21462
 URL: https://issues.apache.org/jira/browse/HIVE-21462
 Project: Hive
  Issue Type: Bug
  Components: Standalone Metastore
Reporter: Ashutosh Bapat
Assignee: Ashutosh Bapat
 Fix For: 4.0.0


SQL server does not allow changing data type of a column which has a constraint 
or an index on it. The constraint or the index needs to be dropped before 
changing the data type and needs to be recreated after that. Metastore upgrade 
scripts aren't doing this and thus upgrade fails.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21430) INSERT into a dynamically partitioned table with hive.stats.autogather = false throws a MetaException

2019-03-12 Thread Ashutosh Bapat (JIRA)
Ashutosh Bapat created HIVE-21430:
-

 Summary: INSERT into a dynamically partitioned table with 
hive.stats.autogather = false throws a MetaException
 Key: HIVE-21430
 URL: https://issues.apache.org/jira/browse/HIVE-21430
 Project: Hive
  Issue Type: Bug
Reporter: Ashutosh Bapat
Assignee: Ashutosh Bapat
 Attachments: metaexception_repro.patch, 
org.apache.hadoop.hive.ql.stats.TestStatsUpdaterThread-output.txt

When the test TestStatsUpdaterThread#testTxnDynamicPartitions added in the 
attached patch is run it throws exception (full logs attached.)

org.apache.hadoop.hive.metastore.api.MetaException: Cannot change stats state 
for a transactional table default.simple_stats without providing the 
transactional write state for verification (new write ID 5, valid write IDs 
null; current state \{"BASIC_STATS":"true","COLUMN_STATS":{"s":"true"}}; new 
state null
 at 
org.apache.hadoop.hive.metastore.ObjectStore.alterPartitionNoTxn(ObjectStore.java:4328)
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21306) Upgrade HttpComponents to the latest versions similar to what Hadoop has done.

2019-02-21 Thread Ashutosh Bapat (JIRA)
Ashutosh Bapat created HIVE-21306:
-

 Summary: Upgrade HttpComponents to the latest versions similar to 
what Hadoop has done.
 Key: HIVE-21306
 URL: https://issues.apache.org/jira/browse/HIVE-21306
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 4.0.0
Reporter: Ashutosh Bapat
Assignee: Ashutosh Bapat
 Fix For: 4.0.0


The use of HTTPClient 4.5.2 breaks the use of SPNEGO over TLS.
It mistakenly added HTTPS instead of HTTP to the principal when over SSL and 
thus breaks the authentication.

This was upgraded recently in Hadoop and needs to be done for Hive as well.

See: HADOOP-16076

Where we upgraded from 4.5.2 and 4.4.4 to 4.5.6 and 4.4.10.



4.5.2
4.4.4
+ 4.5.6
+ 4.4.10



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] hive pull request #511: HIVE-21078: Replicate column and table level statist...

2019-01-24 Thread ashutosh-bapat
Github user ashutosh-bapat closed the pull request at:

https://github.com/apache/hive/pull/511


---


[GitHub] hive pull request #522: HIVE-21079: Stats replication for partitioned table

2019-01-24 Thread ashutosh-bapat
GitHub user ashutosh-bapat reopened a pull request:

https://github.com/apache/hive/pull/522

HIVE-21079: Stats replication for partitioned table

The first commit is for stats replication for partitioned table. The other 
two commits are fixing bugs in existing code, AFAIU.

@sankarh can you please review?

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ashutosh-bapat/hive hive21079

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/522.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #522


commit a8b729ab1f120cd50c0ad0e096bb0f724a178838
Author: Ashutosh Bapat 
Date:   2019-01-15T12:06:29Z

HIVE-21079: Replicate statistics for partitioned, non-transactional tables.

Ashutosh Bapat

commit c8aea7b85a06ab53873ce60eb08fcf0514806787
Author: Ashutosh Bapat 
Date:   2019-01-18T05:31:44Z

HIVE-21079: ALTER PARTITION events not applied during incremental 
replication

In AlterPartitionHandler, we set 
withinContext.replicationSpec.setIsMetadataOnly(true);
In ImportSemanticAnalyzer.createReplImportTasks(), per code around line 
1197, we do not add new
PartitionSpecs and corresponding tasks. This means that we never apply an 
ALTER_PARTITION event
during incremental load. That looks like a serious bug.

Either we should check PartitionDescs irrespective of 
replicationSpec.setIsMetadataOnly() OR we
shouldn’t set replicationSpec.setIsMetadataOnly() to true while dumping 
an ALTER_PARTITION event. We
set replicationSpec.setIsMetadataOnly(true) for ALTER TABLE events as well, 
so doing that for ALTER
PARTITION event looks fine.

Ashutosh Bapat.

commit 536492395cd5c280738c2ec1038c39036b477209
Author: Ashutosh Bapat 
Date:   2019-01-18T06:07:37Z

HIVE-21079: Do not dump partition related events during a metadata only 
dump.

During bootstrap metadata-only dump we do not dump partitions (See 
TableExport.getPartitions(). For
bootstrap dump we always pass TableSpec with TABLE_ONLY set.). So don't 
dump partition related
events for a metadata-only dump.

Ashutosh Bapat.




---


[GitHub] hive pull request #522: HIVE-21079: Stats replication for partitioned table

2019-01-24 Thread ashutosh-bapat
Github user ashutosh-bapat closed the pull request at:

https://github.com/apache/hive/pull/522


---


[GitHub] hive pull request #522: Hive21079: Stats replication for partitioned table

2019-01-23 Thread ashutosh-bapat
GitHub user ashutosh-bapat opened a pull request:

https://github.com/apache/hive/pull/522

Hive21079: Stats replication for partitioned table

The first commit is for stats replication for partitioned table. The other 
two commits are fixing bugs in existing code, AFAIU.

@sankarh can you please review?

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ashutosh-bapat/hive hive21079

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/522.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #522


commit a8b729ab1f120cd50c0ad0e096bb0f724a178838
Author: Ashutosh Bapat 
Date:   2019-01-15T12:06:29Z

HIVE-21079: Replicate statistics for partitioned, non-transactional tables.

Ashutosh Bapat

commit c8aea7b85a06ab53873ce60eb08fcf0514806787
Author: Ashutosh Bapat 
Date:   2019-01-18T05:31:44Z

HIVE-21079: ALTER PARTITION events not applied during incremental 
replication

In AlterPartitionHandler, we set 
withinContext.replicationSpec.setIsMetadataOnly(true);
In ImportSemanticAnalyzer.createReplImportTasks(), per code around line 
1197, we do not add new
PartitionSpecs and corresponding tasks. This means that we never apply an 
ALTER_PARTITION event
during incremental load. That looks like a serious bug.

Either we should check PartitionDescs irrespective of 
replicationSpec.setIsMetadataOnly() OR we
shouldn’t set replicationSpec.setIsMetadataOnly() to true while dumping 
an ALTER_PARTITION event. We
set replicationSpec.setIsMetadataOnly(true) for ALTER TABLE events as well, 
so doing that for ALTER
PARTITION event looks fine.

Ashutosh Bapat.

commit 536492395cd5c280738c2ec1038c39036b477209
Author: Ashutosh Bapat 
Date:   2019-01-18T06:07:37Z

HIVE-21079: Do not dump partition related events during a metadata only 
dump.

During bootstrap metadata-only dump we do not dump partitions (See 
TableExport.getPartitions(). For
bootstrap dump we always pass TableSpec with TABLE_ONLY set.). So don't 
dump partition related
events for a metadata-only dump.

Ashutosh Bapat.




---


getTable variants in InjectableBehaviourObjectStore

2019-01-10 Thread Ashutosh Bapat
Hi,
There are two getTable variants in InjectableBehaviourObjectStore() each
calling corresponding super.getTable() and passing the returned value
to getTableModifier.apply(). ObjectStore, which is super class here, itself
has the same getTable variants. A result is that when one variant of
getTable() which calls another in ObjectStore is called from
InjectableBehaviourObjectStore, it may result in a NULL pointer exception
if the injected apply is not careful about the input value.

E.g let's say the getTable variants are getTable(A) and getTable(B) in
ObjectStore.java.  InjectableBehaviourObjectStore also implements those two
variants calling corresponding variants of super class ObjectStore (using
super.getTable()). With the inheritance the call stack looks like
InjectableBehaviourObjectStore.getTable(A) calls ObjectStore.getTable(A)
calls InjectableBehaviourObjectStore.getTable(A, B) calls
Object.getTable(A, B). If getTable() variants in
InjectableBehaviourObjectStore return NULL, as most of them do, the apply()
method will end up with a NullPointerException if it's not careful about
its input. And not many implementation of apply are careful.

I think this should be fixed in InjectableBehaviourObjectStore(), by
avoiding to apply apply() method i.e injection twice by tracking whether
the apply() method has already been applied already.

Does that sound good?

--
Best Wishes,
Ashutosh Bapat


[jira] [Created] (HIVE-21110) Stats replication for materialized views

2019-01-08 Thread Ashutosh Bapat (JIRA)
Ashutosh Bapat created HIVE-21110:
-

 Summary: Stats replication for materialized views
 Key: HIVE-21110
 URL: https://issues.apache.org/jira/browse/HIVE-21110
 Project: Hive
  Issue Type: Sub-task
Reporter: Ashutosh Bapat
Assignee: Ashutosh Bapat


Check if materialized views have stats associated with them. If so, support 
replicating those statistics. Most of this should be testing whether the code 
for table level stats replication is working for materialized views as well. 
But since materialized views are handled as views, they have slightly different 
code path than normal tables e.g. creating a materialized view. Those paths 
will need fixes along the lines of normal table.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21108) Assign writeId for stats update for a converted transactional table

2019-01-08 Thread Ashutosh Bapat (JIRA)
Ashutosh Bapat created HIVE-21108:
-

 Summary: Assign writeId for stats update for a converted 
transactional table
 Key: HIVE-21108
 URL: https://issues.apache.org/jira/browse/HIVE-21108
 Project: Hive
  Issue Type: Sub-task
Reporter: Ashutosh Bapat
Assignee: Ashutosh Bapat


When a non-ACID table on the source is converted to an ACID table on the 
target, the subsequent statistics update (column as well as table level) dumped 
on the source won't have writeId and snapshot associated with those. When 
loading those updates on the target we need to associate an appropriate writeId 
with them. This applies to both a bootstrap and an incremental dump and load.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] hive pull request #511: HIVE-21078: Replicate column and table level statist...

2019-01-02 Thread ashutosh-bapat
GitHub user ashutosh-bapat opened a pull request:

https://github.com/apache/hive/pull/511

HIVE-21078: Replicate column and table level statistics for unpartitioned 
Hive tables

@maheshk114, @sankarh can you please review?



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ashutosh-bapat/hive hive21078

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/511.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #511


commit db98502a44f69f255924231b03e2145248c9be0f
Author: Ashutosh Bapat 
Date:   2018-12-19T04:49:29Z

HIVE-21078: Replicate column and table level statistics for unpartitioned 
Hive tables

The column statistics is included as part of the Table object during 
bootstrap dump and loaded when
corresponding table is created on replica.

During incremental dump and load, UpdateTableColStats event is used to 
replicate the statistics.

In both the cases, the statistics is replicated only when the data is 
replicated.

Ashutosh Bapat




---


[jira] [Created] (HIVE-21079) Replicate column statistics for partitions of partitioned Hive table.

2019-01-02 Thread Ashutosh Bapat (JIRA)
Ashutosh Bapat created HIVE-21079:
-

 Summary: Replicate column statistics for partitions of partitioned 
Hive table.
 Key: HIVE-21079
 URL: https://issues.apache.org/jira/browse/HIVE-21079
 Project: Hive
  Issue Type: Sub-task
Reporter: Ashutosh Bapat
Assignee: Ashutosh Bapat


This task is for replicating statistics for partitions of a partitioned Hive 
table.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21078) Replicate table level column statistics for Hive tables

2019-01-02 Thread Ashutosh Bapat (JIRA)
Ashutosh Bapat created HIVE-21078:
-

 Summary: Replicate table level column statistics for Hive tables
 Key: HIVE-21078
 URL: https://issues.apache.org/jira/browse/HIVE-21078
 Project: Hive
  Issue Type: Sub-task
Reporter: Ashutosh Bapat
Assignee: Ashutosh Bapat


This task is for replicating table level statistics. Partition level statistics 
will be worked upon in a separate sub-task.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21037) Replicate column statistics for Hive tables

2018-12-13 Thread Ashutosh Bapat (JIRA)
Ashutosh Bapat created HIVE-21037:
-

 Summary: Replicate column statistics for Hive tables
 Key: HIVE-21037
 URL: https://issues.apache.org/jira/browse/HIVE-21037
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Reporter: Ashutosh Bapat
Assignee: Ashutosh Bapat


Statistics is important for query optimizations and thus keeping those 
up-to-date on replica is important from query performance perspective. The 
statistics are collected by scanning a table entirely. Thus when the data is 
replicated a. we could update the statistics by scanning it on replica or b. we 
could just replicate the statistics also. For following reasons we desire to go 
by the second approach instead of the first.
 # Scanning the data on replica isn’t a good option since it wastes CPU cycles 
and puts load during replication, which can be significant.
 # Storages like S3 may not have compute capabilities and thus when we are 
replicating from on-prem to cloud, we can not rely on the target to gather 
statistics.
 # For ACID tables, the statistics should be associated with the snapshot. This 
means the statistics collection on target should sync with the write-id on the 
source since target doesn't generate target ids of its own.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21022) Fix remote metastore tests which use ZooKeeper

2018-12-09 Thread Ashutosh Bapat (JIRA)
Ashutosh Bapat created HIVE-21022:
-

 Summary: Fix remote metastore tests which use ZooKeeper
 Key: HIVE-21022
 URL: https://issues.apache.org/jira/browse/HIVE-21022
 Project: Hive
  Issue Type: Bug
  Components: Tests
Affects Versions: 4.0.0
Reporter: Ashutosh Bapat
Assignee: Ashutosh Bapat
 Fix For: 4.0.0


Per [~vgarg]'s comment on HIVE-20794 at 
https://issues.apache.org/jira/browse/HIVE-20794?focusedCommentId=16714093=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16714093,
 the remote metatstore tests using ZooKeeper are flaky. They are failing with 
error "Got exception: org.apache.zookeeper.KeeperException$NoNodeException 
KeeperErrorCode = NoNode for /hs2mszktest".

Both of these tests are using the same root namespace and hence the reason for 
this failure could be that the root namespace becomes unavailable to one test 
when the other drops it. The drop seems to be happening automatically through 
TestingServer code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] hive pull request #491: HIVE-20953: Fix testcase TestReplicationScenariosAcr...

2018-12-03 Thread ashutosh-bapat
Github user ashutosh-bapat closed the pull request at:

https://github.com/apache/hive/pull/491


---


[GitHub] hive pull request #487: Hive20794

2018-11-28 Thread ashutosh-bapat
Github user ashutosh-bapat closed the pull request at:

https://github.com/apache/hive/pull/487


---


[GitHub] hive pull request #445: HIVE-20542

2018-11-28 Thread ashutosh-bapat
Github user ashutosh-bapat closed the pull request at:

https://github.com/apache/hive/pull/445


---


[GitHub] hive pull request #439: HIVE-20644: Avoid exposing sensitive infomation thro...

2018-11-28 Thread ashutosh-bapat
Github user ashutosh-bapat closed the pull request at:

https://github.com/apache/hive/pull/439


---


[GitHub] hive pull request #491: HIVE-20953: Fix testcase TestReplicationScenariosAcr...

2018-11-21 Thread ashutosh-bapat
GitHub user ashutosh-bapat opened a pull request:

https://github.com/apache/hive/pull/491

HIVE-20953: Fix testcase TestReplicationScenariosAcrossInstances#test…

Fix testcase 
TestReplicationScenariosAcrossInstances#testBootstrapReplLoadRetryAfterFailureForPartitions
 to not depend upon the order in which objects get loaded.

@anishek or @maheshk114 can you please review the change?

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ashutosh-bapat/hive hive20953

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/491.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #491


commit 5c9a72bd5f772b7cefdc86c397a7771c2059043a
Author: Ashutosh Bapat 
Date:   2018-11-21T08:25:38Z

HIVE-20953: Fix testcase 
TestReplicationScenariosAcrossInstances#testBootstrapReplLoadRetryAfterFailureForPartitions
 to not depend upon the order in which objects get loaded

Ashutosh Bapat




---


[jira] [Created] (HIVE-20953) Fix testcase TestReplicationScenariosAcrossInstances#testBootstrapReplLoadRetryAfterFailureForPartitions to not depend upon the order in which objects get loaded

2018-11-20 Thread Ashutosh Bapat (JIRA)
Ashutosh Bapat created HIVE-20953:
-

 Summary: Fix testcase 
TestReplicationScenariosAcrossInstances#testBootstrapReplLoadRetryAfterFailureForPartitions
 to not depend upon the order in which objects get loaded
 Key: HIVE-20953
 URL: https://issues.apache.org/jira/browse/HIVE-20953
 Project: Hive
  Issue Type: Bug
  Components: Tests
Affects Versions: 4.0.0
Reporter: Ashutosh Bapat
Assignee: Ashutosh Bapat
 Fix For: 4.0.0


The testcase is intended to test REPL LOAD with retry. The test creates a 
partitioned table and a function in the source database and loads those to the 
replica. The first attempt to load a dump is intended to fail while loading one 
of the partitions. Based on the order in which the objects get loaded, if the 
function is queued after the table, it will not be available in replica after 
the load failure. But if it's queued before the table, it will be available in 
replica even after the load failure. The test assumes the later case, which may 
not be true always.
 
 Hence fix the testcase to order the objects by a fixed ordering. By setting 
hive.in.repl.test.files.sorted to true, the objects are ordered by the 
directory names. This
 ordering is available with minimal changes for testing, hence we use it. With 
this ordering a
 function gets loaded before a table. So changed the test to not expect the 
function to be available after the failed load, but be available after the retry



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [ANNOUNCE] New committer: Mahesh Behera

2018-11-18 Thread Ashutosh Bapat
Congratulations Mahesh.

On Sat, Nov 17, 2018 at 7:30 PM Peter Vary 
wrote:

> Congratulations Mahesh!
>
> > On Nov 17, 2018, at 04:36, Sankar Hariappan 
> wrote:
> >
> > Congrats Mahesh!
> >
> > Best regards
> > Sankar
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > On 17/11/18, 6:54 AM, "Ashutosh Chauhan"  wrote:
> >
> >> Apache Hive's Project Management Committee (PMC) has invited Mahesh
> >> Behera to become a committer, and we are pleased to announce that he has
> >> accepted.
> >> Mahesh, welcome, thank you for your contributions, and we look forward
> to
> >> your further interactions with the community!
> >>
> >> Thanks,
> >> Ashutosh Chauhan (on behalf of the Apache Hive PMC)
>
>

-- 
--
Best Wishes,
Ashutosh Bapat


[GitHub] hive pull request #487: Hive20794

2018-11-13 Thread ashutosh-bapat
GitHub user ashutosh-bapat opened a pull request:

https://github.com/apache/hive/pull/487

Hive20794

Find more details about the changes in HIVE-20794.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ashutosh-bapat/hive hive20794

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/487.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #487


commit 383e8be33934d078bad2e8fe1233cc0f3c6119ed
Author: Ashutosh Bapat 
Date:   2018-10-26T08:22:04Z

HIVE-20794: Use Zookeeper for dynamic service discovery of metastore.

The patch also adds new ZooKeeper configurations for metastore. We reuse 
THRIFT_URIs to specify
ZooKeeper quorum and have another configuration by name 
THRIFT_SERVICE_DISCOVERY_MODE to specify
what method to use for dynamic service discovery.

Ashutosh Bapat

commit a38e2e8c9fdc85cd809a1aac9d16ed1d204117bb
Author: Ashutosh Bapat 
Date:   2018-11-13T09:05:03Z

HIVE-20794: Refactor existing code for supporting metastore dynamic 
discovery using Zookeeper

Extract the code in HiveServer2 dealing with ZooKeeper into a 
ZooKeeperHiveHelper class so that
it can be used by MetaStore server as well. This also moves the 
ZooKeeperHiveHelper.java into a
location common to both HiveServer2 and MetaStore code.

Ashutosh Bapat




---


[jira] [Created] (HIVE-20794) Use Zookeeper for metastore service discovery

2018-10-24 Thread Ashutosh Bapat (JIRA)
Ashutosh Bapat created HIVE-20794:
-

 Summary: Use Zookeeper for metastore service discovery
 Key: HIVE-20794
 URL: https://issues.apache.org/jira/browse/HIVE-20794
 Project: Hive
  Issue Type: Improvement
Reporter: Ashutosh Bapat
Assignee: Ashutosh Bapat


Right now, multiple metastore services can be specified in hive.metastore.uris 
configuration, but that list is static and can not be modified dynamically. Use 
Zookeeper for dynamic service discovery of metastore.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] hive pull request #447: HIVE-20708: Load an external table as an external ta...

2018-10-16 Thread ashutosh-bapat
GitHub user ashutosh-bapat opened a pull request:

https://github.com/apache/hive/pull/447

HIVE-20708: Load an external table as an external table on target with the 
same location as  on the source

Dump an external table as an external table.

When loading an external table set the location of the target table same as 
the location of source,
but relative to the file system of the target location. IOW, the scheme, 
authority of the target
location is same as the target file system but the path relative to the 
file system is same as the
source.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ashutosh-bapat/hive hive20708

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/447.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #447


commit c076bbbd2b0fd1b193ac51a1595911a80324b923
Author: Ashutosh Bapat 
Date:   2018-10-15T05:09:05Z

HIVE-20708: Load an external table as an external table on target with the 
same location as
on the source

Dump an external table as an external table.

When loading an external table set the location of the target table same as 
the location of source,
but relative to the file system of the target location. IOW, the scheme, 
authority of the target
location is same as the target file system but the path relative to the 
file system is same as the
source.




---


[GitHub] hive pull request #445: HIVE-20542

2018-10-10 Thread ashutosh-bapat
GitHub user ashutosh-bapat opened a pull request:

https://github.com/apache/hive/pull/445

HIVE-20542

Changes for HIVE-20542. There are three separate commits, with each commit 
message explaining purpose of that commit. They all should be pulled together 
as a single commit into Apache hive repository. 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ashutosh-bapat/hive hive20542

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/445.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #445


commit e76dd0d51662d5e1bc1890010b95d305f2505ea8
Author: Ashutosh Bapat 
Date:   2018-10-01T05:17:06Z

HIVE-20542: Insert NULL value for columns of NOTIFICATION_LOG for which 
values are not available

When no database is associated with an event we insert 'null' as database 
name in the metastore.
With this commit, we insert NULL as database name.

When no tablename is associated with an event we insert an empty string as 
table name in the
metastore. With this commit, we insert NULL as table name.

Even if a catalog name is associated with an event, addNotificationLog() 
doesn't insert catalog in
the metastore. With this commit we take care of that as well.

Ashutosh Bapat.

commit 31aee469baffb95641fb68f70a6bcaa0ca725d28
Author: Ashutosh Bapat 
Date:   2018-10-01T06:11:15Z

HIVE-20542: Modify query used to count the number of events to be 
replicated incrementally

The query used to count the events for a given incremental replication does 
not
1. Count event with NULL database, table or catalog names.
2. Does not consider toEventId and Limit for the given incremental 
replication.

Ashutosh Bapat.

commit d3a319c5fd347018572c13b40e1e8f7cdbe72050
Author: Ashutosh Bapat 
Date:   2018-10-04T04:08:44Z

HIVE-20644: Add tests for testing getNotificationEventsCount().

Ashutosh Bapat.




---


[jira] [Created] (HIVE-20708) Load (dumped) an external table as an external table on target with the same location as on the source

2018-10-08 Thread Ashutosh Bapat (JIRA)
Ashutosh Bapat created HIVE-20708:
-

 Summary: Load (dumped) an external table as an external table on 
target with the same location as on the source
 Key: HIVE-20708
 URL: https://issues.apache.org/jira/browse/HIVE-20708
 Project: Hive
  Issue Type: Improvement
  Components: repl
Reporter: Ashutosh Bapat
Assignee: Ashutosh Bapat


External tables are currently mapped to managed tables on target. A lot of jobs 
in user environment are dependent upon locations specified in external table 
definitions to run, hence, the path for external tables on the target and on 
the source are expected to be the same. An external table being loaded as a  
managed table makes it difficult for failover (Controlled Failover) / failback 
since there is no option of moving data from managed to external table. So the 
external table replicated to target cluster needs to be kept as external table 
with same location as on the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Error running checkstyle:checkstyle goal

2018-10-04 Thread Ashutosh Bapat
Hi,
I am trying to run "mvn checkstyle:checkstyle" to catch checkstyle errors
before submitting a patch. But while running that command I get an error
[ERROR] Failed to execute goal
org.apache.maven.plugins:maven-checkstyle-plugin:2.17:checkstyle
(default-cli) on project hive-standalone-metastore-common: An error has
occurred in Checkstyle report generation. Failed during checkstyle
execution: Unable to find configuration file at location:
/work/bug110208/cr/standalone-metastore/metastore-common/checkstyle//checkstyle.xml:
Could not find resource
'/work/bug110208/cr/standalone-metastore/metastore-common/checkstyle//checkstyle.xml'.

Looks like we are missing checkstyle.xml file in the given location. Looks
like we need to fix that.

I also wonder, how does jenkins is able to run checkstyle.

--
Best Wishes,
Ashutosh Bapat


Confluence edit permission

2018-10-02 Thread Ashutosh Bapat
Hi Lefty,
I would like to get permissions to edit pages in confluence. I am working
with Hive team in Hortonworks. I couldn't find a way to request edit
permissions through confluence, hence this mail.


--
Best Wishes,
Ashutosh Bapat


[GitHub] hive pull request #439: HIVE-20644: Avoid exposing sensitive infomation thro...

2018-09-27 Thread ashutosh-bapat
GitHub user ashutosh-bapat opened a pull request:

https://github.com/apache/hive/pull/439

HIVE-20644: Avoid exposing sensitive infomation through a Hive Runtime 
exception (Ashutosh Bapat)



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ashutosh-bapat/hive hive20644

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/439.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #439


commit f5c4b22ebbfc4893b2aa436d9ea9b4241f04340b
Author: Ashutosh Bapat 
Date:   2018-09-27T05:51:52Z

HIVE-20644: Avoid exposing sensitive infomation through a Hive Runtime 
exception (Ashutosh Bapat)




---


[jira] [Created] (HIVE-20644) Avoid exposing sensitive infomation through an error message

2018-09-26 Thread Ashutosh Bapat (JIRA)
Ashutosh Bapat created HIVE-20644:
-

 Summary: Avoid exposing sensitive infomation through an error 
message
 Key: HIVE-20644
 URL: https://issues.apache.org/jira/browse/HIVE-20644
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Reporter: Ashutosh Bapat
Assignee: Ashutosh Bapat


The HiveException raised from the following methods is exposing the datarow the 
caused the run time exception.
 # ReduceRecordSource::GroupIterator::next() - around line 372
 # MapOperator::process() - around line 567
 # ExecReducer::reduce() - around line 243

In all the cases, a string representation of the row is constructed on the fly 
and is included in
the error message.

VectorMapOperator::process() - around line 973 raises the same exception but 
it's not exposing the row since the row contents are not included in the error 
message.

While trying to reproduce above error, I also found that the arguments to a UDF 
get exposed in log messages from FunctionRegistry::invoke() around line 1114. 
This too can cause sensitive information to be leaked through error message.

This way some sensitive information is leaked to a user through exception 
message. That information may not be available to the user otherwise. Hence 
it's a kind of security breach or violation of access control.

The contents of the row or the arguments to a function may be useful for 
debugging and hence it's worth to add those to logs. Hence proposal here to log 
a separate message with log level DEBUG or INFO containing the string 
representation of the row. Users can configure their logging so that DEBUG/INFO 
messages do not go to the client but at the same time are available in the hive 
server logs for debugging. The actual exception message will not contain any 
sensitive data like row data or argument data.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)