[jira] [Created] (HIVE-21520) Query "Submit plan" time reported is incorrect

2019-03-26 Thread Rajesh Balamohan (JIRA)
Rajesh Balamohan created HIVE-21520:
---

 Summary: Query "Submit plan" time reported is incorrect
 Key: HIVE-21520
 URL: https://issues.apache.org/jira/browse/HIVE-21520
 Project: Hive
  Issue Type: Bug
Reporter: Rajesh Balamohan


Hive master branch + LLAP
{noformat}
Query Execution Summary
--
OPERATION    DURATION
--
Compile Query   0.00s
Prepare Plan    0.00s
Get Query Coordinator (AM)  0.00s
Submit Plan 1553658149.89s
Start DAG   0.53s
Run DAG 0.43s
--
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21519) The meta data of table is not updated after setting a new data location

2019-03-26 Thread Boying Lu (JIRA)
Boying Lu created HIVE-21519:


 Summary: The meta data of table is not updated after setting a new 
data location
 Key: HIVE-21519
 URL: https://issues.apache.org/jira/browse/HIVE-21519
 Project: Hive
  Issue Type: Bug
  Components: Database/Schema
Affects Versions: 1.1.0
Reporter: Boying Lu


re-produce steps:
 # create a new hive table T1 (without any partition)
 #  persist some data into T1
 # create a new hive table T2 (without any partition)
 # alter table T2 set location path-to-T1-HDFS-folder

expected result:

 The metadata of T2 is changed accordingly

 

actual resut:

    The metadata of T2 is not changed

even after running the commands:

   analyze table T2 compute statistics (only the number of rows was updated)

   msck repair table T2 (no effect)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21518) GenericUDFOPNotEqualNS does not run in LLAP

2019-03-26 Thread Jason Dere (JIRA)
Jason Dere created HIVE-21518:
-

 Summary: GenericUDFOPNotEqualNS does not run in LLAP
 Key: HIVE-21518
 URL: https://issues.apache.org/jira/browse/HIVE-21518
 Project: Hive
  Issue Type: Bug
  Components: UDF
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: HIVE-21518.1.patch

GenericUDFOPNotEqualNS (Not equal nullsafe operator) does not run in LLAP mode, 
because it is not registered as a built-in function.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21517) Fix AggregateStatsCache

2019-03-26 Thread Miklos Gergely (JIRA)
Miklos Gergely created HIVE-21517:
-

 Summary: Fix AggregateStatsCache
 Key: HIVE-21517
 URL: https://issues.apache.org/jira/browse/HIVE-21517
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 3.1.1
Reporter: Miklos Gergely
Assignee: Miklos Gergely
 Fix For: 4.0.0


Due to a bug AggregateStatsCache is not returning the best matching result.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21516) Fix spark downloading for q tests

2019-03-26 Thread Miklos Gergely (JIRA)
Miklos Gergely created HIVE-21516:
-

 Summary: Fix spark downloading for q tests
 Key: HIVE-21516
 URL: https://issues.apache.org/jira/browse/HIVE-21516
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 3.1.1
Reporter: Miklos Gergely
Assignee: Miklos Gergely
 Fix For: 4.0.0


Currently itests/pom.xml declares a command to generated the download script 
for spark, thus it is re-generated every time any maven command is executed for 
any sub project of itests. AS a side effect it is leaving download.sh files 
everywhere. The download.sh file is almost totally static, no need to recreate 
it every time, just requires $spark.version as a parameter.

Also it is only working properly under linux, as it relies on the md5sum 
program which is not present in OS X. This means that if the spark tarball is 
partially downloaded on OS X, then it would never be re-downloaded. This should 
be fixed by making it work as well using md5 on OS X.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21515) Improvement to MoveTrash Facilities

2019-03-26 Thread David Mollitor (JIRA)
David Mollitor created HIVE-21515:
-

 Summary: Improvement to MoveTrash Facilities
 Key: HIVE-21515
 URL: https://issues.apache.org/jira/browse/HIVE-21515
 Project: Hive
  Issue Type: Improvement
Affects Versions: 4.0.0, 3.2.0
Reporter: David Mollitor
Assignee: David Mollitor
 Attachments: HIVE-21515.1.patch





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21514) Map data

2019-03-26 Thread Simon poortman (JIRA)
Simon poortman created HIVE-21514:
-

 Summary: Map data
 Key: HIVE-21514
 URL: https://issues.apache.org/jira/browse/HIVE-21514
 Project: Hive
  Issue Type: Bug
Reporter: Simon poortman
 Fix For: 0.10.1






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21513) ACID: Running merge concurrently with minor compaction causes a later select * to throw exception

2019-03-26 Thread Vaibhav Gumashta (JIRA)
Vaibhav Gumashta created HIVE-21513:
---

 Summary: ACID: Running merge concurrently with minor compaction 
causes a later select * to throw exception 
 Key: HIVE-21513
 URL: https://issues.apache.org/jira/browse/HIVE-21513
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 3.1.1
Reporter: Vaibhav Gumashta


Repro steps:

- Create table 
- Load some data 
- Run merge so records gets updated and delete_delta dirs are created
- Manually initiate minor compaction: ALTER TABLE ... COMPACT 'minor';
- While the compaction is running keep executing the merge statement
- After some time try to do simple select *;



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.

2019-03-26 Thread GitBox
sankarh commented on a change in pull request #579: HIVE-21109 : Support stats 
replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269262756
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/plan/ImportTableDesc.java
 ##
 @@ -381,4 +382,11 @@ public void setOwnerName(String ownerName) {
 throw new RuntimeException("Invalid table type : " + getDescType());
 }
   }
+
+  public Long getReplWriteId() {
+if (this.createTblDesc != null) {
+  return this.createTblDesc.getReplWriteId();
 
 Review comment:
   This replWriteId is just a place holder for the writeId from the event 
message. It need not be in CreateTableDesc. Can be maintained in local 
variables and pass around.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.

2019-03-26 Thread GitBox
sankarh commented on a change in pull request #579: HIVE-21109 : Support stats 
replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269220469
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
 ##
 @@ -2950,21 +2956,33 @@ public Partition createPartition(Table tbl, 
Map partSpec) throws
 int size = addPartitionDesc.getPartitionCount();
 List in =
 new ArrayList(size);
-AcidUtils.TableSnapshot tableSnapshot = AcidUtils.getTableSnapshot(conf, 
tbl, true);
 long writeId;
 String validWriteIdList;
-if (tableSnapshot != null && tableSnapshot.getWriteId() > 0) {
-  writeId = tableSnapshot.getWriteId();
-  validWriteIdList = tableSnapshot.getValidWriteIdList();
+
+// In case of replication, get the writeId from the source and use valid 
write Id list
+// for replication.
+if (addPartitionDesc.getReplicationSpec() != null &&
+addPartitionDesc.getReplicationSpec().isInReplicationScope() &&
+addPartitionDesc.getPartition(0).getWriteId() > 0) {
+  writeId = addPartitionDesc.getPartition(0).getWriteId();
+  validWriteIdList =
 
 Review comment:
   In replication flow, it is fine to use hardcoded ValidWriteIdList as we want 
to forcefully set this writeId into table or partition objects. Getting it from 
current state might be wrong as we don't update ValidTxnList in conf for repl 
created txns. 
   ValidWriteIdList is just used to check if writeId in metastore objects are 
updated by any concurrent inserts. In repl load flow, it is not possible as we 
replicate one event at a time or in bootstrap, no 2 threads writes into same 
table.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.

2019-03-26 Thread GitBox
sankarh commented on a change in pull request #579: HIVE-21109 : Support stats 
replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269136269
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/parse/ImportSemanticAnalyzer.java
 ##
 @@ -1247,17 +1244,37 @@ private static void createReplImportTasks(
   } else if (!replicationSpec.isMetadataOnly()
   && !shouldSkipDataCopyInReplScope(tblDesc, replicationSpec)) {
 x.getLOG().debug("adding dependent CopyWork/MoveWork for table");
-t.addDependentTask(loadTable(fromURI, table, 
replicationSpec.isReplace(),
-new Path(tblDesc.getLocation()), replicationSpec, x, writeId, 
stmtId));
+dependentTasks = new ArrayList<>(1);
+dependentTasks.add(loadTable(fromURI, table, 
replicationSpec.isReplace(),
+  new Path(tblDesc.getLocation()), 
replicationSpec,
+  x, writeId, stmtId));
   }
 
-  if (dropTblTask != null) {
-// Drop first and then create
-dropTblTask.addDependentTask(t);
-x.getTasks().add(dropTblTask);
+  // During replication, by the time we reply a commit transaction event, 
the table should
+  // have been already created when replaying previous events. So no need 
to create table
+  // again. For some reason we need create table task for partitioned 
table though.
 
 Review comment:
   The comment says for partitioned table, create table task needed but in the 
code it is skipped always for commit txn event. Which one is correct?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.

2019-03-26 Thread GitBox
sankarh commented on a change in pull request #579: HIVE-21109 : Support stats 
replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269098036
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java
 ##
 @@ -2689,7 +2689,19 @@ private int alterTable(Hive db, AlterTableDesc 
alterTbl) throws HiveException {
   } else {
 // Note: this is necessary for UPDATE_STATISTICS command, that 
operates via ADDPROPS (why?).
 //   For any other updates, we don't want to do txn check on 
partitions when altering table.
-boolean isTxn = alterTbl.getPartSpec() != null && alterTbl.getOp() == 
AlterTableTypes.ADDPROPS;
+boolean isTxn = false;
+if (alterTbl.getPartSpec() != null && alterTbl.getOp() == 
AlterTableTypes.ADDPROPS) {
+  // ADDPROPS is used to add repl.last.id during replication. That's 
not a transactional
+  // change.
+  Map props = alterTbl.getProps();
+  if (props.size() <= 1 && 
props.get(ReplicationSpec.KEY.CURR_STATE_ID.toString()) != null) {
+isTxn = false;
+  } else {
+isTxn = true;
+  }
+}
+// TODO: Somehow we have to signal alterPartitions that it's part of 
replication and
+//  should use replication's valid writeid list instead of creating 
one.
 
 Review comment:
   What do you mean by replication's valid writeid list in this comment? Even 
in repl flow, we get validWriteIdList from HMS based on incoming writeId in the 
event msg. Are you suggesting to cache this ValidWriteIdList somewhere and use 
it instead of invoking HMS API?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.

2019-03-26 Thread GitBox
sankarh commented on a change in pull request #579: HIVE-21109 : Support stats 
replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269060256
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/ddl/table/CreateTableDesc.java
 ##
 @@ -118,7 +118,8 @@
   List notNullConstraints;
   List defaultConstraints;
   List checkConstraints;
-  private ColumnStatistics colStats;
+  private ColumnStatistics colStats;  // For the sake of replication
+  private long writeId = -1; // For the sake of replication
 
 Review comment:
   Can we re-use the replWriteId variable that we already have?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.

2019-03-26 Thread GitBox
sankarh commented on a change in pull request #579: HIVE-21109 : Support stats 
replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269110947
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
 ##
 @@ -828,6 +828,8 @@ public void alterPartitions(String tblName, 
List newParts,
   new ArrayList();
 try {
   AcidUtils.TableSnapshot tableSnapshot = null;
+  // TODO: In case of replication use the writeId and valid write id list 
constructed for
 
 Review comment:
   Is it done or still TODO?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.

2019-03-26 Thread GitBox
sankarh commented on a change in pull request #579: HIVE-21109 : Support stats 
replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269247183
 
 

 ##
 File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestStatsReplicationScenarios.java
 ##
 @@ -359,17 +383,20 @@ private void testStatsReplicationCommon(boolean 
parallelBootstrap, boolean metad
   }
 
   @Test
-  public void testForNonAcidTables() throws Throwable {
+  public void testNonParallelBootstrapLoad() throws Throwable {
+LOG.info("Testing " + testName.getClass().getName() + "." + 
testName.getMethodName());
 testStatsReplicationCommon(false, false);
   }
 
   @Test
-  public void testForNonAcidTablesParallelBootstrapLoad() throws Throwable {
-testStatsReplicationCommon(true, false);
+  public void testForParallelBootstrapLoad() throws Throwable {
+LOG.info("Testing " + testName.getClass().getName() + "." + 
testName.getMethodName());
+testStatsReplicationCommon(true, false );
   }
 
   @Test
-  public void testNonAcidMetadataOnlyDump() throws Throwable {
+  public void testMetadataOnlyDump() throws Throwable {
 
 Review comment:
   Add more tests for the following scenarios.
   1. REPL LOAD fails after replicating table or partition objects with stats 
but before setting last replId. Now, retry which takes alter table/partition 
replace flows and stats should be valid after successful replication. Need this 
for all non-transactional, transactional and migration cases.
   2. Parallel inserts with autogather enabled. Now, we will have events such 
that multiple txns open when updating stats event. Also, try to simulate that 
one stats update was successful and the other one invalidates it due to 
concurrent writes. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.

2019-03-26 Thread GitBox
sankarh commented on a change in pull request #579: HIVE-21109 : Support stats 
replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269223302
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
 ##
 @@ -2950,21 +2956,33 @@ public Partition createPartition(Table tbl, 
Map partSpec) throws
 int size = addPartitionDesc.getPartitionCount();
 List in =
 new ArrayList(size);
-AcidUtils.TableSnapshot tableSnapshot = AcidUtils.getTableSnapshot(conf, 
tbl, true);
 long writeId;
 String validWriteIdList;
-if (tableSnapshot != null && tableSnapshot.getWriteId() > 0) {
-  writeId = tableSnapshot.getWriteId();
-  validWriteIdList = tableSnapshot.getValidWriteIdList();
+
+// In case of replication, get the writeId from the source and use valid 
write Id list
+// for replication.
+if (addPartitionDesc.getReplicationSpec() != null &&
 
 Review comment:
   addPartitionDesc.getReplicationSpec() will never be null. Can remove this 
check.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.

2019-03-26 Thread GitBox
sankarh commented on a change in pull request #579: HIVE-21109 : Support stats 
replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269161871
 
 

 ##
 File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
 ##
 @@ -2130,11 +2144,18 @@ private void create_table_core(final RawStore ms, 
final Table tbl,
 
   // If the table has column statistics, update it into the metastore. 
This feature is used
   // by replication to replicate table level statistics.
-  if (tbl.isSetColStats()) {
-// We do not replicate statistics for a transactional table right now 
and hence we do not
-// expect a transactional table to have column statistics here. So 
passing null
-// validWriteIds is fine for now.
-updateTableColumnStatsInternal(tbl.getColStats(), null, 
tbl.getWriteId());
+  if (colStats != null) {
+// On replica craft a valid snapshot out of the writeId in the table.
+long writeId = tbl.getWriteId();
+String validWriteIds = null;
+if (writeId > 0) {
+  ValidWriteIdList vwil =
 
 Review comment:
   Shall use meaningful names instead of "vwil".


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.

2019-03-26 Thread GitBox
sankarh commented on a change in pull request #579: HIVE-21109 : Support stats 
replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269257547
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
 ##
 @@ -987,10 +989,14 @@ public void createTable(Table tbl, boolean ifNotExists,
   tTbl.setPrivileges(principalPrivs);
 }
   }
-  // Set table snapshot to api.Table to make it persistent.
-  TableSnapshot tableSnapshot = AcidUtils.getTableSnapshot(conf, tbl, 
true);
-  if (tableSnapshot != null) {
-tbl.getTTable().setWriteId(tableSnapshot.getWriteId());
+  // Set table snapshot to api.Table to make it persistent. A 
transactional table being
+  // replicated may have a valid write Id copied from the source. Use that 
instead of
+  // crafting one on the replica.
+  if (tTbl.getWriteId() <= 0) {
 
 Review comment:
   DO_NOT_UPDATE_STATS flag should be set in createTableFlow as well. Or else 
in autogather mode at target, it will be updated automatically.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.

2019-03-26 Thread GitBox
sankarh commented on a change in pull request #579: HIVE-21109 : Support stats 
replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269172695
 
 

 ##
 File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
 ##
 @@ -3539,10 +3573,19 @@ public boolean equals(Object obj) {
 }
 
 // Update partition column statistics if available
-for (Partition newPart : newParts) {
-  if (newPart.isSetColStats()) {
-updatePartitonColStatsInternal(tbl, newPart.getColStats(), null, 
newPart.getWriteId());
+int cnt = 0;
+for (ColumnStatistics partColStats: partsColStats) {
+  long writeId = partsWriteIds.get(cnt++);
+  // On replica craft a valid snapshot out of the writeId in the 
partition
+  String validWriteIds = null;
+  if (writeId > 0) {
+ValidWriteIdList vwil =
 
 Review comment:
   Same as above.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.

2019-03-26 Thread GitBox
sankarh commented on a change in pull request #579: HIVE-21109 : Support stats 
replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269156935
 
 

 ##
 File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
 ##
 @@ -1894,6 +1898,16 @@ private void create_table_core(final RawStore ms, final 
Table tbl,
List checkConstraints)
 throws AlreadyExistsException, MetaException,
 InvalidObjectException, NoSuchObjectException, InvalidInputException {
+
+  ColumnStatistics colStats = null;
+  // If the given table has column statistics, save it here. We will 
update it later.
+  // We don't want it to be part of the Table object being created, lest 
the create table
 
 Review comment:
   Shall simplify the comment. "Column stats are not expected to be part of 
Create table event and also shouldn't be persisted. So remove it from Table 
object."
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.

2019-03-26 Thread GitBox
sankarh commented on a change in pull request #579: HIVE-21109 : Support stats 
replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269169210
 
 

 ##
 File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
 ##
 @@ -2130,11 +2144,18 @@ private void create_table_core(final RawStore ms, 
final Table tbl,
 
   // If the table has column statistics, update it into the metastore. 
This feature is used
   // by replication to replicate table level statistics.
-  if (tbl.isSetColStats()) {
-// We do not replicate statistics for a transactional table right now 
and hence we do not
-// expect a transactional table to have column statistics here. So 
passing null
-// validWriteIds is fine for now.
-updateTableColumnStatsInternal(tbl.getColStats(), null, 
tbl.getWriteId());
+  if (colStats != null) {
+// On replica craft a valid snapshot out of the writeId in the table.
+long writeId = tbl.getWriteId();
+String validWriteIds = null;
+if (writeId > 0) {
+  ValidWriteIdList vwil =
+  new 
ValidReaderWriteIdList(TableName.getDbTable(tbl.getDbName(),
 
 Review comment:
   Shall add a comment on why the hardcoded validWriteList is used in this flow 
instead of taking current state of txns.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.

2019-03-26 Thread GitBox
sankarh commented on a change in pull request #579: HIVE-21109 : Support stats 
replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269154738
 
 

 ##
 File path: 
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnCommonUtils.java
 ##
 @@ -84,6 +86,73 @@ public static ValidTxnList 
createValidReadTxnList(GetOpenTxnsResponse txns, long
 return new ValidReadTxnList(exceptions, outAbortedBits, highWaterMark, 
minOpenTxnId);
   }
 
+  /**
+   * Transform a {@link 
org.apache.hadoop.hive.metastore.api.GetOpenTxnsResponse} to a
+   * {@link org.apache.hadoop.hive.common.ValidTxnList}.  This assumes that 
the caller intends to
+   * read the files, and thus treats both open and aborted transactions as 
invalid.
+   *
+   * This API is used by Hive replication which may have multiple transactions 
open at a time.
+   *
+   * @param txns open txn list from the metastore
+   * @param currentTxns Current transactions that the replication has opened.  
If any of the
+   *transactions is greater than 0 it will be removed from 
the exceptions
+   *list so that the replication sees its own transaction 
as valid.
+   * @return a valid txn list.
+   */
+  public static ValidTxnList createValidReadTxnList(GetOpenTxnsResponse txns,
 
 Review comment:
   The complete logic of considering all txns opened in a batch by open txn 
event as current txns is incorrect. 
   Multiple txns are opened by repl task only for replicating Hive Streaming 
case where we allocate txns batch but use one at a time. Also, we don't update 
stats in that case. Even if we update stats, it should refer to one txn as 
current txn and rest of the txns are left open. 
   Shall remove replTxnIds cache in TxnManager as well. All callers shall 
create a hardcoded ValidWriteIdList using the writeId received from event msg.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.

2019-03-26 Thread GitBox
sankarh commented on a change in pull request #579: HIVE-21109 : Support stats 
replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269081532
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java
 ##
 @@ -2689,7 +2689,19 @@ private int alterTable(Hive db, AlterTableDesc 
alterTbl) throws HiveException {
   } else {
 // Note: this is necessary for UPDATE_STATISTICS command, that 
operates via ADDPROPS (why?).
 //   For any other updates, we don't want to do txn check on 
partitions when altering table.
-boolean isTxn = alterTbl.getPartSpec() != null && alterTbl.getOp() == 
AlterTableTypes.ADDPROPS;
+boolean isTxn = false;
+if (alterTbl.getPartSpec() != null && alterTbl.getOp() == 
AlterTableTypes.ADDPROPS) {
+  // ADDPROPS is used to add repl.last.id during replication. That's 
not a transactional
+  // change.
+  Map props = alterTbl.getProps();
+  if (props.size() <= 1 && 
props.get(ReplicationSpec.KEY.CURR_STATE_ID.toString()) != null) {
 
 Review comment:
   ReplUtils.REPL_CHECKPOINT_KEY is another prop we set it in repl flow which 
is not transactional. This check doesn't seems to be clean as in future we 
might add more such alters in repl flow. Can we check 
replicationSpec.isReplicationScope instead or another flag in AlterTableDesc to 
skip this?
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.

2019-03-26 Thread GitBox
sankarh commented on a change in pull request #579: HIVE-21109 : Support stats 
replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269103325
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/events/filesystem/FSTableEvent.java
 ##
 @@ -199,12 +199,15 @@ private AddPartitionDesc partitionDesc(Path fromPath,
   // Right now, we do not have a way of associating a writeId with 
statistics for a table
   // converted to a transactional table if it was non-transactional on the 
source. So, do not
 
 Review comment:
   Comment needs to be corrected.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (HIVE-21512) Upgrade jms-api to 2.0.2

2019-03-26 Thread Zoltan Haindrich (JIRA)
Zoltan Haindrich created HIVE-21512:
---

 Summary: Upgrade jms-api to 2.0.2
 Key: HIVE-21512
 URL: https://issues.apache.org/jira/browse/HIVE-21512
 Project: Hive
  Issue Type: Improvement
Reporter: Zoltan Haindrich
Assignee: Zoltan Haindrich


I've noticed that for some time that sometimes there are issues with 
javax.jms:jms:1.1 artifact - because it doesn't seem to be available from maven 
central for some reason;
https://issues.sonatype.org/browse/MVNCENTRAL-4708

Alternatively; I think we might try to just upgrade to 2.0.2 version of the 
jms-api.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21511) beeline -f report no such file if file is not on local fs

2019-03-26 Thread Bruno Pusztahazi (JIRA)
Bruno Pusztahazi created HIVE-21511:
---

 Summary: beeline -f report no such file if file is not on local fs
 Key: HIVE-21511
 URL: https://issues.apache.org/jira/browse/HIVE-21511
 Project: Hive
  Issue Type: Bug
  Components: Beeline
Affects Versions: 1.3.0, beeline-cli-branch
 Environment: java version: 1.8.0_112-b15

hadoop version: 2.7.2

hive version:1.3.0

hive JDBS version: 1.3.0

beeline version: 1.3.0
Reporter: Bruno Pusztahazi
Assignee: Bruno Pusztahazi
 Fix For: 1.3.0


I test like this

HQL=hdfs://hacluster/tmp/ff.hql

if hadoop fs -test -f ${HQL}

then

   beeline -f ${HQL}

fi

test ${HQL} ok, but beeline report ${HQL} no such file or directory



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21510) Vectorization: add support for and/or for (constant,column) cases

2019-03-26 Thread Zoltan Haindrich (JIRA)
Zoltan Haindrich created HIVE-21510:
---

 Summary: Vectorization: add support for and/or for 
(constant,column) cases
 Key: HIVE-21510
 URL: https://issues.apache.org/jira/browse/HIVE-21510
 Project: Hive
  Issue Type: Improvement
Reporter: Zoltan Haindrich


After HIVE-21001 some selectExpressions will start using VectorUDFAdaptor for 
"null and x" expressions. Because right now there are 2-3 places from which 
rewrite will be done to the form of "null and/or x" form; it would be better to 
support it.

{code}
[...]
selectExpressions: VectorUDFAdaptor((null and dt1 is null))
[...]
usesVectorUDFAdaptor: true
[...]
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21509) LLAP may cache corrupted column vectors and return wrong query result

2019-03-26 Thread Adam Szita (JIRA)
Adam Szita created HIVE-21509:
-

 Summary: LLAP may cache corrupted column vectors and return wrong 
query result
 Key: HIVE-21509
 URL: https://issues.apache.org/jira/browse/HIVE-21509
 Project: Hive
  Issue Type: Bug
  Components: llap
Reporter: Adam Szita
Assignee: Adam Szita






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21508) ClassCastException when initializing HiveMetaStoreClient on JDK10 or newer

2019-03-26 Thread Adar Dembo (JIRA)
Adar Dembo created HIVE-21508:
-

 Summary: ClassCastException when initializing HiveMetaStoreClient 
on JDK10 or newer
 Key: HIVE-21508
 URL: https://issues.apache.org/jira/browse/HIVE-21508
 Project: Hive
  Issue Type: Bug
  Components: Clients
Affects Versions: 2.3.4, 3.2.0
Reporter: Adar Dembo


There's this block of code in {{HiveMetaStoreClient:resolveUris}} (called from 
the constructor) on master:
{noformat}
  private URI metastoreUris[];
  ...
  if (MetastoreConf.getVar(conf, 
ConfVars.THRIFT_URI_SELECTION).equalsIgnoreCase("RANDOM")) {
List uriList = Arrays.asList(metastoreUris);
Collections.shuffle(uriList);
metastoreUris = (URI[]) uriList.toArray();
  }
{noformat}

The cast to {{URI[]}} throws a {{ClassCastException}} beginning with JDK 10, 
possibly with JDK 9 as well. Note that {{THRIFT_URI_SELECTION}} defaults to 
{{RANDOM}} so this should affect anyone who creates a {{HiveMetaStoreClient}}. 
On master this can be overridden with {{SEQUENTIAL}} to avoid the broken case; 
I'm working against 2.3.4 where there's no such workaround.

[Here's|https://stackoverflow.com/questions/51372788/array-cast-java-8-vs-java-9]
 a StackOverflow post that explains the issue in more detail. Interestingly, 
the author described the issue in the context of the HMS; not sure why there 
was no follow up with a Hive bug report.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21507) Hive swallows NPE if no delegation token found

2019-03-26 Thread Denes Bodo (JIRA)
Denes Bodo created HIVE-21507:
-

 Summary: Hive swallows NPE if no delegation token found
 Key: HIVE-21507
 URL: https://issues.apache.org/jira/browse/HIVE-21507
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Affects Versions: 3.1.1
Reporter: Denes Bodo
Assignee: Denes Bodo


In case if there is no delegation token put into token file, this 
[line|https://github.com/apache/hive/blob/master/jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java#L777]
 will cause a NullPointerException which is not handled and the user is not 
notified in any way.

To cause NPE the use case is to have an Oozie Sqoop import to Hive in a 
kerberized cluster. Oozie puts the delegation token into the token file with 
id: *HIVE_DELEGATION_TOKEN_hiveserver2ClientToken*. So with id *hive* it is not 
working. However, fallback code uses the key which Oozie provides 
[this|https://github.com/apache/hive/blob/master/jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java#L784]
 way.

I suggest to have warning message to user that key with id *hive* cannot be 
used and falling back to get delegation token from the session.

I am creating the patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21506) Memory based TxnHandler implementation

2019-03-26 Thread Peter Vary (JIRA)
Peter Vary created HIVE-21506:
-

 Summary: Memory based TxnHandler implementation
 Key: HIVE-21506
 URL: https://issues.apache.org/jira/browse/HIVE-21506
 Project: Hive
  Issue Type: New Feature
  Components: Transactions
Reporter: Peter Vary


The current TxnHandler implementations are using the backend RDBMS to store 
every Hive lock and transaction data, so multiple TxnHandler instances can run 
simultaneously and can serve requests. The continuous communication/locking 
done on the RDBMS side puts serious load on the backend databases also 
restricts the possible throughput.

If it is possible to have only a single active TxnHandler (with the current 
design HMS) instance then we can provide much better (using only java based 
locking) performance. We still have to store the committed write transactions 
to the RDBMS (or later some other persistent storage), but other lock and 
transaction operations could remain memory only.

The most important drawbacks with this solution is that we definitely lose 
scalability when one instance of TxnHandler is no longer able to serve the 
requests (see NameNode), and fault tolerance in the sense that the ongoing 
transactions should be terminated when the TxnHandler is failed. If this 
drawbacks are acceptable in certain situations the we can provide better 
throughput for the users.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)