[jira] [Work logged] (HIVE-24497) Node heartbeats from LLAP Daemon to the client are not matching leading to timeout.

2020-12-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24497?focusedWorklogId=522102=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522102
 ]

ASF GitHub Bot logged work on HIVE-24497:
-

Author: ASF GitHub Bot
Created on: 09/Dec/20 07:53
Start Date: 09/Dec/20 07:53
Worklog Time Spent: 10m 
  Work Description: simhadri-g commented on a change in pull request #1755:
URL: https://github.com/apache/hive/pull/1755#discussion_r539081553



##
File path: 
llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/AMReporter.java
##
@@ -541,6 +551,14 @@ boolean isDone() {
   return isDone.get();
 }
 
+void setIsExtCliRequest(boolean val) {
+  isExtCliRequest.set(val);

Review comment:
   Sure, I have changed the names to isExternalClientRequest in the recent 
commit.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 522102)
Time Spent: 0.5h  (was: 20m)

> Node heartbeats from LLAP Daemon to the client are not matching leading to 
> timeout.
> ---
>
> Key: HIVE-24497
> URL: https://issues.apache.org/jira/browse/HIVE-24497
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Simhadri G
>Assignee: Simhadri G
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Node heartbeat contains info about all the tasks that were submitted to that 
> LLAP Daemon. In cloud deployment, the client is not able to match this 
> heartbeats due to differences in hostname and port .



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24497) Node heartbeats from LLAP Daemon to the client are not matching leading to timeout.

2020-12-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24497?focusedWorklogId=522089=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522089
 ]

ASF GitHub Bot logged work on HIVE-24497:
-

Author: ASF GitHub Bot
Created on: 09/Dec/20 07:19
Start Date: 09/Dec/20 07:19
Worklog Time Spent: 10m 
  Work Description: prasanthj commented on a change in pull request #1755:
URL: https://github.com/apache/hive/pull/1755#discussion_r539064255



##
File path: 
llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/AMReporter.java
##
@@ -541,6 +551,14 @@ boolean isDone() {
   return isDone.get();
 }
 
+void setIsExtCliRequest(boolean val) {
+  isExtCliRequest.set(val);

Review comment:
   nit: for better readability, rename the variable and method to 
isExternalClientRequest. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 522089)
Time Spent: 20m  (was: 10m)

> Node heartbeats from LLAP Daemon to the client are not matching leading to 
> timeout.
> ---
>
> Key: HIVE-24497
> URL: https://issues.apache.org/jira/browse/HIVE-24497
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Simhadri G
>Assignee: Simhadri G
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Node heartbeat contains info about all the tasks that were submitted to that 
> LLAP Daemon. In cloud deployment, the client is not able to match this 
> heartbeats due to differences in hostname and port .



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24432) Delete Notification Events in Batches

2020-12-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24432?focusedWorklogId=522074=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522074
 ]

ASF GitHub Bot logged work on HIVE-24432:
-

Author: ASF GitHub Bot
Created on: 09/Dec/20 06:24
Start Date: 09/Dec/20 06:24
Worklog Time Spent: 10m 
  Work Description: aasha edited a comment on pull request #1710:
URL: https://github.com/apache/hive/pull/1710#issuecomment-741558077


   Please add a test.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 522074)
Time Spent: 2h  (was: 1h 50m)

> Delete Notification Events in Batches
> -
>
> Key: HIVE-24432
> URL: https://issues.apache.org/jira/browse/HIVE-24432
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Notification events are loaded in batches (reduces memory pressure on the 
> HMS), but all of the deletes happen under a single transactions and, when 
> deleting many records, can put a lot of pressure on the backend database.
> Instead, delete events in batches (in different transactions) as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24432) Delete Notification Events in Batches

2020-12-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24432?focusedWorklogId=522073=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522073
 ]

ASF GitHub Bot logged work on HIVE-24432:
-

Author: ASF GitHub Bot
Created on: 09/Dec/20 06:23
Start Date: 09/Dec/20 06:23
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #1710:
URL: https://github.com/apache/hive/pull/1710#discussion_r539040762



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
##
@@ -10800,53 +10801,89 @@ public void addNotificationEvent(NotificationEvent 
entry) throws MetaException {
 
   @Override
   public void cleanNotificationEvents(int olderThan) {
-boolean commited = false;
-Query query = null;
+final int eventBatchSize = MetastoreConf.getIntVar(conf, 
MetastoreConf.ConfVars.EVENT_CLEAN_MAX_EVENTS);
+
+final long ageSec = olderThan;
+final Instant now = Instant.now();
+
+final int tooOld = Math.toIntExact(now.getEpochSecond() - ageSec);
+
+final Optional batchSize = (eventBatchSize > 0) ? 
Optional.of(eventBatchSize) : Optional.empty();
+
+final long start = System.nanoTime();
+int deleteCount = doCleanNotificationEvents(tooOld, batchSize);
+
+if (deleteCount == 0) {
+  LOG.info("No Notification events found to be cleaned with eventTime < 
{}", tooOld);
+} else {
+  int batchCount = 0;
+  do {
+batchCount = doCleanNotificationEvents(tooOld, batchSize);
+deleteCount += batchCount;
+  } while (batchCount > 0);
+}
+
+final long finish = System.nanoTime();
+
+LOG.info("Deleted {} notification events older than epoch:{} in {}ms", 
deleteCount, tooOld,
+TimeUnit.NANOSECONDS.toMillis(finish - start));
+  }
+
+  private int doCleanNotificationEvents(final int ageSec, final 
Optional batchSize) {
+final Transaction tx = pm.currentTransaction();
+int eventsCount = 0;
+
 try {
-  openTransaction();
-  long tmp = System.currentTimeMillis() / 1000 - olderThan;
-  int tooOld = (tmp > Integer.MAX_VALUE) ? 0 : (int) tmp;
-  query = pm.newQuery(MNotificationLog.class, "eventTime < tooOld");
-  query.declareParameters("java.lang.Integer tooOld");
+  tx.begin();
 
-  int max_events = MetastoreConf.getIntVar(conf, 
MetastoreConf.ConfVars.EVENT_CLEAN_MAX_EVENTS);
-  max_events = max_events > 0 ? max_events : Integer.MAX_VALUE;
-  query.setRange(0, max_events);
-  query.setOrdering("eventId ascending");
+  try (Query query = pm.newQuery(MNotificationLog.class, "eventTime < 
tooOld")) {
+query.declareParameters("java.lang.Integer tooOld");
+query.setOrdering("eventId ascending");
+if (batchSize.isPresent()) {
+  query.setRange(0, batchSize.get());
+}
 
-  List toBeRemoved = (List) query.execute(tooOld);
-  int iteration = 0;
-  int eventCount = 0;
-  long minEventId = 0;
-  long minEventTime = 0;
-  long maxEventId = 0;
-  long maxEventTime = 0;
-  while (CollectionUtils.isNotEmpty(toBeRemoved)) {
-int listSize = toBeRemoved.size();
-if (iteration == 0) {
-  MNotificationLog firstNotification = toBeRemoved.get(0);
-  minEventId = firstNotification.getEventId();
-  minEventTime = firstNotification.getEventTime();
+List events = (List) query.execute(ageSec);
+if (CollectionUtils.isNotEmpty(events)) {
+  eventsCount = events.size();
+
+  if (LOG.isDebugEnabled()) {
+int minEventTime, maxEventTime;
+long minEventId, maxEventId;
+Iterator iter = events.iterator();
+MNotificationLog firstNotification = iter.next();
+
+minEventTime = maxEventTime = firstNotification.getEventTime();
+minEventId = maxEventId = firstNotification.getEventId();
+
+while (iter.hasNext()) {
+  MNotificationLog notification = iter.next();

Review comment:
   comparison is not required. events will always be in ascending order of 
event id





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 522073)
Time Spent: 1h 50m  (was: 1h 40m)

> Delete Notification Events in Batches
> -
>
> Key: HIVE-24432
> URL: https://issues.apache.org/jira/browse/HIVE-24432
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 3.2.0
>Reporter: David 

[jira] [Work logged] (HIVE-24432) Delete Notification Events in Batches

2020-12-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24432?focusedWorklogId=522071=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522071
 ]

ASF GitHub Bot logged work on HIVE-24432:
-

Author: ASF GitHub Bot
Created on: 09/Dec/20 06:18
Start Date: 09/Dec/20 06:18
Worklog Time Spent: 10m 
  Work Description: aasha commented on pull request #1710:
URL: https://github.com/apache/hive/pull/1710#issuecomment-741558077


   Please add a test. The code looks good to me.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 522071)
Time Spent: 1h 40m  (was: 1.5h)

> Delete Notification Events in Batches
> -
>
> Key: HIVE-24432
> URL: https://issues.apache.org/jira/browse/HIVE-24432
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Notification events are loaded in batches (reduces memory pressure on the 
> HMS), but all of the deletes happen under a single transactions and, when 
> deleting many records, can put a lot of pressure on the backend database.
> Instead, delete events in batches (in different transactions) as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (HIVE-24510) Vectorize compute_bit_vector

2020-12-08 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-24510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-24510 started by Mustafa İman.
---
> Vectorize compute_bit_vector
> 
>
> Key: HIVE-24510
> URL: https://issues.apache.org/jira/browse/HIVE-24510
> Project: Hive
>  Issue Type: Improvement
>Reporter: Mustafa İman
>Assignee: Mustafa İman
>Priority: Major
>
> After https://issues.apache.org/jira/browse/HIVE-23530 , almost all compute 
> stats functions are vectorizable. Only function that is not vectorizable is 
> "compute_bit_vector" for ndv statistics computation. This causes "create 
> table as select" and "insert overwrite select" queries to run in 
> non-vectorized mode. 
> Even a very naive implementation of vectorized compute_bit_vector gives about 
> 50% performance improvement on simple "insert overwrite select" queries. That 
> is because entire mapper or reducer can run in vectorized mode.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24510) Vectorize compute_bit_vector

2020-12-08 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-24510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mustafa İman reassigned HIVE-24510:
---


> Vectorize compute_bit_vector
> 
>
> Key: HIVE-24510
> URL: https://issues.apache.org/jira/browse/HIVE-24510
> Project: Hive
>  Issue Type: Improvement
>Reporter: Mustafa İman
>Assignee: Mustafa İman
>Priority: Major
>
> After https://issues.apache.org/jira/browse/HIVE-23530 , almost all compute 
> stats functions are vectorizable. Only function that is not vectorizable is 
> "compute_bit_vector" for ndv statistics computation. This causes "create 
> table as select" and "insert overwrite select" queries to run in 
> non-vectorized mode. 
> Even a very naive implementation of vectorized compute_bit_vector gives about 
> 50% performance improvement on simple "insert overwrite select" queries. That 
> is because entire mapper or reducer can run in vectorized mode.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24197) Check for write transactions for the db under replication at a frequent interval

2020-12-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24197?focusedWorklogId=521968=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-521968
 ]

ASF GitHub Bot logged work on HIVE-24197:
-

Author: ASF GitHub Bot
Created on: 09/Dec/20 00:49
Start Date: 09/Dec/20 00:49
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on pull request #1523:
URL: https://github.com/apache/hive/pull/1523#issuecomment-741342755


   This pull request has been automatically marked as stale because it has not 
had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the d...@hive.apache.org list if the patch is in 
need of reviews.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 521968)
Time Spent: 20m  (was: 10m)

> Check for write transactions for the db under replication at a frequent 
> interval
> 
>
> Key: HIVE-24197
> URL: https://issues.apache.org/jira/browse/HIVE-24197
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24197.01.patch, HIVE-24197.02.patch, 
> HIVE-24197.03.patch, HIVE-24197.04.patch, HIVE-24197.05.patch, 
> HIVE-24197.06.patch, HIVE-24197.07.patch, HIVE-24197.08.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24244) NPE during Atlas metadata replication

2020-12-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24244?focusedWorklogId=521967=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-521967
 ]

ASF GitHub Bot logged work on HIVE-24244:
-

Author: ASF GitHub Bot
Created on: 09/Dec/20 00:49
Start Date: 09/Dec/20 00:49
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on pull request #1563:
URL: https://github.com/apache/hive/pull/1563#issuecomment-741342691


   This pull request has been automatically marked as stale because it has not 
had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the d...@hive.apache.org list if the patch is in 
need of reviews.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 521967)
Time Spent: 1h 10m  (was: 1h)

> NPE during Atlas metadata replication
> -
>
> Key: HIVE-24244
> URL: https://issues.apache.org/jira/browse/HIVE-24244
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24244.01.patch
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24208) LLAP: query job stuck due to race conditions

2020-12-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24208?focusedWorklogId=521966=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-521966
 ]

ASF GitHub Bot logged work on HIVE-24208:
-

Author: ASF GitHub Bot
Created on: 09/Dec/20 00:49
Start Date: 09/Dec/20 00:49
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on pull request #1534:
URL: https://github.com/apache/hive/pull/1534#issuecomment-741342741


   This pull request has been automatically marked as stale because it has not 
had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the d...@hive.apache.org list if the patch is in 
need of reviews.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 521966)
Time Spent: 40m  (was: 0.5h)

> LLAP: query job stuck due to race conditions
> 
>
> Key: HIVE-24208
> URL: https://issues.apache.org/jira/browse/HIVE-24208
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.3.4
>Reporter: Yuriy Baltovskyy
>Assignee: Yuriy Baltovskyy
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> When issuing an LLAP query, sometimes the TEZ job on LLAP server never ends 
> and it never returns the data reader.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24503) Optimize vector row serde by avoiding type check at run time

2020-12-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24503?focusedWorklogId=521963=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-521963
 ]

ASF GitHub Bot logged work on HIVE-24503:
-

Author: ASF GitHub Bot
Created on: 09/Dec/20 00:45
Start Date: 09/Dec/20 00:45
Worklog Time Spent: 10m 
  Work Description: rbalamohan commented on a change in pull request #1753:
URL: https://github.com/apache/hive/pull/1753#discussion_r538916121



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSerializeRow.java
##
@@ -61,27 +61,16 @@
   private Field root;
 
   private static class Field {
-Field[] children;
-
-boolean isPrimitive;
-Category category;
-PrimitiveCategory primitiveCategory;
-TypeInfo typeInfo;
-
-int count;
-
-ObjectInspector objectInspector;
-int outputColumnNum;
-
+Field[] children = null;
+boolean isPrimitive = false;
+Category category = null;
+PrimitiveCategory primitiveCategory = null;
+TypeInfo typeInfo = null;
+int count = 0;
+ObjectInspector objectInspector = null;
+int outputColumnNum = -1;
+VectorSerializeWriter writer = null;
 Field() {

Review comment:
   Can be removed.

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorDeserializeRow.java
##
@@ -933,12 +1207,20 @@ private void storeUnionRowColumn(ColumnVector colVector,
 unionColVector.isNull[batchIndex] = false;
 unionColVector.tags[batchIndex] = tag;
 
-storeComplexFieldRowColumn(
+deserializer.storeComplexFieldRowColumn(
 colVectorFields[tag],
 unionHelper.getFields()[tag],
 batchIndex,
 canRetainByteRef);
-deserializeRead.finishComplexVariableFieldsType();
+deserializer.deserializeRead.finishComplexVariableFieldsType();
+  }
+
+  abstract static class VectorBatchDeserializer {
+abstract void store(ColumnVector colVector, Field field, int batchIndex, 
boolean canRetainByteRef,
+VectorDeserializeRow deserializer) throws 
IOException;

Review comment:
   Why VectorDeserializerRow needs be passed here again? ("this" references 
in other places as well). If you remove "static" class declaration in 
VectorBatchDeserializer children, you may not need to pass this.  And the patch 
would become lot lesser changes?

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSerializeRow.java
##
@@ -274,44 +315,25 @@ private void serializeWrite(
   return;
 }
 isAllNulls = false;
+field.writer.serialize(colVector, field, adjustedBatchIndex, this);
+  }
 
-if (field.isPrimitive) {
-  serializePrimitiveWrite(colVector, field, adjustedBatchIndex);
-  return;
-}
-final Category category = field.category;
-switch (category) {
-case LIST:
-  serializeListWrite(
-  (ListColumnVector) colVector,
-  field,
-  adjustedBatchIndex);
-  break;
-case MAP:
-  serializeMapWrite(
-  (MapColumnVector) colVector,
-  field,
-  adjustedBatchIndex);
-  break;
-case STRUCT:
-  serializeStructWrite(
-  (StructColumnVector) colVector,
-  field,
-  adjustedBatchIndex);
-  break;
-case UNION:
-  serializeUnionWrite(
-  (UnionColumnVector) colVector,
-  field,
-  adjustedBatchIndex);
-  break;
-default:
-  throw new RuntimeException("Unexpected category " + category);
+  abstract static class VectorSerializeWriter {
+abstract void serialize(Object colVector, Field field, int 
adjustedBatchIndex,
+VectorSerializeRow serializeRow) throws 
IOException;

Review comment:
   Same as earlier. VectorSerializeRow need not be passed here. Patch may 
need lesser changes if you remove static declaration on children.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 521963)
Time Spent: 20m  (was: 10m)

> Optimize vector row serde by avoiding type check at run time 
> -
>
> Key: HIVE-24503
> URL: https://issues.apache.org/jira/browse/HIVE-24503
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> 

[jira] [Commented] (HIVE-24482) Advance write Id during AlterTableAddConstraint DDL

2020-12-08 Thread Kishen Das (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246213#comment-17246213
 ] 

Kishen Das commented on HIVE-24482:
---

https://github.com/apache/hive/pull/1737

> Advance write Id during AlterTableAddConstraint DDL
> ---
>
> Key: HIVE-24482
> URL: https://issues.apache.org/jira/browse/HIVE-24482
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Kishen Das
>Assignee: Kishen Das
>Priority: Major
>
> For AlterTableAddConstraint related DDL tasks, although we might be advancing 
> the write ID, looks like it's not updated correctly during the Analyzer 
> phase. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24509) Move show specific codes under DDL and cut MetaDataFormatter classes to pieces

2020-12-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24509?focusedWorklogId=521932=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-521932
 ]

ASF GitHub Bot logged work on HIVE-24509:
-

Author: ASF GitHub Bot
Created on: 08/Dec/20 23:23
Start Date: 08/Dec/20 23:23
Worklog Time Spent: 10m 
  Work Description: miklosgergely opened a new pull request #1756:
URL: https://github.com/apache/hive/pull/1756


   ### What changes were proposed in this pull request?
   Move the codes used by show commands only next to the classes processing 
those commands.
   
   ### Why are the changes needed?
   Move the codes from org.apache.hadoop.hive.ql.metadata.formatting to the 
show command related direcctories, cutting them to pieces to their specific 
commands, or the utility classes if used by multiple commands.
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   All the unit tests and q tests are still running.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 521932)
Remaining Estimate: 0h
Time Spent: 10m

> Move show specific codes under DDL and cut MetaDataFormatter classes to pieces
> --
>
> Key: HIVE-24509
> URL: https://issues.apache.org/jira/browse/HIVE-24509
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Lot of show ... specific codes are under the  
> org.apache.hadoop.hive.ql.metadata.formatting package which are used only by 
> these commands. Also the two MetaDataFormatters (JsonMetaDataFormatter, 
> TextMetaDataFormatter) are trying to do everything, while they contain a lot 
> of code duplications. Their functionalities should be put under the 
> directories of the appropriate show commands.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24509) Move show specific codes under DDL and cut MetaDataFormatter classes to pieces

2020-12-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24509:
--
Labels: pull-request-available  (was: )

> Move show specific codes under DDL and cut MetaDataFormatter classes to pieces
> --
>
> Key: HIVE-24509
> URL: https://issues.apache.org/jira/browse/HIVE-24509
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Lot of show ... specific codes are under the  
> org.apache.hadoop.hive.ql.metadata.formatting package which are used only by 
> these commands. Also the two MetaDataFormatters (JsonMetaDataFormatter, 
> TextMetaDataFormatter) are trying to do everything, while they contain a lot 
> of code duplications. Their functionalities should be put under the 
> directories of the appropriate show commands.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24509) Move show specific codes under DDL and cut MetaDataFormatter classes to pieces

2020-12-08 Thread Miklos Gergely (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Gergely reassigned HIVE-24509:
-


> Move show specific codes under DDL and cut MetaDataFormatter classes to pieces
> --
>
> Key: HIVE-24509
> URL: https://issues.apache.org/jira/browse/HIVE-24509
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
>
> Lot of show ... specific codes are under the  
> org.apache.hadoop.hive.ql.metadata.formatting package which are used only by 
> these commands. Also the two MetaDataFormatters (JsonMetaDataFormatter, 
> TextMetaDataFormatter) are trying to do everything, while they contain a lot 
> of code duplications. Their functionalities should be put under the 
> directories of the appropriate show commands.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-23410) ACID: Improve the delete and update operations to avoid the move step

2020-12-08 Thread Marta Kuczora (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246181#comment-17246181
 ] 

Marta Kuczora edited comment on HIVE-23410 at 12/8/20, 11:09 PM:
-

Pushed to master.
Thanks a lot [~pvary] for the review!!


was (Author: kuczoram):
Pushed to master.
Thanks a lot @pvary for the review!!

> ACID: Improve the delete and update operations to avoid the move step
> -
>
> Key: HIVE-23410
> URL: https://issues.apache.org/jira/browse/HIVE-23410
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0
>Reporter: Marta Kuczora
>Assignee: Marta Kuczora
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23410.1.patch
>
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> This is a follow-up task for 
> [HIVE-21164|https://issues.apache.org/jira/browse/HIVE-21164], where the 
> insert operation has been modified to write directly to the table locations 
> instead of the staging directory. The same improvement should be done for the 
> ACID update and delete operations as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23410) ACID: Improve the delete and update operations to avoid the move step

2020-12-08 Thread Marta Kuczora (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marta Kuczora updated HIVE-23410:
-
Fix Version/s: 4.0.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> ACID: Improve the delete and update operations to avoid the move step
> -
>
> Key: HIVE-23410
> URL: https://issues.apache.org/jira/browse/HIVE-23410
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0
>Reporter: Marta Kuczora
>Assignee: Marta Kuczora
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-23410.1.patch
>
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> This is a follow-up task for 
> [HIVE-21164|https://issues.apache.org/jira/browse/HIVE-21164], where the 
> insert operation has been modified to write directly to the table locations 
> instead of the staging directory. The same improvement should be done for the 
> ACID update and delete operations as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23410) ACID: Improve the delete and update operations to avoid the move step

2020-12-08 Thread Marta Kuczora (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246181#comment-17246181
 ] 

Marta Kuczora commented on HIVE-23410:
--

Pushed to master.
Thanks a lot @pvary for the review!!

> ACID: Improve the delete and update operations to avoid the move step
> -
>
> Key: HIVE-23410
> URL: https://issues.apache.org/jira/browse/HIVE-23410
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0
>Reporter: Marta Kuczora
>Assignee: Marta Kuczora
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23410.1.patch
>
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> This is a follow-up task for 
> [HIVE-21164|https://issues.apache.org/jira/browse/HIVE-21164], where the 
> insert operation has been modified to write directly to the table locations 
> instead of the staging directory. The same improvement should be done for the 
> ACID update and delete operations as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23410) ACID: Improve the delete and update operations to avoid the move step

2020-12-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23410?focusedWorklogId=521927=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-521927
 ]

ASF GitHub Bot logged work on HIVE-23410:
-

Author: ASF GitHub Bot
Created on: 08/Dec/20 23:08
Start Date: 08/Dec/20 23:08
Worklog Time Spent: 10m 
  Work Description: kuczoram merged pull request #1660:
URL: https://github.com/apache/hive/pull/1660


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 521927)
Time Spent: 4h  (was: 3h 50m)

> ACID: Improve the delete and update operations to avoid the move step
> -
>
> Key: HIVE-23410
> URL: https://issues.apache.org/jira/browse/HIVE-23410
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0
>Reporter: Marta Kuczora
>Assignee: Marta Kuczora
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23410.1.patch
>
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> This is a follow-up task for 
> [HIVE-21164|https://issues.apache.org/jira/browse/HIVE-21164], where the 
> insert operation has been modified to write directly to the table locations 
> instead of the staging directory. The same improvement should be done for the 
> ACID update and delete operations as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24497) Node heartbeats from LLAP Daemon to the client are not matching leading to timeout.

2020-12-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24497?focusedWorklogId=521865=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-521865
 ]

ASF GitHub Bot logged work on HIVE-24497:
-

Author: ASF GitHub Bot
Created on: 08/Dec/20 20:40
Start Date: 08/Dec/20 20:40
Worklog Time Spent: 10m 
  Work Description: simhadri-g opened a new pull request #1755:
URL: https://github.com/apache/hive/pull/1755


   …tching leading to timeout in cloud deplyoment
   
   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   Node heartbeat contains info about all the tasks that were submitted to that 
LLAP Daemon. In cloud deployment, the client is not able to match this 
heartbeats due to differences in hostname and port resulting in timeout as see 
below with the following log .
   
   20/07/24 03:27:03 INFO ext.LlapTaskUmbilicalExternalClient: No tasks found 
for heartbeat from hostname executor-host-0.executor-host-0.cluster.local, port 
25000
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 521865)
Remaining Estimate: 0h
Time Spent: 10m

> Node heartbeats from LLAP Daemon to the client are not matching leading to 
> timeout.
> ---
>
> Key: HIVE-24497
> URL: https://issues.apache.org/jira/browse/HIVE-24497
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Simhadri G
>Assignee: Simhadri G
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Node heartbeat contains info about all the tasks that were submitted to that 
> LLAP Daemon. In cloud deployment, the client is not able to match this 
> heartbeats due to differences in hostname and port .



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24497) Node heartbeats from LLAP Daemon to the client are not matching leading to timeout.

2020-12-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24497:
--
Labels: pull-request-available  (was: )

> Node heartbeats from LLAP Daemon to the client are not matching leading to 
> timeout.
> ---
>
> Key: HIVE-24497
> URL: https://issues.apache.org/jira/browse/HIVE-24497
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Simhadri G
>Assignee: Simhadri G
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Node heartbeat contains info about all the tasks that were submitted to that 
> LLAP Daemon. In cloud deployment, the client is not able to match this 
> heartbeats due to differences in hostname and port .



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24508) hive.parquet.timestamp.skip.conversion doesn't work

2020-12-08 Thread wenjun ma (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wenjun ma reassigned HIVE-24508:



> hive.parquet.timestamp.skip.conversion doesn't work
> ---
>
> Key: HIVE-24508
> URL: https://issues.apache.org/jira/browse/HIVE-24508
> Project: Hive
>  Issue Type: Bug
>  Components: Parquet
>Reporter: wenjun ma
>Assignee: wenjun ma
>Priority: Major
> Fix For: All Versions
>
>
> Even we set true or false. When we insert the current timestamp it always 
> uses the local time zone. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23410) ACID: Improve the delete and update operations to avoid the move step

2020-12-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23410?focusedWorklogId=521823=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-521823
 ]

ASF GitHub Bot logged work on HIVE-23410:
-

Author: ASF GitHub Bot
Created on: 08/Dec/20 19:21
Start Date: 08/Dec/20 19:21
Worklog Time Spent: 10m 
  Work Description: kuczoram commented on pull request #1660:
URL: https://github.com/apache/hive/pull/1660#issuecomment-740890764


   Thanks a lot for the review @pvary and @pvargacl ! :)
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 521823)
Time Spent: 3h 50m  (was: 3h 40m)

> ACID: Improve the delete and update operations to avoid the move step
> -
>
> Key: HIVE-23410
> URL: https://issues.apache.org/jira/browse/HIVE-23410
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0
>Reporter: Marta Kuczora
>Assignee: Marta Kuczora
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23410.1.patch
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> This is a follow-up task for 
> [HIVE-21164|https://issues.apache.org/jira/browse/HIVE-21164], where the 
> insert operation has been modified to write directly to the table locations 
> instead of the staging directory. The same improvement should be done for the 
> ACID update and delete operations as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-11266) count(*) wrong result based on table statistics for external tables

2020-12-08 Thread Tristan Stevens (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-11266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246083#comment-17246083
 ] 

Tristan Stevens commented on HIVE-11266:


[~findepi] this is true however with managed (i.e. non external) tables then 
modifying the underlying data without performing a REFRESH is not supported. 
With external tables however it is expected behaviour. This is essentially the 
definition of MANAGED vs. EXTERNAL.

> count(*) wrong result based on table statistics for external tables
> ---
>
> Key: HIVE-11266
> URL: https://issues.apache.org/jira/browse/HIVE-11266
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Simone Battaglia
>Assignee: Jesus Camacho Rodriguez
>Priority: Blocker
> Fix For: 3.0.0
>
> Attachments: HIVE-11266.01.patch, HIVE-11266.patch
>
>
> Hive returns wrong count result on an external table with table statistics if 
> I change table data files.
> This is the scenario in details:
> 1) create external table my_table (...) location 'my_location';
> 2) analyze table my_table compute statistics;
> 3) change/add/delete one or more files in 'my_location' directory;
> 4) select count(\*) from my_table;
> In this case the count query doesn't generate a MR job and returns the result 
> based on table statistics. This result is wrong because is based on 
> statistics stored in the Hive metastore and doesn't take into account 
> modifications introduced on data files.
> Obviously setting "hive.compute.query.using.stats" to FALSE this problem 
> doesn't occur but the default value of this property is TRUE.
> I thinks that also this post on stackoverflow, that shows another type of bug 
> in case of multiple insert, is related to the one that I reported:
> http://stackoverflow.com/questions/24080276/wrong-result-for-count-in-hive-table



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24500) Hive - upgrade log4j 2.12.1 to 2.13.2+ due to CVE-2020-9488

2020-12-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24500:
--
Labels: pull-request-available  (was: )

> Hive - upgrade log4j 2.12.1 to 2.13.2+ due to CVE-2020-9488
> ---
>
> Key: HIVE-24500
> URL: https://issues.apache.org/jira/browse/HIVE-24500
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Hive is pulling in log4j 2.12.1 specifically to:
>  * ./usr/lib/hive/lib/log4j-core-2.12.1.jar
> CVE-2020-9488 affects this version and the fix is to upgrade to 2.13.2+. So, 
> upgrade this dependency.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24500) Hive - upgrade log4j 2.12.1 to 2.13.2+ due to CVE-2020-9488

2020-12-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24500?focusedWorklogId=521784=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-521784
 ]

ASF GitHub Bot logged work on HIVE-24500:
-

Author: ASF GitHub Bot
Created on: 08/Dec/20 17:48
Start Date: 08/Dec/20 17:48
Worklog Time Spent: 10m 
  Work Description: saihemanth-cloudera opened a new pull request #1754:
URL: https://github.com/apache/hive/pull/1754


   …CVE-2020-9488
   
   
   
   ### What changes were proposed in this pull request?
   Changing the log4j version in the pom to 2.13.2.
   
   
   
   ### Why are the changes needed?
   To avoid CVE-2020-9488
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   
   
   ### How was this patch tested?
   Locally.
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 521784)
Remaining Estimate: 0h
Time Spent: 10m

> Hive - upgrade log4j 2.12.1 to 2.13.2+ due to CVE-2020-9488
> ---
>
> Key: HIVE-24500
> URL: https://issues.apache.org/jira/browse/HIVE-24500
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Hive is pulling in log4j 2.12.1 specifically to:
>  * ./usr/lib/hive/lib/log4j-core-2.12.1.jar
> CVE-2020-9488 affects this version and the fix is to upgrade to 2.13.2+. So, 
> upgrade this dependency.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24507) "File file:XXX.jar does not exist" when changing content of "hive.reloadable.aux.jars.path" directories

2020-12-08 Thread Arnaud Linz (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arnaud Linz updated HIVE-24507:
---
Description: 
Purpose of hive.reloadable.aux.jars.path, introduced by 
https://issues.apache.org/jira/browse/HIVE-7553 was do avoid scheduling 
maintenance window for every jar change, but it is not enough.

On a large system, the lack of atomicity between the directory listing of jars 
contained in hive.reloadable.aux.jars.path and the actual use of the file when 
uploaded to the job's yarn resources may lead to query failures, even if no 
jar/UDF is used in the failing query (because it is a global parameter).

Stack trace sample:
{code:java}
 File file:/XXX.jar does not exist
   at 
org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:641)
   at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:867)
   at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:631)
   at 
org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:442)
   at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:378)
   at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:329)
   at 
org.apache.hadoop.mapreduce.JobResourceUploader.copyRemoteFiles(JobResourceUploader.java:703)
   at 
org.apache.hadoop.mapreduce.JobResourceUploader.uploadLibJars(JobResourceUploader.java:315)
   at 
org.apache.hadoop.mapreduce.JobResourceUploader.uploadResourcesInternal(JobResourceUploader.java:207)
   at 
org.apache.hadoop.mapreduce.JobResourceUploader.uploadResources(JobResourceUploader.java:135)
   at 
org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:99)
   at 
org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:194)
   at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1570)
   at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1567)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:422)
   at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1567)
   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:576)
   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:571)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:422)
   at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
   at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:571)
   at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:562)
   at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:444)
   at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:151)
   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199)
   at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97)
   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2200)
   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1843)
   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1563)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1339)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1334)
   at 
org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:256)
   at 
org.apache.hive.service.cli.operation.SQLOperation.access$600(SQLOperation.java:92)
   at 
org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:345)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:422)
   at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
   at 
org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:357)
   at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   at java.lang.Thread.run(Thread.java:748)
{code}
 

It's probably not possible to achieve atomicity, but this lack of atomicity 
should be taken into account and this error should be a warning. Actually, if a 
jar is removed, it's probably because no query are using it any longer. And if 
it was really used, it will trigger another ClassNotFound error later that, 
with the warning log, can suffice.

 

 

  was:
Purpose of hive.reloadable.aux.jars.path, introduced by 
https://issues.apache.org/jira/browse/HIVE-7553 was do avoid scheduling 
maintenance window for every jar change, but it is not enough.

On a large system, the lack of atomicity between the directory listing of 

[jira] [Work logged] (HIVE-24470) Separate HiveMetastore Thrift and Driver logic

2020-12-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24470?focusedWorklogId=521758=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-521758
 ]

ASF GitHub Bot logged work on HIVE-24470:
-

Author: ASF GitHub Bot
Created on: 08/Dec/20 16:22
Start Date: 08/Dec/20 16:22
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on pull request #1740:
URL: https://github.com/apache/hive/pull/1740#issuecomment-740734192


   since I'm not the PR creator, we'll need to wait with this for @Noremac201. 
Thanks for the tips!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 521758)
Time Spent: 1h 50m  (was: 1h 40m)

> Separate HiveMetastore Thrift and Driver logic
> --
>
> Key: HIVE-24470
> URL: https://issues.apache.org/jira/browse/HIVE-24470
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Cameron Moberg
>Assignee: Cameron Moberg
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> In the file HiveMetastore.java the majority of the code is a thrift interface 
> rather than the actual logic behind starting hive metastore, this should be 
> moved out into a separate file to clean up the file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23410) ACID: Improve the delete and update operations to avoid the move step

2020-12-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23410?focusedWorklogId=521755=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-521755
 ]

ASF GitHub Bot logged work on HIVE-23410:
-

Author: ASF GitHub Bot
Created on: 08/Dec/20 16:19
Start Date: 08/Dec/20 16:19
Worklog Time Spent: 10m 
  Work Description: kuczoram commented on a change in pull request #1660:
URL: https://github.com/apache/hive/pull/1660#discussion_r538556544



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java
##
@@ -232,9 +236,25 @@ public void closeWriters(boolean abort) throws 
HiveException {
   for (int i = 0; i < updaters.length; i++) {
 if (updaters[i] != null) {
   SerDeStats stats = updaters[i].getStats();
-  // Ignore 0 row files except in case of insert overwrite
-  if (isDirectInsert && (stats.getRowCount() > 0 || 
isInsertOverwrite)) {
-outPathsCommitted[i] = updaters[i].getUpdatedFilePath();
+  // Ignore 0 row files except in case of insert overwrite or delete 
or update
+  if (isDirectInsert
+  && (stats.getRowCount() > 0 || isInsertOverwrite || 
AcidUtils.Operation.DELETE.equals(acidOperation)
+  || AcidUtils.Operation.UPDATE.equals(acidOperation))) {
+// In case of delete operation, the deleteFilePath has to be used, 
not the updatedFilePath
+// In case of update operation, we need both paths. The 
updateFilePath will be added
+// to the outPathsCommitted array and the deleteFilePath will be 
collected in a separate list.
+OrcRecordUpdater recordUpdater = (OrcRecordUpdater) updaters[i];
+outPathsCommitted[i] = recordUpdater.getUpdatedFilePath();

Review comment:
   Sure! Fixed





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 521755)
Time Spent: 3h 40m  (was: 3.5h)

> ACID: Improve the delete and update operations to avoid the move step
> -
>
> Key: HIVE-23410
> URL: https://issues.apache.org/jira/browse/HIVE-23410
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0
>Reporter: Marta Kuczora
>Assignee: Marta Kuczora
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23410.1.patch
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> This is a follow-up task for 
> [HIVE-21164|https://issues.apache.org/jira/browse/HIVE-21164], where the 
> insert operation has been modified to write directly to the table locations 
> instead of the staging directory. The same improvement should be done for the 
> ACID update and delete operations as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24507) "File file:XXX.jar does not exist" when changing content of "hive.reloadable.aux.jars.path" directories

2020-12-08 Thread Arnaud Linz (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arnaud Linz updated HIVE-24507:
---
Summary: "File file:XXX.jar does not exist" when changing content of 
"hive.reloadable.aux.jars.path" directories  (was: "File file:XXX.jar does not 
exist" when changing content of "hive.reloadable.aux.jars.path" directory 
content)

> "File file:XXX.jar does not exist" when changing content of 
> "hive.reloadable.aux.jars.path" directories
> ---
>
> Key: HIVE-24507
> URL: https://issues.apache.org/jira/browse/HIVE-24507
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 2.1.1
>Reporter: Arnaud Linz
>Priority: Major
>
> Purpose of hive.reloadable.aux.jars.path, introduced by 
> https://issues.apache.org/jira/browse/HIVE-7553 was do avoid scheduling 
> maintenance window for every jar change, but it is not enough.
> On a large system, the lack of atomicity between the directory listing of 
> jars contained in hive.reloadable.aux.jars.path and the actual use of the 
> file when uploaded to the job's yarn resources lead to query failures, even 
> if no jar/UDF is used in the failing query (because it is a global parameter).
> Stack trace sample:
> {code:java}
>  File file:/XXX.jar does not exist
>at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:641)
>at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:867)
>at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:631)
>at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:442)
>at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:378)
>at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:329)
>at 
> org.apache.hadoop.mapreduce.JobResourceUploader.copyRemoteFiles(JobResourceUploader.java:703)
>at 
> org.apache.hadoop.mapreduce.JobResourceUploader.uploadLibJars(JobResourceUploader.java:315)
>at 
> org.apache.hadoop.mapreduce.JobResourceUploader.uploadResourcesInternal(JobResourceUploader.java:207)
>at 
> org.apache.hadoop.mapreduce.JobResourceUploader.uploadResources(JobResourceUploader.java:135)
>at 
> org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:99)
>at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:194)
>at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1570)
>at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1567)
>at java.security.AccessController.doPrivileged(Native Method)
>at javax.security.auth.Subject.doAs(Subject.java:422)
>at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
>at org.apache.hadoop.mapreduce.Job.submit(Job.java:1567)
>at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:576)
>at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:571)
>at java.security.AccessController.doPrivileged(Native Method)
>at javax.security.auth.Subject.doAs(Subject.java:422)
>at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
>at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:571)
>at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:562)
>at 
> org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:444)
>at 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:151)
>at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199)
>at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97)
>at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2200)
>at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1843)
>at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1563)
>at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1339)
>at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1334)
>at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:256)
>at 
> org.apache.hive.service.cli.operation.SQLOperation.access$600(SQLOperation.java:92)
>at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:345)
>at java.security.AccessController.doPrivileged(Native Method)
>at javax.security.auth.Subject.doAs(Subject.java:422)
>at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
>at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:357)
>at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>at 

[jira] [Updated] (HIVE-24507) "File file:XXX.jar does not exist" when changing content of "hive.reloadable.aux.jars.path" directory content

2020-12-08 Thread Arnaud Linz (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arnaud Linz updated HIVE-24507:
---
Description: 
Purpose of hive.reloadable.aux.jars.path, introduced by 
https://issues.apache.org/jira/browse/HIVE-7553 was do avoid scheduling 
maintenance window for every jar change, but it is not enough.

On a large system, the lack of atomicity between the directory listing of jars 
contained in hive.reloadable.aux.jars.path and the actual use of the file when 
uploaded to the job's yarn resources lead to query failures, even if no jar/UDF 
is used in the failing query (because it is a global parameter).

Stack trace sample:
{code:java}
 File file:/XXX.jar does not exist
   at 
org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:641)
   at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:867)
   at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:631)
   at 
org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:442)
   at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:378)
   at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:329)
   at 
org.apache.hadoop.mapreduce.JobResourceUploader.copyRemoteFiles(JobResourceUploader.java:703)
   at 
org.apache.hadoop.mapreduce.JobResourceUploader.uploadLibJars(JobResourceUploader.java:315)
   at 
org.apache.hadoop.mapreduce.JobResourceUploader.uploadResourcesInternal(JobResourceUploader.java:207)
   at 
org.apache.hadoop.mapreduce.JobResourceUploader.uploadResources(JobResourceUploader.java:135)
   at 
org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:99)
   at 
org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:194)
   at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1570)
   at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1567)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:422)
   at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1567)
   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:576)
   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:571)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:422)
   at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
   at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:571)
   at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:562)
   at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:444)
   at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:151)
   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199)
   at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97)
   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2200)
   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1843)
   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1563)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1339)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1334)
   at 
org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:256)
   at 
org.apache.hive.service.cli.operation.SQLOperation.access$600(SQLOperation.java:92)
   at 
org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:345)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:422)
   at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
   at 
org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:357)
   at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   at java.lang.Thread.run(Thread.java:748)
{code}
 

It's probably not possible to achieve atomicity, but this lack of atomicity 
should be taken into account and this error should be a warning. Actually, if a 
jar is removed, it's probably because no query are using it any longer. And if 
it was really used, it will trigger another ClassNotFound error later that, 
with the warning log, can suffice.

 

 

  was:
Purpose of hive.reloadable.aux.jars.path, introduced by 
https://issues.apache.org/jira/browse/HIVE-7553 was do avoid scheduling 
maintenance window for every jar change, but it is not enough.

On a large system, the lack of atomicity between the directory listing of jars 

[jira] [Work logged] (HIVE-23410) ACID: Improve the delete and update operations to avoid the move step

2020-12-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23410?focusedWorklogId=521751=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-521751
 ]

ASF GitHub Bot logged work on HIVE-23410:
-

Author: ASF GitHub Bot
Created on: 08/Dec/20 15:53
Start Date: 08/Dec/20 15:53
Worklog Time Spent: 10m 
  Work Description: kuczoram commented on a change in pull request #1660:
URL: https://github.com/apache/hive/pull/1660#discussion_r538526877



##
File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java
##
@@ -563,6 +564,21 @@ else if (filename.startsWith(BUCKET_PREFIX)) {
 return result;
   }
 
+  public static Map getDeltaToAttemptIdMap(

Review comment:
   Done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 521751)
Time Spent: 3.5h  (was: 3h 20m)

> ACID: Improve the delete and update operations to avoid the move step
> -
>
> Key: HIVE-23410
> URL: https://issues.apache.org/jira/browse/HIVE-23410
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0
>Reporter: Marta Kuczora
>Assignee: Marta Kuczora
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23410.1.patch
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> This is a follow-up task for 
> [HIVE-21164|https://issues.apache.org/jira/browse/HIVE-21164], where the 
> insert operation has been modified to write directly to the table locations 
> instead of the staging directory. The same improvement should be done for the 
> ACID update and delete operations as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23410) ACID: Improve the delete and update operations to avoid the move step

2020-12-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23410?focusedWorklogId=521750=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-521750
 ]

ASF GitHub Bot logged work on HIVE-23410:
-

Author: ASF GitHub Bot
Created on: 08/Dec/20 15:48
Start Date: 08/Dec/20 15:48
Worklog Time Spent: 10m 
  Work Description: kuczoram commented on a change in pull request #1660:
URL: https://github.com/apache/hive/pull/1660#discussion_r538520768



##
File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java
##
@@ -1895,7 +1901,20 @@ public static boolean isSkewedStoredAsDirs(FileSinkDesc 
fsInputDesc) {
   }
 
   if ((srcDir != null) && srcDir.equals(fsopFinalDir)) {
-return mvTsk;
+if (isDirectInsert || isMmFsop) {
+  if (moveTaskId != null && fsoMoveTaskId != null && 
moveTaskId.equals(fsoMoveTaskId)) {
+// If the ACID direct insert is on, the MoveTasks cannot be 
identified by the srcDir as
+// in this case the srcDir is always the root directory of the 
table.
+// We need to consider the ACID write type to identify the 
MoveTasks.
+return mvTsk;
+  }
+  if ((moveTaskId == null || fsoMoveTaskId == null) && 
moveTaskWriteType != null

Review comment:
   There was a test which was failing if this was not there, but since then 
I think I fixed the moveTaskId generation, so cannot be null. It think this is 
not needed. I will remove it and let's see what the tests say.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 521750)
Time Spent: 3h 20m  (was: 3h 10m)

> ACID: Improve the delete and update operations to avoid the move step
> -
>
> Key: HIVE-23410
> URL: https://issues.apache.org/jira/browse/HIVE-23410
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0
>Reporter: Marta Kuczora
>Assignee: Marta Kuczora
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23410.1.patch
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> This is a follow-up task for 
> [HIVE-21164|https://issues.apache.org/jira/browse/HIVE-21164], where the 
> insert operation has been modified to write directly to the table locations 
> instead of the staging directory. The same improvement should be done for the 
> ACID update and delete operations as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23410) ACID: Improve the delete and update operations to avoid the move step

2020-12-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23410?focusedWorklogId=521746=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-521746
 ]

ASF GitHub Bot logged work on HIVE-23410:
-

Author: ASF GitHub Bot
Created on: 08/Dec/20 15:38
Start Date: 08/Dec/20 15:38
Worklog Time Spent: 10m 
  Work Description: kuczoram commented on a change in pull request #1660:
URL: https://github.com/apache/hive/pull/1660#discussion_r538510542



##
File path: ql/src/test/queries/clientpositive/sort_acid.q
##
@@ -16,7 +16,7 @@ explain cbo
 update acidtlb set b=777;
 update acidtlb set b=777;
 
-select * from acidtlb;
+select * from acidtlb order by a;

Review comment:
   Sure, fixed.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 521746)
Time Spent: 3h 10m  (was: 3h)

> ACID: Improve the delete and update operations to avoid the move step
> -
>
> Key: HIVE-23410
> URL: https://issues.apache.org/jira/browse/HIVE-23410
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0
>Reporter: Marta Kuczora
>Assignee: Marta Kuczora
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23410.1.patch
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> This is a follow-up task for 
> [HIVE-21164|https://issues.apache.org/jira/browse/HIVE-21164], where the 
> insert operation has been modified to write directly to the table locations 
> instead of the staging directory. The same improvement should be done for the 
> ACID update and delete operations as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24506) Investigate the materialized_view_create_rewrite_4.q test with direct insert on

2020-12-08 Thread Marta Kuczora (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marta Kuczora updated HIVE-24506:
-
Description: In the materialized_view_create_rewrite_4.q the direct insert 
got turned off, because if it was on, the totalSize of the table alternated 
between two values from run to run. In other test cases this issue was due to 
the order in which the FSOs got the statementIds. Since the direct insert is 
not necessary for materialized views, I turned it off for this test in 
HIVE-23410 and will investigate under this Jira.

> Investigate the materialized_view_create_rewrite_4.q test with direct insert 
> on
> ---
>
> Key: HIVE-24506
> URL: https://issues.apache.org/jira/browse/HIVE-24506
> Project: Hive
>  Issue Type: Task
>Affects Versions: 4.0.0
>Reporter: Marta Kuczora
>Assignee: Marta Kuczora
>Priority: Major
>
> In the materialized_view_create_rewrite_4.q the direct insert got turned off, 
> because if it was on, the totalSize of the table alternated between two 
> values from run to run. In other test cases this issue was due to the order 
> in which the FSOs got the statementIds. Since the direct insert is not 
> necessary for materialized views, I turned it off for this test in HIVE-23410 
> and will investigate under this Jira.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23410) ACID: Improve the delete and update operations to avoid the move step

2020-12-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23410?focusedWorklogId=521745=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-521745
 ]

ASF GitHub Bot logged work on HIVE-23410:
-

Author: ASF GitHub Bot
Created on: 08/Dec/20 15:36
Start Date: 08/Dec/20 15:36
Worklog Time Spent: 10m 
  Work Description: kuczoram commented on a change in pull request #1660:
URL: https://github.com/apache/hive/pull/1660#discussion_r538508071



##
File path: 
ql/src/test/queries/clientpositive/materialized_view_create_rewrite_4.q
##
@@ -3,6 +3,7 @@ set hive.support.concurrency=true;
 set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
 set hive.strict.checks.cartesian.product=false;
 set hive.materializedview.rewriting=true;
+set hive.acid.direct.insert.enabled=false;

Review comment:
   Done: https://issues.apache.org/jira/browse/HIVE-24506





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 521745)
Time Spent: 3h  (was: 2h 50m)

> ACID: Improve the delete and update operations to avoid the move step
> -
>
> Key: HIVE-23410
> URL: https://issues.apache.org/jira/browse/HIVE-23410
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0
>Reporter: Marta Kuczora
>Assignee: Marta Kuczora
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23410.1.patch
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> This is a follow-up task for 
> [HIVE-21164|https://issues.apache.org/jira/browse/HIVE-21164], where the 
> insert operation has been modified to write directly to the table locations 
> instead of the staging directory. The same improvement should be done for the 
> ACID update and delete operations as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24506) Investigate the materialized_view_create_rewrite_4.q test with direct insert on

2020-12-08 Thread Marta Kuczora (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marta Kuczora reassigned HIVE-24506:


Assignee: Marta Kuczora

> Investigate the materialized_view_create_rewrite_4.q test with direct insert 
> on
> ---
>
> Key: HIVE-24506
> URL: https://issues.apache.org/jira/browse/HIVE-24506
> Project: Hive
>  Issue Type: Task
>Affects Versions: 4.0.0
>Reporter: Marta Kuczora
>Assignee: Marta Kuczora
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24506) Investigate the materialized_view_create_rewrite_4.q test with direct insert on

2020-12-08 Thread Marta Kuczora (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marta Kuczora updated HIVE-24506:
-
Affects Version/s: 4.0.0

> Investigate the materialized_view_create_rewrite_4.q test with direct insert 
> on
> ---
>
> Key: HIVE-24506
> URL: https://issues.apache.org/jira/browse/HIVE-24506
> Project: Hive
>  Issue Type: Task
>Affects Versions: 4.0.0
>Reporter: Marta Kuczora
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23410) ACID: Improve the delete and update operations to avoid the move step

2020-12-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23410?focusedWorklogId=521744=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-521744
 ]

ASF GitHub Bot logged work on HIVE-23410:
-

Author: ASF GitHub Bot
Created on: 08/Dec/20 15:32
Start Date: 08/Dec/20 15:32
Worklog Time Spent: 10m 
  Work Description: kuczoram commented on a change in pull request #1660:
URL: https://github.com/apache/hive/pull/1660#discussion_r538502931



##
File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
##
@@ -2694,7 +2699,7 @@ private void constructOneLBLocationMap(FileStatus fSta,
*/
   private Set getValidPartitionsInPath(
   int numDP, int numLB, Path loadPath, Long writeId, int stmtId,
-  boolean isMmTable, boolean isInsertOverwrite, boolean isDirectInsert) 
throws HiveException {
+  boolean isMmTable, boolean isInsertOverwrite, boolean isDirectInsert, 
AcidUtils.Operation operation, Set dynamiPartitionSpecs) throws 
HiveException {

Review comment:
   Fixed





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 521744)
Time Spent: 2h 50m  (was: 2h 40m)

> ACID: Improve the delete and update operations to avoid the move step
> -
>
> Key: HIVE-23410
> URL: https://issues.apache.org/jira/browse/HIVE-23410
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0
>Reporter: Marta Kuczora
>Assignee: Marta Kuczora
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23410.1.patch
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> This is a follow-up task for 
> [HIVE-21164|https://issues.apache.org/jira/browse/HIVE-21164], where the 
> insert operation has been modified to write directly to the table locations 
> instead of the staging directory. The same improvement should be done for the 
> ACID update and delete operations as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23410) ACID: Improve the delete and update operations to avoid the move step

2020-12-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23410?focusedWorklogId=521743=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-521743
 ]

ASF GitHub Bot logged work on HIVE-23410:
-

Author: ASF GitHub Bot
Created on: 08/Dec/20 15:31
Start Date: 08/Dec/20 15:31
Worklog Time Spent: 10m 
  Work Description: kuczoram commented on a change in pull request #1660:
URL: https://github.com/apache/hive/pull/1660#discussion_r538496432



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java
##
@@ -251,7 +271,7 @@ public void closeWriters(boolean abort) throws 
HiveException {
   }
 }
 
-private void commit(FileSystem fs, List commitPaths) throws 
HiveException {
+private void commit(FileSystem fs, List commitPaths, List 
deleteDeltas) throws HiveException {

Review comment:
   I know, but I don't really know a better solution, only if we change the 
internal structures in FileSinkOperator. Like using Lists instead of arrays. 
But this could have unexpected side effects. I am open to try it but I would do 
it under a separate Jira. I create one about investigating this refactoring.
   https://issues.apache.org/jira/browse/HIVE-24505





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 521743)
Time Spent: 2h 40m  (was: 2.5h)

> ACID: Improve the delete and update operations to avoid the move step
> -
>
> Key: HIVE-23410
> URL: https://issues.apache.org/jira/browse/HIVE-23410
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0
>Reporter: Marta Kuczora
>Assignee: Marta Kuczora
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23410.1.patch
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> This is a follow-up task for 
> [HIVE-21164|https://issues.apache.org/jira/browse/HIVE-21164], where the 
> insert operation has been modified to write directly to the table locations 
> instead of the staging directory. The same improvement should be done for the 
> ACID update and delete operations as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24505) Investigate if the arrays in the FileSinkOperator could be replaced by Lists

2020-12-08 Thread Marta Kuczora (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marta Kuczora reassigned HIVE-24505:



> Investigate if the arrays in the FileSinkOperator could be replaced by Lists
> 
>
> Key: HIVE-24505
> URL: https://issues.apache.org/jira/browse/HIVE-24505
> Project: Hive
>  Issue Type: Task
>Affects Versions: 4.0.0
>Reporter: Marta Kuczora
>Assignee: Marta Kuczora
>Priority: Major
>
> The FileSinkOperator uses some array variables, like
> Path[] outPaths;
> Path[] outPathsCommitted;
> Path[] finalPaths;
> RecordWriter[] outWriters;
> RecordUpdater[] updaters;
> Working with these is not always convenient, like when in the 
> createDynamicBucket method, they are extended with elements. Or in case of an 
> UPDATE operation with direct insert on. Then the delete deltas have to be 
> collected separately, because the outPaths array will contain only the 
> inserted deltas. These operations would be much easier with lists.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23410) ACID: Improve the delete and update operations to avoid the move step

2020-12-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23410?focusedWorklogId=521739=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-521739
 ]

ASF GitHub Bot logged work on HIVE-23410:
-

Author: ASF GitHub Bot
Created on: 08/Dec/20 15:26
Start Date: 08/Dec/20 15:26
Worklog Time Spent: 10m 
  Work Description: kuczoram commented on a change in pull request #1660:
URL: https://github.com/apache/hive/pull/1660#discussion_r538496432



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java
##
@@ -251,7 +271,7 @@ public void closeWriters(boolean abort) throws 
HiveException {
   }
 }
 
-private void commit(FileSystem fs, List commitPaths) throws 
HiveException {
+private void commit(FileSystem fs, List commitPaths, List 
deleteDeltas) throws HiveException {

Review comment:
   I know, but I don't really know a better solution, only if we change the 
internal structures in FileSinkOperator. Like using Lists instead of arrays. 
But this could have unexpected side effects. I am open to try it but I would do 
it under a separate Jira. I create one about investigating this refactoring.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 521739)
Time Spent: 2.5h  (was: 2h 20m)

> ACID: Improve the delete and update operations to avoid the move step
> -
>
> Key: HIVE-23410
> URL: https://issues.apache.org/jira/browse/HIVE-23410
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0
>Reporter: Marta Kuczora
>Assignee: Marta Kuczora
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23410.1.patch
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> This is a follow-up task for 
> [HIVE-21164|https://issues.apache.org/jira/browse/HIVE-21164], where the 
> insert operation has been modified to write directly to the table locations 
> instead of the staging directory. The same improvement should be done for the 
> ACID update and delete operations as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-11266) count(*) wrong result based on table statistics for external tables

2020-12-08 Thread Piotr Findeisen (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-11266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17245953#comment-17245953
 ] 

Piotr Findeisen commented on HIVE-11266:


{quote}This is not just external tables - any tables where users are directly 
modifying the underlying data can be impacted by this.
{quote}
 
{quote}Yes, I agree with you, external table is just my personal use 
case.{quote}
 
[~tmgstev] [~simobatt] was there a follow-up issue to this?
>From the attached patch (same as 
>[https://github.com/apache/hive/commit/a2dff9e13acc62ecc0388b3b2e221f26c9184dbb)]
> i see this was fixed for external tables only.
 

> count(*) wrong result based on table statistics for external tables
> ---
>
> Key: HIVE-11266
> URL: https://issues.apache.org/jira/browse/HIVE-11266
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Simone Battaglia
>Assignee: Jesus Camacho Rodriguez
>Priority: Blocker
> Fix For: 3.0.0
>
> Attachments: HIVE-11266.01.patch, HIVE-11266.patch
>
>
> Hive returns wrong count result on an external table with table statistics if 
> I change table data files.
> This is the scenario in details:
> 1) create external table my_table (...) location 'my_location';
> 2) analyze table my_table compute statistics;
> 3) change/add/delete one or more files in 'my_location' directory;
> 4) select count(\*) from my_table;
> In this case the count query doesn't generate a MR job and returns the result 
> based on table statistics. This result is wrong because is based on 
> statistics stored in the Hive metastore and doesn't take into account 
> modifications introduced on data files.
> Obviously setting "hive.compute.query.using.stats" to FALSE this problem 
> doesn't occur but the default value of this property is TRUE.
> I thinks that also this post on stackoverflow, that shows another type of bug 
> in case of multiple insert, is related to the one that I reported:
> http://stackoverflow.com/questions/24080276/wrong-result-for-count-in-hive-table



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23410) ACID: Improve the delete and update operations to avoid the move step

2020-12-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23410?focusedWorklogId=521730=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-521730
 ]

ASF GitHub Bot logged work on HIVE-23410:
-

Author: ASF GitHub Bot
Created on: 08/Dec/20 15:07
Start Date: 08/Dec/20 15:07
Worklog Time Spent: 10m 
  Work Description: kuczoram commented on a change in pull request #1660:
URL: https://github.com/apache/hive/pull/1660#discussion_r538475478



##
File path: ql/src/test/org/apache/hadoop/hive/ql/io/TestAcidInputFormat.java
##
@@ -52,6 +52,8 @@
   @Mock
   private DataInput mockDataInput;
 
+  // IRJUNK IDE TESZTET!!!

Review comment:
   Removed.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 521730)
Time Spent: 2h 20m  (was: 2h 10m)

> ACID: Improve the delete and update operations to avoid the move step
> -
>
> Key: HIVE-23410
> URL: https://issues.apache.org/jira/browse/HIVE-23410
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0
>Reporter: Marta Kuczora
>Assignee: Marta Kuczora
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23410.1.patch
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> This is a follow-up task for 
> [HIVE-21164|https://issues.apache.org/jira/browse/HIVE-21164], where the 
> insert operation has been modified to write directly to the table locations 
> instead of the staging directory. The same improvement should be done for the 
> ACID update and delete operations as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23410) ACID: Improve the delete and update operations to avoid the move step

2020-12-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23410?focusedWorklogId=521729=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-521729
 ]

ASF GitHub Bot logged work on HIVE-23410:
-

Author: ASF GitHub Bot
Created on: 08/Dec/20 15:06
Start Date: 08/Dec/20 15:06
Worklog Time Spent: 10m 
  Work Description: kuczoram commented on a change in pull request #1660:
URL: https://github.com/apache/hive/pull/1660#discussion_r538474101



##
File path: ql/src/java/org/apache/hadoop/hive/ql/QueryPlan.java
##
@@ -189,6 +191,49 @@ public WriteEntity getAcidAnalyzeTable() {
 return acidSinks;
   }
 
+  public Integer getStatementIdForAcidWriteType(long writeId, String 
moveTaskId, AcidUtils.Operation acidOperation, Path path) {
+FileSinkDesc result = null;
+for (FileSinkDesc acidSink : acidSinks) {
+  if (acidOperation.equals(acidSink.getAcidOperation()) && 
path.equals(acidSink.getDestPath())
+  && acidSink.getTableWriteId() == writeId
+  && (moveTaskId == null || acidSink.getMoveTaskId() == null || 
moveTaskId.equals(acidSink.getMoveTaskId( {
+// There is a problem with the union all optimisation. In this case, 
there will be multiple FileSinkOperators
+// with the same operation, writeId and moveTaskId. But one of these 
FSOs doesn't write data and its statementId
+// is not valid, so if this FSO is selected and its statementId is 
returned, the file listing will find nothing.
+// So check the acidSinks and if two of them have the same writeId, 
path and moveTaskId, then return -1 as statementId.
+// Like this, the file listing will find all partitions and files 
correctly.
+if (result != null) {
+  return -1;
+}
+result = acidSink;
+  }
+}
+if (result != null) {
+  return result.getStatementId();
+} else {
+  return -1;
+}
+  }
+
+  public Set getDynamicPartitionSpecs(long writeId, String moveTaskId, 
AcidUtils.Operation acidOperation, Path path) {

Review comment:
   Added it.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 521729)
Time Spent: 2h 10m  (was: 2h)

> ACID: Improve the delete and update operations to avoid the move step
> -
>
> Key: HIVE-23410
> URL: https://issues.apache.org/jira/browse/HIVE-23410
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0
>Reporter: Marta Kuczora
>Assignee: Marta Kuczora
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23410.1.patch
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> This is a follow-up task for 
> [HIVE-21164|https://issues.apache.org/jira/browse/HIVE-21164], where the 
> insert operation has been modified to write directly to the table locations 
> instead of the staging directory. The same improvement should be done for the 
> ACID update and delete operations as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23410) ACID: Improve the delete and update operations to avoid the move step

2020-12-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23410?focusedWorklogId=521722=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-521722
 ]

ASF GitHub Bot logged work on HIVE-23410:
-

Author: ASF GitHub Bot
Created on: 08/Dec/20 14:55
Start Date: 08/Dec/20 14:55
Worklog Time Spent: 10m 
  Work Description: kuczoram commented on a change in pull request #1660:
URL: https://github.com/apache/hive/pull/1660#discussion_r538455465



##
File path: ql/src/java/org/apache/hadoop/hive/ql/QueryPlan.java
##
@@ -189,6 +191,49 @@ public WriteEntity getAcidAnalyzeTable() {
 return acidSinks;
   }
 
+  public Integer getStatementIdForAcidWriteType(long writeId, String 
moveTaskId, AcidUtils.Operation acidOperation, Path path) {
+FileSinkDesc result = null;
+for (FileSinkDesc acidSink : acidSinks) {
+  if (acidOperation.equals(acidSink.getAcidOperation()) && 
path.equals(acidSink.getDestPath())
+  && acidSink.getTableWriteId() == writeId
+  && (moveTaskId == null || acidSink.getMoveTaskId() == null || 
moveTaskId.equals(acidSink.getMoveTaskId( {
+// There is a problem with the union all optimisation. In this case, 
there will be multiple FileSinkOperators

Review comment:
   Yeah, it would be better. Added Java doc.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 521722)
Time Spent: 2h  (was: 1h 50m)

> ACID: Improve the delete and update operations to avoid the move step
> -
>
> Key: HIVE-23410
> URL: https://issues.apache.org/jira/browse/HIVE-23410
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0
>Reporter: Marta Kuczora
>Assignee: Marta Kuczora
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23410.1.patch
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> This is a follow-up task for 
> [HIVE-21164|https://issues.apache.org/jira/browse/HIVE-21164], where the 
> insert operation has been modified to write directly to the table locations 
> instead of the staging directory. The same improvement should be done for the 
> ACID update and delete operations as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23410) ACID: Improve the delete and update operations to avoid the move step

2020-12-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23410?focusedWorklogId=521721=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-521721
 ]

ASF GitHub Bot logged work on HIVE-23410:
-

Author: ASF GitHub Bot
Created on: 08/Dec/20 14:55
Start Date: 08/Dec/20 14:55
Worklog Time Spent: 10m 
  Work Description: kuczoram commented on a change in pull request #1660:
URL: https://github.com/apache/hive/pull/1660#discussion_r538455465



##
File path: ql/src/java/org/apache/hadoop/hive/ql/QueryPlan.java
##
@@ -189,6 +191,49 @@ public WriteEntity getAcidAnalyzeTable() {
 return acidSinks;
   }
 
+  public Integer getStatementIdForAcidWriteType(long writeId, String 
moveTaskId, AcidUtils.Operation acidOperation, Path path) {
+FileSinkDesc result = null;
+for (FileSinkDesc acidSink : acidSinks) {
+  if (acidOperation.equals(acidSink.getAcidOperation()) && 
path.equals(acidSink.getDestPath())
+  && acidSink.getTableWriteId() == writeId
+  && (moveTaskId == null || acidSink.getMoveTaskId() == null || 
moveTaskId.equals(acidSink.getMoveTaskId( {
+// There is a problem with the union all optimisation. In this case, 
there will be multiple FileSinkOperators

Review comment:
   I fixed it.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 521721)
Time Spent: 1h 50m  (was: 1h 40m)

> ACID: Improve the delete and update operations to avoid the move step
> -
>
> Key: HIVE-23410
> URL: https://issues.apache.org/jira/browse/HIVE-23410
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0
>Reporter: Marta Kuczora
>Assignee: Marta Kuczora
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23410.1.patch
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> This is a follow-up task for 
> [HIVE-21164|https://issues.apache.org/jira/browse/HIVE-21164], where the 
> insert operation has been modified to write directly to the table locations 
> instead of the staging directory. The same improvement should be done for the 
> ACID update and delete operations as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23410) ACID: Improve the delete and update operations to avoid the move step

2020-12-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23410?focusedWorklogId=521716=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-521716
 ]

ASF GitHub Bot logged work on HIVE-23410:
-

Author: ASF GitHub Bot
Created on: 08/Dec/20 14:49
Start Date: 08/Dec/20 14:49
Worklog Time Spent: 10m 
  Work Description: kuczoram commented on a change in pull request #1660:
URL: https://github.com/apache/hive/pull/1660#discussion_r538455465



##
File path: ql/src/java/org/apache/hadoop/hive/ql/QueryPlan.java
##
@@ -189,6 +191,49 @@ public WriteEntity getAcidAnalyzeTable() {
 return acidSinks;
   }
 
+  public Integer getStatementIdForAcidWriteType(long writeId, String 
moveTaskId, AcidUtils.Operation acidOperation, Path path) {
+FileSinkDesc result = null;
+for (FileSinkDesc acidSink : acidSinks) {
+  if (acidOperation.equals(acidSink.getAcidOperation()) && 
path.equals(acidSink.getDestPath())
+  && acidSink.getTableWriteId() == writeId
+  && (moveTaskId == null || acidSink.getMoveTaskId() == null || 
moveTaskId.equals(acidSink.getMoveTaskId( {
+// There is a problem with the union all optimisation. In this case, 
there will be multiple FileSinkOperators

Review comment:
   Actually this comment is to explain why the following check was needed, 
but the Java doc for the whole method is a good idea. I added it.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 521716)
Time Spent: 1h 40m  (was: 1.5h)

> ACID: Improve the delete and update operations to avoid the move step
> -
>
> Key: HIVE-23410
> URL: https://issues.apache.org/jira/browse/HIVE-23410
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0
>Reporter: Marta Kuczora
>Assignee: Marta Kuczora
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23410.1.patch
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> This is a follow-up task for 
> [HIVE-21164|https://issues.apache.org/jira/browse/HIVE-21164], where the 
> insert operation has been modified to write directly to the table locations 
> instead of the staging directory. The same improvement should be done for the 
> ACID update and delete operations as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23410) ACID: Improve the delete and update operations to avoid the move step

2020-12-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23410?focusedWorklogId=521714=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-521714
 ]

ASF GitHub Bot logged work on HIVE-23410:
-

Author: ASF GitHub Bot
Created on: 08/Dec/20 14:46
Start Date: 08/Dec/20 14:46
Worklog Time Spent: 10m 
  Work Description: kuczoram commented on a change in pull request #1660:
URL: https://github.com/apache/hive/pull/1660#discussion_r538452455



##
File path: ql/src/java/org/apache/hadoop/hive/ql/Context.java
##
@@ -105,6 +105,7 @@
 
   private Configuration conf;
   protected int pathid = 1;
+  private int moveTaskId = 100;

Review comment:
   Sure! Fixed it.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 521714)
Time Spent: 1.5h  (was: 1h 20m)

> ACID: Improve the delete and update operations to avoid the move step
> -
>
> Key: HIVE-23410
> URL: https://issues.apache.org/jira/browse/HIVE-23410
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0
>Reporter: Marta Kuczora
>Assignee: Marta Kuczora
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23410.1.patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> This is a follow-up task for 
> [HIVE-21164|https://issues.apache.org/jira/browse/HIVE-21164], where the 
> insert operation has been modified to write directly to the table locations 
> instead of the staging directory. The same improvement should be done for the 
> ACID update and delete operations as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23410) ACID: Improve the delete and update operations to avoid the move step

2020-12-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23410?focusedWorklogId=521711=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-521711
 ]

ASF GitHub Bot logged work on HIVE-23410:
-

Author: ASF GitHub Bot
Created on: 08/Dec/20 14:40
Start Date: 08/Dec/20 14:40
Worklog Time Spent: 10m 
  Work Description: kuczoram commented on a change in pull request #1660:
URL: https://github.com/apache/hive/pull/1660#discussion_r538445196



##
File path: ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDbTxnManager2.java
##
@@ -1747,15 +1747,15 @@ public void 
testMultiInsertOnDynamicallyPartitionedMmTable() throws Exception {
 final String completedTxnComponentsContents =
 TxnDbUtil.queryToString(conf, "select * from 
\"COMPLETED_TXN_COMPONENTS\"");
 Assert.assertEquals(completedTxnComponentsContents,
-2, TxnDbUtil.countQueryAgent(conf, "select count(*) from 
\"COMPLETED_TXN_COMPONENTS\""));
+4, TxnDbUtil.countQueryAgent(conf, "select count(*) from 
\"COMPLETED_TXN_COMPONENTS\""));

Review comment:
   Those records are duplicates. It is a "side-effect" of fixing the 
FileSinkOperator-MoveTask assignment.
   For ACID tables for an insert like in the test, 4 records were created even 
before the direct insert got introduced. Because then the FSO-MoveTask 
assignment was based on the staging directories. And for insert like this there 
were 2 FSOs and 2 MoveTasks. Each MoveTasks called the metastore method which 
creates an entry in the TXN_COMPONENTS table for each partition. So there were 
4 records at the end of the insert. But for MM tables (and later for direct 
insert) there is no staging directory and all MoveTasks and all FSOs will 
contain the table directory. So for every FSO it will find the same MoveTask 
(which is the first in the list) and only this one will be executed. This is 
not correct, but didn't cause any issue, so it was undetected until the direct 
delete and update came in. To make them work properly, had to fix the 
FSO-MoveTask assignment, but then for MM tables and with direct insert it will 
have duplicate records just like for ACID tables without direct insert. The 
Java doc of the TxnHandler.addDynamicPartitions method says that duplicates 
won't cause any trouble, but if you know issues with that, please share it with 
me.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 521711)
Time Spent: 1h 20m  (was: 1h 10m)

> ACID: Improve the delete and update operations to avoid the move step
> -
>
> Key: HIVE-23410
> URL: https://issues.apache.org/jira/browse/HIVE-23410
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0
>Reporter: Marta Kuczora
>Assignee: Marta Kuczora
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23410.1.patch
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> This is a follow-up task for 
> [HIVE-21164|https://issues.apache.org/jira/browse/HIVE-21164], where the 
> insert operation has been modified to write directly to the table locations 
> instead of the staging directory. The same improvement should be done for the 
> ACID update and delete operations as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24504) VectorFileSinkArrowOperator does not serialize complex types correctly

2020-12-08 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary reassigned HIVE-24504:
-


> VectorFileSinkArrowOperator does not serialize complex types correctly
> --
>
> Key: HIVE-24504
> URL: https://issues.apache.org/jira/browse/HIVE-24504
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>
> When the table has complex types and the result has 0 records the 
> VectorFileSinkArrowOperator only serializes the primitive types correctly. 
> For complex types only the main type is set which causes issues for clients 
> trying to read data.
> Got the following HWC exception:
> {code:java}
> Previous exception in task: Unsupported data type: Null
>   
> org.apache.spark.sql.execution.arrow.ArrowUtils$.fromArrowType(ArrowUtils.scala:71)
>   
> org.apache.spark.sql.execution.arrow.ArrowUtils$.fromArrowField(ArrowUtils.scala:106)
>   
> org.apache.spark.sql.execution.arrow.ArrowUtils$.fromArrowField(ArrowUtils.scala:98)
>   
> org.apache.spark.sql.execution.arrow.ArrowUtils.fromArrowField(ArrowUtils.scala)
>   
> org.apache.spark.sql.vectorized.ArrowColumnVector.(ArrowColumnVector.java:135)
>   
> com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataReader.get(HiveWarehouseDataReader.java:105)
>   
> com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataReader.get(HiveWarehouseDataReader.java:29)
>   
> org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.next(DataSourceRDD.scala:59)
>   
> org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:40)
>   
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.datasourcev2scan_nextBatch_0$(Unknown
>  Source)
>   
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
>  Source)
>   
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614)
>   
> org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:253)
>   
> org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:247)
>   
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:836)
>   
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:836)
>   org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49)
>   org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>   org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>   org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49)
>   org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>   org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>   org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
>   org.apache.spark.scheduler.Task.run(Task.scala:109)
>   org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
>   
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   java.lang.Thread.run(Thread.java:745)
>   at 
> org.apache.spark.TaskContextImpl.invokeListeners(TaskContextImpl.scala:139)
>   at 
> org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:117)
>   at org.apache.spark.scheduler.Task.run(Task.scala:119)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745) {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24470) Separate HiveMetastore Thrift and Driver logic

2020-12-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24470?focusedWorklogId=521702=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-521702
 ]

ASF GitHub Bot logged work on HIVE-24470:
-

Author: ASF GitHub Bot
Created on: 08/Dec/20 14:28
Start Date: 08/Dec/20 14:28
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on pull request #1740:
URL: https://github.com/apache/hive/pull/1740#issuecomment-740652364


   Also, please create a feature branch (HIVE-24470) on your local repository 
and PR from there.
   
   ```
   git checkout -b HIVE-24470
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 521702)
Time Spent: 1h 40m  (was: 1.5h)

> Separate HiveMetastore Thrift and Driver logic
> --
>
> Key: HIVE-24470
> URL: https://issues.apache.org/jira/browse/HIVE-24470
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Cameron Moberg
>Assignee: Cameron Moberg
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> In the file HiveMetastore.java the majority of the code is a thrift interface 
> rather than the actual logic behind starting hive metastore, this should be 
> moved out into a separate file to clean up the file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24470) Separate HiveMetastore Thrift and Driver logic

2020-12-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24470?focusedWorklogId=521701=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-521701
 ]

ASF GitHub Bot logged work on HIVE-24470:
-

Author: ASF GitHub Bot
Created on: 08/Dec/20 14:27
Start Date: 08/Dec/20 14:27
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on pull request #1740:
URL: https://github.com/apache/hive/pull/1740#issuecomment-740651806


   @mwalenia Go ahead and just close the PR manually for 30s and then re-open.  
That should trigger the tests again.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 521701)
Time Spent: 1.5h  (was: 1h 20m)

> Separate HiveMetastore Thrift and Driver logic
> --
>
> Key: HIVE-24470
> URL: https://issues.apache.org/jira/browse/HIVE-24470
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Cameron Moberg
>Assignee: Cameron Moberg
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> In the file HiveMetastore.java the majority of the code is a thrift interface 
> rather than the actual logic behind starting hive metastore, this should be 
> moved out into a separate file to clean up the file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24388) Enhance swo optimizations to merge EventOperators

2020-12-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24388?focusedWorklogId=521674=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-521674
 ]

ASF GitHub Bot logged work on HIVE-24388:
-

Author: ASF GitHub Bot
Created on: 08/Dec/20 13:15
Start Date: 08/Dec/20 13:15
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on pull request #1750:
URL: https://github.com/apache/hive/pull/1750#issuecomment-740612558


   @jcamachor could you please take a look? 
   this patch also closes down 2 smaller bugs - one of them might have 
potentially caused parallel edge creation; I think the issue was not new; the 
new algorithm just stressed the issue a little bit more...I will ask Sungwoo to 
check it again after we have this patch



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 521674)
Time Spent: 20m  (was: 10m)

> Enhance swo optimizations to merge EventOperators
> -
>
> Key: HIVE-24388
> URL: https://issues.apache.org/jira/browse/HIVE-24388
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> {code}
> EVENT1->TS1
> EVENT2->TS2
> {code}
> are not merged because a TS may only handles the first event properly; 
> sending 2 events would cause one of them to be ignored



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24503) Optimize vector row serde by avoiding type check at run time

2020-12-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24503:
--
Labels: pull-request-available  (was: )

> Optimize vector row serde by avoiding type check at run time 
> -
>
> Key: HIVE-24503
> URL: https://issues.apache.org/jira/browse/HIVE-24503
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Serialization/Deserialization of vectorized batch done at VectorSerializeRow 
> and VectorDeserializeRow does a type checking for each column of each row. 
> This becomes very costly when there are billions of rows to read/write. This 
> can be optimized if the type check is done during init time and specific 
> reader/writer classes are created. This classes can be used directly stored 
> in filed structure to avoid run time type check.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24503) Optimize vector row serde by avoiding type check at run time

2020-12-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24503?focusedWorklogId=521643=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-521643
 ]

ASF GitHub Bot logged work on HIVE-24503:
-

Author: ASF GitHub Bot
Created on: 08/Dec/20 11:38
Start Date: 08/Dec/20 11:38
Worklog Time Spent: 10m 
  Work Description: maheshk114 opened a new pull request #1753:
URL: https://github.com/apache/hive/pull/1753


   …
   
   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 521643)
Remaining Estimate: 0h
Time Spent: 10m

> Optimize vector row serde by avoiding type check at run time 
> -
>
> Key: HIVE-24503
> URL: https://issues.apache.org/jira/browse/HIVE-24503
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Serialization/Deserialization of vectorized batch done at VectorSerializeRow 
> and VectorDeserializeRow does a type checking for each column of each row. 
> This becomes very costly when there are billions of rows to read/write. This 
> can be optimized if the type check is done during init time and specific 
> reader/writer classes are created. This classes can be used directly stored 
> in filed structure to avoid run time type check.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24503) Optimize vector row serde by avoiding type check at run time

2020-12-08 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-24503:
---
Description: Serialization/Deserialization of vectorized batch done at 
VectorSerializeRow and VectorDeserializeRow does a type checking for each 
column of each row. This becomes very costly when there are billions of rows to 
read/write. This can be optimized if the type check is done during init time 
and specific reader/writer classes are created. This classes can be used 
directly stored in filed structure to avoid run time type check.  (was: 
Serialization/Deserialization of vectorized batch done at VectorSerializeRow 
and VectorDeserializeRow does a type checking for each column of each row. This 
becomes very costly when there are billions of rows to read/write. This can be 
optimized if the type check is done during init time and specific reader/writer 
classes are created. )

> Optimize vector row serde by avoiding type check at run time 
> -
>
> Key: HIVE-24503
> URL: https://issues.apache.org/jira/browse/HIVE-24503
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>
> Serialization/Deserialization of vectorized batch done at VectorSerializeRow 
> and VectorDeserializeRow does a type checking for each column of each row. 
> This becomes very costly when there are billions of rows to read/write. This 
> can be optimized if the type check is done during init time and specific 
> reader/writer classes are created. This classes can be used directly stored 
> in filed structure to avoid run time type check.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24503) Optimize vector row serde to avoid type check at run time

2020-12-08 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera reassigned HIVE-24503:
--


> Optimize vector row serde to avoid type check at run time 
> --
>
> Key: HIVE-24503
> URL: https://issues.apache.org/jira/browse/HIVE-24503
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>
> Serialization/Deserialization of vectorized batch done at VectorSerializeRow 
> and VectorDeserializeRow does a type checking for each column of each row. 
> This becomes very costly when there are billions of rows to read/write. This 
> can be optimized if the type check is done during init time and specific 
> reader/writer classes are created. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24503) Optimize vector row serde by avoiding type check at run time

2020-12-08 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-24503:
---
Summary: Optimize vector row serde by avoiding type check at run time   
(was: Optimize vector row serde to avoid type check at run time )

> Optimize vector row serde by avoiding type check at run time 
> -
>
> Key: HIVE-24503
> URL: https://issues.apache.org/jira/browse/HIVE-24503
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>
> Serialization/Deserialization of vectorized batch done at VectorSerializeRow 
> and VectorDeserializeRow does a type checking for each column of each row. 
> This becomes very costly when there are billions of rows to read/write. This 
> can be optimized if the type check is done during init time and specific 
> reader/writer classes are created. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24470) Separate HiveMetastore Thrift and Driver logic

2020-12-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24470?focusedWorklogId=521635=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-521635
 ]

ASF GitHub Bot logged work on HIVE-24470:
-

Author: ASF GitHub Bot
Created on: 08/Dec/20 11:23
Start Date: 08/Dec/20 11:23
Worklog Time Spent: 10m 
  Work Description: miklosgergely commented on pull request #1740:
URL: https://github.com/apache/hive/pull/1740#issuecomment-740561137


   @mwalenia please push your changes again to your remote branch, with a new 
commit having the same content.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 521635)
Time Spent: 1h 20m  (was: 1h 10m)

> Separate HiveMetastore Thrift and Driver logic
> --
>
> Key: HIVE-24470
> URL: https://issues.apache.org/jira/browse/HIVE-24470
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Cameron Moberg
>Assignee: Cameron Moberg
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> In the file HiveMetastore.java the majority of the code is a thrift interface 
> rather than the actual logic behind starting hive metastore, this should be 
> moved out into a separate file to clean up the file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24470) Separate HiveMetastore Thrift and Driver logic

2020-12-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24470?focusedWorklogId=521634=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-521634
 ]

ASF GitHub Bot logged work on HIVE-24470:
-

Author: ASF GitHub Bot
Created on: 08/Dec/20 11:22
Start Date: 08/Dec/20 11:22
Worklog Time Spent: 10m 
  Work Description: miklosgergely removed a comment on pull request #1740:
URL: https://github.com/apache/hive/pull/1740#issuecomment-740560239


   recheck



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 521634)
Time Spent: 1h 10m  (was: 1h)

> Separate HiveMetastore Thrift and Driver logic
> --
>
> Key: HIVE-24470
> URL: https://issues.apache.org/jira/browse/HIVE-24470
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Cameron Moberg
>Assignee: Cameron Moberg
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> In the file HiveMetastore.java the majority of the code is a thrift interface 
> rather than the actual logic behind starting hive metastore, this should be 
> moved out into a separate file to clean up the file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24470) Separate HiveMetastore Thrift and Driver logic

2020-12-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24470?focusedWorklogId=521633=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-521633
 ]

ASF GitHub Bot logged work on HIVE-24470:
-

Author: ASF GitHub Bot
Created on: 08/Dec/20 11:22
Start Date: 08/Dec/20 11:22
Worklog Time Spent: 10m 
  Work Description: miklosgergely commented on pull request #1740:
URL: https://github.com/apache/hive/pull/1740#issuecomment-740560239


   recheck



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 521633)
Time Spent: 1h  (was: 50m)

> Separate HiveMetastore Thrift and Driver logic
> --
>
> Key: HIVE-24470
> URL: https://issues.apache.org/jira/browse/HIVE-24470
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Cameron Moberg
>Assignee: Cameron Moberg
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> In the file HiveMetastore.java the majority of the code is a thrift interface 
> rather than the actual logic behind starting hive metastore, this should be 
> moved out into a separate file to clean up the file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24470) Separate HiveMetastore Thrift and Driver logic

2020-12-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24470?focusedWorklogId=521632=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-521632
 ]

ASF GitHub Bot logged work on HIVE-24470:
-

Author: ASF GitHub Bot
Created on: 08/Dec/20 11:21
Start Date: 08/Dec/20 11:21
Worklog Time Spent: 10m 
  Work Description: miklosgergely removed a comment on pull request #1740:
URL: https://github.com/apache/hive/pull/1740#issuecomment-740559929


   retest



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 521632)
Time Spent: 50m  (was: 40m)

> Separate HiveMetastore Thrift and Driver logic
> --
>
> Key: HIVE-24470
> URL: https://issues.apache.org/jira/browse/HIVE-24470
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Cameron Moberg
>Assignee: Cameron Moberg
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> In the file HiveMetastore.java the majority of the code is a thrift interface 
> rather than the actual logic behind starting hive metastore, this should be 
> moved out into a separate file to clean up the file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24470) Separate HiveMetastore Thrift and Driver logic

2020-12-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24470?focusedWorklogId=521631=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-521631
 ]

ASF GitHub Bot logged work on HIVE-24470:
-

Author: ASF GitHub Bot
Created on: 08/Dec/20 11:21
Start Date: 08/Dec/20 11:21
Worklog Time Spent: 10m 
  Work Description: miklosgergely commented on pull request #1740:
URL: https://github.com/apache/hive/pull/1740#issuecomment-740559929


   retest



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 521631)
Time Spent: 40m  (was: 0.5h)

> Separate HiveMetastore Thrift and Driver logic
> --
>
> Key: HIVE-24470
> URL: https://issues.apache.org/jira/browse/HIVE-24470
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Cameron Moberg
>Assignee: Cameron Moberg
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> In the file HiveMetastore.java the majority of the code is a thrift interface 
> rather than the actual logic behind starting hive metastore, this should be 
> moved out into a separate file to clean up the file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24502) Store table level regular expression used during dump for table level replication

2020-12-08 Thread Aasha Medhi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aasha Medhi reassigned HIVE-24502:
--


> Store table level regular expression used during dump for table level 
> replication
> -
>
> Key: HIVE-24502
> URL: https://issues.apache.org/jira/browse/HIVE-24502
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24501) UpdateInputAccessTimeHook should not update stats

2020-12-08 Thread Prasanth Jayachandran (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17245807#comment-17245807
 ] 

Prasanth Jayachandran commented on HIVE-24501:
--

[~ashutoshc] [~jcamachorodriguez] could someone please help with reviewing this 
small change? 

> UpdateInputAccessTimeHook should not update stats
> -
>
> Key: HIVE-24501
> URL: https://issues.apache.org/jira/browse/HIVE-24501
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> UpdateInputAccessTimeHook can fail for transactional tables with following 
> exception.
> The hook should skip updating the stats and only update the access time.
> {code:java}
> ERROR : FAILED: Hive Internal Error: 
> org.apache.hadoop.hive.ql.metadata.HiveException(Unable to alter table. 
> Cannot change stats state for a transactional table default.test without 
> providing the transactional write state for verification (new write ID 0, 
> valid write IDs default.test:8:9223372036854775807::1,2,3,4,7; current state 
> {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true","name":"true"}}; new state 
> null)ERROR : FAILED: Hive Internal Error: 
> org.apache.hadoop.hive.ql.metadata.HiveException(Unable to alter table. 
> Cannot change stats state for a transactional table default.test without 
> providing the transactional write state for verification (new write ID 0, 
> valid write IDs default.test:8:9223372036854775807::1,2,3,4,7; current state 
> {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true","name":"true"}}; new state 
> null)org.apache.hadoop.hive.ql.metadata.HiveException: Unable to alter table. 
> Cannot change stats state for a transactional table default.test without 
> providing the transactional write state for verification (new write ID 0, 
> valid write IDs default.test:8:9223372036854775807::1,2,3,4,7; current state 
> {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true","name":"true"}}; new state 
> null at org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:821) at 
> org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:769) at 
> org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:756) at 
> org.apache.hadoop.hive.ql.hooks.UpdateInputAccessTimeHook$PreExec.run(UpdateInputAccessTimeHook.java:70)
>  at 
> org.apache.hadoop.hive.ql.HookRunner.invokeGeneralHook(HookRunner.java:296) 
> at org.apache.hadoop.hive.ql.HookRunner.runPreHooks(HookRunner.java:273) at 
> org.apache.hadoop.hive.ql.Executor.preExecutionActions(Executor.java:155) at 
> org.apache.hadoop.hive.ql.Executor.execute(Executor.java:107) at 
> org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:721) at 
> org.apache.hadoop.hive.ql.Driver.run(Driver.java:488) at 
> org.apache.hadoop.hive.ql.Driver.run(Driver.java:482) at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166) at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:225)
>  at 
> org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87)
>  at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:322)
>  at java.security.AccessController.doPrivileged(Native Method) at 
> javax.security.auth.Subject.doAs(Subject.java:422) at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
>  at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:340)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at 
> java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)Caused by: 
> MetaException(message:Cannot change stats state for a transactional table 
> default.test without providing the transactional write state for verification 
> (new write ID 0, valid write IDs 
> default.test:8:9223372036854775807::1,2,3,4,7; current state 
> {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true","name":"true"}}; new state 
> null) at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$alter_table_req_result$alter_table_req_resultStandardScheme.read(ThriftHiveMetastore.java)
>  at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$alter_table_req_result$alter_table_req_resultStandardScheme.read(ThriftHiveMetastore.java)
>  at 
> 

[jira] [Updated] (HIVE-24501) UpdateInputAccessTimeHook should not update stats

2020-12-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24501:
--
Labels: pull-request-available  (was: )

> UpdateInputAccessTimeHook should not update stats
> -
>
> Key: HIVE-24501
> URL: https://issues.apache.org/jira/browse/HIVE-24501
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> UpdateInputAccessTimeHook can fail for transactional tables with following 
> exception.
> The hook should skip updating the stats and only update the access time.
> {code:java}
> ERROR : FAILED: Hive Internal Error: 
> org.apache.hadoop.hive.ql.metadata.HiveException(Unable to alter table. 
> Cannot change stats state for a transactional table default.test without 
> providing the transactional write state for verification (new write ID 0, 
> valid write IDs default.test:8:9223372036854775807::1,2,3,4,7; current state 
> {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true","name":"true"}}; new state 
> null)ERROR : FAILED: Hive Internal Error: 
> org.apache.hadoop.hive.ql.metadata.HiveException(Unable to alter table. 
> Cannot change stats state for a transactional table default.test without 
> providing the transactional write state for verification (new write ID 0, 
> valid write IDs default.test:8:9223372036854775807::1,2,3,4,7; current state 
> {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true","name":"true"}}; new state 
> null)org.apache.hadoop.hive.ql.metadata.HiveException: Unable to alter table. 
> Cannot change stats state for a transactional table default.test without 
> providing the transactional write state for verification (new write ID 0, 
> valid write IDs default.test:8:9223372036854775807::1,2,3,4,7; current state 
> {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true","name":"true"}}; new state 
> null at org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:821) at 
> org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:769) at 
> org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:756) at 
> org.apache.hadoop.hive.ql.hooks.UpdateInputAccessTimeHook$PreExec.run(UpdateInputAccessTimeHook.java:70)
>  at 
> org.apache.hadoop.hive.ql.HookRunner.invokeGeneralHook(HookRunner.java:296) 
> at org.apache.hadoop.hive.ql.HookRunner.runPreHooks(HookRunner.java:273) at 
> org.apache.hadoop.hive.ql.Executor.preExecutionActions(Executor.java:155) at 
> org.apache.hadoop.hive.ql.Executor.execute(Executor.java:107) at 
> org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:721) at 
> org.apache.hadoop.hive.ql.Driver.run(Driver.java:488) at 
> org.apache.hadoop.hive.ql.Driver.run(Driver.java:482) at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166) at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:225)
>  at 
> org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87)
>  at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:322)
>  at java.security.AccessController.doPrivileged(Native Method) at 
> javax.security.auth.Subject.doAs(Subject.java:422) at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
>  at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:340)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at 
> java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)Caused by: 
> MetaException(message:Cannot change stats state for a transactional table 
> default.test without providing the transactional write state for verification 
> (new write ID 0, valid write IDs 
> default.test:8:9223372036854775807::1,2,3,4,7; current state 
> {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true","name":"true"}}; new state 
> null) at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$alter_table_req_result$alter_table_req_resultStandardScheme.read(ThriftHiveMetastore.java)
>  at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$alter_table_req_result$alter_table_req_resultStandardScheme.read(ThriftHiveMetastore.java)
>  at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$alter_table_req_result.read(ThriftHiveMetastore.java)
>  at 

[jira] [Updated] (HIVE-24501) UpdateInputAccessTimeHook should not update stats

2020-12-08 Thread Prasanth Jayachandran (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-24501:
-
Status: Patch Available  (was: Open)

> UpdateInputAccessTimeHook should not update stats
> -
>
> Key: HIVE-24501
> URL: https://issues.apache.org/jira/browse/HIVE-24501
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> UpdateInputAccessTimeHook can fail for transactional tables with following 
> exception.
> The hook should skip updating the stats and only update the access time.
> {code:java}
> ERROR : FAILED: Hive Internal Error: 
> org.apache.hadoop.hive.ql.metadata.HiveException(Unable to alter table. 
> Cannot change stats state for a transactional table default.test without 
> providing the transactional write state for verification (new write ID 0, 
> valid write IDs default.test:8:9223372036854775807::1,2,3,4,7; current state 
> {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true","name":"true"}}; new state 
> null)ERROR : FAILED: Hive Internal Error: 
> org.apache.hadoop.hive.ql.metadata.HiveException(Unable to alter table. 
> Cannot change stats state for a transactional table default.test without 
> providing the transactional write state for verification (new write ID 0, 
> valid write IDs default.test:8:9223372036854775807::1,2,3,4,7; current state 
> {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true","name":"true"}}; new state 
> null)org.apache.hadoop.hive.ql.metadata.HiveException: Unable to alter table. 
> Cannot change stats state for a transactional table default.test without 
> providing the transactional write state for verification (new write ID 0, 
> valid write IDs default.test:8:9223372036854775807::1,2,3,4,7; current state 
> {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true","name":"true"}}; new state 
> null at org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:821) at 
> org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:769) at 
> org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:756) at 
> org.apache.hadoop.hive.ql.hooks.UpdateInputAccessTimeHook$PreExec.run(UpdateInputAccessTimeHook.java:70)
>  at 
> org.apache.hadoop.hive.ql.HookRunner.invokeGeneralHook(HookRunner.java:296) 
> at org.apache.hadoop.hive.ql.HookRunner.runPreHooks(HookRunner.java:273) at 
> org.apache.hadoop.hive.ql.Executor.preExecutionActions(Executor.java:155) at 
> org.apache.hadoop.hive.ql.Executor.execute(Executor.java:107) at 
> org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:721) at 
> org.apache.hadoop.hive.ql.Driver.run(Driver.java:488) at 
> org.apache.hadoop.hive.ql.Driver.run(Driver.java:482) at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166) at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:225)
>  at 
> org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87)
>  at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:322)
>  at java.security.AccessController.doPrivileged(Native Method) at 
> javax.security.auth.Subject.doAs(Subject.java:422) at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
>  at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:340)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at 
> java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)Caused by: 
> MetaException(message:Cannot change stats state for a transactional table 
> default.test without providing the transactional write state for verification 
> (new write ID 0, valid write IDs 
> default.test:8:9223372036854775807::1,2,3,4,7; current state 
> {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true","name":"true"}}; new state 
> null) at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$alter_table_req_result$alter_table_req_resultStandardScheme.read(ThriftHiveMetastore.java)
>  at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$alter_table_req_result$alter_table_req_resultStandardScheme.read(ThriftHiveMetastore.java)
>  at 
> 

[jira] [Work logged] (HIVE-24501) UpdateInputAccessTimeHook should not update stats

2020-12-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24501?focusedWorklogId=521620=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-521620
 ]

ASF GitHub Bot logged work on HIVE-24501:
-

Author: ASF GitHub Bot
Created on: 08/Dec/20 10:35
Start Date: 08/Dec/20 10:35
Worklog Time Spent: 10m 
  Work Description: prasanthj opened a new pull request #1752:
URL: https://github.com/apache/hive/pull/1752


   ### What changes were proposed in this pull request?
   When UpdateInputAccessTimeHook is used as pre-hook, any simple queries on 
transactional table throws exception (full stactrace in 
https://issues.apache.org/jira/browse/HIVE-24501)
   ```
   org.apache.hadoop.hive.ql.metadata.HiveException(Unable to alter table. 
Cannot change stats state for a transactional table default.test without 
providing the transactional write state for verification
   ```
   For updating only access time, the stats of table and partitions does not 
have to be updated. This PR sets environment context to skip updating the stats.
   
   ### Why are the changes needed?
   Bug with exception trace in https://issues.apache.org/jira/browse/HIVE-24501
   
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   
   ### How was this patch tested?
   Manually on dev cluster.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 521620)
Remaining Estimate: 0h
Time Spent: 10m

> UpdateInputAccessTimeHook should not update stats
> -
>
> Key: HIVE-24501
> URL: https://issues.apache.org/jira/browse/HIVE-24501
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> UpdateInputAccessTimeHook can fail for transactional tables with following 
> exception.
> The hook should skip updating the stats and only update the access time.
> {code:java}
> ERROR : FAILED: Hive Internal Error: 
> org.apache.hadoop.hive.ql.metadata.HiveException(Unable to alter table. 
> Cannot change stats state for a transactional table default.test without 
> providing the transactional write state for verification (new write ID 0, 
> valid write IDs default.test:8:9223372036854775807::1,2,3,4,7; current state 
> {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true","name":"true"}}; new state 
> null)ERROR : FAILED: Hive Internal Error: 
> org.apache.hadoop.hive.ql.metadata.HiveException(Unable to alter table. 
> Cannot change stats state for a transactional table default.test without 
> providing the transactional write state for verification (new write ID 0, 
> valid write IDs default.test:8:9223372036854775807::1,2,3,4,7; current state 
> {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true","name":"true"}}; new state 
> null)org.apache.hadoop.hive.ql.metadata.HiveException: Unable to alter table. 
> Cannot change stats state for a transactional table default.test without 
> providing the transactional write state for verification (new write ID 0, 
> valid write IDs default.test:8:9223372036854775807::1,2,3,4,7; current state 
> {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true","name":"true"}}; new state 
> null at org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:821) at 
> org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:769) at 
> org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:756) at 
> org.apache.hadoop.hive.ql.hooks.UpdateInputAccessTimeHook$PreExec.run(UpdateInputAccessTimeHook.java:70)
>  at 
> org.apache.hadoop.hive.ql.HookRunner.invokeGeneralHook(HookRunner.java:296) 
> at org.apache.hadoop.hive.ql.HookRunner.runPreHooks(HookRunner.java:273) at 
> org.apache.hadoop.hive.ql.Executor.preExecutionActions(Executor.java:155) at 
> org.apache.hadoop.hive.ql.Executor.execute(Executor.java:107) at 
> org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:721) at 
> org.apache.hadoop.hive.ql.Driver.run(Driver.java:488) at 
> org.apache.hadoop.hive.ql.Driver.run(Driver.java:482) at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166) at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:225)
>  at 
> org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87)
>  at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:322)
>  at 

[jira] [Assigned] (HIVE-24501) UpdateInputAccessTimeHook should not update stats

2020-12-08 Thread Prasanth Jayachandran (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran reassigned HIVE-24501:



> UpdateInputAccessTimeHook should not update stats
> -
>
> Key: HIVE-24501
> URL: https://issues.apache.org/jira/browse/HIVE-24501
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
>
> UpdateInputAccessTimeHook can fail for transactional tables with following 
> exception.
> The hook should skip updating the stats and only update the access time.
> {code:java}
> ERROR : FAILED: Hive Internal Error: 
> org.apache.hadoop.hive.ql.metadata.HiveException(Unable to alter table. 
> Cannot change stats state for a transactional table default.test without 
> providing the transactional write state for verification (new write ID 0, 
> valid write IDs default.test:8:9223372036854775807::1,2,3,4,7; current state 
> {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true","name":"true"}}; new state 
> null)ERROR : FAILED: Hive Internal Error: 
> org.apache.hadoop.hive.ql.metadata.HiveException(Unable to alter table. 
> Cannot change stats state for a transactional table default.test without 
> providing the transactional write state for verification (new write ID 0, 
> valid write IDs default.test:8:9223372036854775807::1,2,3,4,7; current state 
> {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true","name":"true"}}; new state 
> null)org.apache.hadoop.hive.ql.metadata.HiveException: Unable to alter table. 
> Cannot change stats state for a transactional table default.test without 
> providing the transactional write state for verification (new write ID 0, 
> valid write IDs default.test:8:9223372036854775807::1,2,3,4,7; current state 
> {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true","name":"true"}}; new state 
> null at org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:821) at 
> org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:769) at 
> org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:756) at 
> org.apache.hadoop.hive.ql.hooks.UpdateInputAccessTimeHook$PreExec.run(UpdateInputAccessTimeHook.java:70)
>  at 
> org.apache.hadoop.hive.ql.HookRunner.invokeGeneralHook(HookRunner.java:296) 
> at org.apache.hadoop.hive.ql.HookRunner.runPreHooks(HookRunner.java:273) at 
> org.apache.hadoop.hive.ql.Executor.preExecutionActions(Executor.java:155) at 
> org.apache.hadoop.hive.ql.Executor.execute(Executor.java:107) at 
> org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:721) at 
> org.apache.hadoop.hive.ql.Driver.run(Driver.java:488) at 
> org.apache.hadoop.hive.ql.Driver.run(Driver.java:482) at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166) at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:225)
>  at 
> org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87)
>  at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:322)
>  at java.security.AccessController.doPrivileged(Native Method) at 
> javax.security.auth.Subject.doAs(Subject.java:422) at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
>  at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:340)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at 
> java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)Caused by: 
> MetaException(message:Cannot change stats state for a transactional table 
> default.test without providing the transactional write state for verification 
> (new write ID 0, valid write IDs 
> default.test:8:9223372036854775807::1,2,3,4,7; current state 
> {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true","name":"true"}}; new state 
> null) at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$alter_table_req_result$alter_table_req_resultStandardScheme.read(ThriftHiveMetastore.java)
>  at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$alter_table_req_result$alter_table_req_resultStandardScheme.read(ThriftHiveMetastore.java)
>  at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$alter_table_req_result.read(ThriftHiveMetastore.java)
>  at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:86) at 
> 

[jira] [Work logged] (HIVE-24470) Separate HiveMetastore Thrift and Driver logic

2020-12-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24470?focusedWorklogId=521611=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-521611
 ]

ASF GitHub Bot logged work on HIVE-24470:
-

Author: ASF GitHub Bot
Created on: 08/Dec/20 09:46
Start Date: 08/Dec/20 09:46
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on pull request #1740:
URL: https://github.com/apache/hive/pull/1740#issuecomment-740510182


   @miklosgergely can you run the tests again?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 521611)
Time Spent: 0.5h  (was: 20m)

> Separate HiveMetastore Thrift and Driver logic
> --
>
> Key: HIVE-24470
> URL: https://issues.apache.org/jira/browse/HIVE-24470
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Cameron Moberg
>Assignee: Cameron Moberg
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> In the file HiveMetastore.java the majority of the code is a thrift interface 
> rather than the actual logic behind starting hive metastore, this should be 
> moved out into a separate file to clean up the file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24337) Cache delete delta files in LLAP cache

2020-12-08 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-24337?focusedWorklogId=521609=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-521609
 ]

Ádám Szita logged work on HIVE-24337:
-

Author: Ádám Szita
Created on: 08/Dec/20 09:39
Start Date: 08/Dec/20 09:39
Worklog Time Spent: 4h 

Issue Time Tracking
---

Worklog Id: (was: 521609)
Remaining Estimate: 0h
Time Spent: 4h

> Cache delete delta files in LLAP cache
> --
>
> Key: HIVE-24337
> URL: https://issues.apache.org/jira/browse/HIVE-24337
> Project: Hive
>  Issue Type: New Feature
>Reporter: Ádám Szita
>Priority: Major
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> HIVE-23824 added the functionality of caching metadata part of orc files in 
> LLAP cache, so that ACID reads can be faster. However the content itself 
> still needs to be read in every single time. If this could be cached too, 
> additional time could be saved.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)