[jira] [Work logged] (HIVE-24497) Node heartbeats from LLAP Daemon to the client are not matching leading to timeout.
[ https://issues.apache.org/jira/browse/HIVE-24497?focusedWorklogId=522102=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522102 ] ASF GitHub Bot logged work on HIVE-24497: - Author: ASF GitHub Bot Created on: 09/Dec/20 07:53 Start Date: 09/Dec/20 07:53 Worklog Time Spent: 10m Work Description: simhadri-g commented on a change in pull request #1755: URL: https://github.com/apache/hive/pull/1755#discussion_r539081553 ## File path: llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/AMReporter.java ## @@ -541,6 +551,14 @@ boolean isDone() { return isDone.get(); } +void setIsExtCliRequest(boolean val) { + isExtCliRequest.set(val); Review comment: Sure, I have changed the names to isExternalClientRequest in the recent commit. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 522102) Time Spent: 0.5h (was: 20m) > Node heartbeats from LLAP Daemon to the client are not matching leading to > timeout. > --- > > Key: HIVE-24497 > URL: https://issues.apache.org/jira/browse/HIVE-24497 > Project: Hive > Issue Type: Sub-task >Reporter: Simhadri G >Assignee: Simhadri G >Priority: Minor > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > Node heartbeat contains info about all the tasks that were submitted to that > LLAP Daemon. In cloud deployment, the client is not able to match this > heartbeats due to differences in hostname and port . -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24497) Node heartbeats from LLAP Daemon to the client are not matching leading to timeout.
[ https://issues.apache.org/jira/browse/HIVE-24497?focusedWorklogId=522089=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522089 ] ASF GitHub Bot logged work on HIVE-24497: - Author: ASF GitHub Bot Created on: 09/Dec/20 07:19 Start Date: 09/Dec/20 07:19 Worklog Time Spent: 10m Work Description: prasanthj commented on a change in pull request #1755: URL: https://github.com/apache/hive/pull/1755#discussion_r539064255 ## File path: llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/AMReporter.java ## @@ -541,6 +551,14 @@ boolean isDone() { return isDone.get(); } +void setIsExtCliRequest(boolean val) { + isExtCliRequest.set(val); Review comment: nit: for better readability, rename the variable and method to isExternalClientRequest. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 522089) Time Spent: 20m (was: 10m) > Node heartbeats from LLAP Daemon to the client are not matching leading to > timeout. > --- > > Key: HIVE-24497 > URL: https://issues.apache.org/jira/browse/HIVE-24497 > Project: Hive > Issue Type: Sub-task >Reporter: Simhadri G >Assignee: Simhadri G >Priority: Minor > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Node heartbeat contains info about all the tasks that were submitted to that > LLAP Daemon. In cloud deployment, the client is not able to match this > heartbeats due to differences in hostname and port . -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24432) Delete Notification Events in Batches
[ https://issues.apache.org/jira/browse/HIVE-24432?focusedWorklogId=522074=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522074 ] ASF GitHub Bot logged work on HIVE-24432: - Author: ASF GitHub Bot Created on: 09/Dec/20 06:24 Start Date: 09/Dec/20 06:24 Worklog Time Spent: 10m Work Description: aasha edited a comment on pull request #1710: URL: https://github.com/apache/hive/pull/1710#issuecomment-741558077 Please add a test. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 522074) Time Spent: 2h (was: 1h 50m) > Delete Notification Events in Batches > - > > Key: HIVE-24432 > URL: https://issues.apache.org/jira/browse/HIVE-24432 > Project: Hive > Issue Type: Improvement >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Labels: pull-request-available > Time Spent: 2h > Remaining Estimate: 0h > > Notification events are loaded in batches (reduces memory pressure on the > HMS), but all of the deletes happen under a single transactions and, when > deleting many records, can put a lot of pressure on the backend database. > Instead, delete events in batches (in different transactions) as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24432) Delete Notification Events in Batches
[ https://issues.apache.org/jira/browse/HIVE-24432?focusedWorklogId=522073=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522073 ] ASF GitHub Bot logged work on HIVE-24432: - Author: ASF GitHub Bot Created on: 09/Dec/20 06:23 Start Date: 09/Dec/20 06:23 Worklog Time Spent: 10m Work Description: aasha commented on a change in pull request #1710: URL: https://github.com/apache/hive/pull/1710#discussion_r539040762 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java ## @@ -10800,53 +10801,89 @@ public void addNotificationEvent(NotificationEvent entry) throws MetaException { @Override public void cleanNotificationEvents(int olderThan) { -boolean commited = false; -Query query = null; +final int eventBatchSize = MetastoreConf.getIntVar(conf, MetastoreConf.ConfVars.EVENT_CLEAN_MAX_EVENTS); + +final long ageSec = olderThan; +final Instant now = Instant.now(); + +final int tooOld = Math.toIntExact(now.getEpochSecond() - ageSec); + +final Optional batchSize = (eventBatchSize > 0) ? Optional.of(eventBatchSize) : Optional.empty(); + +final long start = System.nanoTime(); +int deleteCount = doCleanNotificationEvents(tooOld, batchSize); + +if (deleteCount == 0) { + LOG.info("No Notification events found to be cleaned with eventTime < {}", tooOld); +} else { + int batchCount = 0; + do { +batchCount = doCleanNotificationEvents(tooOld, batchSize); +deleteCount += batchCount; + } while (batchCount > 0); +} + +final long finish = System.nanoTime(); + +LOG.info("Deleted {} notification events older than epoch:{} in {}ms", deleteCount, tooOld, +TimeUnit.NANOSECONDS.toMillis(finish - start)); + } + + private int doCleanNotificationEvents(final int ageSec, final Optional batchSize) { +final Transaction tx = pm.currentTransaction(); +int eventsCount = 0; + try { - openTransaction(); - long tmp = System.currentTimeMillis() / 1000 - olderThan; - int tooOld = (tmp > Integer.MAX_VALUE) ? 0 : (int) tmp; - query = pm.newQuery(MNotificationLog.class, "eventTime < tooOld"); - query.declareParameters("java.lang.Integer tooOld"); + tx.begin(); - int max_events = MetastoreConf.getIntVar(conf, MetastoreConf.ConfVars.EVENT_CLEAN_MAX_EVENTS); - max_events = max_events > 0 ? max_events : Integer.MAX_VALUE; - query.setRange(0, max_events); - query.setOrdering("eventId ascending"); + try (Query query = pm.newQuery(MNotificationLog.class, "eventTime < tooOld")) { +query.declareParameters("java.lang.Integer tooOld"); +query.setOrdering("eventId ascending"); +if (batchSize.isPresent()) { + query.setRange(0, batchSize.get()); +} - List toBeRemoved = (List) query.execute(tooOld); - int iteration = 0; - int eventCount = 0; - long minEventId = 0; - long minEventTime = 0; - long maxEventId = 0; - long maxEventTime = 0; - while (CollectionUtils.isNotEmpty(toBeRemoved)) { -int listSize = toBeRemoved.size(); -if (iteration == 0) { - MNotificationLog firstNotification = toBeRemoved.get(0); - minEventId = firstNotification.getEventId(); - minEventTime = firstNotification.getEventTime(); +List events = (List) query.execute(ageSec); +if (CollectionUtils.isNotEmpty(events)) { + eventsCount = events.size(); + + if (LOG.isDebugEnabled()) { +int minEventTime, maxEventTime; +long minEventId, maxEventId; +Iterator iter = events.iterator(); +MNotificationLog firstNotification = iter.next(); + +minEventTime = maxEventTime = firstNotification.getEventTime(); +minEventId = maxEventId = firstNotification.getEventId(); + +while (iter.hasNext()) { + MNotificationLog notification = iter.next(); Review comment: comparison is not required. events will always be in ascending order of event id This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 522073) Time Spent: 1h 50m (was: 1h 40m) > Delete Notification Events in Batches > - > > Key: HIVE-24432 > URL: https://issues.apache.org/jira/browse/HIVE-24432 > Project: Hive > Issue Type: Improvement >Affects Versions: 3.2.0 >Reporter: David
[jira] [Work logged] (HIVE-24432) Delete Notification Events in Batches
[ https://issues.apache.org/jira/browse/HIVE-24432?focusedWorklogId=522071=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522071 ] ASF GitHub Bot logged work on HIVE-24432: - Author: ASF GitHub Bot Created on: 09/Dec/20 06:18 Start Date: 09/Dec/20 06:18 Worklog Time Spent: 10m Work Description: aasha commented on pull request #1710: URL: https://github.com/apache/hive/pull/1710#issuecomment-741558077 Please add a test. The code looks good to me. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 522071) Time Spent: 1h 40m (was: 1.5h) > Delete Notification Events in Batches > - > > Key: HIVE-24432 > URL: https://issues.apache.org/jira/browse/HIVE-24432 > Project: Hive > Issue Type: Improvement >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Labels: pull-request-available > Time Spent: 1h 40m > Remaining Estimate: 0h > > Notification events are loaded in batches (reduces memory pressure on the > HMS), but all of the deletes happen under a single transactions and, when > deleting many records, can put a lot of pressure on the backend database. > Instead, delete events in batches (in different transactions) as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (HIVE-24510) Vectorize compute_bit_vector
[ https://issues.apache.org/jira/browse/HIVE-24510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-24510 started by Mustafa İman. --- > Vectorize compute_bit_vector > > > Key: HIVE-24510 > URL: https://issues.apache.org/jira/browse/HIVE-24510 > Project: Hive > Issue Type: Improvement >Reporter: Mustafa İman >Assignee: Mustafa İman >Priority: Major > > After https://issues.apache.org/jira/browse/HIVE-23530 , almost all compute > stats functions are vectorizable. Only function that is not vectorizable is > "compute_bit_vector" for ndv statistics computation. This causes "create > table as select" and "insert overwrite select" queries to run in > non-vectorized mode. > Even a very naive implementation of vectorized compute_bit_vector gives about > 50% performance improvement on simple "insert overwrite select" queries. That > is because entire mapper or reducer can run in vectorized mode. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24510) Vectorize compute_bit_vector
[ https://issues.apache.org/jira/browse/HIVE-24510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mustafa İman reassigned HIVE-24510: --- > Vectorize compute_bit_vector > > > Key: HIVE-24510 > URL: https://issues.apache.org/jira/browse/HIVE-24510 > Project: Hive > Issue Type: Improvement >Reporter: Mustafa İman >Assignee: Mustafa İman >Priority: Major > > After https://issues.apache.org/jira/browse/HIVE-23530 , almost all compute > stats functions are vectorizable. Only function that is not vectorizable is > "compute_bit_vector" for ndv statistics computation. This causes "create > table as select" and "insert overwrite select" queries to run in > non-vectorized mode. > Even a very naive implementation of vectorized compute_bit_vector gives about > 50% performance improvement on simple "insert overwrite select" queries. That > is because entire mapper or reducer can run in vectorized mode. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24197) Check for write transactions for the db under replication at a frequent interval
[ https://issues.apache.org/jira/browse/HIVE-24197?focusedWorklogId=521968=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-521968 ] ASF GitHub Bot logged work on HIVE-24197: - Author: ASF GitHub Bot Created on: 09/Dec/20 00:49 Start Date: 09/Dec/20 00:49 Worklog Time Spent: 10m Work Description: github-actions[bot] commented on pull request #1523: URL: https://github.com/apache/hive/pull/1523#issuecomment-741342755 This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Feel free to reach out on the d...@hive.apache.org list if the patch is in need of reviews. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 521968) Time Spent: 20m (was: 10m) > Check for write transactions for the db under replication at a frequent > interval > > > Key: HIVE-24197 > URL: https://issues.apache.org/jira/browse/HIVE-24197 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-24197.01.patch, HIVE-24197.02.patch, > HIVE-24197.03.patch, HIVE-24197.04.patch, HIVE-24197.05.patch, > HIVE-24197.06.patch, HIVE-24197.07.patch, HIVE-24197.08.patch > > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24244) NPE during Atlas metadata replication
[ https://issues.apache.org/jira/browse/HIVE-24244?focusedWorklogId=521967=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-521967 ] ASF GitHub Bot logged work on HIVE-24244: - Author: ASF GitHub Bot Created on: 09/Dec/20 00:49 Start Date: 09/Dec/20 00:49 Worklog Time Spent: 10m Work Description: github-actions[bot] commented on pull request #1563: URL: https://github.com/apache/hive/pull/1563#issuecomment-741342691 This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Feel free to reach out on the d...@hive.apache.org list if the patch is in need of reviews. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 521967) Time Spent: 1h 10m (was: 1h) > NPE during Atlas metadata replication > - > > Key: HIVE-24244 > URL: https://issues.apache.org/jira/browse/HIVE-24244 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Pravin Sinha >Assignee: Pravin Sinha >Priority: Major > Labels: pull-request-available > Attachments: HIVE-24244.01.patch > > Time Spent: 1h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24208) LLAP: query job stuck due to race conditions
[ https://issues.apache.org/jira/browse/HIVE-24208?focusedWorklogId=521966=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-521966 ] ASF GitHub Bot logged work on HIVE-24208: - Author: ASF GitHub Bot Created on: 09/Dec/20 00:49 Start Date: 09/Dec/20 00:49 Worklog Time Spent: 10m Work Description: github-actions[bot] commented on pull request #1534: URL: https://github.com/apache/hive/pull/1534#issuecomment-741342741 This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Feel free to reach out on the d...@hive.apache.org list if the patch is in need of reviews. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 521966) Time Spent: 40m (was: 0.5h) > LLAP: query job stuck due to race conditions > > > Key: HIVE-24208 > URL: https://issues.apache.org/jira/browse/HIVE-24208 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 2.3.4 >Reporter: Yuriy Baltovskyy >Assignee: Yuriy Baltovskyy >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > When issuing an LLAP query, sometimes the TEZ job on LLAP server never ends > and it never returns the data reader. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24503) Optimize vector row serde by avoiding type check at run time
[ https://issues.apache.org/jira/browse/HIVE-24503?focusedWorklogId=521963=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-521963 ] ASF GitHub Bot logged work on HIVE-24503: - Author: ASF GitHub Bot Created on: 09/Dec/20 00:45 Start Date: 09/Dec/20 00:45 Worklog Time Spent: 10m Work Description: rbalamohan commented on a change in pull request #1753: URL: https://github.com/apache/hive/pull/1753#discussion_r538916121 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSerializeRow.java ## @@ -61,27 +61,16 @@ private Field root; private static class Field { -Field[] children; - -boolean isPrimitive; -Category category; -PrimitiveCategory primitiveCategory; -TypeInfo typeInfo; - -int count; - -ObjectInspector objectInspector; -int outputColumnNum; - +Field[] children = null; +boolean isPrimitive = false; +Category category = null; +PrimitiveCategory primitiveCategory = null; +TypeInfo typeInfo = null; +int count = 0; +ObjectInspector objectInspector = null; +int outputColumnNum = -1; +VectorSerializeWriter writer = null; Field() { Review comment: Can be removed. ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorDeserializeRow.java ## @@ -933,12 +1207,20 @@ private void storeUnionRowColumn(ColumnVector colVector, unionColVector.isNull[batchIndex] = false; unionColVector.tags[batchIndex] = tag; -storeComplexFieldRowColumn( +deserializer.storeComplexFieldRowColumn( colVectorFields[tag], unionHelper.getFields()[tag], batchIndex, canRetainByteRef); -deserializeRead.finishComplexVariableFieldsType(); +deserializer.deserializeRead.finishComplexVariableFieldsType(); + } + + abstract static class VectorBatchDeserializer { +abstract void store(ColumnVector colVector, Field field, int batchIndex, boolean canRetainByteRef, +VectorDeserializeRow deserializer) throws IOException; Review comment: Why VectorDeserializerRow needs be passed here again? ("this" references in other places as well). If you remove "static" class declaration in VectorBatchDeserializer children, you may not need to pass this. And the patch would become lot lesser changes? ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSerializeRow.java ## @@ -274,44 +315,25 @@ private void serializeWrite( return; } isAllNulls = false; +field.writer.serialize(colVector, field, adjustedBatchIndex, this); + } -if (field.isPrimitive) { - serializePrimitiveWrite(colVector, field, adjustedBatchIndex); - return; -} -final Category category = field.category; -switch (category) { -case LIST: - serializeListWrite( - (ListColumnVector) colVector, - field, - adjustedBatchIndex); - break; -case MAP: - serializeMapWrite( - (MapColumnVector) colVector, - field, - adjustedBatchIndex); - break; -case STRUCT: - serializeStructWrite( - (StructColumnVector) colVector, - field, - adjustedBatchIndex); - break; -case UNION: - serializeUnionWrite( - (UnionColumnVector) colVector, - field, - adjustedBatchIndex); - break; -default: - throw new RuntimeException("Unexpected category " + category); + abstract static class VectorSerializeWriter { +abstract void serialize(Object colVector, Field field, int adjustedBatchIndex, +VectorSerializeRow serializeRow) throws IOException; Review comment: Same as earlier. VectorSerializeRow need not be passed here. Patch may need lesser changes if you remove static declaration on children. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 521963) Time Spent: 20m (was: 10m) > Optimize vector row serde by avoiding type check at run time > - > > Key: HIVE-24503 > URL: https://issues.apache.org/jira/browse/HIVE-24503 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > >
[jira] [Commented] (HIVE-24482) Advance write Id during AlterTableAddConstraint DDL
[ https://issues.apache.org/jira/browse/HIVE-24482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246213#comment-17246213 ] Kishen Das commented on HIVE-24482: --- https://github.com/apache/hive/pull/1737 > Advance write Id during AlterTableAddConstraint DDL > --- > > Key: HIVE-24482 > URL: https://issues.apache.org/jira/browse/HIVE-24482 > Project: Hive > Issue Type: Sub-task >Reporter: Kishen Das >Assignee: Kishen Das >Priority: Major > > For AlterTableAddConstraint related DDL tasks, although we might be advancing > the write ID, looks like it's not updated correctly during the Analyzer > phase. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24509) Move show specific codes under DDL and cut MetaDataFormatter classes to pieces
[ https://issues.apache.org/jira/browse/HIVE-24509?focusedWorklogId=521932=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-521932 ] ASF GitHub Bot logged work on HIVE-24509: - Author: ASF GitHub Bot Created on: 08/Dec/20 23:23 Start Date: 08/Dec/20 23:23 Worklog Time Spent: 10m Work Description: miklosgergely opened a new pull request #1756: URL: https://github.com/apache/hive/pull/1756 ### What changes were proposed in this pull request? Move the codes used by show commands only next to the classes processing those commands. ### Why are the changes needed? Move the codes from org.apache.hadoop.hive.ql.metadata.formatting to the show command related direcctories, cutting them to pieces to their specific commands, or the utility classes if used by multiple commands. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? All the unit tests and q tests are still running. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 521932) Remaining Estimate: 0h Time Spent: 10m > Move show specific codes under DDL and cut MetaDataFormatter classes to pieces > -- > > Key: HIVE-24509 > URL: https://issues.apache.org/jira/browse/HIVE-24509 > Project: Hive > Issue Type: Sub-task > Components: Hive >Reporter: Miklos Gergely >Assignee: Miklos Gergely >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Lot of show ... specific codes are under the > org.apache.hadoop.hive.ql.metadata.formatting package which are used only by > these commands. Also the two MetaDataFormatters (JsonMetaDataFormatter, > TextMetaDataFormatter) are trying to do everything, while they contain a lot > of code duplications. Their functionalities should be put under the > directories of the appropriate show commands. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24509) Move show specific codes under DDL and cut MetaDataFormatter classes to pieces
[ https://issues.apache.org/jira/browse/HIVE-24509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-24509: -- Labels: pull-request-available (was: ) > Move show specific codes under DDL and cut MetaDataFormatter classes to pieces > -- > > Key: HIVE-24509 > URL: https://issues.apache.org/jira/browse/HIVE-24509 > Project: Hive > Issue Type: Sub-task > Components: Hive >Reporter: Miklos Gergely >Assignee: Miklos Gergely >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Lot of show ... specific codes are under the > org.apache.hadoop.hive.ql.metadata.formatting package which are used only by > these commands. Also the two MetaDataFormatters (JsonMetaDataFormatter, > TextMetaDataFormatter) are trying to do everything, while they contain a lot > of code duplications. Their functionalities should be put under the > directories of the appropriate show commands. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24509) Move show specific codes under DDL and cut MetaDataFormatter classes to pieces
[ https://issues.apache.org/jira/browse/HIVE-24509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Miklos Gergely reassigned HIVE-24509: - > Move show specific codes under DDL and cut MetaDataFormatter classes to pieces > -- > > Key: HIVE-24509 > URL: https://issues.apache.org/jira/browse/HIVE-24509 > Project: Hive > Issue Type: Sub-task > Components: Hive >Reporter: Miklos Gergely >Assignee: Miklos Gergely >Priority: Major > > Lot of show ... specific codes are under the > org.apache.hadoop.hive.ql.metadata.formatting package which are used only by > these commands. Also the two MetaDataFormatters (JsonMetaDataFormatter, > TextMetaDataFormatter) are trying to do everything, while they contain a lot > of code duplications. Their functionalities should be put under the > directories of the appropriate show commands. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HIVE-23410) ACID: Improve the delete and update operations to avoid the move step
[ https://issues.apache.org/jira/browse/HIVE-23410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246181#comment-17246181 ] Marta Kuczora edited comment on HIVE-23410 at 12/8/20, 11:09 PM: - Pushed to master. Thanks a lot [~pvary] for the review!! was (Author: kuczoram): Pushed to master. Thanks a lot @pvary for the review!! > ACID: Improve the delete and update operations to avoid the move step > - > > Key: HIVE-23410 > URL: https://issues.apache.org/jira/browse/HIVE-23410 > Project: Hive > Issue Type: Improvement >Affects Versions: 4.0.0 >Reporter: Marta Kuczora >Assignee: Marta Kuczora >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23410.1.patch > > Time Spent: 4h > Remaining Estimate: 0h > > This is a follow-up task for > [HIVE-21164|https://issues.apache.org/jira/browse/HIVE-21164], where the > insert operation has been modified to write directly to the table locations > instead of the staging directory. The same improvement should be done for the > ACID update and delete operations as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23410) ACID: Improve the delete and update operations to avoid the move step
[ https://issues.apache.org/jira/browse/HIVE-23410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marta Kuczora updated HIVE-23410: - Fix Version/s: 4.0.0 Resolution: Fixed Status: Resolved (was: Patch Available) > ACID: Improve the delete and update operations to avoid the move step > - > > Key: HIVE-23410 > URL: https://issues.apache.org/jira/browse/HIVE-23410 > Project: Hive > Issue Type: Improvement >Affects Versions: 4.0.0 >Reporter: Marta Kuczora >Assignee: Marta Kuczora >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-23410.1.patch > > Time Spent: 4h > Remaining Estimate: 0h > > This is a follow-up task for > [HIVE-21164|https://issues.apache.org/jira/browse/HIVE-21164], where the > insert operation has been modified to write directly to the table locations > instead of the staging directory. The same improvement should be done for the > ACID update and delete operations as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23410) ACID: Improve the delete and update operations to avoid the move step
[ https://issues.apache.org/jira/browse/HIVE-23410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246181#comment-17246181 ] Marta Kuczora commented on HIVE-23410: -- Pushed to master. Thanks a lot @pvary for the review!! > ACID: Improve the delete and update operations to avoid the move step > - > > Key: HIVE-23410 > URL: https://issues.apache.org/jira/browse/HIVE-23410 > Project: Hive > Issue Type: Improvement >Affects Versions: 4.0.0 >Reporter: Marta Kuczora >Assignee: Marta Kuczora >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23410.1.patch > > Time Spent: 4h > Remaining Estimate: 0h > > This is a follow-up task for > [HIVE-21164|https://issues.apache.org/jira/browse/HIVE-21164], where the > insert operation has been modified to write directly to the table locations > instead of the staging directory. The same improvement should be done for the > ACID update and delete operations as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23410) ACID: Improve the delete and update operations to avoid the move step
[ https://issues.apache.org/jira/browse/HIVE-23410?focusedWorklogId=521927=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-521927 ] ASF GitHub Bot logged work on HIVE-23410: - Author: ASF GitHub Bot Created on: 08/Dec/20 23:08 Start Date: 08/Dec/20 23:08 Worklog Time Spent: 10m Work Description: kuczoram merged pull request #1660: URL: https://github.com/apache/hive/pull/1660 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 521927) Time Spent: 4h (was: 3h 50m) > ACID: Improve the delete and update operations to avoid the move step > - > > Key: HIVE-23410 > URL: https://issues.apache.org/jira/browse/HIVE-23410 > Project: Hive > Issue Type: Improvement >Affects Versions: 4.0.0 >Reporter: Marta Kuczora >Assignee: Marta Kuczora >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23410.1.patch > > Time Spent: 4h > Remaining Estimate: 0h > > This is a follow-up task for > [HIVE-21164|https://issues.apache.org/jira/browse/HIVE-21164], where the > insert operation has been modified to write directly to the table locations > instead of the staging directory. The same improvement should be done for the > ACID update and delete operations as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24497) Node heartbeats from LLAP Daemon to the client are not matching leading to timeout.
[ https://issues.apache.org/jira/browse/HIVE-24497?focusedWorklogId=521865=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-521865 ] ASF GitHub Bot logged work on HIVE-24497: - Author: ASF GitHub Bot Created on: 08/Dec/20 20:40 Start Date: 08/Dec/20 20:40 Worklog Time Spent: 10m Work Description: simhadri-g opened a new pull request #1755: URL: https://github.com/apache/hive/pull/1755 …tching leading to timeout in cloud deplyoment ### What changes were proposed in this pull request? ### Why are the changes needed? Node heartbeat contains info about all the tasks that were submitted to that LLAP Daemon. In cloud deployment, the client is not able to match this heartbeats due to differences in hostname and port resulting in timeout as see below with the following log . 20/07/24 03:27:03 INFO ext.LlapTaskUmbilicalExternalClient: No tasks found for heartbeat from hostname executor-host-0.executor-host-0.cluster.local, port 25000 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 521865) Remaining Estimate: 0h Time Spent: 10m > Node heartbeats from LLAP Daemon to the client are not matching leading to > timeout. > --- > > Key: HIVE-24497 > URL: https://issues.apache.org/jira/browse/HIVE-24497 > Project: Hive > Issue Type: Sub-task >Reporter: Simhadri G >Assignee: Simhadri G >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > Node heartbeat contains info about all the tasks that were submitted to that > LLAP Daemon. In cloud deployment, the client is not able to match this > heartbeats due to differences in hostname and port . -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24497) Node heartbeats from LLAP Daemon to the client are not matching leading to timeout.
[ https://issues.apache.org/jira/browse/HIVE-24497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-24497: -- Labels: pull-request-available (was: ) > Node heartbeats from LLAP Daemon to the client are not matching leading to > timeout. > --- > > Key: HIVE-24497 > URL: https://issues.apache.org/jira/browse/HIVE-24497 > Project: Hive > Issue Type: Sub-task >Reporter: Simhadri G >Assignee: Simhadri G >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Node heartbeat contains info about all the tasks that were submitted to that > LLAP Daemon. In cloud deployment, the client is not able to match this > heartbeats due to differences in hostname and port . -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24508) hive.parquet.timestamp.skip.conversion doesn't work
[ https://issues.apache.org/jira/browse/HIVE-24508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wenjun ma reassigned HIVE-24508: > hive.parquet.timestamp.skip.conversion doesn't work > --- > > Key: HIVE-24508 > URL: https://issues.apache.org/jira/browse/HIVE-24508 > Project: Hive > Issue Type: Bug > Components: Parquet >Reporter: wenjun ma >Assignee: wenjun ma >Priority: Major > Fix For: All Versions > > > Even we set true or false. When we insert the current timestamp it always > uses the local time zone. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23410) ACID: Improve the delete and update operations to avoid the move step
[ https://issues.apache.org/jira/browse/HIVE-23410?focusedWorklogId=521823=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-521823 ] ASF GitHub Bot logged work on HIVE-23410: - Author: ASF GitHub Bot Created on: 08/Dec/20 19:21 Start Date: 08/Dec/20 19:21 Worklog Time Spent: 10m Work Description: kuczoram commented on pull request #1660: URL: https://github.com/apache/hive/pull/1660#issuecomment-740890764 Thanks a lot for the review @pvary and @pvargacl ! :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 521823) Time Spent: 3h 50m (was: 3h 40m) > ACID: Improve the delete and update operations to avoid the move step > - > > Key: HIVE-23410 > URL: https://issues.apache.org/jira/browse/HIVE-23410 > Project: Hive > Issue Type: Improvement >Affects Versions: 4.0.0 >Reporter: Marta Kuczora >Assignee: Marta Kuczora >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23410.1.patch > > Time Spent: 3h 50m > Remaining Estimate: 0h > > This is a follow-up task for > [HIVE-21164|https://issues.apache.org/jira/browse/HIVE-21164], where the > insert operation has been modified to write directly to the table locations > instead of the staging directory. The same improvement should be done for the > ACID update and delete operations as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-11266) count(*) wrong result based on table statistics for external tables
[ https://issues.apache.org/jira/browse/HIVE-11266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246083#comment-17246083 ] Tristan Stevens commented on HIVE-11266: [~findepi] this is true however with managed (i.e. non external) tables then modifying the underlying data without performing a REFRESH is not supported. With external tables however it is expected behaviour. This is essentially the definition of MANAGED vs. EXTERNAL. > count(*) wrong result based on table statistics for external tables > --- > > Key: HIVE-11266 > URL: https://issues.apache.org/jira/browse/HIVE-11266 > Project: Hive > Issue Type: Bug >Affects Versions: 1.1.0 >Reporter: Simone Battaglia >Assignee: Jesus Camacho Rodriguez >Priority: Blocker > Fix For: 3.0.0 > > Attachments: HIVE-11266.01.patch, HIVE-11266.patch > > > Hive returns wrong count result on an external table with table statistics if > I change table data files. > This is the scenario in details: > 1) create external table my_table (...) location 'my_location'; > 2) analyze table my_table compute statistics; > 3) change/add/delete one or more files in 'my_location' directory; > 4) select count(\*) from my_table; > In this case the count query doesn't generate a MR job and returns the result > based on table statistics. This result is wrong because is based on > statistics stored in the Hive metastore and doesn't take into account > modifications introduced on data files. > Obviously setting "hive.compute.query.using.stats" to FALSE this problem > doesn't occur but the default value of this property is TRUE. > I thinks that also this post on stackoverflow, that shows another type of bug > in case of multiple insert, is related to the one that I reported: > http://stackoverflow.com/questions/24080276/wrong-result-for-count-in-hive-table -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24500) Hive - upgrade log4j 2.12.1 to 2.13.2+ due to CVE-2020-9488
[ https://issues.apache.org/jira/browse/HIVE-24500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-24500: -- Labels: pull-request-available (was: ) > Hive - upgrade log4j 2.12.1 to 2.13.2+ due to CVE-2020-9488 > --- > > Key: HIVE-24500 > URL: https://issues.apache.org/jira/browse/HIVE-24500 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Sai Hemanth Gantasala >Assignee: Sai Hemanth Gantasala >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Hive is pulling in log4j 2.12.1 specifically to: > * ./usr/lib/hive/lib/log4j-core-2.12.1.jar > CVE-2020-9488 affects this version and the fix is to upgrade to 2.13.2+. So, > upgrade this dependency. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24500) Hive - upgrade log4j 2.12.1 to 2.13.2+ due to CVE-2020-9488
[ https://issues.apache.org/jira/browse/HIVE-24500?focusedWorklogId=521784=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-521784 ] ASF GitHub Bot logged work on HIVE-24500: - Author: ASF GitHub Bot Created on: 08/Dec/20 17:48 Start Date: 08/Dec/20 17:48 Worklog Time Spent: 10m Work Description: saihemanth-cloudera opened a new pull request #1754: URL: https://github.com/apache/hive/pull/1754 …CVE-2020-9488 ### What changes were proposed in this pull request? Changing the log4j version in the pom to 2.13.2. ### Why are the changes needed? To avoid CVE-2020-9488 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Locally. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 521784) Remaining Estimate: 0h Time Spent: 10m > Hive - upgrade log4j 2.12.1 to 2.13.2+ due to CVE-2020-9488 > --- > > Key: HIVE-24500 > URL: https://issues.apache.org/jira/browse/HIVE-24500 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Sai Hemanth Gantasala >Assignee: Sai Hemanth Gantasala >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Hive is pulling in log4j 2.12.1 specifically to: > * ./usr/lib/hive/lib/log4j-core-2.12.1.jar > CVE-2020-9488 affects this version and the fix is to upgrade to 2.13.2+. So, > upgrade this dependency. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24507) "File file:XXX.jar does not exist" when changing content of "hive.reloadable.aux.jars.path" directories
[ https://issues.apache.org/jira/browse/HIVE-24507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arnaud Linz updated HIVE-24507: --- Description: Purpose of hive.reloadable.aux.jars.path, introduced by https://issues.apache.org/jira/browse/HIVE-7553 was do avoid scheduling maintenance window for every jar change, but it is not enough. On a large system, the lack of atomicity between the directory listing of jars contained in hive.reloadable.aux.jars.path and the actual use of the file when uploaded to the job's yarn resources may lead to query failures, even if no jar/UDF is used in the failing query (because it is a global parameter). Stack trace sample: {code:java} File file:/XXX.jar does not exist at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:641) at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:867) at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:631) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:442) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:378) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:329) at org.apache.hadoop.mapreduce.JobResourceUploader.copyRemoteFiles(JobResourceUploader.java:703) at org.apache.hadoop.mapreduce.JobResourceUploader.uploadLibJars(JobResourceUploader.java:315) at org.apache.hadoop.mapreduce.JobResourceUploader.uploadResourcesInternal(JobResourceUploader.java:207) at org.apache.hadoop.mapreduce.JobResourceUploader.uploadResources(JobResourceUploader.java:135) at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:99) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:194) at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1570) at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1567) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1567) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:576) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:571) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:571) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:562) at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:444) at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:151) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2200) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1843) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1563) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1339) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1334) at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:256) at org.apache.hive.service.cli.operation.SQLOperation.access$600(SQLOperation.java:92) at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:345) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875) at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:357) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) {code} It's probably not possible to achieve atomicity, but this lack of atomicity should be taken into account and this error should be a warning. Actually, if a jar is removed, it's probably because no query are using it any longer. And if it was really used, it will trigger another ClassNotFound error later that, with the warning log, can suffice. was: Purpose of hive.reloadable.aux.jars.path, introduced by https://issues.apache.org/jira/browse/HIVE-7553 was do avoid scheduling maintenance window for every jar change, but it is not enough. On a large system, the lack of atomicity between the directory listing of
[jira] [Work logged] (HIVE-24470) Separate HiveMetastore Thrift and Driver logic
[ https://issues.apache.org/jira/browse/HIVE-24470?focusedWorklogId=521758=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-521758 ] ASF GitHub Bot logged work on HIVE-24470: - Author: ASF GitHub Bot Created on: 08/Dec/20 16:22 Start Date: 08/Dec/20 16:22 Worklog Time Spent: 10m Work Description: mwalenia commented on pull request #1740: URL: https://github.com/apache/hive/pull/1740#issuecomment-740734192 since I'm not the PR creator, we'll need to wait with this for @Noremac201. Thanks for the tips! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 521758) Time Spent: 1h 50m (was: 1h 40m) > Separate HiveMetastore Thrift and Driver logic > -- > > Key: HIVE-24470 > URL: https://issues.apache.org/jira/browse/HIVE-24470 > Project: Hive > Issue Type: Improvement > Components: Standalone Metastore >Reporter: Cameron Moberg >Assignee: Cameron Moberg >Priority: Minor > Labels: pull-request-available > Time Spent: 1h 50m > Remaining Estimate: 0h > > In the file HiveMetastore.java the majority of the code is a thrift interface > rather than the actual logic behind starting hive metastore, this should be > moved out into a separate file to clean up the file. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23410) ACID: Improve the delete and update operations to avoid the move step
[ https://issues.apache.org/jira/browse/HIVE-23410?focusedWorklogId=521755=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-521755 ] ASF GitHub Bot logged work on HIVE-23410: - Author: ASF GitHub Bot Created on: 08/Dec/20 16:19 Start Date: 08/Dec/20 16:19 Worklog Time Spent: 10m Work Description: kuczoram commented on a change in pull request #1660: URL: https://github.com/apache/hive/pull/1660#discussion_r538556544 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java ## @@ -232,9 +236,25 @@ public void closeWriters(boolean abort) throws HiveException { for (int i = 0; i < updaters.length; i++) { if (updaters[i] != null) { SerDeStats stats = updaters[i].getStats(); - // Ignore 0 row files except in case of insert overwrite - if (isDirectInsert && (stats.getRowCount() > 0 || isInsertOverwrite)) { -outPathsCommitted[i] = updaters[i].getUpdatedFilePath(); + // Ignore 0 row files except in case of insert overwrite or delete or update + if (isDirectInsert + && (stats.getRowCount() > 0 || isInsertOverwrite || AcidUtils.Operation.DELETE.equals(acidOperation) + || AcidUtils.Operation.UPDATE.equals(acidOperation))) { +// In case of delete operation, the deleteFilePath has to be used, not the updatedFilePath +// In case of update operation, we need both paths. The updateFilePath will be added +// to the outPathsCommitted array and the deleteFilePath will be collected in a separate list. +OrcRecordUpdater recordUpdater = (OrcRecordUpdater) updaters[i]; +outPathsCommitted[i] = recordUpdater.getUpdatedFilePath(); Review comment: Sure! Fixed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 521755) Time Spent: 3h 40m (was: 3.5h) > ACID: Improve the delete and update operations to avoid the move step > - > > Key: HIVE-23410 > URL: https://issues.apache.org/jira/browse/HIVE-23410 > Project: Hive > Issue Type: Improvement >Affects Versions: 4.0.0 >Reporter: Marta Kuczora >Assignee: Marta Kuczora >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23410.1.patch > > Time Spent: 3h 40m > Remaining Estimate: 0h > > This is a follow-up task for > [HIVE-21164|https://issues.apache.org/jira/browse/HIVE-21164], where the > insert operation has been modified to write directly to the table locations > instead of the staging directory. The same improvement should be done for the > ACID update and delete operations as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24507) "File file:XXX.jar does not exist" when changing content of "hive.reloadable.aux.jars.path" directories
[ https://issues.apache.org/jira/browse/HIVE-24507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arnaud Linz updated HIVE-24507: --- Summary: "File file:XXX.jar does not exist" when changing content of "hive.reloadable.aux.jars.path" directories (was: "File file:XXX.jar does not exist" when changing content of "hive.reloadable.aux.jars.path" directory content) > "File file:XXX.jar does not exist" when changing content of > "hive.reloadable.aux.jars.path" directories > --- > > Key: HIVE-24507 > URL: https://issues.apache.org/jira/browse/HIVE-24507 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 2.1.1 >Reporter: Arnaud Linz >Priority: Major > > Purpose of hive.reloadable.aux.jars.path, introduced by > https://issues.apache.org/jira/browse/HIVE-7553 was do avoid scheduling > maintenance window for every jar change, but it is not enough. > On a large system, the lack of atomicity between the directory listing of > jars contained in hive.reloadable.aux.jars.path and the actual use of the > file when uploaded to the job's yarn resources lead to query failures, even > if no jar/UDF is used in the failing query (because it is a global parameter). > Stack trace sample: > {code:java} > File file:/XXX.jar does not exist >at > org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:641) >at > org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:867) >at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:631) >at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:442) >at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:378) >at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:329) >at > org.apache.hadoop.mapreduce.JobResourceUploader.copyRemoteFiles(JobResourceUploader.java:703) >at > org.apache.hadoop.mapreduce.JobResourceUploader.uploadLibJars(JobResourceUploader.java:315) >at > org.apache.hadoop.mapreduce.JobResourceUploader.uploadResourcesInternal(JobResourceUploader.java:207) >at > org.apache.hadoop.mapreduce.JobResourceUploader.uploadResources(JobResourceUploader.java:135) >at > org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:99) >at > org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:194) >at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1570) >at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1567) >at java.security.AccessController.doPrivileged(Native Method) >at javax.security.auth.Subject.doAs(Subject.java:422) >at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875) >at org.apache.hadoop.mapreduce.Job.submit(Job.java:1567) >at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:576) >at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:571) >at java.security.AccessController.doPrivileged(Native Method) >at javax.security.auth.Subject.doAs(Subject.java:422) >at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875) >at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:571) >at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:562) >at > org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:444) >at > org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:151) >at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199) >at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97) >at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2200) >at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1843) >at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1563) >at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1339) >at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1334) >at > org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:256) >at > org.apache.hive.service.cli.operation.SQLOperation.access$600(SQLOperation.java:92) >at > org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:345) >at java.security.AccessController.doPrivileged(Native Method) >at javax.security.auth.Subject.doAs(Subject.java:422) >at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875) >at > org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:357) >at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) >at
[jira] [Updated] (HIVE-24507) "File file:XXX.jar does not exist" when changing content of "hive.reloadable.aux.jars.path" directory content
[ https://issues.apache.org/jira/browse/HIVE-24507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arnaud Linz updated HIVE-24507: --- Description: Purpose of hive.reloadable.aux.jars.path, introduced by https://issues.apache.org/jira/browse/HIVE-7553 was do avoid scheduling maintenance window for every jar change, but it is not enough. On a large system, the lack of atomicity between the directory listing of jars contained in hive.reloadable.aux.jars.path and the actual use of the file when uploaded to the job's yarn resources lead to query failures, even if no jar/UDF is used in the failing query (because it is a global parameter). Stack trace sample: {code:java} File file:/XXX.jar does not exist at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:641) at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:867) at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:631) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:442) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:378) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:329) at org.apache.hadoop.mapreduce.JobResourceUploader.copyRemoteFiles(JobResourceUploader.java:703) at org.apache.hadoop.mapreduce.JobResourceUploader.uploadLibJars(JobResourceUploader.java:315) at org.apache.hadoop.mapreduce.JobResourceUploader.uploadResourcesInternal(JobResourceUploader.java:207) at org.apache.hadoop.mapreduce.JobResourceUploader.uploadResources(JobResourceUploader.java:135) at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:99) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:194) at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1570) at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1567) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1567) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:576) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:571) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:571) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:562) at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:444) at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:151) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2200) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1843) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1563) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1339) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1334) at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:256) at org.apache.hive.service.cli.operation.SQLOperation.access$600(SQLOperation.java:92) at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:345) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875) at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:357) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) {code} It's probably not possible to achieve atomicity, but this lack of atomicity should be taken into account and this error should be a warning. Actually, if a jar is removed, it's probably because no query are using it any longer. And if it was really used, it will trigger another ClassNotFound error later that, with the warning log, can suffice. was: Purpose of hive.reloadable.aux.jars.path, introduced by https://issues.apache.org/jira/browse/HIVE-7553 was do avoid scheduling maintenance window for every jar change, but it is not enough. On a large system, the lack of atomicity between the directory listing of jars
[jira] [Work logged] (HIVE-23410) ACID: Improve the delete and update operations to avoid the move step
[ https://issues.apache.org/jira/browse/HIVE-23410?focusedWorklogId=521751=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-521751 ] ASF GitHub Bot logged work on HIVE-23410: - Author: ASF GitHub Bot Created on: 08/Dec/20 15:53 Start Date: 08/Dec/20 15:53 Worklog Time Spent: 10m Work Description: kuczoram commented on a change in pull request #1660: URL: https://github.com/apache/hive/pull/1660#discussion_r538526877 ## File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java ## @@ -563,6 +564,21 @@ else if (filename.startsWith(BUCKET_PREFIX)) { return result; } + public static Map getDeltaToAttemptIdMap( Review comment: Done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 521751) Time Spent: 3.5h (was: 3h 20m) > ACID: Improve the delete and update operations to avoid the move step > - > > Key: HIVE-23410 > URL: https://issues.apache.org/jira/browse/HIVE-23410 > Project: Hive > Issue Type: Improvement >Affects Versions: 4.0.0 >Reporter: Marta Kuczora >Assignee: Marta Kuczora >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23410.1.patch > > Time Spent: 3.5h > Remaining Estimate: 0h > > This is a follow-up task for > [HIVE-21164|https://issues.apache.org/jira/browse/HIVE-21164], where the > insert operation has been modified to write directly to the table locations > instead of the staging directory. The same improvement should be done for the > ACID update and delete operations as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23410) ACID: Improve the delete and update operations to avoid the move step
[ https://issues.apache.org/jira/browse/HIVE-23410?focusedWorklogId=521750=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-521750 ] ASF GitHub Bot logged work on HIVE-23410: - Author: ASF GitHub Bot Created on: 08/Dec/20 15:48 Start Date: 08/Dec/20 15:48 Worklog Time Spent: 10m Work Description: kuczoram commented on a change in pull request #1660: URL: https://github.com/apache/hive/pull/1660#discussion_r538520768 ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java ## @@ -1895,7 +1901,20 @@ public static boolean isSkewedStoredAsDirs(FileSinkDesc fsInputDesc) { } if ((srcDir != null) && srcDir.equals(fsopFinalDir)) { -return mvTsk; +if (isDirectInsert || isMmFsop) { + if (moveTaskId != null && fsoMoveTaskId != null && moveTaskId.equals(fsoMoveTaskId)) { +// If the ACID direct insert is on, the MoveTasks cannot be identified by the srcDir as +// in this case the srcDir is always the root directory of the table. +// We need to consider the ACID write type to identify the MoveTasks. +return mvTsk; + } + if ((moveTaskId == null || fsoMoveTaskId == null) && moveTaskWriteType != null Review comment: There was a test which was failing if this was not there, but since then I think I fixed the moveTaskId generation, so cannot be null. It think this is not needed. I will remove it and let's see what the tests say. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 521750) Time Spent: 3h 20m (was: 3h 10m) > ACID: Improve the delete and update operations to avoid the move step > - > > Key: HIVE-23410 > URL: https://issues.apache.org/jira/browse/HIVE-23410 > Project: Hive > Issue Type: Improvement >Affects Versions: 4.0.0 >Reporter: Marta Kuczora >Assignee: Marta Kuczora >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23410.1.patch > > Time Spent: 3h 20m > Remaining Estimate: 0h > > This is a follow-up task for > [HIVE-21164|https://issues.apache.org/jira/browse/HIVE-21164], where the > insert operation has been modified to write directly to the table locations > instead of the staging directory. The same improvement should be done for the > ACID update and delete operations as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23410) ACID: Improve the delete and update operations to avoid the move step
[ https://issues.apache.org/jira/browse/HIVE-23410?focusedWorklogId=521746=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-521746 ] ASF GitHub Bot logged work on HIVE-23410: - Author: ASF GitHub Bot Created on: 08/Dec/20 15:38 Start Date: 08/Dec/20 15:38 Worklog Time Spent: 10m Work Description: kuczoram commented on a change in pull request #1660: URL: https://github.com/apache/hive/pull/1660#discussion_r538510542 ## File path: ql/src/test/queries/clientpositive/sort_acid.q ## @@ -16,7 +16,7 @@ explain cbo update acidtlb set b=777; update acidtlb set b=777; -select * from acidtlb; +select * from acidtlb order by a; Review comment: Sure, fixed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 521746) Time Spent: 3h 10m (was: 3h) > ACID: Improve the delete and update operations to avoid the move step > - > > Key: HIVE-23410 > URL: https://issues.apache.org/jira/browse/HIVE-23410 > Project: Hive > Issue Type: Improvement >Affects Versions: 4.0.0 >Reporter: Marta Kuczora >Assignee: Marta Kuczora >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23410.1.patch > > Time Spent: 3h 10m > Remaining Estimate: 0h > > This is a follow-up task for > [HIVE-21164|https://issues.apache.org/jira/browse/HIVE-21164], where the > insert operation has been modified to write directly to the table locations > instead of the staging directory. The same improvement should be done for the > ACID update and delete operations as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24506) Investigate the materialized_view_create_rewrite_4.q test with direct insert on
[ https://issues.apache.org/jira/browse/HIVE-24506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marta Kuczora updated HIVE-24506: - Description: In the materialized_view_create_rewrite_4.q the direct insert got turned off, because if it was on, the totalSize of the table alternated between two values from run to run. In other test cases this issue was due to the order in which the FSOs got the statementIds. Since the direct insert is not necessary for materialized views, I turned it off for this test in HIVE-23410 and will investigate under this Jira. > Investigate the materialized_view_create_rewrite_4.q test with direct insert > on > --- > > Key: HIVE-24506 > URL: https://issues.apache.org/jira/browse/HIVE-24506 > Project: Hive > Issue Type: Task >Affects Versions: 4.0.0 >Reporter: Marta Kuczora >Assignee: Marta Kuczora >Priority: Major > > In the materialized_view_create_rewrite_4.q the direct insert got turned off, > because if it was on, the totalSize of the table alternated between two > values from run to run. In other test cases this issue was due to the order > in which the FSOs got the statementIds. Since the direct insert is not > necessary for materialized views, I turned it off for this test in HIVE-23410 > and will investigate under this Jira. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23410) ACID: Improve the delete and update operations to avoid the move step
[ https://issues.apache.org/jira/browse/HIVE-23410?focusedWorklogId=521745=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-521745 ] ASF GitHub Bot logged work on HIVE-23410: - Author: ASF GitHub Bot Created on: 08/Dec/20 15:36 Start Date: 08/Dec/20 15:36 Worklog Time Spent: 10m Work Description: kuczoram commented on a change in pull request #1660: URL: https://github.com/apache/hive/pull/1660#discussion_r538508071 ## File path: ql/src/test/queries/clientpositive/materialized_view_create_rewrite_4.q ## @@ -3,6 +3,7 @@ set hive.support.concurrency=true; set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager; set hive.strict.checks.cartesian.product=false; set hive.materializedview.rewriting=true; +set hive.acid.direct.insert.enabled=false; Review comment: Done: https://issues.apache.org/jira/browse/HIVE-24506 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 521745) Time Spent: 3h (was: 2h 50m) > ACID: Improve the delete and update operations to avoid the move step > - > > Key: HIVE-23410 > URL: https://issues.apache.org/jira/browse/HIVE-23410 > Project: Hive > Issue Type: Improvement >Affects Versions: 4.0.0 >Reporter: Marta Kuczora >Assignee: Marta Kuczora >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23410.1.patch > > Time Spent: 3h > Remaining Estimate: 0h > > This is a follow-up task for > [HIVE-21164|https://issues.apache.org/jira/browse/HIVE-21164], where the > insert operation has been modified to write directly to the table locations > instead of the staging directory. The same improvement should be done for the > ACID update and delete operations as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24506) Investigate the materialized_view_create_rewrite_4.q test with direct insert on
[ https://issues.apache.org/jira/browse/HIVE-24506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marta Kuczora reassigned HIVE-24506: Assignee: Marta Kuczora > Investigate the materialized_view_create_rewrite_4.q test with direct insert > on > --- > > Key: HIVE-24506 > URL: https://issues.apache.org/jira/browse/HIVE-24506 > Project: Hive > Issue Type: Task >Affects Versions: 4.0.0 >Reporter: Marta Kuczora >Assignee: Marta Kuczora >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24506) Investigate the materialized_view_create_rewrite_4.q test with direct insert on
[ https://issues.apache.org/jira/browse/HIVE-24506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marta Kuczora updated HIVE-24506: - Affects Version/s: 4.0.0 > Investigate the materialized_view_create_rewrite_4.q test with direct insert > on > --- > > Key: HIVE-24506 > URL: https://issues.apache.org/jira/browse/HIVE-24506 > Project: Hive > Issue Type: Task >Affects Versions: 4.0.0 >Reporter: Marta Kuczora >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23410) ACID: Improve the delete and update operations to avoid the move step
[ https://issues.apache.org/jira/browse/HIVE-23410?focusedWorklogId=521744=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-521744 ] ASF GitHub Bot logged work on HIVE-23410: - Author: ASF GitHub Bot Created on: 08/Dec/20 15:32 Start Date: 08/Dec/20 15:32 Worklog Time Spent: 10m Work Description: kuczoram commented on a change in pull request #1660: URL: https://github.com/apache/hive/pull/1660#discussion_r538502931 ## File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java ## @@ -2694,7 +2699,7 @@ private void constructOneLBLocationMap(FileStatus fSta, */ private Set getValidPartitionsInPath( int numDP, int numLB, Path loadPath, Long writeId, int stmtId, - boolean isMmTable, boolean isInsertOverwrite, boolean isDirectInsert) throws HiveException { + boolean isMmTable, boolean isInsertOverwrite, boolean isDirectInsert, AcidUtils.Operation operation, Set dynamiPartitionSpecs) throws HiveException { Review comment: Fixed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 521744) Time Spent: 2h 50m (was: 2h 40m) > ACID: Improve the delete and update operations to avoid the move step > - > > Key: HIVE-23410 > URL: https://issues.apache.org/jira/browse/HIVE-23410 > Project: Hive > Issue Type: Improvement >Affects Versions: 4.0.0 >Reporter: Marta Kuczora >Assignee: Marta Kuczora >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23410.1.patch > > Time Spent: 2h 50m > Remaining Estimate: 0h > > This is a follow-up task for > [HIVE-21164|https://issues.apache.org/jira/browse/HIVE-21164], where the > insert operation has been modified to write directly to the table locations > instead of the staging directory. The same improvement should be done for the > ACID update and delete operations as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23410) ACID: Improve the delete and update operations to avoid the move step
[ https://issues.apache.org/jira/browse/HIVE-23410?focusedWorklogId=521743=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-521743 ] ASF GitHub Bot logged work on HIVE-23410: - Author: ASF GitHub Bot Created on: 08/Dec/20 15:31 Start Date: 08/Dec/20 15:31 Worklog Time Spent: 10m Work Description: kuczoram commented on a change in pull request #1660: URL: https://github.com/apache/hive/pull/1660#discussion_r538496432 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java ## @@ -251,7 +271,7 @@ public void closeWriters(boolean abort) throws HiveException { } } -private void commit(FileSystem fs, List commitPaths) throws HiveException { +private void commit(FileSystem fs, List commitPaths, List deleteDeltas) throws HiveException { Review comment: I know, but I don't really know a better solution, only if we change the internal structures in FileSinkOperator. Like using Lists instead of arrays. But this could have unexpected side effects. I am open to try it but I would do it under a separate Jira. I create one about investigating this refactoring. https://issues.apache.org/jira/browse/HIVE-24505 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 521743) Time Spent: 2h 40m (was: 2.5h) > ACID: Improve the delete and update operations to avoid the move step > - > > Key: HIVE-23410 > URL: https://issues.apache.org/jira/browse/HIVE-23410 > Project: Hive > Issue Type: Improvement >Affects Versions: 4.0.0 >Reporter: Marta Kuczora >Assignee: Marta Kuczora >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23410.1.patch > > Time Spent: 2h 40m > Remaining Estimate: 0h > > This is a follow-up task for > [HIVE-21164|https://issues.apache.org/jira/browse/HIVE-21164], where the > insert operation has been modified to write directly to the table locations > instead of the staging directory. The same improvement should be done for the > ACID update and delete operations as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24505) Investigate if the arrays in the FileSinkOperator could be replaced by Lists
[ https://issues.apache.org/jira/browse/HIVE-24505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marta Kuczora reassigned HIVE-24505: > Investigate if the arrays in the FileSinkOperator could be replaced by Lists > > > Key: HIVE-24505 > URL: https://issues.apache.org/jira/browse/HIVE-24505 > Project: Hive > Issue Type: Task >Affects Versions: 4.0.0 >Reporter: Marta Kuczora >Assignee: Marta Kuczora >Priority: Major > > The FileSinkOperator uses some array variables, like > Path[] outPaths; > Path[] outPathsCommitted; > Path[] finalPaths; > RecordWriter[] outWriters; > RecordUpdater[] updaters; > Working with these is not always convenient, like when in the > createDynamicBucket method, they are extended with elements. Or in case of an > UPDATE operation with direct insert on. Then the delete deltas have to be > collected separately, because the outPaths array will contain only the > inserted deltas. These operations would be much easier with lists. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23410) ACID: Improve the delete and update operations to avoid the move step
[ https://issues.apache.org/jira/browse/HIVE-23410?focusedWorklogId=521739=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-521739 ] ASF GitHub Bot logged work on HIVE-23410: - Author: ASF GitHub Bot Created on: 08/Dec/20 15:26 Start Date: 08/Dec/20 15:26 Worklog Time Spent: 10m Work Description: kuczoram commented on a change in pull request #1660: URL: https://github.com/apache/hive/pull/1660#discussion_r538496432 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java ## @@ -251,7 +271,7 @@ public void closeWriters(boolean abort) throws HiveException { } } -private void commit(FileSystem fs, List commitPaths) throws HiveException { +private void commit(FileSystem fs, List commitPaths, List deleteDeltas) throws HiveException { Review comment: I know, but I don't really know a better solution, only if we change the internal structures in FileSinkOperator. Like using Lists instead of arrays. But this could have unexpected side effects. I am open to try it but I would do it under a separate Jira. I create one about investigating this refactoring. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 521739) Time Spent: 2.5h (was: 2h 20m) > ACID: Improve the delete and update operations to avoid the move step > - > > Key: HIVE-23410 > URL: https://issues.apache.org/jira/browse/HIVE-23410 > Project: Hive > Issue Type: Improvement >Affects Versions: 4.0.0 >Reporter: Marta Kuczora >Assignee: Marta Kuczora >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23410.1.patch > > Time Spent: 2.5h > Remaining Estimate: 0h > > This is a follow-up task for > [HIVE-21164|https://issues.apache.org/jira/browse/HIVE-21164], where the > insert operation has been modified to write directly to the table locations > instead of the staging directory. The same improvement should be done for the > ACID update and delete operations as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-11266) count(*) wrong result based on table statistics for external tables
[ https://issues.apache.org/jira/browse/HIVE-11266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17245953#comment-17245953 ] Piotr Findeisen commented on HIVE-11266: {quote}This is not just external tables - any tables where users are directly modifying the underlying data can be impacted by this. {quote} {quote}Yes, I agree with you, external table is just my personal use case.{quote} [~tmgstev] [~simobatt] was there a follow-up issue to this? >From the attached patch (same as >[https://github.com/apache/hive/commit/a2dff9e13acc62ecc0388b3b2e221f26c9184dbb)] > i see this was fixed for external tables only. > count(*) wrong result based on table statistics for external tables > --- > > Key: HIVE-11266 > URL: https://issues.apache.org/jira/browse/HIVE-11266 > Project: Hive > Issue Type: Bug >Affects Versions: 1.1.0 >Reporter: Simone Battaglia >Assignee: Jesus Camacho Rodriguez >Priority: Blocker > Fix For: 3.0.0 > > Attachments: HIVE-11266.01.patch, HIVE-11266.patch > > > Hive returns wrong count result on an external table with table statistics if > I change table data files. > This is the scenario in details: > 1) create external table my_table (...) location 'my_location'; > 2) analyze table my_table compute statistics; > 3) change/add/delete one or more files in 'my_location' directory; > 4) select count(\*) from my_table; > In this case the count query doesn't generate a MR job and returns the result > based on table statistics. This result is wrong because is based on > statistics stored in the Hive metastore and doesn't take into account > modifications introduced on data files. > Obviously setting "hive.compute.query.using.stats" to FALSE this problem > doesn't occur but the default value of this property is TRUE. > I thinks that also this post on stackoverflow, that shows another type of bug > in case of multiple insert, is related to the one that I reported: > http://stackoverflow.com/questions/24080276/wrong-result-for-count-in-hive-table -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23410) ACID: Improve the delete and update operations to avoid the move step
[ https://issues.apache.org/jira/browse/HIVE-23410?focusedWorklogId=521730=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-521730 ] ASF GitHub Bot logged work on HIVE-23410: - Author: ASF GitHub Bot Created on: 08/Dec/20 15:07 Start Date: 08/Dec/20 15:07 Worklog Time Spent: 10m Work Description: kuczoram commented on a change in pull request #1660: URL: https://github.com/apache/hive/pull/1660#discussion_r538475478 ## File path: ql/src/test/org/apache/hadoop/hive/ql/io/TestAcidInputFormat.java ## @@ -52,6 +52,8 @@ @Mock private DataInput mockDataInput; + // IRJUNK IDE TESZTET!!! Review comment: Removed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 521730) Time Spent: 2h 20m (was: 2h 10m) > ACID: Improve the delete and update operations to avoid the move step > - > > Key: HIVE-23410 > URL: https://issues.apache.org/jira/browse/HIVE-23410 > Project: Hive > Issue Type: Improvement >Affects Versions: 4.0.0 >Reporter: Marta Kuczora >Assignee: Marta Kuczora >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23410.1.patch > > Time Spent: 2h 20m > Remaining Estimate: 0h > > This is a follow-up task for > [HIVE-21164|https://issues.apache.org/jira/browse/HIVE-21164], where the > insert operation has been modified to write directly to the table locations > instead of the staging directory. The same improvement should be done for the > ACID update and delete operations as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23410) ACID: Improve the delete and update operations to avoid the move step
[ https://issues.apache.org/jira/browse/HIVE-23410?focusedWorklogId=521729=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-521729 ] ASF GitHub Bot logged work on HIVE-23410: - Author: ASF GitHub Bot Created on: 08/Dec/20 15:06 Start Date: 08/Dec/20 15:06 Worklog Time Spent: 10m Work Description: kuczoram commented on a change in pull request #1660: URL: https://github.com/apache/hive/pull/1660#discussion_r538474101 ## File path: ql/src/java/org/apache/hadoop/hive/ql/QueryPlan.java ## @@ -189,6 +191,49 @@ public WriteEntity getAcidAnalyzeTable() { return acidSinks; } + public Integer getStatementIdForAcidWriteType(long writeId, String moveTaskId, AcidUtils.Operation acidOperation, Path path) { +FileSinkDesc result = null; +for (FileSinkDesc acidSink : acidSinks) { + if (acidOperation.equals(acidSink.getAcidOperation()) && path.equals(acidSink.getDestPath()) + && acidSink.getTableWriteId() == writeId + && (moveTaskId == null || acidSink.getMoveTaskId() == null || moveTaskId.equals(acidSink.getMoveTaskId( { +// There is a problem with the union all optimisation. In this case, there will be multiple FileSinkOperators +// with the same operation, writeId and moveTaskId. But one of these FSOs doesn't write data and its statementId +// is not valid, so if this FSO is selected and its statementId is returned, the file listing will find nothing. +// So check the acidSinks and if two of them have the same writeId, path and moveTaskId, then return -1 as statementId. +// Like this, the file listing will find all partitions and files correctly. +if (result != null) { + return -1; +} +result = acidSink; + } +} +if (result != null) { + return result.getStatementId(); +} else { + return -1; +} + } + + public Set getDynamicPartitionSpecs(long writeId, String moveTaskId, AcidUtils.Operation acidOperation, Path path) { Review comment: Added it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 521729) Time Spent: 2h 10m (was: 2h) > ACID: Improve the delete and update operations to avoid the move step > - > > Key: HIVE-23410 > URL: https://issues.apache.org/jira/browse/HIVE-23410 > Project: Hive > Issue Type: Improvement >Affects Versions: 4.0.0 >Reporter: Marta Kuczora >Assignee: Marta Kuczora >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23410.1.patch > > Time Spent: 2h 10m > Remaining Estimate: 0h > > This is a follow-up task for > [HIVE-21164|https://issues.apache.org/jira/browse/HIVE-21164], where the > insert operation has been modified to write directly to the table locations > instead of the staging directory. The same improvement should be done for the > ACID update and delete operations as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23410) ACID: Improve the delete and update operations to avoid the move step
[ https://issues.apache.org/jira/browse/HIVE-23410?focusedWorklogId=521722=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-521722 ] ASF GitHub Bot logged work on HIVE-23410: - Author: ASF GitHub Bot Created on: 08/Dec/20 14:55 Start Date: 08/Dec/20 14:55 Worklog Time Spent: 10m Work Description: kuczoram commented on a change in pull request #1660: URL: https://github.com/apache/hive/pull/1660#discussion_r538455465 ## File path: ql/src/java/org/apache/hadoop/hive/ql/QueryPlan.java ## @@ -189,6 +191,49 @@ public WriteEntity getAcidAnalyzeTable() { return acidSinks; } + public Integer getStatementIdForAcidWriteType(long writeId, String moveTaskId, AcidUtils.Operation acidOperation, Path path) { +FileSinkDesc result = null; +for (FileSinkDesc acidSink : acidSinks) { + if (acidOperation.equals(acidSink.getAcidOperation()) && path.equals(acidSink.getDestPath()) + && acidSink.getTableWriteId() == writeId + && (moveTaskId == null || acidSink.getMoveTaskId() == null || moveTaskId.equals(acidSink.getMoveTaskId( { +// There is a problem with the union all optimisation. In this case, there will be multiple FileSinkOperators Review comment: Yeah, it would be better. Added Java doc. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 521722) Time Spent: 2h (was: 1h 50m) > ACID: Improve the delete and update operations to avoid the move step > - > > Key: HIVE-23410 > URL: https://issues.apache.org/jira/browse/HIVE-23410 > Project: Hive > Issue Type: Improvement >Affects Versions: 4.0.0 >Reporter: Marta Kuczora >Assignee: Marta Kuczora >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23410.1.patch > > Time Spent: 2h > Remaining Estimate: 0h > > This is a follow-up task for > [HIVE-21164|https://issues.apache.org/jira/browse/HIVE-21164], where the > insert operation has been modified to write directly to the table locations > instead of the staging directory. The same improvement should be done for the > ACID update and delete operations as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23410) ACID: Improve the delete and update operations to avoid the move step
[ https://issues.apache.org/jira/browse/HIVE-23410?focusedWorklogId=521721=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-521721 ] ASF GitHub Bot logged work on HIVE-23410: - Author: ASF GitHub Bot Created on: 08/Dec/20 14:55 Start Date: 08/Dec/20 14:55 Worklog Time Spent: 10m Work Description: kuczoram commented on a change in pull request #1660: URL: https://github.com/apache/hive/pull/1660#discussion_r538455465 ## File path: ql/src/java/org/apache/hadoop/hive/ql/QueryPlan.java ## @@ -189,6 +191,49 @@ public WriteEntity getAcidAnalyzeTable() { return acidSinks; } + public Integer getStatementIdForAcidWriteType(long writeId, String moveTaskId, AcidUtils.Operation acidOperation, Path path) { +FileSinkDesc result = null; +for (FileSinkDesc acidSink : acidSinks) { + if (acidOperation.equals(acidSink.getAcidOperation()) && path.equals(acidSink.getDestPath()) + && acidSink.getTableWriteId() == writeId + && (moveTaskId == null || acidSink.getMoveTaskId() == null || moveTaskId.equals(acidSink.getMoveTaskId( { +// There is a problem with the union all optimisation. In this case, there will be multiple FileSinkOperators Review comment: I fixed it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 521721) Time Spent: 1h 50m (was: 1h 40m) > ACID: Improve the delete and update operations to avoid the move step > - > > Key: HIVE-23410 > URL: https://issues.apache.org/jira/browse/HIVE-23410 > Project: Hive > Issue Type: Improvement >Affects Versions: 4.0.0 >Reporter: Marta Kuczora >Assignee: Marta Kuczora >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23410.1.patch > > Time Spent: 1h 50m > Remaining Estimate: 0h > > This is a follow-up task for > [HIVE-21164|https://issues.apache.org/jira/browse/HIVE-21164], where the > insert operation has been modified to write directly to the table locations > instead of the staging directory. The same improvement should be done for the > ACID update and delete operations as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23410) ACID: Improve the delete and update operations to avoid the move step
[ https://issues.apache.org/jira/browse/HIVE-23410?focusedWorklogId=521716=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-521716 ] ASF GitHub Bot logged work on HIVE-23410: - Author: ASF GitHub Bot Created on: 08/Dec/20 14:49 Start Date: 08/Dec/20 14:49 Worklog Time Spent: 10m Work Description: kuczoram commented on a change in pull request #1660: URL: https://github.com/apache/hive/pull/1660#discussion_r538455465 ## File path: ql/src/java/org/apache/hadoop/hive/ql/QueryPlan.java ## @@ -189,6 +191,49 @@ public WriteEntity getAcidAnalyzeTable() { return acidSinks; } + public Integer getStatementIdForAcidWriteType(long writeId, String moveTaskId, AcidUtils.Operation acidOperation, Path path) { +FileSinkDesc result = null; +for (FileSinkDesc acidSink : acidSinks) { + if (acidOperation.equals(acidSink.getAcidOperation()) && path.equals(acidSink.getDestPath()) + && acidSink.getTableWriteId() == writeId + && (moveTaskId == null || acidSink.getMoveTaskId() == null || moveTaskId.equals(acidSink.getMoveTaskId( { +// There is a problem with the union all optimisation. In this case, there will be multiple FileSinkOperators Review comment: Actually this comment is to explain why the following check was needed, but the Java doc for the whole method is a good idea. I added it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 521716) Time Spent: 1h 40m (was: 1.5h) > ACID: Improve the delete and update operations to avoid the move step > - > > Key: HIVE-23410 > URL: https://issues.apache.org/jira/browse/HIVE-23410 > Project: Hive > Issue Type: Improvement >Affects Versions: 4.0.0 >Reporter: Marta Kuczora >Assignee: Marta Kuczora >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23410.1.patch > > Time Spent: 1h 40m > Remaining Estimate: 0h > > This is a follow-up task for > [HIVE-21164|https://issues.apache.org/jira/browse/HIVE-21164], where the > insert operation has been modified to write directly to the table locations > instead of the staging directory. The same improvement should be done for the > ACID update and delete operations as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23410) ACID: Improve the delete and update operations to avoid the move step
[ https://issues.apache.org/jira/browse/HIVE-23410?focusedWorklogId=521714=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-521714 ] ASF GitHub Bot logged work on HIVE-23410: - Author: ASF GitHub Bot Created on: 08/Dec/20 14:46 Start Date: 08/Dec/20 14:46 Worklog Time Spent: 10m Work Description: kuczoram commented on a change in pull request #1660: URL: https://github.com/apache/hive/pull/1660#discussion_r538452455 ## File path: ql/src/java/org/apache/hadoop/hive/ql/Context.java ## @@ -105,6 +105,7 @@ private Configuration conf; protected int pathid = 1; + private int moveTaskId = 100; Review comment: Sure! Fixed it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 521714) Time Spent: 1.5h (was: 1h 20m) > ACID: Improve the delete and update operations to avoid the move step > - > > Key: HIVE-23410 > URL: https://issues.apache.org/jira/browse/HIVE-23410 > Project: Hive > Issue Type: Improvement >Affects Versions: 4.0.0 >Reporter: Marta Kuczora >Assignee: Marta Kuczora >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23410.1.patch > > Time Spent: 1.5h > Remaining Estimate: 0h > > This is a follow-up task for > [HIVE-21164|https://issues.apache.org/jira/browse/HIVE-21164], where the > insert operation has been modified to write directly to the table locations > instead of the staging directory. The same improvement should be done for the > ACID update and delete operations as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23410) ACID: Improve the delete and update operations to avoid the move step
[ https://issues.apache.org/jira/browse/HIVE-23410?focusedWorklogId=521711=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-521711 ] ASF GitHub Bot logged work on HIVE-23410: - Author: ASF GitHub Bot Created on: 08/Dec/20 14:40 Start Date: 08/Dec/20 14:40 Worklog Time Spent: 10m Work Description: kuczoram commented on a change in pull request #1660: URL: https://github.com/apache/hive/pull/1660#discussion_r538445196 ## File path: ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDbTxnManager2.java ## @@ -1747,15 +1747,15 @@ public void testMultiInsertOnDynamicallyPartitionedMmTable() throws Exception { final String completedTxnComponentsContents = TxnDbUtil.queryToString(conf, "select * from \"COMPLETED_TXN_COMPONENTS\""); Assert.assertEquals(completedTxnComponentsContents, -2, TxnDbUtil.countQueryAgent(conf, "select count(*) from \"COMPLETED_TXN_COMPONENTS\"")); +4, TxnDbUtil.countQueryAgent(conf, "select count(*) from \"COMPLETED_TXN_COMPONENTS\"")); Review comment: Those records are duplicates. It is a "side-effect" of fixing the FileSinkOperator-MoveTask assignment. For ACID tables for an insert like in the test, 4 records were created even before the direct insert got introduced. Because then the FSO-MoveTask assignment was based on the staging directories. And for insert like this there were 2 FSOs and 2 MoveTasks. Each MoveTasks called the metastore method which creates an entry in the TXN_COMPONENTS table for each partition. So there were 4 records at the end of the insert. But for MM tables (and later for direct insert) there is no staging directory and all MoveTasks and all FSOs will contain the table directory. So for every FSO it will find the same MoveTask (which is the first in the list) and only this one will be executed. This is not correct, but didn't cause any issue, so it was undetected until the direct delete and update came in. To make them work properly, had to fix the FSO-MoveTask assignment, but then for MM tables and with direct insert it will have duplicate records just like for ACID tables without direct insert. The Java doc of the TxnHandler.addDynamicPartitions method says that duplicates won't cause any trouble, but if you know issues with that, please share it with me. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 521711) Time Spent: 1h 20m (was: 1h 10m) > ACID: Improve the delete and update operations to avoid the move step > - > > Key: HIVE-23410 > URL: https://issues.apache.org/jira/browse/HIVE-23410 > Project: Hive > Issue Type: Improvement >Affects Versions: 4.0.0 >Reporter: Marta Kuczora >Assignee: Marta Kuczora >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23410.1.patch > > Time Spent: 1h 20m > Remaining Estimate: 0h > > This is a follow-up task for > [HIVE-21164|https://issues.apache.org/jira/browse/HIVE-21164], where the > insert operation has been modified to write directly to the table locations > instead of the staging directory. The same improvement should be done for the > ACID update and delete operations as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24504) VectorFileSinkArrowOperator does not serialize complex types correctly
[ https://issues.apache.org/jira/browse/HIVE-24504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Vary reassigned HIVE-24504: - > VectorFileSinkArrowOperator does not serialize complex types correctly > -- > > Key: HIVE-24504 > URL: https://issues.apache.org/jira/browse/HIVE-24504 > Project: Hive > Issue Type: Bug > Components: llap >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > > When the table has complex types and the result has 0 records the > VectorFileSinkArrowOperator only serializes the primitive types correctly. > For complex types only the main type is set which causes issues for clients > trying to read data. > Got the following HWC exception: > {code:java} > Previous exception in task: Unsupported data type: Null > > org.apache.spark.sql.execution.arrow.ArrowUtils$.fromArrowType(ArrowUtils.scala:71) > > org.apache.spark.sql.execution.arrow.ArrowUtils$.fromArrowField(ArrowUtils.scala:106) > > org.apache.spark.sql.execution.arrow.ArrowUtils$.fromArrowField(ArrowUtils.scala:98) > > org.apache.spark.sql.execution.arrow.ArrowUtils.fromArrowField(ArrowUtils.scala) > > org.apache.spark.sql.vectorized.ArrowColumnVector.(ArrowColumnVector.java:135) > > com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataReader.get(HiveWarehouseDataReader.java:105) > > com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataReader.get(HiveWarehouseDataReader.java:29) > > org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.next(DataSourceRDD.scala:59) > > org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:40) > > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.datasourcev2scan_nextBatch_0$(Unknown > Source) > > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown > Source) > > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614) > > org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:253) > > org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:247) > > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:836) > > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:836) > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49) > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) > org.apache.spark.rdd.RDD.iterator(RDD.scala:288) > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49) > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) > org.apache.spark.rdd.RDD.iterator(RDD.scala:288) > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > org.apache.spark.scheduler.Task.run(Task.scala:109) > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345) > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > java.lang.Thread.run(Thread.java:745) > at > org.apache.spark.TaskContextImpl.invokeListeners(TaskContextImpl.scala:139) > at > org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:117) > at org.apache.spark.scheduler.Task.run(Task.scala:119) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24470) Separate HiveMetastore Thrift and Driver logic
[ https://issues.apache.org/jira/browse/HIVE-24470?focusedWorklogId=521702=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-521702 ] ASF GitHub Bot logged work on HIVE-24470: - Author: ASF GitHub Bot Created on: 08/Dec/20 14:28 Start Date: 08/Dec/20 14:28 Worklog Time Spent: 10m Work Description: belugabehr commented on pull request #1740: URL: https://github.com/apache/hive/pull/1740#issuecomment-740652364 Also, please create a feature branch (HIVE-24470) on your local repository and PR from there. ``` git checkout -b HIVE-24470 ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 521702) Time Spent: 1h 40m (was: 1.5h) > Separate HiveMetastore Thrift and Driver logic > -- > > Key: HIVE-24470 > URL: https://issues.apache.org/jira/browse/HIVE-24470 > Project: Hive > Issue Type: Improvement > Components: Standalone Metastore >Reporter: Cameron Moberg >Assignee: Cameron Moberg >Priority: Minor > Labels: pull-request-available > Time Spent: 1h 40m > Remaining Estimate: 0h > > In the file HiveMetastore.java the majority of the code is a thrift interface > rather than the actual logic behind starting hive metastore, this should be > moved out into a separate file to clean up the file. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24470) Separate HiveMetastore Thrift and Driver logic
[ https://issues.apache.org/jira/browse/HIVE-24470?focusedWorklogId=521701=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-521701 ] ASF GitHub Bot logged work on HIVE-24470: - Author: ASF GitHub Bot Created on: 08/Dec/20 14:27 Start Date: 08/Dec/20 14:27 Worklog Time Spent: 10m Work Description: belugabehr commented on pull request #1740: URL: https://github.com/apache/hive/pull/1740#issuecomment-740651806 @mwalenia Go ahead and just close the PR manually for 30s and then re-open. That should trigger the tests again. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 521701) Time Spent: 1.5h (was: 1h 20m) > Separate HiveMetastore Thrift and Driver logic > -- > > Key: HIVE-24470 > URL: https://issues.apache.org/jira/browse/HIVE-24470 > Project: Hive > Issue Type: Improvement > Components: Standalone Metastore >Reporter: Cameron Moberg >Assignee: Cameron Moberg >Priority: Minor > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > In the file HiveMetastore.java the majority of the code is a thrift interface > rather than the actual logic behind starting hive metastore, this should be > moved out into a separate file to clean up the file. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24388) Enhance swo optimizations to merge EventOperators
[ https://issues.apache.org/jira/browse/HIVE-24388?focusedWorklogId=521674=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-521674 ] ASF GitHub Bot logged work on HIVE-24388: - Author: ASF GitHub Bot Created on: 08/Dec/20 13:15 Start Date: 08/Dec/20 13:15 Worklog Time Spent: 10m Work Description: kgyrtkirk commented on pull request #1750: URL: https://github.com/apache/hive/pull/1750#issuecomment-740612558 @jcamachor could you please take a look? this patch also closes down 2 smaller bugs - one of them might have potentially caused parallel edge creation; I think the issue was not new; the new algorithm just stressed the issue a little bit more...I will ask Sungwoo to check it again after we have this patch This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 521674) Time Spent: 20m (was: 10m) > Enhance swo optimizations to merge EventOperators > - > > Key: HIVE-24388 > URL: https://issues.apache.org/jira/browse/HIVE-24388 > Project: Hive > Issue Type: Sub-task >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > {code} > EVENT1->TS1 > EVENT2->TS2 > {code} > are not merged because a TS may only handles the first event properly; > sending 2 events would cause one of them to be ignored -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24503) Optimize vector row serde by avoiding type check at run time
[ https://issues.apache.org/jira/browse/HIVE-24503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-24503: -- Labels: pull-request-available (was: ) > Optimize vector row serde by avoiding type check at run time > - > > Key: HIVE-24503 > URL: https://issues.apache.org/jira/browse/HIVE-24503 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Serialization/Deserialization of vectorized batch done at VectorSerializeRow > and VectorDeserializeRow does a type checking for each column of each row. > This becomes very costly when there are billions of rows to read/write. This > can be optimized if the type check is done during init time and specific > reader/writer classes are created. This classes can be used directly stored > in filed structure to avoid run time type check. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24503) Optimize vector row serde by avoiding type check at run time
[ https://issues.apache.org/jira/browse/HIVE-24503?focusedWorklogId=521643=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-521643 ] ASF GitHub Bot logged work on HIVE-24503: - Author: ASF GitHub Bot Created on: 08/Dec/20 11:38 Start Date: 08/Dec/20 11:38 Worklog Time Spent: 10m Work Description: maheshk114 opened a new pull request #1753: URL: https://github.com/apache/hive/pull/1753 … ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 521643) Remaining Estimate: 0h Time Spent: 10m > Optimize vector row serde by avoiding type check at run time > - > > Key: HIVE-24503 > URL: https://issues.apache.org/jira/browse/HIVE-24503 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Serialization/Deserialization of vectorized batch done at VectorSerializeRow > and VectorDeserializeRow does a type checking for each column of each row. > This becomes very costly when there are billions of rows to read/write. This > can be optimized if the type check is done during init time and specific > reader/writer classes are created. This classes can be used directly stored > in filed structure to avoid run time type check. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24503) Optimize vector row serde by avoiding type check at run time
[ https://issues.apache.org/jira/browse/HIVE-24503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mahesh kumar behera updated HIVE-24503: --- Description: Serialization/Deserialization of vectorized batch done at VectorSerializeRow and VectorDeserializeRow does a type checking for each column of each row. This becomes very costly when there are billions of rows to read/write. This can be optimized if the type check is done during init time and specific reader/writer classes are created. This classes can be used directly stored in filed structure to avoid run time type check. (was: Serialization/Deserialization of vectorized batch done at VectorSerializeRow and VectorDeserializeRow does a type checking for each column of each row. This becomes very costly when there are billions of rows to read/write. This can be optimized if the type check is done during init time and specific reader/writer classes are created. ) > Optimize vector row serde by avoiding type check at run time > - > > Key: HIVE-24503 > URL: https://issues.apache.org/jira/browse/HIVE-24503 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > > Serialization/Deserialization of vectorized batch done at VectorSerializeRow > and VectorDeserializeRow does a type checking for each column of each row. > This becomes very costly when there are billions of rows to read/write. This > can be optimized if the type check is done during init time and specific > reader/writer classes are created. This classes can be used directly stored > in filed structure to avoid run time type check. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24503) Optimize vector row serde to avoid type check at run time
[ https://issues.apache.org/jira/browse/HIVE-24503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mahesh kumar behera reassigned HIVE-24503: -- > Optimize vector row serde to avoid type check at run time > -- > > Key: HIVE-24503 > URL: https://issues.apache.org/jira/browse/HIVE-24503 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > > Serialization/Deserialization of vectorized batch done at VectorSerializeRow > and VectorDeserializeRow does a type checking for each column of each row. > This becomes very costly when there are billions of rows to read/write. This > can be optimized if the type check is done during init time and specific > reader/writer classes are created. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24503) Optimize vector row serde by avoiding type check at run time
[ https://issues.apache.org/jira/browse/HIVE-24503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mahesh kumar behera updated HIVE-24503: --- Summary: Optimize vector row serde by avoiding type check at run time (was: Optimize vector row serde to avoid type check at run time ) > Optimize vector row serde by avoiding type check at run time > - > > Key: HIVE-24503 > URL: https://issues.apache.org/jira/browse/HIVE-24503 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > > Serialization/Deserialization of vectorized batch done at VectorSerializeRow > and VectorDeserializeRow does a type checking for each column of each row. > This becomes very costly when there are billions of rows to read/write. This > can be optimized if the type check is done during init time and specific > reader/writer classes are created. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24470) Separate HiveMetastore Thrift and Driver logic
[ https://issues.apache.org/jira/browse/HIVE-24470?focusedWorklogId=521635=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-521635 ] ASF GitHub Bot logged work on HIVE-24470: - Author: ASF GitHub Bot Created on: 08/Dec/20 11:23 Start Date: 08/Dec/20 11:23 Worklog Time Spent: 10m Work Description: miklosgergely commented on pull request #1740: URL: https://github.com/apache/hive/pull/1740#issuecomment-740561137 @mwalenia please push your changes again to your remote branch, with a new commit having the same content. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 521635) Time Spent: 1h 20m (was: 1h 10m) > Separate HiveMetastore Thrift and Driver logic > -- > > Key: HIVE-24470 > URL: https://issues.apache.org/jira/browse/HIVE-24470 > Project: Hive > Issue Type: Improvement > Components: Standalone Metastore >Reporter: Cameron Moberg >Assignee: Cameron Moberg >Priority: Minor > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > In the file HiveMetastore.java the majority of the code is a thrift interface > rather than the actual logic behind starting hive metastore, this should be > moved out into a separate file to clean up the file. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24470) Separate HiveMetastore Thrift and Driver logic
[ https://issues.apache.org/jira/browse/HIVE-24470?focusedWorklogId=521634=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-521634 ] ASF GitHub Bot logged work on HIVE-24470: - Author: ASF GitHub Bot Created on: 08/Dec/20 11:22 Start Date: 08/Dec/20 11:22 Worklog Time Spent: 10m Work Description: miklosgergely removed a comment on pull request #1740: URL: https://github.com/apache/hive/pull/1740#issuecomment-740560239 recheck This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 521634) Time Spent: 1h 10m (was: 1h) > Separate HiveMetastore Thrift and Driver logic > -- > > Key: HIVE-24470 > URL: https://issues.apache.org/jira/browse/HIVE-24470 > Project: Hive > Issue Type: Improvement > Components: Standalone Metastore >Reporter: Cameron Moberg >Assignee: Cameron Moberg >Priority: Minor > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > In the file HiveMetastore.java the majority of the code is a thrift interface > rather than the actual logic behind starting hive metastore, this should be > moved out into a separate file to clean up the file. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24470) Separate HiveMetastore Thrift and Driver logic
[ https://issues.apache.org/jira/browse/HIVE-24470?focusedWorklogId=521633=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-521633 ] ASF GitHub Bot logged work on HIVE-24470: - Author: ASF GitHub Bot Created on: 08/Dec/20 11:22 Start Date: 08/Dec/20 11:22 Worklog Time Spent: 10m Work Description: miklosgergely commented on pull request #1740: URL: https://github.com/apache/hive/pull/1740#issuecomment-740560239 recheck This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 521633) Time Spent: 1h (was: 50m) > Separate HiveMetastore Thrift and Driver logic > -- > > Key: HIVE-24470 > URL: https://issues.apache.org/jira/browse/HIVE-24470 > Project: Hive > Issue Type: Improvement > Components: Standalone Metastore >Reporter: Cameron Moberg >Assignee: Cameron Moberg >Priority: Minor > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > In the file HiveMetastore.java the majority of the code is a thrift interface > rather than the actual logic behind starting hive metastore, this should be > moved out into a separate file to clean up the file. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24470) Separate HiveMetastore Thrift and Driver logic
[ https://issues.apache.org/jira/browse/HIVE-24470?focusedWorklogId=521632=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-521632 ] ASF GitHub Bot logged work on HIVE-24470: - Author: ASF GitHub Bot Created on: 08/Dec/20 11:21 Start Date: 08/Dec/20 11:21 Worklog Time Spent: 10m Work Description: miklosgergely removed a comment on pull request #1740: URL: https://github.com/apache/hive/pull/1740#issuecomment-740559929 retest This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 521632) Time Spent: 50m (was: 40m) > Separate HiveMetastore Thrift and Driver logic > -- > > Key: HIVE-24470 > URL: https://issues.apache.org/jira/browse/HIVE-24470 > Project: Hive > Issue Type: Improvement > Components: Standalone Metastore >Reporter: Cameron Moberg >Assignee: Cameron Moberg >Priority: Minor > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > In the file HiveMetastore.java the majority of the code is a thrift interface > rather than the actual logic behind starting hive metastore, this should be > moved out into a separate file to clean up the file. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24470) Separate HiveMetastore Thrift and Driver logic
[ https://issues.apache.org/jira/browse/HIVE-24470?focusedWorklogId=521631=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-521631 ] ASF GitHub Bot logged work on HIVE-24470: - Author: ASF GitHub Bot Created on: 08/Dec/20 11:21 Start Date: 08/Dec/20 11:21 Worklog Time Spent: 10m Work Description: miklosgergely commented on pull request #1740: URL: https://github.com/apache/hive/pull/1740#issuecomment-740559929 retest This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 521631) Time Spent: 40m (was: 0.5h) > Separate HiveMetastore Thrift and Driver logic > -- > > Key: HIVE-24470 > URL: https://issues.apache.org/jira/browse/HIVE-24470 > Project: Hive > Issue Type: Improvement > Components: Standalone Metastore >Reporter: Cameron Moberg >Assignee: Cameron Moberg >Priority: Minor > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > In the file HiveMetastore.java the majority of the code is a thrift interface > rather than the actual logic behind starting hive metastore, this should be > moved out into a separate file to clean up the file. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24502) Store table level regular expression used during dump for table level replication
[ https://issues.apache.org/jira/browse/HIVE-24502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aasha Medhi reassigned HIVE-24502: -- > Store table level regular expression used during dump for table level > replication > - > > Key: HIVE-24502 > URL: https://issues.apache.org/jira/browse/HIVE-24502 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-24501) UpdateInputAccessTimeHook should not update stats
[ https://issues.apache.org/jira/browse/HIVE-24501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17245807#comment-17245807 ] Prasanth Jayachandran commented on HIVE-24501: -- [~ashutoshc] [~jcamachorodriguez] could someone please help with reviewing this small change? > UpdateInputAccessTimeHook should not update stats > - > > Key: HIVE-24501 > URL: https://issues.apache.org/jira/browse/HIVE-24501 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > UpdateInputAccessTimeHook can fail for transactional tables with following > exception. > The hook should skip updating the stats and only update the access time. > {code:java} > ERROR : FAILED: Hive Internal Error: > org.apache.hadoop.hive.ql.metadata.HiveException(Unable to alter table. > Cannot change stats state for a transactional table default.test without > providing the transactional write state for verification (new write ID 0, > valid write IDs default.test:8:9223372036854775807::1,2,3,4,7; current state > {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true","name":"true"}}; new state > null)ERROR : FAILED: Hive Internal Error: > org.apache.hadoop.hive.ql.metadata.HiveException(Unable to alter table. > Cannot change stats state for a transactional table default.test without > providing the transactional write state for verification (new write ID 0, > valid write IDs default.test:8:9223372036854775807::1,2,3,4,7; current state > {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true","name":"true"}}; new state > null)org.apache.hadoop.hive.ql.metadata.HiveException: Unable to alter table. > Cannot change stats state for a transactional table default.test without > providing the transactional write state for verification (new write ID 0, > valid write IDs default.test:8:9223372036854775807::1,2,3,4,7; current state > {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true","name":"true"}}; new state > null at org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:821) at > org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:769) at > org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:756) at > org.apache.hadoop.hive.ql.hooks.UpdateInputAccessTimeHook$PreExec.run(UpdateInputAccessTimeHook.java:70) > at > org.apache.hadoop.hive.ql.HookRunner.invokeGeneralHook(HookRunner.java:296) > at org.apache.hadoop.hive.ql.HookRunner.runPreHooks(HookRunner.java:273) at > org.apache.hadoop.hive.ql.Executor.preExecutionActions(Executor.java:155) at > org.apache.hadoop.hive.ql.Executor.execute(Executor.java:107) at > org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:721) at > org.apache.hadoop.hive.ql.Driver.run(Driver.java:488) at > org.apache.hadoop.hive.ql.Driver.run(Driver.java:482) at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166) at > org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:225) > at > org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87) > at > org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:322) > at java.security.AccessController.doPrivileged(Native Method) at > javax.security.auth.Subject.doAs(Subject.java:422) at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876) > at > org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:340) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at > java.util.concurrent.FutureTask.run(FutureTask.java:266) at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748)Caused by: > MetaException(message:Cannot change stats state for a transactional table > default.test without providing the transactional write state for verification > (new write ID 0, valid write IDs > default.test:8:9223372036854775807::1,2,3,4,7; current state > {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true","name":"true"}}; new state > null) at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$alter_table_req_result$alter_table_req_resultStandardScheme.read(ThriftHiveMetastore.java) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$alter_table_req_result$alter_table_req_resultStandardScheme.read(ThriftHiveMetastore.java) > at >
[jira] [Updated] (HIVE-24501) UpdateInputAccessTimeHook should not update stats
[ https://issues.apache.org/jira/browse/HIVE-24501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-24501: -- Labels: pull-request-available (was: ) > UpdateInputAccessTimeHook should not update stats > - > > Key: HIVE-24501 > URL: https://issues.apache.org/jira/browse/HIVE-24501 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > UpdateInputAccessTimeHook can fail for transactional tables with following > exception. > The hook should skip updating the stats and only update the access time. > {code:java} > ERROR : FAILED: Hive Internal Error: > org.apache.hadoop.hive.ql.metadata.HiveException(Unable to alter table. > Cannot change stats state for a transactional table default.test without > providing the transactional write state for verification (new write ID 0, > valid write IDs default.test:8:9223372036854775807::1,2,3,4,7; current state > {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true","name":"true"}}; new state > null)ERROR : FAILED: Hive Internal Error: > org.apache.hadoop.hive.ql.metadata.HiveException(Unable to alter table. > Cannot change stats state for a transactional table default.test without > providing the transactional write state for verification (new write ID 0, > valid write IDs default.test:8:9223372036854775807::1,2,3,4,7; current state > {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true","name":"true"}}; new state > null)org.apache.hadoop.hive.ql.metadata.HiveException: Unable to alter table. > Cannot change stats state for a transactional table default.test without > providing the transactional write state for verification (new write ID 0, > valid write IDs default.test:8:9223372036854775807::1,2,3,4,7; current state > {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true","name":"true"}}; new state > null at org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:821) at > org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:769) at > org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:756) at > org.apache.hadoop.hive.ql.hooks.UpdateInputAccessTimeHook$PreExec.run(UpdateInputAccessTimeHook.java:70) > at > org.apache.hadoop.hive.ql.HookRunner.invokeGeneralHook(HookRunner.java:296) > at org.apache.hadoop.hive.ql.HookRunner.runPreHooks(HookRunner.java:273) at > org.apache.hadoop.hive.ql.Executor.preExecutionActions(Executor.java:155) at > org.apache.hadoop.hive.ql.Executor.execute(Executor.java:107) at > org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:721) at > org.apache.hadoop.hive.ql.Driver.run(Driver.java:488) at > org.apache.hadoop.hive.ql.Driver.run(Driver.java:482) at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166) at > org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:225) > at > org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87) > at > org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:322) > at java.security.AccessController.doPrivileged(Native Method) at > javax.security.auth.Subject.doAs(Subject.java:422) at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876) > at > org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:340) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at > java.util.concurrent.FutureTask.run(FutureTask.java:266) at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748)Caused by: > MetaException(message:Cannot change stats state for a transactional table > default.test without providing the transactional write state for verification > (new write ID 0, valid write IDs > default.test:8:9223372036854775807::1,2,3,4,7; current state > {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true","name":"true"}}; new state > null) at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$alter_table_req_result$alter_table_req_resultStandardScheme.read(ThriftHiveMetastore.java) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$alter_table_req_result$alter_table_req_resultStandardScheme.read(ThriftHiveMetastore.java) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$alter_table_req_result.read(ThriftHiveMetastore.java) > at
[jira] [Updated] (HIVE-24501) UpdateInputAccessTimeHook should not update stats
[ https://issues.apache.org/jira/browse/HIVE-24501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-24501: - Status: Patch Available (was: Open) > UpdateInputAccessTimeHook should not update stats > - > > Key: HIVE-24501 > URL: https://issues.apache.org/jira/browse/HIVE-24501 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > UpdateInputAccessTimeHook can fail for transactional tables with following > exception. > The hook should skip updating the stats and only update the access time. > {code:java} > ERROR : FAILED: Hive Internal Error: > org.apache.hadoop.hive.ql.metadata.HiveException(Unable to alter table. > Cannot change stats state for a transactional table default.test without > providing the transactional write state for verification (new write ID 0, > valid write IDs default.test:8:9223372036854775807::1,2,3,4,7; current state > {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true","name":"true"}}; new state > null)ERROR : FAILED: Hive Internal Error: > org.apache.hadoop.hive.ql.metadata.HiveException(Unable to alter table. > Cannot change stats state for a transactional table default.test without > providing the transactional write state for verification (new write ID 0, > valid write IDs default.test:8:9223372036854775807::1,2,3,4,7; current state > {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true","name":"true"}}; new state > null)org.apache.hadoop.hive.ql.metadata.HiveException: Unable to alter table. > Cannot change stats state for a transactional table default.test without > providing the transactional write state for verification (new write ID 0, > valid write IDs default.test:8:9223372036854775807::1,2,3,4,7; current state > {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true","name":"true"}}; new state > null at org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:821) at > org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:769) at > org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:756) at > org.apache.hadoop.hive.ql.hooks.UpdateInputAccessTimeHook$PreExec.run(UpdateInputAccessTimeHook.java:70) > at > org.apache.hadoop.hive.ql.HookRunner.invokeGeneralHook(HookRunner.java:296) > at org.apache.hadoop.hive.ql.HookRunner.runPreHooks(HookRunner.java:273) at > org.apache.hadoop.hive.ql.Executor.preExecutionActions(Executor.java:155) at > org.apache.hadoop.hive.ql.Executor.execute(Executor.java:107) at > org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:721) at > org.apache.hadoop.hive.ql.Driver.run(Driver.java:488) at > org.apache.hadoop.hive.ql.Driver.run(Driver.java:482) at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166) at > org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:225) > at > org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87) > at > org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:322) > at java.security.AccessController.doPrivileged(Native Method) at > javax.security.auth.Subject.doAs(Subject.java:422) at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876) > at > org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:340) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at > java.util.concurrent.FutureTask.run(FutureTask.java:266) at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748)Caused by: > MetaException(message:Cannot change stats state for a transactional table > default.test without providing the transactional write state for verification > (new write ID 0, valid write IDs > default.test:8:9223372036854775807::1,2,3,4,7; current state > {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true","name":"true"}}; new state > null) at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$alter_table_req_result$alter_table_req_resultStandardScheme.read(ThriftHiveMetastore.java) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$alter_table_req_result$alter_table_req_resultStandardScheme.read(ThriftHiveMetastore.java) > at >
[jira] [Work logged] (HIVE-24501) UpdateInputAccessTimeHook should not update stats
[ https://issues.apache.org/jira/browse/HIVE-24501?focusedWorklogId=521620=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-521620 ] ASF GitHub Bot logged work on HIVE-24501: - Author: ASF GitHub Bot Created on: 08/Dec/20 10:35 Start Date: 08/Dec/20 10:35 Worklog Time Spent: 10m Work Description: prasanthj opened a new pull request #1752: URL: https://github.com/apache/hive/pull/1752 ### What changes were proposed in this pull request? When UpdateInputAccessTimeHook is used as pre-hook, any simple queries on transactional table throws exception (full stactrace in https://issues.apache.org/jira/browse/HIVE-24501) ``` org.apache.hadoop.hive.ql.metadata.HiveException(Unable to alter table. Cannot change stats state for a transactional table default.test without providing the transactional write state for verification ``` For updating only access time, the stats of table and partitions does not have to be updated. This PR sets environment context to skip updating the stats. ### Why are the changes needed? Bug with exception trace in https://issues.apache.org/jira/browse/HIVE-24501 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manually on dev cluster. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 521620) Remaining Estimate: 0h Time Spent: 10m > UpdateInputAccessTimeHook should not update stats > - > > Key: HIVE-24501 > URL: https://issues.apache.org/jira/browse/HIVE-24501 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > UpdateInputAccessTimeHook can fail for transactional tables with following > exception. > The hook should skip updating the stats and only update the access time. > {code:java} > ERROR : FAILED: Hive Internal Error: > org.apache.hadoop.hive.ql.metadata.HiveException(Unable to alter table. > Cannot change stats state for a transactional table default.test without > providing the transactional write state for verification (new write ID 0, > valid write IDs default.test:8:9223372036854775807::1,2,3,4,7; current state > {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true","name":"true"}}; new state > null)ERROR : FAILED: Hive Internal Error: > org.apache.hadoop.hive.ql.metadata.HiveException(Unable to alter table. > Cannot change stats state for a transactional table default.test without > providing the transactional write state for verification (new write ID 0, > valid write IDs default.test:8:9223372036854775807::1,2,3,4,7; current state > {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true","name":"true"}}; new state > null)org.apache.hadoop.hive.ql.metadata.HiveException: Unable to alter table. > Cannot change stats state for a transactional table default.test without > providing the transactional write state for verification (new write ID 0, > valid write IDs default.test:8:9223372036854775807::1,2,3,4,7; current state > {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true","name":"true"}}; new state > null at org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:821) at > org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:769) at > org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:756) at > org.apache.hadoop.hive.ql.hooks.UpdateInputAccessTimeHook$PreExec.run(UpdateInputAccessTimeHook.java:70) > at > org.apache.hadoop.hive.ql.HookRunner.invokeGeneralHook(HookRunner.java:296) > at org.apache.hadoop.hive.ql.HookRunner.runPreHooks(HookRunner.java:273) at > org.apache.hadoop.hive.ql.Executor.preExecutionActions(Executor.java:155) at > org.apache.hadoop.hive.ql.Executor.execute(Executor.java:107) at > org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:721) at > org.apache.hadoop.hive.ql.Driver.run(Driver.java:488) at > org.apache.hadoop.hive.ql.Driver.run(Driver.java:482) at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166) at > org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:225) > at > org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87) > at > org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:322) > at
[jira] [Assigned] (HIVE-24501) UpdateInputAccessTimeHook should not update stats
[ https://issues.apache.org/jira/browse/HIVE-24501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran reassigned HIVE-24501: > UpdateInputAccessTimeHook should not update stats > - > > Key: HIVE-24501 > URL: https://issues.apache.org/jira/browse/HIVE-24501 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Major > > UpdateInputAccessTimeHook can fail for transactional tables with following > exception. > The hook should skip updating the stats and only update the access time. > {code:java} > ERROR : FAILED: Hive Internal Error: > org.apache.hadoop.hive.ql.metadata.HiveException(Unable to alter table. > Cannot change stats state for a transactional table default.test without > providing the transactional write state for verification (new write ID 0, > valid write IDs default.test:8:9223372036854775807::1,2,3,4,7; current state > {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true","name":"true"}}; new state > null)ERROR : FAILED: Hive Internal Error: > org.apache.hadoop.hive.ql.metadata.HiveException(Unable to alter table. > Cannot change stats state for a transactional table default.test without > providing the transactional write state for verification (new write ID 0, > valid write IDs default.test:8:9223372036854775807::1,2,3,4,7; current state > {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true","name":"true"}}; new state > null)org.apache.hadoop.hive.ql.metadata.HiveException: Unable to alter table. > Cannot change stats state for a transactional table default.test without > providing the transactional write state for verification (new write ID 0, > valid write IDs default.test:8:9223372036854775807::1,2,3,4,7; current state > {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true","name":"true"}}; new state > null at org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:821) at > org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:769) at > org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:756) at > org.apache.hadoop.hive.ql.hooks.UpdateInputAccessTimeHook$PreExec.run(UpdateInputAccessTimeHook.java:70) > at > org.apache.hadoop.hive.ql.HookRunner.invokeGeneralHook(HookRunner.java:296) > at org.apache.hadoop.hive.ql.HookRunner.runPreHooks(HookRunner.java:273) at > org.apache.hadoop.hive.ql.Executor.preExecutionActions(Executor.java:155) at > org.apache.hadoop.hive.ql.Executor.execute(Executor.java:107) at > org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:721) at > org.apache.hadoop.hive.ql.Driver.run(Driver.java:488) at > org.apache.hadoop.hive.ql.Driver.run(Driver.java:482) at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166) at > org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:225) > at > org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87) > at > org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:322) > at java.security.AccessController.doPrivileged(Native Method) at > javax.security.auth.Subject.doAs(Subject.java:422) at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876) > at > org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:340) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at > java.util.concurrent.FutureTask.run(FutureTask.java:266) at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748)Caused by: > MetaException(message:Cannot change stats state for a transactional table > default.test without providing the transactional write state for verification > (new write ID 0, valid write IDs > default.test:8:9223372036854775807::1,2,3,4,7; current state > {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true","name":"true"}}; new state > null) at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$alter_table_req_result$alter_table_req_resultStandardScheme.read(ThriftHiveMetastore.java) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$alter_table_req_result$alter_table_req_resultStandardScheme.read(ThriftHiveMetastore.java) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$alter_table_req_result.read(ThriftHiveMetastore.java) > at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:86) at >
[jira] [Work logged] (HIVE-24470) Separate HiveMetastore Thrift and Driver logic
[ https://issues.apache.org/jira/browse/HIVE-24470?focusedWorklogId=521611=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-521611 ] ASF GitHub Bot logged work on HIVE-24470: - Author: ASF GitHub Bot Created on: 08/Dec/20 09:46 Start Date: 08/Dec/20 09:46 Worklog Time Spent: 10m Work Description: mwalenia commented on pull request #1740: URL: https://github.com/apache/hive/pull/1740#issuecomment-740510182 @miklosgergely can you run the tests again? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 521611) Time Spent: 0.5h (was: 20m) > Separate HiveMetastore Thrift and Driver logic > -- > > Key: HIVE-24470 > URL: https://issues.apache.org/jira/browse/HIVE-24470 > Project: Hive > Issue Type: Improvement > Components: Standalone Metastore >Reporter: Cameron Moberg >Assignee: Cameron Moberg >Priority: Minor > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > In the file HiveMetastore.java the majority of the code is a thrift interface > rather than the actual logic behind starting hive metastore, this should be > moved out into a separate file to clean up the file. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24337) Cache delete delta files in LLAP cache
[ https://issues.apache.org/jira/browse/HIVE-24337?focusedWorklogId=521609=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-521609 ] Ádám Szita logged work on HIVE-24337: - Author: Ádám Szita Created on: 08/Dec/20 09:39 Start Date: 08/Dec/20 09:39 Worklog Time Spent: 4h Issue Time Tracking --- Worklog Id: (was: 521609) Remaining Estimate: 0h Time Spent: 4h > Cache delete delta files in LLAP cache > -- > > Key: HIVE-24337 > URL: https://issues.apache.org/jira/browse/HIVE-24337 > Project: Hive > Issue Type: New Feature >Reporter: Ádám Szita >Priority: Major > Time Spent: 4h > Remaining Estimate: 0h > > HIVE-23824 added the functionality of caching metadata part of orc files in > LLAP cache, so that ACID reads can be faster. However the content itself > still needs to be read in every single time. If this could be cached too, > additional time could be saved. -- This message was sent by Atlassian Jira (v8.3.4#803005)