[jira] [Work logged] (HIVE-25397) Snapshot support for controlled failover

2021-11-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25397?focusedWorklogId=687804=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-687804
 ]

ASF GitHub Bot logged work on HIVE-25397:
-

Author: ASF GitHub Bot
Created on: 30/Nov/21 07:43
Start Date: 30/Nov/21 07:43
Worklog Time Spent: 10m 
  Work Description: ArkoSharma commented on a change in pull request #2539:
URL: https://github.com/apache/hive/pull/2539#discussion_r759005160



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplExternalTables.java
##
@@ -189,57 +191,137 @@ private void dirLocationToCopy(String tableName, 
FileList fileList, Path sourceP
   targetPath = new Path(Utils.replaceHost(targetPath.toString(), 
sourcePath.toUri().getHost()));
   sourcePath = new Path(Utils.replaceHost(sourcePath.toString(), 
remoteNS));
 }
-fileList.add(new DirCopyWork(tableName, sourcePath, targetPath, copyMode, 
snapshotPrefix).convertToString());
+fileList.add(new DirCopyWork(tableName, sourcePath, targetPath, copyMode, 
snapshotPrefix, isBootstrap).convertToString());
   }
 
-  private SnapshotUtils.SnapshotCopyMode createSnapshotsAtSource(Path 
sourcePath, String snapshotPrefix,
-  boolean isSnapshotEnabled, HiveConf conf, 
SnapshotUtils.ReplSnapshotCount replSnapshotCount, FileList snapPathFileList,
-  ArrayList prevSnaps, boolean isBootstrap) throws IOException {
+  SnapshotUtils.SnapshotCopyMode createSnapshotsAtSource(Path sourcePath, Path 
targetPath, String snapshotPrefix,
+  
boolean isSnapshotEnabled, HiveConf conf, SnapshotUtils.ReplSnapshotCount 
replSnapshotCount, FileList snapPathFileList,
+  
ArrayList prevSnaps, boolean isBootstrap) throws IOException {
 if (!isSnapshotEnabled) {
   LOG.info("Snapshot copy not enabled for path {} Will use normal distCp 
for copying data.", sourcePath);
   return FALLBACK_COPY;
 }
 DistributedFileSystem sourceDfs = SnapshotUtils.getDFS(sourcePath, conf);
 try {
-  if(isBootstrap) {
-// Delete any pre existing snapshots.
-SnapshotUtils.deleteSnapshotIfExists(sourceDfs, sourcePath, 
firstSnapshot(snapshotPrefix), conf);
-SnapshotUtils.deleteSnapshotIfExists(sourceDfs, sourcePath, 
secondSnapshot(snapshotPrefix), conf);
-allowAndCreateInitialSnapshot(sourcePath, snapshotPrefix, conf, 
replSnapshotCount, snapPathFileList, sourceDfs);
-return INITIAL_COPY;
+  if(isBootstrap && 
conf.getBoolVar(HiveConf.ConfVars.REPL_REUSE_SNAPSHOTS)) {
+try {
+  FileStatus[] listing = sourceDfs.listStatus(new Path(sourcePath, 
".snapshot"));
+  for (FileStatus elem : listing) {
+String snapShotName = elem.getPath().getName();
+String prefix;
+if (snapShotName.contains(OLD_SNAPSHOT)) {
+  prefix = snapShotName.substring(0, 
snapShotName.lastIndexOf(OLD_SNAPSHOT));
+  if(!prefix.equals(snapshotPrefix)) {
+sourceDfs.renameSnapshot(sourcePath, firstSnapshot(prefix), 
firstSnapshot(snapshotPrefix));
+  }
+}
+if (snapShotName.contains(NEW_SNAPSHOT)) {
+  prefix = snapShotName.substring(0, 
snapShotName.lastIndexOf(NEW_SNAPSHOT));
+  if(!prefix.equals(snapshotPrefix)) {
+sourceDfs.renameSnapshot(sourcePath, secondSnapshot(prefix), 
secondSnapshot(snapshotPrefix));
+  }
+}
+  }
+} catch (SnapshotException e) {
+  //dir not snapshottable, continue
+}
   }
+  boolean firstSnapAvailable =
+  SnapshotUtils.isSnapshotAvailable(sourceDfs, sourcePath, 
snapshotPrefix, OLD_SNAPSHOT, conf);
+  boolean secondSnapAvailable =
+  SnapshotUtils.isSnapshotAvailable(sourceDfs, sourcePath, 
snapshotPrefix, NEW_SNAPSHOT, conf);
 
+  //While resuming a failed replication
   if (prevSnaps.contains(sourcePath.toString())) {
 // We already created a snapshot for this, just refresh the latest 
snapshot and leave.
-sourceDfs.deleteSnapshot(sourcePath, secondSnapshot(snapshotPrefix));
-replSnapshotCount.incrementNumDeleted();
+// In case of reverse replication after fail-over, in some paths, 
second snapshot may not be present.
+if(SnapshotUtils.deleteSnapshotIfExists(sourceDfs, sourcePath, 
secondSnapshot(snapshotPrefix), conf)) {
+  replSnapshotCount.incrementNumDeleted();
+}
 SnapshotUtils.createSnapshot(sourceDfs, sourcePath, 
secondSnapshot(snapshotPrefix), conf);
 replSnapshotCount.incrementNumCreated();
 snapPathFileList.add(sourcePath.toString());
 return SnapshotUtils
-

[jira] [Work logged] (HIVE-25397) Snapshot support for controlled failover

2021-11-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25397?focusedWorklogId=687805=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-687805
 ]

ASF GitHub Bot logged work on HIVE-25397:
-

Author: ASF GitHub Bot
Created on: 30/Nov/21 07:43
Start Date: 30/Nov/21 07:43
Worklog Time Spent: 10m 
  Work Description: ArkoSharma commented on a change in pull request #2539:
URL: https://github.com/apache/hive/pull/2539#discussion_r759005160



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplExternalTables.java
##
@@ -189,57 +191,137 @@ private void dirLocationToCopy(String tableName, 
FileList fileList, Path sourceP
   targetPath = new Path(Utils.replaceHost(targetPath.toString(), 
sourcePath.toUri().getHost()));
   sourcePath = new Path(Utils.replaceHost(sourcePath.toString(), 
remoteNS));
 }
-fileList.add(new DirCopyWork(tableName, sourcePath, targetPath, copyMode, 
snapshotPrefix).convertToString());
+fileList.add(new DirCopyWork(tableName, sourcePath, targetPath, copyMode, 
snapshotPrefix, isBootstrap).convertToString());
   }
 
-  private SnapshotUtils.SnapshotCopyMode createSnapshotsAtSource(Path 
sourcePath, String snapshotPrefix,
-  boolean isSnapshotEnabled, HiveConf conf, 
SnapshotUtils.ReplSnapshotCount replSnapshotCount, FileList snapPathFileList,
-  ArrayList prevSnaps, boolean isBootstrap) throws IOException {
+  SnapshotUtils.SnapshotCopyMode createSnapshotsAtSource(Path sourcePath, Path 
targetPath, String snapshotPrefix,
+  
boolean isSnapshotEnabled, HiveConf conf, SnapshotUtils.ReplSnapshotCount 
replSnapshotCount, FileList snapPathFileList,
+  
ArrayList prevSnaps, boolean isBootstrap) throws IOException {
 if (!isSnapshotEnabled) {
   LOG.info("Snapshot copy not enabled for path {} Will use normal distCp 
for copying data.", sourcePath);
   return FALLBACK_COPY;
 }
 DistributedFileSystem sourceDfs = SnapshotUtils.getDFS(sourcePath, conf);
 try {
-  if(isBootstrap) {
-// Delete any pre existing snapshots.
-SnapshotUtils.deleteSnapshotIfExists(sourceDfs, sourcePath, 
firstSnapshot(snapshotPrefix), conf);
-SnapshotUtils.deleteSnapshotIfExists(sourceDfs, sourcePath, 
secondSnapshot(snapshotPrefix), conf);
-allowAndCreateInitialSnapshot(sourcePath, snapshotPrefix, conf, 
replSnapshotCount, snapPathFileList, sourceDfs);
-return INITIAL_COPY;
+  if(isBootstrap && 
conf.getBoolVar(HiveConf.ConfVars.REPL_REUSE_SNAPSHOTS)) {
+try {
+  FileStatus[] listing = sourceDfs.listStatus(new Path(sourcePath, 
".snapshot"));
+  for (FileStatus elem : listing) {
+String snapShotName = elem.getPath().getName();
+String prefix;
+if (snapShotName.contains(OLD_SNAPSHOT)) {
+  prefix = snapShotName.substring(0, 
snapShotName.lastIndexOf(OLD_SNAPSHOT));
+  if(!prefix.equals(snapshotPrefix)) {
+sourceDfs.renameSnapshot(sourcePath, firstSnapshot(prefix), 
firstSnapshot(snapshotPrefix));
+  }
+}
+if (snapShotName.contains(NEW_SNAPSHOT)) {
+  prefix = snapShotName.substring(0, 
snapShotName.lastIndexOf(NEW_SNAPSHOT));
+  if(!prefix.equals(snapshotPrefix)) {
+sourceDfs.renameSnapshot(sourcePath, secondSnapshot(prefix), 
secondSnapshot(snapshotPrefix));
+  }
+}
+  }
+} catch (SnapshotException e) {
+  //dir not snapshottable, continue
+}
   }
+  boolean firstSnapAvailable =
+  SnapshotUtils.isSnapshotAvailable(sourceDfs, sourcePath, 
snapshotPrefix, OLD_SNAPSHOT, conf);
+  boolean secondSnapAvailable =
+  SnapshotUtils.isSnapshotAvailable(sourceDfs, sourcePath, 
snapshotPrefix, NEW_SNAPSHOT, conf);
 
+  //While resuming a failed replication
   if (prevSnaps.contains(sourcePath.toString())) {
 // We already created a snapshot for this, just refresh the latest 
snapshot and leave.
-sourceDfs.deleteSnapshot(sourcePath, secondSnapshot(snapshotPrefix));
-replSnapshotCount.incrementNumDeleted();
+// In case of reverse replication after fail-over, in some paths, 
second snapshot may not be present.
+if(SnapshotUtils.deleteSnapshotIfExists(sourceDfs, sourcePath, 
secondSnapshot(snapshotPrefix), conf)) {
+  replSnapshotCount.incrementNumDeleted();
+}
 SnapshotUtils.createSnapshot(sourceDfs, sourcePath, 
secondSnapshot(snapshotPrefix), conf);
 replSnapshotCount.incrementNumCreated();
 snapPathFileList.add(sourcePath.toString());
 return SnapshotUtils
-

[jira] [Work logged] (HIVE-21075) Metastore: Drop partition performance downgrade with Postgres DB

2021-11-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21075?focusedWorklogId=687786=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-687786
 ]

ASF GitHub Bot logged work on HIVE-21075:
-

Author: ASF GitHub Bot
Created on: 30/Nov/21 06:44
Start Date: 30/Nov/21 06:44
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2826:
URL: https://github.com/apache/hive/pull/2826#discussion_r758975794



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
##
@@ -5275,38 +5275,67 @@ private void 
removeUnusedColumnDescriptor(MColumnDescriptor oldCD) {
   return;
 }
 
-boolean success = false;
 Query query = null;
+Query query2 = null;
+boolean success = false;
+LOG.debug("execute removeUnusedColumnDescriptor");
+DatabaseProduct dbProduct = 
DatabaseProduct.determineDatabaseProduct(MetaStoreDirectSql.getProductName(pm), 
conf);
 
+/**
+ * In order to workaround oracle not supporting limit statement caused 
performance issue, HIVE-9447 makes

Review comment:
   Maybe this comment could go to the new method where we decide if we can 
remove the CD or not




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 687786)
Time Spent: 4h 50m  (was: 4h 40m)

> Metastore: Drop partition performance downgrade with Postgres DB
> 
>
> Key: HIVE-21075
> URL: https://issues.apache.org/jira/browse/HIVE-21075
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 3.0.0
>Reporter: Yongzhi Chen
>Assignee: Oleksiy Sayankin
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21075.2.patch
>
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> In order to workaround oracle not supporting limit statement caused 
> performance issue, HIVE-9447 makes all the backend DB run select count(1) 
> from SDS where SDS.CD_ID=? to check if the specific CD_ID is referenced in 
> SDS table before drop a partition. This select count(1) statement does not 
> scale well in Postgres, and there is no index for CD_ID column in SDS table.
> For a SDS table with with 1.5 million rows, select count(1) has average 700ms 
> without index, while in 10-20ms with index. But the statement before 
> HIVE-9447( SELECT * FROM "SDS" "A0" WHERE "A0"."CD_ID" = $1 limit 1) uses 
> less than 10ms .



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-21075) Metastore: Drop partition performance downgrade with Postgres DB

2021-11-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21075?focusedWorklogId=687785=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-687785
 ]

ASF GitHub Bot logged work on HIVE-21075:
-

Author: ASF GitHub Bot
Created on: 30/Nov/21 06:43
Start Date: 30/Nov/21 06:43
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2826:
URL: https://github.com/apache/hive/pull/2826#discussion_r758975234



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
##
@@ -5326,6 +5355,23 @@ private void preDropStorageDescriptor(MStorageDescriptor 
msd) {
 removeUnusedColumnDescriptor(mcd);
   }
 
+  /**
+   * Get a list of storage descriptors that reference a particular Column 
Descriptor
+   * @param oldCD the column descriptor to get storage descriptors for

Review comment:
   If we keep the query parameter, please document it 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 687785)
Time Spent: 4h 40m  (was: 4.5h)

> Metastore: Drop partition performance downgrade with Postgres DB
> 
>
> Key: HIVE-21075
> URL: https://issues.apache.org/jira/browse/HIVE-21075
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 3.0.0
>Reporter: Yongzhi Chen
>Assignee: Oleksiy Sayankin
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21075.2.patch
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> In order to workaround oracle not supporting limit statement caused 
> performance issue, HIVE-9447 makes all the backend DB run select count(1) 
> from SDS where SDS.CD_ID=? to check if the specific CD_ID is referenced in 
> SDS table before drop a partition. This select count(1) statement does not 
> scale well in Postgres, and there is no index for CD_ID column in SDS table.
> For a SDS table with with 1.5 million rows, select count(1) has average 700ms 
> without index, while in 10-20ms with index. But the statement before 
> HIVE-9447( SELECT * FROM "SDS" "A0" WHERE "A0"."CD_ID" = $1 limit 1) uses 
> less than 10ms .



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-21075) Metastore: Drop partition performance downgrade with Postgres DB

2021-11-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21075?focusedWorklogId=687784=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-687784
 ]

ASF GitHub Bot logged work on HIVE-21075:
-

Author: ASF GitHub Bot
Created on: 30/Nov/21 06:42
Start Date: 30/Nov/21 06:42
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2826:
URL: https://github.com/apache/hive/pull/2826#discussion_r758974741



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
##
@@ -5326,6 +5355,23 @@ private void preDropStorageDescriptor(MStorageDescriptor 
msd) {
 removeUnusedColumnDescriptor(mcd);
   }
 
+  /**
+   * Get a list of storage descriptors that reference a particular Column 
Descriptor
+   * @param oldCD the column descriptor to get storage descriptors for
+   * @return a list of storage descriptors
+   */
+  private List 
listStorageDescriptorsWithCD(MColumnDescriptor oldCD, Query query) {

Review comment:
   Create a query here and close it immediately. Why would we want to leak 
the query? Do we reuse it with the same statement? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 687784)
Time Spent: 4.5h  (was: 4h 20m)

> Metastore: Drop partition performance downgrade with Postgres DB
> 
>
> Key: HIVE-21075
> URL: https://issues.apache.org/jira/browse/HIVE-21075
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 3.0.0
>Reporter: Yongzhi Chen
>Assignee: Oleksiy Sayankin
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21075.2.patch
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> In order to workaround oracle not supporting limit statement caused 
> performance issue, HIVE-9447 makes all the backend DB run select count(1) 
> from SDS where SDS.CD_ID=? to check if the specific CD_ID is referenced in 
> SDS table before drop a partition. This select count(1) statement does not 
> scale well in Postgres, and there is no index for CD_ID column in SDS table.
> For a SDS table with with 1.5 million rows, select count(1) has average 700ms 
> without index, while in 10-20ms with index. But the statement before 
> HIVE-9447( SELECT * FROM "SDS" "A0" WHERE "A0"."CD_ID" = $1 limit 1) uses 
> less than 10ms .



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-21075) Metastore: Drop partition performance downgrade with Postgres DB

2021-11-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21075?focusedWorklogId=687783=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-687783
 ]

ASF GitHub Bot logged work on HIVE-21075:
-

Author: ASF GitHub Bot
Created on: 30/Nov/21 06:40
Start Date: 30/Nov/21 06:40
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2826:
URL: https://github.com/apache/hive/pull/2826#discussion_r758973881



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
##
@@ -5275,38 +5275,67 @@ private void 
removeUnusedColumnDescriptor(MColumnDescriptor oldCD) {
   return;
 }
 
-boolean success = false;
 Query query = null;
+Query query2 = null;
+boolean success = false;
+LOG.debug("execute removeUnusedColumnDescriptor");
+DatabaseProduct dbProduct = 
DatabaseProduct.determineDatabaseProduct(MetaStoreDirectSql.getProductName(pm), 
conf);
 
+/**
+ * In order to workaround oracle not supporting limit statement caused 
performance issue, HIVE-9447 makes
+ * all the backend DB run select count(1) from SDS where SDS.CD_ID=? to 
check if the specific CD_ID is
+ * referenced in SDS table before drop a partition. This select count(1) 
statement does not scale well in
+ * Postgres, and there is no index for CD_ID column in SDS table.
+ * For a SDS table with with 1.5 million rows, select count(1) has average 
700ms without index, while in
+ * 10-20ms with index. But the statement before
+ * HIVE-9447( SELECT * FROM "SDS" "A0" WHERE "A0"."CD_ID" = $1 limit 1) 
uses less than 10ms .
+ */
 try {
   openTransaction();
-  LOG.debug("execute removeUnusedColumnDescriptor");
-
-  query = pm.newQuery("select count(1) from " +
-"org.apache.hadoop.hive.metastore.model.MStorageDescriptor where 
(this.cd == inCD)");
-  query.declareParameters("MColumnDescriptor inCD");
-  long count = ((Long)query.execute(oldCD)).longValue();
-
-  //if no other SD references this CD, we can throw it out.
-  if (count == 0) {
-// First remove any constraints that may be associated with this CD
-query = pm.newQuery(MConstraint.class, "parentColumn == inCD || 
childColumn == inCD");
+  // Fix performance regression for postgres caused by HIVE-9447
+  if (dbProduct.isPOSTGRES() || dbProduct.isMYSQL()) {
+query = pm.newQuery(MStorageDescriptor.class, "this.cd == inCD");
+query.declareParameters("MColumnDescriptor inCD");
+List referencedSDs = 
listStorageDescriptorsWithCD(oldCD, query);
+//if no other SD references this CD, we can throw it out.
+if (referencedSDs != null && referencedSDs.isEmpty()) {
+  query2 = removeConstraintsAndCd(oldCD);
+}
+  } else {
+query = pm.newQuery(
+"select count(1) from 
org.apache.hadoop.hive.metastore.model.MStorageDescriptor where (this.cd == 
inCD)");
 query.declareParameters("MColumnDescriptor inCD");
-List mConstraintsList = (List) 
query.execute(oldCD);
-if (CollectionUtils.isNotEmpty(mConstraintsList)) {
-  pm.deletePersistentAll(mConstraintsList);
+long count = (Long) query.execute(oldCD);
+//if no other SD references this CD, we can throw it out.
+if (count == 0) {
+  query2 = removeConstraintsAndCd(oldCD);
 }
-// Finally remove CD
-pm.retrieve(oldCD);
-pm.deletePersistent(oldCD);
   }
   success = commitTransaction();
-  LOG.debug("successfully deleted a CD in removeUnusedColumnDescriptor");
 } finally {
   rollbackAndCleanup(success, query);
+  if (query2 != null) {
+query2.closeAll();
+  }
 }
   }
 
+  private Query removeConstraintsAndCd(MColumnDescriptor oldCD) {
+Query query = null;

Review comment:
   I would close the query inside this method and would not leak it. Why do 
we do that? I might miss something 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 687783)
Time Spent: 4h 20m  (was: 4h 10m)

> Metastore: Drop partition performance downgrade with Postgres DB
> 
>
> Key: HIVE-21075
> URL: https://issues.apache.org/jira/browse/HIVE-21075
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 3.0.0
>Reporter: Yongzhi Chen
>Assignee: 

[jira] [Work logged] (HIVE-21075) Metastore: Drop partition performance downgrade with Postgres DB

2021-11-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21075?focusedWorklogId=687781=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-687781
 ]

ASF GitHub Bot logged work on HIVE-21075:
-

Author: ASF GitHub Bot
Created on: 30/Nov/21 06:37
Start Date: 30/Nov/21 06:37
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2826:
URL: https://github.com/apache/hive/pull/2826#discussion_r758972538



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
##
@@ -5275,38 +5275,67 @@ private void 
removeUnusedColumnDescriptor(MColumnDescriptor oldCD) {
   return;
 }
 
-boolean success = false;
 Query query = null;
+Query query2 = null;
+boolean success = false;
+LOG.debug("execute removeUnusedColumnDescriptor");
+DatabaseProduct dbProduct = 
DatabaseProduct.determineDatabaseProduct(MetaStoreDirectSql.getProductName(pm), 
conf);
 
+/**
+ * In order to workaround oracle not supporting limit statement caused 
performance issue, HIVE-9447 makes
+ * all the backend DB run select count(1) from SDS where SDS.CD_ID=? to 
check if the specific CD_ID is
+ * referenced in SDS table before drop a partition. This select count(1) 
statement does not scale well in
+ * Postgres, and there is no index for CD_ID column in SDS table.
+ * For a SDS table with with 1.5 million rows, select count(1) has average 
700ms without index, while in
+ * 10-20ms with index. But the statement before
+ * HIVE-9447( SELECT * FROM "SDS" "A0" WHERE "A0"."CD_ID" = $1 limit 1) 
uses less than 10ms .
+ */
 try {
   openTransaction();
-  LOG.debug("execute removeUnusedColumnDescriptor");
-
-  query = pm.newQuery("select count(1) from " +
-"org.apache.hadoop.hive.metastore.model.MStorageDescriptor where 
(this.cd == inCD)");
-  query.declareParameters("MColumnDescriptor inCD");
-  long count = ((Long)query.execute(oldCD)).longValue();
-
-  //if no other SD references this CD, we can throw it out.
-  if (count == 0) {
-// First remove any constraints that may be associated with this CD
-query = pm.newQuery(MConstraint.class, "parentColumn == inCD || 
childColumn == inCD");
+  // Fix performance regression for postgres caused by HIVE-9447
+  if (dbProduct.isPOSTGRES() || dbProduct.isMYSQL()) {
+query = pm.newQuery(MStorageDescriptor.class, "this.cd == inCD");
+query.declareParameters("MColumnDescriptor inCD");
+List referencedSDs = 
listStorageDescriptorsWithCD(oldCD, query);
+//if no other SD references this CD, we can throw it out.
+if (referencedSDs != null && referencedSDs.isEmpty()) {

Review comment:
   Should we just create a method for checking the SD references, like 
`hasRemainingCDReference`, and return a boolean? And the rest of the code could 
remain the same? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 687781)
Time Spent: 4h 10m  (was: 4h)

> Metastore: Drop partition performance downgrade with Postgres DB
> 
>
> Key: HIVE-21075
> URL: https://issues.apache.org/jira/browse/HIVE-21075
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 3.0.0
>Reporter: Yongzhi Chen
>Assignee: Oleksiy Sayankin
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21075.2.patch
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> In order to workaround oracle not supporting limit statement caused 
> performance issue, HIVE-9447 makes all the backend DB run select count(1) 
> from SDS where SDS.CD_ID=? to check if the specific CD_ID is referenced in 
> SDS table before drop a partition. This select count(1) statement does not 
> scale well in Postgres, and there is no index for CD_ID column in SDS table.
> For a SDS table with with 1.5 million rows, select count(1) has average 700ms 
> without index, while in 10-20ms with index. But the statement before 
> HIVE-9447( SELECT * FROM "SDS" "A0" WHERE "A0"."CD_ID" = $1 limit 1) uses 
> less than 10ms .



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-21075) Metastore: Drop partition performance downgrade with Postgres DB

2021-11-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21075?focusedWorklogId=687779=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-687779
 ]

ASF GitHub Bot logged work on HIVE-21075:
-

Author: ASF GitHub Bot
Created on: 30/Nov/21 06:33
Start Date: 30/Nov/21 06:33
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2826:
URL: https://github.com/apache/hive/pull/2826#discussion_r758971384



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
##
@@ -5275,38 +5275,67 @@ private void 
removeUnusedColumnDescriptor(MColumnDescriptor oldCD) {
   return;
 }
 
-boolean success = false;
 Query query = null;
+Query query2 = null;
+boolean success = false;
+LOG.debug("execute removeUnusedColumnDescriptor");
+DatabaseProduct dbProduct = 
DatabaseProduct.determineDatabaseProduct(MetaStoreDirectSql.getProductName(pm), 
conf);
 
+/**
+ * In order to workaround oracle not supporting limit statement caused 
performance issue, HIVE-9447 makes
+ * all the backend DB run select count(1) from SDS where SDS.CD_ID=? to 
check if the specific CD_ID is
+ * referenced in SDS table before drop a partition. This select count(1) 
statement does not scale well in
+ * Postgres, and there is no index for CD_ID column in SDS table.
+ * For a SDS table with with 1.5 million rows, select count(1) has average 
700ms without index, while in
+ * 10-20ms with index. But the statement before
+ * HIVE-9447( SELECT * FROM "SDS" "A0" WHERE "A0"."CD_ID" = $1 limit 1) 
uses less than 10ms .
+ */
 try {
   openTransaction();
-  LOG.debug("execute removeUnusedColumnDescriptor");
-
-  query = pm.newQuery("select count(1) from " +
-"org.apache.hadoop.hive.metastore.model.MStorageDescriptor where 
(this.cd == inCD)");
-  query.declareParameters("MColumnDescriptor inCD");
-  long count = ((Long)query.execute(oldCD)).longValue();
-
-  //if no other SD references this CD, we can throw it out.
-  if (count == 0) {
-// First remove any constraints that may be associated with this CD
-query = pm.newQuery(MConstraint.class, "parentColumn == inCD || 
childColumn == inCD");
+  // Fix performance regression for postgres caused by HIVE-9447
+  if (dbProduct.isPOSTGRES() || dbProduct.isMYSQL()) {
+query = pm.newQuery(MStorageDescriptor.class, "this.cd == inCD");

Review comment:
   Is there a QueryWrapper which closes the query for us in a 
try-with-resources clause? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 687779)
Time Spent: 4h  (was: 3h 50m)

> Metastore: Drop partition performance downgrade with Postgres DB
> 
>
> Key: HIVE-21075
> URL: https://issues.apache.org/jira/browse/HIVE-21075
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 3.0.0
>Reporter: Yongzhi Chen
>Assignee: Oleksiy Sayankin
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21075.2.patch
>
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> In order to workaround oracle not supporting limit statement caused 
> performance issue, HIVE-9447 makes all the backend DB run select count(1) 
> from SDS where SDS.CD_ID=? to check if the specific CD_ID is referenced in 
> SDS table before drop a partition. This select count(1) statement does not 
> scale well in Postgres, and there is no index for CD_ID column in SDS table.
> For a SDS table with with 1.5 million rows, select count(1) has average 700ms 
> without index, while in 10-20ms with index. But the statement before 
> HIVE-9447( SELECT * FROM "SDS" "A0" WHERE "A0"."CD_ID" = $1 limit 1) uses 
> less than 10ms .



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HIVE-11819) HiveServer2 catches OOMs on request threads

2021-11-29 Thread xiepengjie (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-11819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17450826#comment-17450826
 ] 

xiepengjie commented on HIVE-11819:
---

[~zabetak] , hi good friend, would you like to discuss this issue ?

> HiveServer2 catches OOMs on request threads
> ---
>
> Key: HIVE-11819
> URL: https://issues.apache.org/jira/browse/HIVE-11819
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: HIVE-11819.01.patch, HIVE-11819.02.patch, 
> HIVE-11819.patch
>
>
> ThriftCLIService methods such as ExecuteStatement are apparently capable of 
> catching OOMs because they get wrapped in RTE by HiveSessionProxy. 
> This shouldn't happen.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-25722) Compaction fails if there are empty buckets.

2021-11-29 Thread Arko Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arko Sharma updated HIVE-25722:
---
Description: 
Compaction fails if there are empty buckets.
This patch detects and deletes empty buckets before launching compaction in 
order to protect it from failing.

Error stacktrace :
{code:java}
Caused by: java.lang.IllegalStateException: No 'original' files found for 
bucketId=3 in 
file:/Users/asharma/hive-fork/hive/itests/hive-unit/target/tmp/org.apache.hadoop.hive.ql.txn.compactor.TestCompactor-1638241161113_-1801963913/warehouse/comp3/delta_002_002_
   at 
org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger$OriginalReaderPairToCompact.(OrcRawRecordMerger.java:602)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
   at 
org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.(OrcRawRecordMerger.java:1154)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
   at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRawReader(OrcInputFormat.java:2462)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
   at 
org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorMap.map(CompactorMR.java:811)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:?]
   at 
org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorMap.map(CompactorMR.java:787)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:?]
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) 
~[hadoop-mapreduce-client-core-3.1.0.jar:?]
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465) 
~[hadoop-mapreduce-client-core-3.1.0.jar:?]
{code}

  was:
Compaction fails if there are empty buckets.
This patch detects and deletes empty buckets before launching compaction in 
order to protect it from failing.


> Compaction fails if there are empty buckets.
> 
>
> Key: HIVE-25722
> URL: https://issues.apache.org/jira/browse/HIVE-25722
> Project: Hive
>  Issue Type: Bug
>Reporter: Arko Sharma
>Assignee: Arko Sharma
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Compaction fails if there are empty buckets.
> This patch detects and deletes empty buckets before launching compaction in 
> order to protect it from failing.
> Error stacktrace :
> {code:java}
> Caused by: java.lang.IllegalStateException: No 'original' files found for 
> bucketId=3 in 
> file:/Users/asharma/hive-fork/hive/itests/hive-unit/target/tmp/org.apache.hadoop.hive.ql.txn.compactor.TestCompactor-1638241161113_-1801963913/warehouse/comp3/delta_002_002_
>at 
> org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger$OriginalReaderPairToCompact.(OrcRawRecordMerger.java:602)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>at 
> org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.(OrcRawRecordMerger.java:1154)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRawReader(OrcInputFormat.java:2462)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>at 
> org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorMap.map(CompactorMR.java:811)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:?]
>at 
> org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorMap.map(CompactorMR.java:787)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:?]
>at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) 
> ~[hadoop-mapreduce-client-core-3.1.0.jar:?]
>at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465) 
> ~[hadoop-mapreduce-client-core-3.1.0.jar:?]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HIVE-14261) Support set/unset partition parameters

2021-11-29 Thread xiepengjie (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-14261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17450823#comment-17450823
 ] 

xiepengjie commented on HIVE-14261:
---

[~zabetak] , I am very happy to discuss this issue with you , i have closed 
HIVE-25739 . For this issue , if we worried about some bad case, maybe we can 
set partition's parameters by super user/some special users. But i think we 
don't need to worried about it, because  user can still setting it with 
following code, unless hms disabled it.
{code:java}
HiveConf hiveConf = new HiveConf();    
HiveMetaStoreClient hmsc = new HiveMetaStoreClient(hiveConf);
Partition partition = hmsc.getPartition("default", "test", "2021-11-29");
Map parameters = partition.getParameters();
parameters.put("newKey", "newValue");
hmsc.alter_partition("db", "tableName", partition); {code}
 

> Support set/unset partition parameters
> --
>
> Key: HIVE-14261
> URL: https://issues.apache.org/jira/browse/HIVE-14261
> Project: Hive
>  Issue Type: New Feature
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
>Priority: Major
> Attachments: HIVE-14261.01.patch
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-25722) Compaction fails if there are empty buckets.

2021-11-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25722:
--
Labels: pull-request-available  (was: )

> Compaction fails if there are empty buckets.
> 
>
> Key: HIVE-25722
> URL: https://issues.apache.org/jira/browse/HIVE-25722
> Project: Hive
>  Issue Type: Bug
>Reporter: Arko Sharma
>Assignee: Arko Sharma
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Compaction fails if there are empty buckets.
> This patch detects and deletes empty buckets before launching compaction in 
> order to protect it from failing.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25722) Compaction fails if there are empty buckets.

2021-11-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25722?focusedWorklogId=687728=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-687728
 ]

ASF GitHub Bot logged work on HIVE-25722:
-

Author: ASF GitHub Bot
Created on: 30/Nov/21 02:43
Start Date: 30/Nov/21 02:43
Worklog Time Spent: 10m 
  Work Description: ArkoSharma commented on pull request #2799:
URL: https://github.com/apache/hive/pull/2799#issuecomment-982227374


   @szlta @pvary 
   Could you please review ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 687728)
Remaining Estimate: 0h
Time Spent: 10m

> Compaction fails if there are empty buckets.
> 
>
> Key: HIVE-25722
> URL: https://issues.apache.org/jira/browse/HIVE-25722
> Project: Hive
>  Issue Type: Bug
>Reporter: Arko Sharma
>Assignee: Arko Sharma
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Compaction fails if there are empty buckets.
> This patch detects and deletes empty buckets before launching compaction in 
> order to protect it from failing.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25738) NullIf doesn't support complex types

2021-11-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25738?focusedWorklogId=687718=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-687718
 ]

ASF GitHub Bot logged work on HIVE-25738:
-

Author: ASF GitHub Bot
Created on: 30/Nov/21 02:12
Start Date: 30/Nov/21 02:12
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on a change in pull request #2816:
URL: https://github.com/apache/hive/pull/2816#discussion_r758876501



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFNullif.java
##
@@ -86,17 +87,13 @@ public ObjectInspector initialize(ObjectInspector[] 
arguments) throws UDFArgumen
   public Object evaluate(DeferredObject[] arguments) throws HiveException {
 Object arg0 = arguments[0].get();
 Object arg1 = arguments[1].get();
-Object value0 = null;
-if (arg0 != null) {
-  value0 = returnOIResolver.convertIfNecessary(arg0, argumentOIs[0], 
false);
-}
+Object value0 = returnOIResolver.convertIfNecessary(arg0, argumentOIs[0], 
false);
 if (arg0 == null || arg1 == null) {
   return value0;
 }
-PrimitiveObjectInspector compareOI = (PrimitiveObjectInspector) 
returnOIResolver.get();
-if (PrimitiveObjectInspectorUtils.comparePrimitiveObjects(
-value0, compareOI,
-returnOIResolver.convertIfNecessary(arg1, argumentOIs[1], false), 
compareOI)) {
+Object value1 = returnOIResolver.convertIfNecessary(arg1, argumentOIs[1], 
false);
+ObjectInspector compareOI = returnOIResolver.get();
+if (ObjectInspectorUtils.compare(value0, compareOI, value1, compareOI) == 
0) {

Review comment:
   I think the result of union is expected, the `idx` in a union implies 
[which part of the union is being 
used](https://cwiki.apache.org/confluence/display/hive/languagemanual+types#LanguageManualTypes-UnionTypesunionUnionTypes)
 ,  we cannot ignore the `idx` part in union type.
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 687718)
Time Spent: 1.5h  (was: 1h 20m)

> NullIf doesn't support complex types
> 
>
> Key: HIVE-25738
> URL: https://issues.apache.org/jira/browse/HIVE-25738
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> {code}
> SELECT NULLIF(array(1,2,3),array(1,2,3))
> {code}
> results in:
> {code}
>  java.lang.ClassCastException: 
> org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector 
> cannot be cast to 
> org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector
>   at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFNullif.evaluate(GenericUDFNullif.java:96)
>   at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDF.initializeAndFoldConstants(GenericUDF.java:177)
>   at 
> org.apache.hadoop.hive.ql.parse.type.HiveFunctionHelper.getReturnType(HiveFunctionHelper.java:135)
>   at 
> org.apache.hadoop.hive.ql.parse.type.RexNodeExprFactory.createFuncCallExpr(RexNodeExprFactory.java:647)
> [...]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25738) NullIf doesn't support complex types

2021-11-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25738?focusedWorklogId=687717=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-687717
 ]

ASF GitHub Bot logged work on HIVE-25738:
-

Author: ASF GitHub Bot
Created on: 30/Nov/21 02:11
Start Date: 30/Nov/21 02:11
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on a change in pull request #2816:
URL: https://github.com/apache/hive/pull/2816#discussion_r758876501



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFNullif.java
##
@@ -86,17 +87,13 @@ public ObjectInspector initialize(ObjectInspector[] 
arguments) throws UDFArgumen
   public Object evaluate(DeferredObject[] arguments) throws HiveException {
 Object arg0 = arguments[0].get();
 Object arg1 = arguments[1].get();
-Object value0 = null;
-if (arg0 != null) {
-  value0 = returnOIResolver.convertIfNecessary(arg0, argumentOIs[0], 
false);
-}
+Object value0 = returnOIResolver.convertIfNecessary(arg0, argumentOIs[0], 
false);
 if (arg0 == null || arg1 == null) {
   return value0;
 }
-PrimitiveObjectInspector compareOI = (PrimitiveObjectInspector) 
returnOIResolver.get();
-if (PrimitiveObjectInspectorUtils.comparePrimitiveObjects(
-value0, compareOI,
-returnOIResolver.convertIfNecessary(arg1, argumentOIs[1], false), 
compareOI)) {
+Object value1 = returnOIResolver.convertIfNecessary(arg1, argumentOIs[1], 
false);
+ObjectInspector compareOI = returnOIResolver.get();
+if (ObjectInspectorUtils.compare(value0, compareOI, value1, compareOI) == 
0) {

Review comment:
   I think the result of union is expected, the `idx` in a union implies 
[which part of the union is being 
used](https://cwiki.apache.org/confluence/display/hive/languagemanual+types#LanguageManualTypes-UnionTypesunionUnionTypes)
 ,  we cannot ignore both the `idx` part in union type.
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 687717)
Time Spent: 1h 20m  (was: 1h 10m)

> NullIf doesn't support complex types
> 
>
> Key: HIVE-25738
> URL: https://issues.apache.org/jira/browse/HIVE-25738
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> {code}
> SELECT NULLIF(array(1,2,3),array(1,2,3))
> {code}
> results in:
> {code}
>  java.lang.ClassCastException: 
> org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector 
> cannot be cast to 
> org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector
>   at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFNullif.evaluate(GenericUDFNullif.java:96)
>   at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDF.initializeAndFoldConstants(GenericUDF.java:177)
>   at 
> org.apache.hadoop.hive.ql.parse.type.HiveFunctionHelper.getReturnType(HiveFunctionHelper.java:135)
>   at 
> org.apache.hadoop.hive.ql.parse.type.RexNodeExprFactory.createFuncCallExpr(RexNodeExprFactory.java:647)
> [...]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25652) Add constraints in result of “SHOW CREATE TABLE ”

2021-11-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25652?focusedWorklogId=687696=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-687696
 ]

ASF GitHub Bot logged work on HIVE-25652:
-

Author: ASF GitHub Bot
Created on: 30/Nov/21 00:17
Start Date: 30/Nov/21 00:17
Worklog Time Spent: 10m 
  Work Description: soumyakanti3578 commented on pull request #2777:
URL: https://github.com/apache/hive/pull/2777#issuecomment-982153696


   @kasakrisz 
   I tried adding a primary key constraint to the suggested table 
in`quotedid_basic.q`, however, I am getting an error:
   ` org.apache.hadoop.hive.ql.metadata.HiveException: 
InvalidObjectException(message:Parent column not found: 
"%&'()*+,-/;<=>?[]_|{}$^!~#@``)
at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1298)
at 
org.apache.hadoop.hive.ql.ddl.table.create.CreateTableOperation.createTableNonReplaceMode(CreateTableOperation.java:136)
at 
org.apache.hadoop.hive.ql.ddl.table.create.CreateTableOperation.execute(CreateTableOperation.java:98)
at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:84)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:212)
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105)
at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:361)
at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:334)
at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:245)
at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:108)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:348)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:204)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:153)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:148)
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:164)
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:230)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:256)
at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:201)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:127)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:422)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:353)
at 
org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:726)
at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:696)
at 
org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:114)
at 
org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157)
at 
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:62)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)`
   
   
   This error is due to mismatch of the column names. In the `addPrimaryKey` 
method 
[here](https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L5724),
 `columnName` gets trimmed in `normalizeIdentifier` and loses the ` `(space), 
and it doesn't match with the column name from the ColumnDescriptor in 
`getColumnIndexFromTableColumns` method 
[here](https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L5735).
 And this happens because we don't call `normalizeIdentifier` on the column 
name while converting `FieldSchema` to `MFieldSchema` in 
`convertToMFieldSchemas` method 
[here](https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L2297)
 . I tried adding `normalizeIdentifier` there but it fails elsewhere, and 
anyway I don't think its the correct solution because we don't want to trim the 
quoted identifier.
   
   What do you think about this?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 687696)
Time Spent: 3.5h  (was: 3h 20m)

> Add constraints in result of “SHOW CREATE TABLE ”
> -
>
> Key: HIVE-25652
> URL: https://issues.apache.org/jira/browse/HIVE-25652
> Project: Hive
>  Issue Type: Improvement
>Reporter: Soumyakanti Das
>Assignee: 

[jira] [Work logged] (HIVE-24969) Predicates may be removed when decorrelating subqueries with lateral

2021-11-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24969?focusedWorklogId=687694=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-687694
 ]

ASF GitHub Bot logged work on HIVE-24969:
-

Author: ASF GitHub Bot
Created on: 30/Nov/21 00:11
Start Date: 30/Nov/21 00:11
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #2145:
URL: https://github.com/apache/hive/pull/2145


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 687694)
Time Spent: 2h 40m  (was: 2.5h)

> Predicates may be removed when decorrelating subqueries with lateral
> 
>
> Key: HIVE-24969
> URL: https://issues.apache.org/jira/browse/HIVE-24969
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Step to reproduce:
> {code:java}
> select count(distinct logItem.triggerId)
> from service_stat_log LATERAL VIEW explode(logItems) LogItemTable AS logItem
> where logItem.dsp in ('delivery', 'ocpa')
> and logItem.iswin = true
> and logItem.adid in (
>  select distinct adId
>  from ad_info
>  where subAccountId in (16010, 14863));  {code}
> For predicates _logItem.dsp in ('delivery', 'ocpa')_  and _logItem.iswin = 
> true_ are removed when doing ppd: JOIN ->   RS  -> LVJ.  The JOIN has 
> candicates: logitem -> [logItem.dsp in ('delivery', 'ocpa'), logItem.iswin = 
> true],when pushing them to the RS followed by LVJ,  none of them are pushed, 
> the candicates of logitem are removed finally by default, which cause to the 
> wrong result.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-21075) Metastore: Drop partition performance downgrade with Postgres DB

2021-11-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21075?focusedWorklogId=687676=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-687676
 ]

ASF GitHub Bot logged work on HIVE-21075:
-

Author: ASF GitHub Bot
Created on: 29/Nov/21 23:31
Start Date: 29/Nov/21 23:31
Worklog Time Spent: 10m 
  Work Description: yongzhi commented on pull request #2826:
URL: https://github.com/apache/hive/pull/2826#issuecomment-982129515


   recheck


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 687676)
Time Spent: 3h 50m  (was: 3h 40m)

> Metastore: Drop partition performance downgrade with Postgres DB
> 
>
> Key: HIVE-21075
> URL: https://issues.apache.org/jira/browse/HIVE-21075
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 3.0.0
>Reporter: Yongzhi Chen
>Assignee: Oleksiy Sayankin
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21075.2.patch
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> In order to workaround oracle not supporting limit statement caused 
> performance issue, HIVE-9447 makes all the backend DB run select count(1) 
> from SDS where SDS.CD_ID=? to check if the specific CD_ID is referenced in 
> SDS table before drop a partition. This select count(1) statement does not 
> scale well in Postgres, and there is no index for CD_ID column in SDS table.
> For a SDS table with with 1.5 million rows, select count(1) has average 700ms 
> without index, while in 10-20ms with index. But the statement before 
> HIVE-9447( SELECT * FROM "SDS" "A0" WHERE "A0"."CD_ID" = $1 limit 1) uses 
> less than 10ms .



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-21075) Metastore: Drop partition performance downgrade with Postgres DB

2021-11-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21075?focusedWorklogId=687611=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-687611
 ]

ASF GitHub Bot logged work on HIVE-21075:
-

Author: ASF GitHub Bot
Created on: 29/Nov/21 20:50
Start Date: 29/Nov/21 20:50
Worklog Time Spent: 10m 
  Work Description: yongzhi opened a new pull request #2826:
URL: https://github.com/apache/hive/pull/2826


   This is rebase of PR #2323
   
   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 687611)
Time Spent: 3h 40m  (was: 3.5h)

> Metastore: Drop partition performance downgrade with Postgres DB
> 
>
> Key: HIVE-21075
> URL: https://issues.apache.org/jira/browse/HIVE-21075
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 3.0.0
>Reporter: Yongzhi Chen
>Assignee: Oleksiy Sayankin
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21075.2.patch
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> In order to workaround oracle not supporting limit statement caused 
> performance issue, HIVE-9447 makes all the backend DB run select count(1) 
> from SDS where SDS.CD_ID=? to check if the specific CD_ID is referenced in 
> SDS table before drop a partition. This select count(1) statement does not 
> scale well in Postgres, and there is no index for CD_ID column in SDS table.
> For a SDS table with with 1.5 million rows, select count(1) has average 700ms 
> without index, while in 10-20ms with index. But the statement before 
> HIVE-9447( SELECT * FROM "SDS" "A0" WHERE "A0"."CD_ID" = $1 limit 1) uses 
> less than 10ms .



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25115) Compaction queue entries may accumulate in "ready for cleaning" state

2021-11-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25115?focusedWorklogId=687595=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-687595
 ]

ASF GitHub Bot logged work on HIVE-25115:
-

Author: ASF GitHub Bot
Created on: 29/Nov/21 20:22
Start Date: 29/Nov/21 20:22
Worklog Time Spent: 10m 
  Work Description: deniskuzZ opened a new pull request #2825:
URL: https://github.com/apache/hive/pull/2825


   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 687595)
Time Spent: 3h 40m  (was: 3.5h)

> Compaction queue entries may accumulate in "ready for cleaning" state
> -
>
> Key: HIVE-25115
> URL: https://issues.apache.org/jira/browse/HIVE-25115
> Project: Hive
>  Issue Type: Improvement
>Reporter: Karen Coppage
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> If the Cleaner does not delete any files, the compaction queue entry is 
> thrown back to the queue and remains in "ready for cleaning" state.
> Problem: If 2 compactions run on the same table and enter "ready for 
> cleaning" state at the same time, only one "cleaning" will remove obsolete 
> files, the other entry will remain in the queue in "ready for cleaning" state.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25652) Add constraints in result of “SHOW CREATE TABLE ”

2021-11-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25652?focusedWorklogId=687590=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-687590
 ]

ASF GitHub Bot logged work on HIVE-25652:
-

Author: ASF GitHub Bot
Created on: 29/Nov/21 20:08
Start Date: 29/Nov/21 20:08
Worklog Time Spent: 10m 
  Work Description: soumyakanti3578 commented on a change in pull request 
#2777:
URL: https://github.com/apache/hive/pull/2777#discussion_r758698228



##
File path: ql/src/test/queries/clientpositive/show_create_table.q
##
@@ -0,0 +1,44 @@
+CREATE TABLE TEST(
+  col1 varchar(100) NOT NULL COMMENT "comment for column 1",
+  col2 timestamp DEFAULT CURRENT_TIMESTAMP() COMMENT "comment for column 2",
+  col3 decimal CHECK (col3 + col4 > 1) enable novalidate rely,
+  col4 decimal NOT NULL,
+  col5 varchar(100),
+  primary key(col1, col2) disable novalidate rely,
+  constraint c3_c4_check CHECK((col3 + col4)/(col3 - col4) > 3),
+  constraint c4_unique UNIQUE(col4) disable novalidate rely)
+ROW FORMAT SERDE
+  'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
+STORED AS INPUTFORMAT
+  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
+OUTPUTFORMAT
+  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat';
+
+CREATE TABLE TEST2(
+ col varchar(100),
+ primary key(col) disable novalidate rely)
+ROW FORMAT SERDE
+'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
+STORED AS INPUTFORMAT
+'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
+OUTPUTFORMAT
+'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat';
+
+CREATE TABLE TEST3(
+  col1 varchar(100) COMMENT "comment",
+  col2 timestamp,
+  col3 varchar(100),
+  foreign key(col1, col2) references TEST(col1, col2) disable novalidate rely,
+  foreign key(col3) references TEST2(col) disable novalidate rely)

Review comment:
   I couldn't find a way to add `enable validate norely` to any constraint. 
I think `validate` is not supported yet, because I keep getting 
`org.apache.hadoop.hive.ql.parse.SemanticException: Invalid Foreign Key syntax 
VALIDATE feature not supported yet. Please use NOVALIDATE instead.`
   Also, I consulted the language manual - 
https://cwiki.apache.org/confluence/display/hive/languagemanual+ddl#LanguageManualDDL-CreateTableCreate/Drop/TruncateTable
 - where I couldn't find an example with `validate`.
   Do let me know if you have a suggestion!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 687590)
Time Spent: 3h 20m  (was: 3h 10m)

> Add constraints in result of “SHOW CREATE TABLE ”
> -
>
> Key: HIVE-25652
> URL: https://issues.apache.org/jira/browse/HIVE-25652
> Project: Hive
>  Issue Type: Improvement
>Reporter: Soumyakanti Das
>Assignee: Soumyakanti Das
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Currently show create table doesn’t pull any constraint info like not null, 
> defaults, primary key.
> Example:
> Create table
>  
> {code:java}
> CREATE TABLE TEST(
>   col1 varchar(100) NOT NULL COMMENT "comment for column 1",
>   col2 timestamp DEFAULT CURRENT_TIMESTAMP() COMMENT "comment for column 2",
>   col3 decimal,
>   col4 varchar(512) NOT NULL,
>   col5 varchar(100),
>   primary key(col1, col2) disable novalidate)
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat';
> {code}
> Currently {{SHOW CREATE TABLE TEST}} doesn't show the column constraints.
> {code:java}
> CREATE TABLE `test`(
>   `col1` varchar(100) COMMENT 'comment for column 1', 
>   `col2` timestamp COMMENT 'comment for column 2', 
>   `col3` decimal(10,0), 
>   `col4` varchar(512), 
>   `col5` varchar(100))
> ROW FORMAT SERDE 
>   'org.apache.hadoop.hive.ql.io.orc.OrcSerde' 
> STORED AS INPUTFORMAT 
>   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' 
> OUTPUTFORMAT 
>   'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-21075) Metastore: Drop partition performance downgrade with Postgres DB

2021-11-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21075?focusedWorklogId=687559=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-687559
 ]

ASF GitHub Bot logged work on HIVE-21075:
-

Author: ASF GitHub Bot
Created on: 29/Nov/21 19:24
Start Date: 29/Nov/21 19:24
Worklog Time Spent: 10m 
  Work Description: yongzhi commented on pull request #2323:
URL: https://github.com/apache/hive/pull/2323#issuecomment-981943366


   The Change Looks good +2


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 687559)
Time Spent: 3.5h  (was: 3h 20m)

> Metastore: Drop partition performance downgrade with Postgres DB
> 
>
> Key: HIVE-21075
> URL: https://issues.apache.org/jira/browse/HIVE-21075
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 3.0.0
>Reporter: Yongzhi Chen
>Assignee: Oleksiy Sayankin
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21075.2.patch
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> In order to workaround oracle not supporting limit statement caused 
> performance issue, HIVE-9447 makes all the backend DB run select count(1) 
> from SDS where SDS.CD_ID=? to check if the specific CD_ID is referenced in 
> SDS table before drop a partition. This select count(1) statement does not 
> scale well in Postgres, and there is no index for CD_ID column in SDS table.
> For a SDS table with with 1.5 million rows, select count(1) has average 700ms 
> without index, while in 10-20ms with index. But the statement before 
> HIVE-9447( SELECT * FROM "SDS" "A0" WHERE "A0"."CD_ID" = $1 limit 1) uses 
> less than 10ms .



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-25728) ParseException while gathering Column Stats

2021-11-29 Thread Soumyakanti Das (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Soumyakanti Das updated HIVE-25728:
---
Description: 
The {{columnName}} is escaped twice in {{ColumnStatsSemanticAnalyzer}} at [line 
261|https://github.com/apache/hive/blob/934faa73c56920fa19f86da53b5daa5bf7c98ef4/ql/src/java/org/apache/hadoop/hive/ql/parse/ColumnStatsSemanticAnalyzer.java#L261],
 which can cause ParseException. Potential solution is to simply not escape it 
second time.

This can be reproduced as following:
{code:java}
CREATE TABLE table1(
   t1_col1 bigint);

 CREATE TABLE table2(
   t2_col1 bigint,
   t2_col2 int)
 PARTITIONED BY (
   t2_col3 date);

insert into table1 values(1);
insert into table2 values("1","1","1");

--set hive.stats.autogather=false;
set hive.support.quoted.identifiers=none;

create external table ext_table STORED AS ORC 
tblproperties('compression'='snappy','external.table.purge'='true') as
SELECT a.* ,d.`(t2_col1|t2_col3)?+.+`
FROM table1 a
LEFT JOIN (SELECT * FROM table2 where t2_col3 like '2021-01-%') d
on a.t1_col1 = d.t2_col1;{code}
and it fails with the following stack trace:
{noformat}
See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, or 
check ./ql/target/surefire-reports or ./itests/qtest/target/surefire-reports/ 
for specific test cases logs.
 org.apache.hadoop.hive.ql.parse.SemanticException: 
org.apache.hadoop.hive.ql.parse.ParseException: line 1:772 rule Identifier 
failed predicate: {allowQuotedId() != Quotation.NONE}?
line 1:778 rule Identifier failed predicate: {allowQuotedId() != 
Quotation.NONE}?
line 1:782 rule Identifier failed predicate: {allowQuotedId() != 
Quotation.NONE}?
line 1:807 character '' not supported here
at 
org.apache.hadoop.hive.ql.parse.ColumnStatsAutoGatherContext.insertAnalyzePipeline(ColumnStatsAutoGatherContext.java:144)
at 
org.apache.hadoop.hive.ql.parse.ColumnStatsAutoGatherContext.insertTableValuesAnalyzePipeline(ColumnStatsAutoGatherContext.java:135)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genAutoColumnStatsGatheringPipeline(SemanticAnalyzer.java:8380)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFileSinkPlan(SemanticAnalyzer.java:7915)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:11064)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:10939)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11854)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11724)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:625)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12557)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:455)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:317)
at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:223)
at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:105)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:500)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:453)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:417)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:411)
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:125)
at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:229)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:256)
at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:201)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:127)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:422)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:353)
at 
org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:783)
at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:753)
at 
org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:142)
at 
org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157)
at 
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:62)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)

[jira] [Updated] (HIVE-25728) ParseException while gathering Column Stats

2021-11-29 Thread Soumyakanti Das (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Soumyakanti Das updated HIVE-25728:
---
Description: 
The {{columnName}} is escaped twice in {{ColumnStatsSemanticAnalyzer}} at [line 
261|https://github.com/apache/hive/blob/934faa73c56920fa19f86da53b5daa5bf7c98ef4/ql/src/java/org/apache/hadoop/hive/ql/parse/ColumnStatsSemanticAnalyzer.java#L261],
 which can cause ParseException. Potential solution is to simply not escape it 
second time.

This can be reproduced as following:
{code:java}
CREATE TABLE table1(
   t1_col1 bigint);

 CREATE TABLE table2(
   t2_col1 bigint,
   t2_col2 int)
 PARTITIONED BY (
   t2_col3 date);

insert into table1 values(1);
insert into table2 values("1","1","1");

--set hive.stats.autogather=false;
set hive.support.quoted.identifiers=none;

create external table ext_table STORED AS ORC 
tblproperties('compression'='snappy','external.table.purge'='true') as
SELECT a.* ,d.`(t2_col1|t2_col3)?+.+`
FROM table1 a
LEFT JOIN (SELECT * FROM table2 where t2_col3 like '2021-01-%') d
on a.t1_col1 = d.t2_col1;{code}
and it fails with the following stack trace:
{code:java}
See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, or 
check ./ql/target/surefire-reports or ./itests/qtest/target/surefire-reports/ 
for specific test cases logs.
 org.apache.hadoop.hive.ql.parse.SemanticException: 
org.apache.hadoop.hive.ql.parse.ParseException: line 1:772 rule Identifier 
failed predicate: {allowQuotedId() != Quotation.NONE}?
line 1:778 rule Identifier failed predicate: {allowQuotedId() != 
Quotation.NONE}?
line 1:782 rule Identifier failed predicate: {allowQuotedId() != 
Quotation.NONE}?
line 1:807 character '' not supported here
at 
org.apache.hadoop.hive.ql.parse.ColumnStatsAutoGatherContext.insertAnalyzePipeline(ColumnStatsAutoGatherContext.java:144)
at 
org.apache.hadoop.hive.ql.parse.ColumnStatsAutoGatherContext.insertTableValuesAnalyzePipeline(ColumnStatsAutoGatherContext.java:135)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genAutoColumnStatsGatheringPipeline(SemanticAnalyzer.java:8380)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFileSinkPlan(SemanticAnalyzer.java:7915)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:11064)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:10939)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11854)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11724)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:625)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12557)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:455)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:317)
at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:223)
at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:105)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:500)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:453)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:417)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:411)
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:125)
at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:229)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:256)
at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:201)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:127)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:422)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:353)
at 
org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:783)
at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:753)
at 
org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:142)
at 
org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157)
at 
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:62)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)

[jira] [Updated] (HIVE-25728) ParseException while gathering Column Stats

2021-11-29 Thread Soumyakanti Das (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Soumyakanti Das updated HIVE-25728:
---
Description: 
The {{columnName}} is escaped twice in {{ColumnStatsSemanticAnalyzer}} at [line 
261|https://github.com/apache/hive/blob/934faa73c56920fa19f86da53b5daa5bf7c98ef4/ql/src/java/org/apache/hadoop/hive/ql/parse/ColumnStatsSemanticAnalyzer.java#L261],
 which can cause ParseException. Potential solution is to simply not escape it 
second time.

This can be reproduced as following:


  was:The {{columnName}} is escaped twice in {{ColumnStatsSemanticAnalyzer}} at 
[line 
261|https://github.com/apache/hive/blob/934faa73c56920fa19f86da53b5daa5bf7c98ef4/ql/src/java/org/apache/hadoop/hive/ql/parse/ColumnStatsSemanticAnalyzer.java#L261],
 which can cause ParseException. Potential solution is to simply not escape it 
second time.


> ParseException while gathering Column Stats
> ---
>
> Key: HIVE-25728
> URL: https://issues.apache.org/jira/browse/HIVE-25728
> Project: Hive
>  Issue Type: Bug
>Reporter: Soumyakanti Das
>Priority: Major
>
> The {{columnName}} is escaped twice in {{ColumnStatsSemanticAnalyzer}} at 
> [line 
> 261|https://github.com/apache/hive/blob/934faa73c56920fa19f86da53b5daa5bf7c98ef4/ql/src/java/org/apache/hadoop/hive/ql/parse/ColumnStatsSemanticAnalyzer.java#L261],
>  which can cause ParseException. Potential solution is to simply not escape 
> it second time.
> This can be reproduced as following:



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25749) Check if RelMetadataQuery.collations() returns null to avoid NPE

2021-11-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25749?focusedWorklogId=687507=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-687507
 ]

ASF GitHub Bot logged work on HIVE-25749:
-

Author: ASF GitHub Bot
Created on: 29/Nov/21 18:16
Start Date: 29/Nov/21 18:16
Worklog Time Spent: 10m 
  Work Description: amansinha100 commented on pull request #2823:
URL: https://github.com/apache/hive/pull/2823#issuecomment-981891264


   +1 
   I verified manually against a TPC-DS SF1 dataset that I have locally where 
the following query was previously hitting NPE and it works ok with this 
change. 
   
   Explain CBO
   with my_customers as (
select distinct c_customer_sk
   , c_current_addr_sk
from   
   ( select cs_sold_date_sk sold_date_sk,
cs_bill_customer_sk customer_sk,
cs_item_sk item_sk
 from   catalog_sales
 union all
 select ws_sold_date_sk sold_date_sk,
ws_bill_customer_sk customer_sk,
ws_item_sk item_sk
 from   web_sales
) cs_or_ws_sales,
item,
date_dim,
customer
where   sold_date_sk = d_date_sk
and item_sk = i_item_sk
and i_category = 'Books'
and i_class = 'business'
and c_customer_sk = cs_or_ws_sales.customer_sk
and d_moy = 2
and d_year = 2000
)
   select * from my_customers;


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 687507)
Time Spent: 20m  (was: 10m)

> Check if RelMetadataQuery.collations() returns null to avoid NPE
> 
>
> Key: HIVE-25749
> URL: https://issues.apache.org/jira/browse/HIVE-25749
> Project: Hive
>  Issue Type: Bug
>  Components: CBO, Query Planning
>Affects Versions: 4.0.0
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Accoring to "RelMetadataQuery.collations()" 
> [javadoc|https://github.com/apache/calcite/blob/calcite-1.25.0/core/src/main/java/org/apache/calcite/rel/metadata/RelMetadataQuery.java#L537],
>  the method can return "null" if collactions information are not available.
> Hive invokes the method in two places 
> ([RelFieldTrimmer|https://github.com/apache/hive/blob/1046f41ea36ab3c8b036481128ba9b76dda2882a/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/RelFieldTrimmer.java#L192]
>  and 
> [HiveJoin|https://github.com/apache/hive/blob/1046f41ea36ab3c8b036481128ba9b76dda2882a/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveJoin.java#L206]),
>  but it does not check for "null" return values, which can cause NPE.
> For RelFieldTrimmer, the same bug has been fixed in Calcite (where the code 
> has been taken from) here: 
> https://github.com/apache/calcite/commit/47871235177a3a0d398b1d890d1d2e947028e052



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-25728) ParseException while gathering Column Stats

2021-11-29 Thread Soumyakanti Das (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Soumyakanti Das updated HIVE-25728:
---
Description: The {{columnName}} is escaped twice in 
{{ColumnStatsSemanticAnalyzer}} at [line 
261|https://github.com/apache/hive/blob/934faa73c56920fa19f86da53b5daa5bf7c98ef4/ql/src/java/org/apache/hadoop/hive/ql/parse/ColumnStatsSemanticAnalyzer.java#L261],
 which can cause ParseException. Potential solution is to simply not escape it 
second time.  (was: The {{columnName}} is escaped twice in 
{{ColumnStatsSemanticAnalyzer}} at [line 
262|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/ColumnStatsSemanticAnalyzer.java#L262],
 which can cause ParseException. Potential solution is to simply not escape it 
second time.)

> ParseException while gathering Column Stats
> ---
>
> Key: HIVE-25728
> URL: https://issues.apache.org/jira/browse/HIVE-25728
> Project: Hive
>  Issue Type: Bug
>Reporter: Soumyakanti Das
>Priority: Major
>
> The {{columnName}} is escaped twice in {{ColumnStatsSemanticAnalyzer}} at 
> [line 
> 261|https://github.com/apache/hive/blob/934faa73c56920fa19f86da53b5daa5bf7c98ef4/ql/src/java/org/apache/hadoop/hive/ql/parse/ColumnStatsSemanticAnalyzer.java#L261],
>  which can cause ParseException. Potential solution is to simply not escape 
> it second time.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work started] (HIVE-25734) Wrongly-typed constant in case expression leads to incorrect empty result

2021-11-29 Thread Alessandro Solimando (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-25734 started by Alessandro Solimando.
---
> Wrongly-typed constant in case expression leads to incorrect empty result
> -
>
> Key: HIVE-25734
> URL: https://issues.apache.org/jira/browse/HIVE-25734
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 4.0.0
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
>  
> The type of constants in case expressions should be inferred, if possible, by 
> the "surrounding" input reference columns, if any.
> Consider the following table and query: 
> {code:java}
> create external table test_case (row_seq smallint, row_desc string) stored as 
> parquet;
> insert into test_case values (1, 'a');
> insert into test_case values (2, 'aa');
> insert into test_case values (6, 'aa');
> with base_t as (select row_seq, row_desc,
>   case row_seq
> when 1 then '34'
> when 6 then '35'
> when 2 then '36'
>   end as zb from test_case where row_seq in (1,2,6))
> select row_seq, row_desc, zb from base_t where zb <> '34';{code}
> The aforementioned query fails by returning an empty results, while "1 a 34" 
> is expected.
>  
> To understand the root cause, let's consider the debug input and output of 
> some related CBO rules which are triggered during the evaluation of the 
> query: 
>  
> {noformat}
> --$0 is the column 'row_seq'
> 1. HiveReduceExpressionsRule
> Input: AND(IN($0, 1:SMALLINT, 2:SMALLINT, 6:SMALLINT), <>(CASE(=($0, 
> 1:INTEGER), '34':VARCHAR, =($0, 6:INTEGER), '35':VARCHAR, =($0, 2:INTEGER), 
> '36':VARCHAR, null:VARCHAR), '34':CHAR(2)))
> Output: AND(IN($0, 1:SMALLINT, 2:SMALLINT, 6:SMALLINT), OR(=($0, 6:INTEGER), 
> =($0, 2:INTEGER)), IS NOT TRUE(=($0, 1:INTEGER)))
> 2. HivePointLookupOptimizerRule.RexTransformIntoInClause
> Input: AND(IN($0, 1:SMALLINT, 2:SMALLINT, 6:SMALLINT), OR(=($0, 6:INTEGER), 
> =($0, 2:INTEGER)), IS NOT TRUE(=($0, 1:INTEGER)))
> Output: AND(IN($0, 1:SMALLINT, 2:SMALLINT, 6:SMALLINT), IN($0, 6:INTEGER, 
> 2:INTEGER), IS NOT TRUE(=($0, 1:INTEGER)))
> 3. HivePointLookupOptimizerRule.RexMergeInClause
> Input: AND(IN($0, 1:SMALLINT, 2:SMALLINT, 6:SMALLINT), IN($0, 6:INTEGER, 
> 2:INTEGER), IS NOT TRUE(=($0, 1:INTEGER)))
> Output: false{noformat}
> In the first part, we can see that the constants are correctly typed as 
> "SMALLINT" in the first part of the "AND" operand, while they are typed as 
> "INTEGER" for the "CASE" expression, despite the input reference "$0" being 
> available for inferring a more precise type.
> This type difference makes "HivePointLookupOptimizerRule.RexMergeInClause" 
> missing the commonality between the two "IN" expressions, whose intersection 
> is considered empty, hence the empty result.
> Providing a more refined type inference for "case" expressions should fix the 
> issue.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-25734) Wrongly-typed constant in case expression leads to incorrect empty result

2021-11-29 Thread Alessandro Solimando (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Solimando updated HIVE-25734:

Component/s: Query Planning

> Wrongly-typed constant in case expression leads to incorrect empty result
> -
>
> Key: HIVE-25734
> URL: https://issues.apache.org/jira/browse/HIVE-25734
> Project: Hive
>  Issue Type: Bug
>  Components: CBO, Query Planning
>Affects Versions: 4.0.0
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
>  
> The type of constants in case expressions should be inferred, if possible, by 
> the "surrounding" input reference columns, if any.
> Consider the following table and query: 
> {code:java}
> create external table test_case (row_seq smallint, row_desc string) stored as 
> parquet;
> insert into test_case values (1, 'a');
> insert into test_case values (2, 'aa');
> insert into test_case values (6, 'aa');
> with base_t as (select row_seq, row_desc,
>   case row_seq
> when 1 then '34'
> when 6 then '35'
> when 2 then '36'
>   end as zb from test_case where row_seq in (1,2,6))
> select row_seq, row_desc, zb from base_t where zb <> '34';{code}
> The aforementioned query fails by returning an empty results, while "1 a 34" 
> is expected.
>  
> To understand the root cause, let's consider the debug input and output of 
> some related CBO rules which are triggered during the evaluation of the 
> query: 
>  
> {noformat}
> --$0 is the column 'row_seq'
> 1. HiveReduceExpressionsRule
> Input: AND(IN($0, 1:SMALLINT, 2:SMALLINT, 6:SMALLINT), <>(CASE(=($0, 
> 1:INTEGER), '34':VARCHAR, =($0, 6:INTEGER), '35':VARCHAR, =($0, 2:INTEGER), 
> '36':VARCHAR, null:VARCHAR), '34':CHAR(2)))
> Output: AND(IN($0, 1:SMALLINT, 2:SMALLINT, 6:SMALLINT), OR(=($0, 6:INTEGER), 
> =($0, 2:INTEGER)), IS NOT TRUE(=($0, 1:INTEGER)))
> 2. HivePointLookupOptimizerRule.RexTransformIntoInClause
> Input: AND(IN($0, 1:SMALLINT, 2:SMALLINT, 6:SMALLINT), OR(=($0, 6:INTEGER), 
> =($0, 2:INTEGER)), IS NOT TRUE(=($0, 1:INTEGER)))
> Output: AND(IN($0, 1:SMALLINT, 2:SMALLINT, 6:SMALLINT), IN($0, 6:INTEGER, 
> 2:INTEGER), IS NOT TRUE(=($0, 1:INTEGER)))
> 3. HivePointLookupOptimizerRule.RexMergeInClause
> Input: AND(IN($0, 1:SMALLINT, 2:SMALLINT, 6:SMALLINT), IN($0, 6:INTEGER, 
> 2:INTEGER), IS NOT TRUE(=($0, 1:INTEGER)))
> Output: false{noformat}
> In the first part, we can see that the constants are correctly typed as 
> "SMALLINT" in the first part of the "AND" operand, while they are typed as 
> "INTEGER" for the "CASE" expression, despite the input reference "$0" being 
> available for inferring a more precise type.
> This type difference makes "HivePointLookupOptimizerRule.RexMergeInClause" 
> missing the commonality between the two "IN" expressions, whose intersection 
> is considered empty, hence the empty result.
> Providing a more refined type inference for "case" expressions should fix the 
> issue.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25750) Beeline: Creating a standalone tarball by isolating dependencies

2021-11-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25750?focusedWorklogId=687488=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-687488
 ]

ASF GitHub Bot logged work on HIVE-25750:
-

Author: ASF GitHub Bot
Created on: 29/Nov/21 17:56
Start Date: 29/Nov/21 17:56
Worklog Time Spent: 10m 
  Work Description: achennagiri opened a new pull request #2824:
URL: https://github.com/apache/hive/pull/2824


   The code to create a standalone beeline tarball was created as part of this 
ticket https://issues.apache.org/jira/browse/HIVE-24348. However, a bug was 
reported in the case when the beeline is tried to install without the hadoop 
installed. 
   The beeline script complains of missing dependencies when it is run. 
   
   
   ### What changes were proposed in this pull request?
   The beeline script can be run with/without hadoop installed. All the 
required dependencies are bundled into a single downloadable tar file. 
   `mvn clean package install -Pdist -Pitests -DskipTests -Denforcer.skip=true` 
generates something along the lines of 
   **apache-hive-beeline-4.0.0-SNAPSHOT.tar.gz** in the **packaging/target** 
folder.
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 687488)
Remaining Estimate: 0h
Time Spent: 10m

> Beeline: Creating a standalone tarball by isolating dependencies
> 
>
> Key: HIVE-25750
> URL: https://issues.apache.org/jira/browse/HIVE-25750
> Project: Hive
>  Issue Type: Bug
>Reporter: Abhay
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The code to create a standalone beeline tarball was created as part of this 
> ticket https://issues.apache.org/jira/browse/HIVE-24348. However, a bug was 
> reported in the case when the beeline is tried to install without the hadoop 
> installed. 
> The beeline script complains of missing dependencies when it is run.
> The ask as part of this ticket is to fix that bug. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-25750) Beeline: Creating a standalone tarball by isolating dependencies

2021-11-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25750:
--
Labels: pull-request-available  (was: )

> Beeline: Creating a standalone tarball by isolating dependencies
> 
>
> Key: HIVE-25750
> URL: https://issues.apache.org/jira/browse/HIVE-25750
> Project: Hive
>  Issue Type: Bug
>Reporter: Abhay
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The code to create a standalone beeline tarball was created as part of this 
> ticket https://issues.apache.org/jira/browse/HIVE-24348. However, a bug was 
> reported in the case when the beeline is tried to install without the hadoop 
> installed. 
> The beeline script complains of missing dependencies when it is run.
> The ask as part of this ticket is to fix that bug. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-25750) Beeline: Creating a standalone tarball by isolating dependencies

2021-11-29 Thread Abhay (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhay updated HIVE-25750:
-
Description: 
The code to create a standalone beeline tarball was created as part of this 
ticket https://issues.apache.org/jira/browse/HIVE-24348. However, a bug was 
reported in the case when the beeline is tried to install without the hadoop 
installed. 
The beeline script complains of missing dependencies when it is run.
The ask as part of this ticket is to fix that bug. 

  was:Beeline: Isolating dependencies and execution with java


> Beeline: Creating a standalone tarball by isolating dependencies
> 
>
> Key: HIVE-25750
> URL: https://issues.apache.org/jira/browse/HIVE-25750
> Project: Hive
>  Issue Type: Bug
>Reporter: Abhay
>Priority: Major
>
> The code to create a standalone beeline tarball was created as part of this 
> ticket https://issues.apache.org/jira/browse/HIVE-24348. However, a bug was 
> reported in the case when the beeline is tried to install without the hadoop 
> installed. 
> The beeline script complains of missing dependencies when it is run.
> The ask as part of this ticket is to fix that bug. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25652) Add constraints in result of “SHOW CREATE TABLE ”

2021-11-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25652?focusedWorklogId=687439=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-687439
 ]

ASF GitHub Bot logged work on HIVE-25652:
-

Author: ASF GitHub Bot
Created on: 29/Nov/21 16:58
Start Date: 29/Nov/21 16:58
Worklog Time Spent: 10m 
  Work Description: soumyakanti3578 commented on a change in pull request 
#2777:
URL: https://github.com/apache/hive/pull/2777#discussion_r758558430



##
File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/CheckConstraint.java
##
@@ -34,13 +34,40 @@
 @SuppressWarnings("serial")
 public class CheckConstraint implements Serializable {
 
-  public class CheckConstraintCol {
-public String colName;
-public String checkExpression;
-
-public CheckConstraintCol(String colName, String checkExpression) {
+  public static class CheckConstraintCol {
+private final String colName;
+private final String checkExpression;
+private final String enable;
+private final String validate;
+private final String rely;

Review comment:
   I think we should keep them as `String` becasue:
   1. we can get the boolean values using `constraint.isEnable_cstr()` anyway.
   2. We are setting them to String values based on its boolean values 
[here](https://github.com/apache/hive/pull/2777/files/3c667cfa85e41f5c7377aa0f606b3d1fe7c467e5#diff-adcaf114fc9b04e43b9004e5464e588a85022712a79b3592d374fb69e8d1f8bdR98),
 which helps while printing.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 687439)
Time Spent: 3h 10m  (was: 3h)

> Add constraints in result of “SHOW CREATE TABLE ”
> -
>
> Key: HIVE-25652
> URL: https://issues.apache.org/jira/browse/HIVE-25652
> Project: Hive
>  Issue Type: Improvement
>Reporter: Soumyakanti Das
>Assignee: Soumyakanti Das
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Currently show create table doesn’t pull any constraint info like not null, 
> defaults, primary key.
> Example:
> Create table
>  
> {code:java}
> CREATE TABLE TEST(
>   col1 varchar(100) NOT NULL COMMENT "comment for column 1",
>   col2 timestamp DEFAULT CURRENT_TIMESTAMP() COMMENT "comment for column 2",
>   col3 decimal,
>   col4 varchar(512) NOT NULL,
>   col5 varchar(100),
>   primary key(col1, col2) disable novalidate)
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat';
> {code}
> Currently {{SHOW CREATE TABLE TEST}} doesn't show the column constraints.
> {code:java}
> CREATE TABLE `test`(
>   `col1` varchar(100) COMMENT 'comment for column 1', 
>   `col2` timestamp COMMENT 'comment for column 2', 
>   `col3` decimal(10,0), 
>   `col4` varchar(512), 
>   `col5` varchar(100))
> ROW FORMAT SERDE 
>   'org.apache.hadoop.hive.ql.io.orc.OrcSerde' 
> STORED AS INPUTFORMAT 
>   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' 
> OUTPUTFORMAT 
>   'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-25749) Check if RelMetadataQuery.collations() returns null to avoid NPE

2021-11-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25749:
--
Labels: pull-request-available  (was: )

> Check if RelMetadataQuery.collations() returns null to avoid NPE
> 
>
> Key: HIVE-25749
> URL: https://issues.apache.org/jira/browse/HIVE-25749
> Project: Hive
>  Issue Type: Bug
>  Components: CBO, Query Planning
>Affects Versions: 4.0.0
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Accoring to "RelMetadataQuery.collations()" 
> [javadoc|https://github.com/apache/calcite/blob/calcite-1.25.0/core/src/main/java/org/apache/calcite/rel/metadata/RelMetadataQuery.java#L537],
>  the method can return "null" if collactions information are not available.
> Hive invokes the method in two places 
> ([RelFieldTrimmer|https://github.com/apache/hive/blob/1046f41ea36ab3c8b036481128ba9b76dda2882a/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/RelFieldTrimmer.java#L192]
>  and 
> [HiveJoin|https://github.com/apache/hive/blob/1046f41ea36ab3c8b036481128ba9b76dda2882a/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveJoin.java#L206]),
>  but it does not check for "null" return values, which can cause NPE.
> For RelFieldTrimmer, the same bug has been fixed in Calcite (where the code 
> has been taken from) here: 
> https://github.com/apache/calcite/commit/47871235177a3a0d398b1d890d1d2e947028e052



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25749) Check if RelMetadataQuery.collations() returns null to avoid NPE

2021-11-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25749?focusedWorklogId=687431=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-687431
 ]

ASF GitHub Bot logged work on HIVE-25749:
-

Author: ASF GitHub Bot
Created on: 29/Nov/21 16:48
Start Date: 29/Nov/21 16:48
Worklog Time Spent: 10m 
  Work Description: asolimando opened a new pull request #2823:
URL: https://github.com/apache/hive/pull/2823


   …oid NPE
   
   
   
   ### What changes were proposed in this pull request?
   
   Prevent NPE by checking RelMetadataQuery.collations()'s return value.
   
   ### Why are the changes needed?
   
   
   NPE can be thrown if the return value is null.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No
   
   ### How was this patch tested?
   
   
   Existing tests, I could not reproduce the NPE via unit testing.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 687431)
Remaining Estimate: 0h
Time Spent: 10m

> Check if RelMetadataQuery.collations() returns null to avoid NPE
> 
>
> Key: HIVE-25749
> URL: https://issues.apache.org/jira/browse/HIVE-25749
> Project: Hive
>  Issue Type: Bug
>  Components: CBO, Query Planning
>Affects Versions: 4.0.0
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Accoring to "RelMetadataQuery.collations()" 
> [javadoc|https://github.com/apache/calcite/blob/calcite-1.25.0/core/src/main/java/org/apache/calcite/rel/metadata/RelMetadataQuery.java#L537],
>  the method can return "null" if collactions information are not available.
> Hive invokes the method in two places 
> ([RelFieldTrimmer|https://github.com/apache/hive/blob/1046f41ea36ab3c8b036481128ba9b76dda2882a/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/RelFieldTrimmer.java#L192]
>  and 
> [HiveJoin|https://github.com/apache/hive/blob/1046f41ea36ab3c8b036481128ba9b76dda2882a/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveJoin.java#L206]),
>  but it does not check for "null" return values, which can cause NPE.
> For RelFieldTrimmer, the same bug has been fixed in Calcite (where the code 
> has been taken from) here: 
> https://github.com/apache/calcite/commit/47871235177a3a0d398b1d890d1d2e947028e052



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-25749) Check if RelMetadataQuery.collations() returns null to avoid NPE

2021-11-29 Thread Alessandro Solimando (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Solimando updated HIVE-25749:

Summary: Check if RelMetadataQuery.collations() returns null to avoid NPE  
(was: Check if RelMetadataQuery.collations() return null to avoid NPE)

> Check if RelMetadataQuery.collations() returns null to avoid NPE
> 
>
> Key: HIVE-25749
> URL: https://issues.apache.org/jira/browse/HIVE-25749
> Project: Hive
>  Issue Type: Bug
>  Components: CBO, Query Planning
>Affects Versions: 4.0.0
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>
> Accoring to "RelMetadataQuery.collations()" 
> [javadoc|https://github.com/apache/calcite/blob/calcite-1.25.0/core/src/main/java/org/apache/calcite/rel/metadata/RelMetadataQuery.java#L537],
>  the method can return "null" if collactions information are not available.
> Hive invokes the method in two places 
> ([RelFieldTrimmer|https://github.com/apache/hive/blob/1046f41ea36ab3c8b036481128ba9b76dda2882a/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/RelFieldTrimmer.java#L192]
>  and 
> [HiveJoin|https://github.com/apache/hive/blob/1046f41ea36ab3c8b036481128ba9b76dda2882a/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveJoin.java#L206]),
>  but it does not check for "null" return values, which can cause NPE.
> For RelFieldTrimmer, the same bug has been fixed in Calcite (where the code 
> has been taken from) here: 
> https://github.com/apache/calcite/commit/47871235177a3a0d398b1d890d1d2e947028e052



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-25749) Check if RelMetadataQuery.collations() return null to avoid NPE

2021-11-29 Thread Alessandro Solimando (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Solimando updated HIVE-25749:

Description: 
Accoring to "RelMetadataQuery.collations()" 
[javadoc|https://github.com/apache/calcite/blob/calcite-1.25.0/core/src/main/java/org/apache/calcite/rel/metadata/RelMetadataQuery.java#L537],
 the method can return "null" if collactions information are not available.

Hive invokes the method in two places 
([RelFieldTrimmer|https://github.com/apache/hive/blob/1046f41ea36ab3c8b036481128ba9b76dda2882a/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/RelFieldTrimmer.java#L192]
 and 
[HiveJoin|https://github.com/apache/hive/blob/1046f41ea36ab3c8b036481128ba9b76dda2882a/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveJoin.java#L206]),
 but it does not check for "null" return values, which can cause NPE.

For RelFieldTrimmer, the same bug has been fixed in Calcite (where the code has 
been taken from) here: 
https://github.com/apache/calcite/commit/47871235177a3a0d398b1d890d1d2e947028e052

  was:
Accoring to "RelMetadataQuery.collations()" 
[javadoc|https://github.com/apache/calcite/blob/calcite-1.25.0/core/src/main/java/org/apache/calcite/rel/metadata/RelMetadataQuery.java#L537],
 the method can return "null" if collactions information are not available.

Hive invokes the method in two places 
([RelFieldTrimmer|https://github.com/apache/hive/blob/1046f41ea36ab3c8b036481128ba9b76dda2882a/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/RelFieldTrimmer.java#L192]
 and 
[HiveJoin|https://github.com/apache/hive/blob/1046f41ea36ab3c8b036481128ba9b76dda2882a/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveJoin.java#L206]),
 but it does not check for "null" return values, which can cause NPE.


> Check if RelMetadataQuery.collations() return null to avoid NPE
> ---
>
> Key: HIVE-25749
> URL: https://issues.apache.org/jira/browse/HIVE-25749
> Project: Hive
>  Issue Type: Bug
>  Components: CBO, Query Planning
>Affects Versions: 4.0.0
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>
> Accoring to "RelMetadataQuery.collations()" 
> [javadoc|https://github.com/apache/calcite/blob/calcite-1.25.0/core/src/main/java/org/apache/calcite/rel/metadata/RelMetadataQuery.java#L537],
>  the method can return "null" if collactions information are not available.
> Hive invokes the method in two places 
> ([RelFieldTrimmer|https://github.com/apache/hive/blob/1046f41ea36ab3c8b036481128ba9b76dda2882a/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/RelFieldTrimmer.java#L192]
>  and 
> [HiveJoin|https://github.com/apache/hive/blob/1046f41ea36ab3c8b036481128ba9b76dda2882a/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveJoin.java#L206]),
>  but it does not check for "null" return values, which can cause NPE.
> For RelFieldTrimmer, the same bug has been fixed in Calcite (where the code 
> has been taken from) here: 
> https://github.com/apache/calcite/commit/47871235177a3a0d398b1d890d1d2e947028e052



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work started] (HIVE-25749) Check if RelMetadataQuery.collations() return null to avoid NPE

2021-11-29 Thread Alessandro Solimando (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-25749 started by Alessandro Solimando.
---
> Check if RelMetadataQuery.collations() return null to avoid NPE
> ---
>
> Key: HIVE-25749
> URL: https://issues.apache.org/jira/browse/HIVE-25749
> Project: Hive
>  Issue Type: Bug
>  Components: CBO, Query Planning
>Affects Versions: 4.0.0
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>
> Accoring to "RelMetadataQuery.collations()" 
> [javadoc|https://github.com/apache/calcite/blob/calcite-1.25.0/core/src/main/java/org/apache/calcite/rel/metadata/RelMetadataQuery.java#L537],
>  the method can return "null" if collactions information are not available.
> Hive invokes the method in two places 
> ([RelFieldTrimmer|https://github.com/apache/hive/blob/1046f41ea36ab3c8b036481128ba9b76dda2882a/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/RelFieldTrimmer.java#L192]
>  and 
> [HiveJoin|https://github.com/apache/hive/blob/1046f41ea36ab3c8b036481128ba9b76dda2882a/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveJoin.java#L206]),
>  but it does not check for "null" return values, which can cause NPE.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25738) NullIf doesn't support complex types

2021-11-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25738?focusedWorklogId=687418=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-687418
 ]

ASF GitHub Bot logged work on HIVE-25738:
-

Author: ASF GitHub Bot
Created on: 29/Nov/21 16:31
Start Date: 29/Nov/21 16:31
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on a change in pull request #2816:
URL: https://github.com/apache/hive/pull/2816#discussion_r758533743



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFNullif.java
##
@@ -86,17 +87,13 @@ public ObjectInspector initialize(ObjectInspector[] 
arguments) throws UDFArgumen
   public Object evaluate(DeferredObject[] arguments) throws HiveException {
 Object arg0 = arguments[0].get();
 Object arg1 = arguments[1].get();
-Object value0 = null;
-if (arg0 != null) {
-  value0 = returnOIResolver.convertIfNecessary(arg0, argumentOIs[0], 
false);
-}
+Object value0 = returnOIResolver.convertIfNecessary(arg0, argumentOIs[0], 
false);
 if (arg0 == null || arg1 == null) {
   return value0;
 }
-PrimitiveObjectInspector compareOI = (PrimitiveObjectInspector) 
returnOIResolver.get();
-if (PrimitiveObjectInspectorUtils.comparePrimitiveObjects(
-value0, compareOI,
-returnOIResolver.convertIfNecessary(arg1, argumentOIs[1], false), 
compareOI)) {
+Object value1 = returnOIResolver.convertIfNecessary(arg1, argumentOIs[1], 
false);
+ObjectInspector compareOI = returnOIResolver.get();
+if (ObjectInspectorUtils.compare(value0, compareOI, value1, compareOI) == 
0) {

Review comment:
   I think `UNION` is probably a corner feature and not in wide usage - 
it's a good idea to throw an exception instead of going down the dragon's path




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 687418)
Time Spent: 1h 10m  (was: 1h)

> NullIf doesn't support complex types
> 
>
> Key: HIVE-25738
> URL: https://issues.apache.org/jira/browse/HIVE-25738
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> {code}
> SELECT NULLIF(array(1,2,3),array(1,2,3))
> {code}
> results in:
> {code}
>  java.lang.ClassCastException: 
> org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector 
> cannot be cast to 
> org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector
>   at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFNullif.evaluate(GenericUDFNullif.java:96)
>   at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDF.initializeAndFoldConstants(GenericUDF.java:177)
>   at 
> org.apache.hadoop.hive.ql.parse.type.HiveFunctionHelper.getReturnType(HiveFunctionHelper.java:135)
>   at 
> org.apache.hadoop.hive.ql.parse.type.RexNodeExprFactory.createFuncCallExpr(RexNodeExprFactory.java:647)
> [...]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (HIVE-25749) Check if RelMetadataQuery.collations() return null to avoid NPE

2021-11-29 Thread Alessandro Solimando (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Solimando reassigned HIVE-25749:
---


> Check if RelMetadataQuery.collations() return null to avoid NPE
> ---
>
> Key: HIVE-25749
> URL: https://issues.apache.org/jira/browse/HIVE-25749
> Project: Hive
>  Issue Type: Bug
>  Components: CBO, Query Planning
>Affects Versions: 4.0.0
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>
> Accoring to "RelMetadataQuery.collations()" 
> [javadoc|https://github.com/apache/calcite/blob/calcite-1.25.0/core/src/main/java/org/apache/calcite/rel/metadata/RelMetadataQuery.java#L537],
>  the method can return "null" if collactions information are not available.
> Hive invokes the method in two places 
> ([RelFieldTrimmer|https://github.com/apache/hive/blob/1046f41ea36ab3c8b036481128ba9b76dda2882a/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/RelFieldTrimmer.java#L192]
>  and 
> [HiveJoin|https://github.com/apache/hive/blob/1046f41ea36ab3c8b036481128ba9b76dda2882a/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveJoin.java#L206]),
>  but it does not check for "null" return values, which can cause NPE.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25738) NullIf doesn't support complex types

2021-11-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25738?focusedWorklogId=687383=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-687383
 ]

ASF GitHub Bot logged work on HIVE-25738:
-

Author: ASF GitHub Bot
Created on: 29/Nov/21 15:48
Start Date: 29/Nov/21 15:48
Worklog Time Spent: 10m 
  Work Description: zabetak commented on a change in pull request #2816:
URL: https://github.com/apache/hive/pull/2816#discussion_r758491714



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFNullif.java
##
@@ -86,17 +87,13 @@ public ObjectInspector initialize(ObjectInspector[] 
arguments) throws UDFArgumen
   public Object evaluate(DeferredObject[] arguments) throws HiveException {
 Object arg0 = arguments[0].get();
 Object arg1 = arguments[1].get();
-Object value0 = null;
-if (arg0 != null) {
-  value0 = returnOIResolver.convertIfNecessary(arg0, argumentOIs[0], 
false);
-}
+Object value0 = returnOIResolver.convertIfNecessary(arg0, argumentOIs[0], 
false);
 if (arg0 == null || arg1 == null) {
   return value0;
 }
-PrimitiveObjectInspector compareOI = (PrimitiveObjectInspector) 
returnOIResolver.get();
-if (PrimitiveObjectInspectorUtils.comparePrimitiveObjects(
-value0, compareOI,
-returnOIResolver.convertIfNecessary(arg1, argumentOIs[1], false), 
compareOI)) {
+Object value1 = returnOIResolver.convertIfNecessary(arg1, argumentOIs[1], 
false);
+ObjectInspector compareOI = returnOIResolver.get();
+if (ObjectInspectorUtils.compare(value0, compareOI, value1, compareOI) == 
0) {

Review comment:
   Thanks for adding more tests Zoltan. I always find interesting stuff 
whenever I decide to add a test :)
   
   FYI: When I worked on adding support for non-primitive types in simple 
equality operations (HIVE-24886) I excluded UNION types. I wasn't sure what the 
semantics should be so I though it was better to throw an exception and say the 
operation is not permitted.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 687383)
Time Spent: 1h  (was: 50m)

> NullIf doesn't support complex types
> 
>
> Key: HIVE-25738
> URL: https://issues.apache.org/jira/browse/HIVE-25738
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> {code}
> SELECT NULLIF(array(1,2,3),array(1,2,3))
> {code}
> results in:
> {code}
>  java.lang.ClassCastException: 
> org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector 
> cannot be cast to 
> org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector
>   at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFNullif.evaluate(GenericUDFNullif.java:96)
>   at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDF.initializeAndFoldConstants(GenericUDF.java:177)
>   at 
> org.apache.hadoop.hive.ql.parse.type.HiveFunctionHelper.getReturnType(HiveFunctionHelper.java:135)
>   at 
> org.apache.hadoop.hive.ql.parse.type.RexNodeExprFactory.createFuncCallExpr(RexNodeExprFactory.java:647)
> [...]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25738) NullIf doesn't support complex types

2021-11-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25738?focusedWorklogId=687369=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-687369
 ]

ASF GitHub Bot logged work on HIVE-25738:
-

Author: ASF GitHub Bot
Created on: 29/Nov/21 15:25
Start Date: 29/Nov/21 15:25
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on a change in pull request #2816:
URL: https://github.com/apache/hive/pull/2816#discussion_r758469294



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFNullif.java
##
@@ -86,17 +87,13 @@ public ObjectInspector initialize(ObjectInspector[] 
arguments) throws UDFArgumen
   public Object evaluate(DeferredObject[] arguments) throws HiveException {
 Object arg0 = arguments[0].get();
 Object arg1 = arguments[1].get();
-Object value0 = null;
-if (arg0 != null) {
-  value0 = returnOIResolver.convertIfNecessary(arg0, argumentOIs[0], 
false);
-}
+Object value0 = returnOIResolver.convertIfNecessary(arg0, argumentOIs[0], 
false);
 if (arg0 == null || arg1 == null) {
   return value0;
 }
-PrimitiveObjectInspector compareOI = (PrimitiveObjectInspector) 
returnOIResolver.get();
-if (PrimitiveObjectInspectorUtils.comparePrimitiveObjects(
-value0, compareOI,
-returnOIResolver.convertIfNecessary(arg1, argumentOIs[1], false), 
compareOI)) {
+Object value1 = returnOIResolver.convertIfNecessary(arg1, argumentOIs[1], 
false);
+ObjectInspector compareOI = returnOIResolver.get();
+if (ObjectInspectorUtils.compare(value0, compareOI, value1, compareOI) == 
0) {

Review comment:
   it does work - but there are some interesting findings:
   
   both of the following cases change the "non-used" part of the union (note: 
`create_union(idx,o0,o1)` creates a union which uses the `idx`-th object)
   ```
   SELECT (NULLIF(create_union(0,1,2),create_union(0,1,3)) is not null);
   false
   SELECT (NULLIF(create_union(0,1,2),create_union(1,2,1)) is not null);
   true
   ```
   it seems like the comparision ignores the unused/dormant parts which could 
be ok; but it seems like it also compares the `idx` field - I think it should 
either also ignore the `idx` field or also consider the dormant objects
   
   opened HIVE-25748 to look into this
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 687369)
Time Spent: 50m  (was: 40m)

> NullIf doesn't support complex types
> 
>
> Key: HIVE-25738
> URL: https://issues.apache.org/jira/browse/HIVE-25738
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> {code}
> SELECT NULLIF(array(1,2,3),array(1,2,3))
> {code}
> results in:
> {code}
>  java.lang.ClassCastException: 
> org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector 
> cannot be cast to 
> org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector
>   at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFNullif.evaluate(GenericUDFNullif.java:96)
>   at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDF.initializeAndFoldConstants(GenericUDF.java:177)
>   at 
> org.apache.hadoop.hive.ql.parse.type.HiveFunctionHelper.getReturnType(HiveFunctionHelper.java:135)
>   at 
> org.apache.hadoop.hive.ql.parse.type.RexNodeExprFactory.createFuncCallExpr(RexNodeExprFactory.java:647)
> [...]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HIVE-25748) Investigate Union comparision

2021-11-29 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17450528#comment-17450528
 ] 

Zoltan Haindrich commented on HIVE-25748:
-

it seems like the comparision ignores the unused/dormant parts which could be 
ok; but it seems like it also compares the idx field - I think it should either 
also ignore the idx field or also consider the dormant objects

> Investigate Union comparision
> -
>
> Key: HIVE-25748
> URL: https://issues.apache.org/jira/browse/HIVE-25748
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Priority: Major
>
> both of the following cases change the "non-used" part of the union (note: 
> `create_union(idx,o0,o1)` creates a union which uses the `idx`-th object)
> {code}
> SELECT (NULLIF(create_union(0,1,2),create_union(0,1,3)) is not null);
> false
> SELECT (NULLIF(create_union(0,1,2),create_union(1,2,1)) is not null);
> true
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25666) Realtime memory usage in beeline progress

2021-11-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25666?focusedWorklogId=687368=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-687368
 ]

ASF GitHub Bot logged work on HIVE-25666:
-

Author: ASF GitHub Bot
Created on: 29/Nov/21 15:24
Start Date: 29/Nov/21 15:24
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on a change in pull request #2812:
URL: https://github.com/apache/hive/pull/2812#discussion_r758468637



##
File path: 
llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskCommunicator.java
##
@@ -1163,4 +1165,15 @@ private QueryIdentifierProto 
constructQueryIdentifierProto(int dagIdentifier) {
   public String getAmHostString() {
 return amHost;
   }
+
+  /**
+   * Overrides TezTaskCommunicatorImpl.getTotalUsedMemory in order to provide 
correct aggregated memory usage.
+   * In LLAP, every container reports the whole used heap of the daemon 
they're running in, so we need to consider
+   * every usedMemory once per daemon.
+   * @return
+   */
+  @Override
+  public long getTotalUsedMemory() {
+return pingedNodeMap.values().stream().mapToLong(c -> c.usedMemory).sum();

Review comment:
   thanks for the comments @pgaref 
   ```
   Sum does not provide much value here
   ```
   I see, the sum is just one information (we need at least the number of 
daemons additionally to make it have a meaning), but the percentage looks more 
useful
   I was already thinking about percentage, what should be the value of 100%? I 
mean, usedMemory is heap usage in daemons, what do you think we should consider 
as the "whole" memory?
   I found adjustedExecutorMemory which would make sense:
   
https://github.com/apache/hive/blob/1046f41ea36ab3c8b036481128ba9b76dda2882a/llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/LlapDaemon.java#L227
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 687368)
Time Spent: 0.5h  (was: 20m)

> Realtime memory usage in beeline progress
> -
>
> Key: HIVE-25666
> URL: https://issues.apache.org/jira/browse/HIVE-25666
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
> Attachments: llap_memory_monitoring_beeline.gif
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (HIVE-25747) Make a cost base decision when rebuilding materialized views

2021-11-29 Thread Krisztian Kasa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa reassigned HIVE-25747:
-


> Make a cost base decision when rebuilding materialized views
> 
>
> Key: HIVE-25747
> URL: https://issues.apache.org/jira/browse/HIVE-25747
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO, Materialized views
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>
> Choose between full insert-overwrite and partition based incremental rebuild 
> plan when rebuilding partitioned materialized views.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-25746) Compaction Failure Counter counted incorrectly

2021-11-29 Thread Viktor Csomor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viktor Csomor updated HIVE-25746:
-
Description: 
The count of the below metrics counted incorrectly upon an exception.
- {{compaction_initator_failure_counter}}
- {{compaction_cleaner_failure_counter}}

Reasoning:
In the {{Initator}}/{{Cleaner}} class creates a list of {{CompletableFuture}} 
which {{Runnable}} core exception is being wrapped to {{RuntimeExceptions}}. 
The below code-snippet waits all cleaners to complete (Initiators does it 
similarly).
{code:java}
try {
   
for (CompactionInfo compactionInfo : readyToClean) {
  
cleanerList.add(CompletableFuture.runAsync(CompactorUtil.ThrowingRunnable.unchecked(()
 ->
  clean(compactionInfo, cleanerWaterMark, metricsEnabled)), 
cleanerExecutor));
}
CompletableFuture.allOf(cleanerList.toArray(new 
CompletableFuture[0])).join();
  }
} catch (Throwable t) {
  // the lock timeout on AUX lock, should be ignored.
  if (metricsEnabled && handle != null) {
failuresCounter.inc();
  }
{code}

If the {{CompleteableFututre#join}} throws an Exception then the failure 
counter is incremented.

Docs:
{code}

/**
 * Returns the result value when complete, or throws an
 * (unchecked) exception if completed exceptionally. To better
 * conform with the use of common functional forms, if a
 * computation involved in the completion of this
 * CompletableFuture threw an exception, this method throws an
 * (unchecked) {@link CompletionException} with the underlying
 * exception as its cause.
 *
 * @return the result value
 * @throws CancellationException if the computation was cancelled
 * @throws CompletionException if this future completed
 * exceptionally or a completion computation threw an exception
 */
public T join() {
Object r;
return reportJoin((r = result) == null ? waitingGet(false) : r);
}
{code}

(!) Let's suppose we have 10 cleaners and the 2nd throws an exception. The 
{{catch}} block will be initiated and the {{failuresCounter}} will be 
incremented. If there is any consecutive error amongst the remaining cleaners 
the counter won't be incremented. 

  was:
The count of the below metrics counted incorrectly upon an exception.
- {{compaction_initator_failure_counter}}
- {{compaction_cleaner_failure_counter}}

Reasoning:
In the {{Initator}}/{{Cleaner}} class creates a list of {{CompletableFuture}} 
which {{Runnable}} core exception is being wrapped to {{RuntimeExceptions}}. 
The below code-snippet waits all cleaners to complete (Initiators does it 
similarly).
{code:java}
try {
   
for (CompactionInfo compactionInfo : readyToClean) {
  
cleanerList.add(CompletableFuture.runAsync(CompactorUtil.ThrowingRunnable.unchecked(()
 ->
  clean(compactionInfo, cleanerWaterMark, metricsEnabled)), 
cleanerExecutor));
}
CompletableFuture.allOf(cleanerList.toArray(new 
CompletableFuture[0])).join();
  }
} catch (Throwable t) {
  // the lock timeout on AUX lock, should be ignored.
  if (metricsEnabled && handle != null) {
failuresCounter.inc();
  }
{code}

If the {{CompleteableFututre#join}} throws an Exception then the failure 
counter is incremented.

Let's suppose we have 10 cleaners and the 2nd throws an exception. The 
{{catch}} block will be initiated and the {{failuresCounter}} will be 
incremented. If there is any consecutive error amongst the remaining cleaners 
the counter won't be incremented. 


> Compaction Failure Counter counted incorrectly
> --
>
> Key: HIVE-25746
> URL: https://issues.apache.org/jira/browse/HIVE-25746
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 4.0.0
>Reporter: Viktor Csomor
>Assignee: Viktor Csomor
>Priority: Minor
>
> The count of the below metrics counted incorrectly upon an exception.
> - {{compaction_initator_failure_counter}}
> - {{compaction_cleaner_failure_counter}}
> Reasoning:
> In the {{Initator}}/{{Cleaner}} class creates a list of {{CompletableFuture}} 
> which {{Runnable}} core exception is being wrapped to {{RuntimeExceptions}}. 
> The below code-snippet waits all cleaners to complete (Initiators does it 
> similarly).
> {code:java}
> try {
>
> for (CompactionInfo compactionInfo : readyToClean) {
>   
> cleanerList.add(CompletableFuture.runAsync(CompactorUtil.ThrowingRunnable.unchecked(()
>  ->
>   clean(compactionInfo, cleanerWaterMark, 
> metricsEnabled)), 

[jira] [Assigned] (HIVE-25746) Compaction Failure Counter counted incorrectly

2021-11-29 Thread Viktor Csomor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viktor Csomor reassigned HIVE-25746:



> Compaction Failure Counter counted incorrectly
> --
>
> Key: HIVE-25746
> URL: https://issues.apache.org/jira/browse/HIVE-25746
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 4.0.0
>Reporter: Viktor Csomor
>Assignee: Viktor Csomor
>Priority: Minor
>
> The count of the below metrics counted incorrectly upon an exception.
> - {{compaction_initator_failure_counter}}
> - {{compaction_cleaner_failure_counter}}
> Reasoning:
> In the {{Initator}}/{{Cleaner}} class creates a list of {{CompletableFuture}} 
> which {{Runnable}} core exception is being wrapped to {{RuntimeExceptions}}. 
> The below code-snippet waits all cleaners to complete (Initiators does it 
> similarly).
> {code:java}
> try {
>
> for (CompactionInfo compactionInfo : readyToClean) {
>   
> cleanerList.add(CompletableFuture.runAsync(CompactorUtil.ThrowingRunnable.unchecked(()
>  ->
>   clean(compactionInfo, cleanerWaterMark, 
> metricsEnabled)), cleanerExecutor));
> }
> CompletableFuture.allOf(cleanerList.toArray(new 
> CompletableFuture[0])).join();
>   }
> } catch (Throwable t) {
>   // the lock timeout on AUX lock, should be ignored.
>   if (metricsEnabled && handle != null) {
> failuresCounter.inc();
>   }
> {code}
> If the {{CompleteableFututre#join}} throws an Exception then the failure 
> counter is incremented.
> Let's suppose we have 10 cleaners and the 2nd throws an exception. The 
> {{catch}} block will be initiated and the {{failuresCounter}} will be 
> incremented. If there is any consecutive error amongst the remaining cleaners 
> the counter won't be incremented. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25740) Handle race condition between compaction txn abort/commit and heartbeater

2021-11-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25740?focusedWorklogId=687357=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-687357
 ]

ASF GitHub Bot logged work on HIVE-25740:
-

Author: ASF GitHub Bot
Created on: 29/Nov/21 14:56
Start Date: 29/Nov/21 14:56
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on pull request #2817:
URL: https://github.com/apache/hive/pull/2817#issuecomment-981710913


   @klcopp can you please a take a quick look too when you get the chance? thx


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 687357)
Time Spent: 1h  (was: 50m)

> Handle race condition between compaction txn abort/commit and heartbeater
> -
>
> Key: HIVE-25740
> URL: https://issues.apache.org/jira/browse/HIVE-25740
> Project: Hive
>  Issue Type: Bug
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> This issue is the following: once the compaction worker finishes, 
> commitTxn/abortTxn is invoked first, and the heartbeater thread is only 
> interrupted after that. This can lead to race conditions where the txn has 
> already been deleted from the backend DB via commit/abort, but the 
> concurrently running heartbeater thread still attempts to send a last 
> heartbeat after that, but the txn id won't be found in the DB, leading to 
> {{{}NoSuchTxnException{}}}.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25734) Wrongly-typed constant in case expression leads to incorrect empty result

2021-11-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25734?focusedWorklogId=687354=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-687354
 ]

ASF GitHub Bot logged work on HIVE-25734:
-

Author: ASF GitHub Bot
Created on: 29/Nov/21 14:46
Start Date: 29/Nov/21 14:46
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on pull request #2815:
URL: https://github.com/apache/hive/pull/2815#issuecomment-981701728


   no need; we will squash it before merging the changes - if you force-push 
the branch all the earlier comment will loose context; and the review system 
also looses the marking of which files are marked as viewed


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 687354)
Time Spent: 50m  (was: 40m)

> Wrongly-typed constant in case expression leads to incorrect empty result
> -
>
> Key: HIVE-25734
> URL: https://issues.apache.org/jira/browse/HIVE-25734
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 4.0.0
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
>  
> The type of constants in case expressions should be inferred, if possible, by 
> the "surrounding" input reference columns, if any.
> Consider the following table and query: 
> {code:java}
> create external table test_case (row_seq smallint, row_desc string) stored as 
> parquet;
> insert into test_case values (1, 'a');
> insert into test_case values (2, 'aa');
> insert into test_case values (6, 'aa');
> with base_t as (select row_seq, row_desc,
>   case row_seq
> when 1 then '34'
> when 6 then '35'
> when 2 then '36'
>   end as zb from test_case where row_seq in (1,2,6))
> select row_seq, row_desc, zb from base_t where zb <> '34';{code}
> The aforementioned query fails by returning an empty results, while "1 a 34" 
> is expected.
>  
> To understand the root cause, let's consider the debug input and output of 
> some related CBO rules which are triggered during the evaluation of the 
> query: 
>  
> {noformat}
> --$0 is the column 'row_seq'
> 1. HiveReduceExpressionsRule
> Input: AND(IN($0, 1:SMALLINT, 2:SMALLINT, 6:SMALLINT), <>(CASE(=($0, 
> 1:INTEGER), '34':VARCHAR, =($0, 6:INTEGER), '35':VARCHAR, =($0, 2:INTEGER), 
> '36':VARCHAR, null:VARCHAR), '34':CHAR(2)))
> Output: AND(IN($0, 1:SMALLINT, 2:SMALLINT, 6:SMALLINT), OR(=($0, 6:INTEGER), 
> =($0, 2:INTEGER)), IS NOT TRUE(=($0, 1:INTEGER)))
> 2. HivePointLookupOptimizerRule.RexTransformIntoInClause
> Input: AND(IN($0, 1:SMALLINT, 2:SMALLINT, 6:SMALLINT), OR(=($0, 6:INTEGER), 
> =($0, 2:INTEGER)), IS NOT TRUE(=($0, 1:INTEGER)))
> Output: AND(IN($0, 1:SMALLINT, 2:SMALLINT, 6:SMALLINT), IN($0, 6:INTEGER, 
> 2:INTEGER), IS NOT TRUE(=($0, 1:INTEGER)))
> 3. HivePointLookupOptimizerRule.RexMergeInClause
> Input: AND(IN($0, 1:SMALLINT, 2:SMALLINT, 6:SMALLINT), IN($0, 6:INTEGER, 
> 2:INTEGER), IS NOT TRUE(=($0, 1:INTEGER)))
> Output: false{noformat}
> In the first part, we can see that the constants are correctly typed as 
> "SMALLINT" in the first part of the "AND" operand, while they are typed as 
> "INTEGER" for the "CASE" expression, despite the input reference "$0" being 
> available for inferring a more precise type.
> This type difference makes "HivePointLookupOptimizerRule.RexMergeInClause" 
> missing the commonality between the two "IN" expressions, whose intersection 
> is considered empty, hence the empty result.
> Providing a more refined type inference for "case" expressions should fix the 
> issue.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (HIVE-25745) Print transactional stats of materialized view source tables

2021-11-29 Thread Krisztian Kasa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa reassigned HIVE-25745:
-


> Print transactional stats of materialized view source tables
> 
>
> Key: HIVE-25745
> URL: https://issues.apache.org/jira/browse/HIVE-25745
> Project: Hive
>  Issue Type: Improvement
>  Components: Materialized views
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>
> Print the number of rows affected by transactions of materialized view source 
> tables since the last rebuild of the view when using the command
> {code:java}
> DESCRIBE FORMATTED ;
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (HIVE-25741) HiveProtoLoggingHook EventLogger should always close old writer

2021-11-29 Thread Marton Bod (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod resolved HIVE-25741.
---
Resolution: Fixed

> HiveProtoLoggingHook EventLogger should always close old writer
> ---
>
> Key: HIVE-25741
> URL: https://issues.apache.org/jira/browse/HIVE-25741
> Project: Hive
>  Issue Type: Bug
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> If {{hive.hook.proto.file.per.event=true}} (recommended for S3A filesystem), 
> the Hive proto {{EventLogger}} will create a new file for each proto event. 
> However, if we already had an appropriate writer (i.e. 
> maybeRolloverWriterForDay() returns false) from some previous operation - we 
> don't close the previous writer instance before creating a new one.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HIVE-25741) HiveProtoLoggingHook EventLogger should always close old writer

2021-11-29 Thread Marton Bod (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17450495#comment-17450495
 ] 

Marton Bod commented on HIVE-25741:
---

Pushed to master. Thanks [~pvary] for reviewing!

> HiveProtoLoggingHook EventLogger should always close old writer
> ---
>
> Key: HIVE-25741
> URL: https://issues.apache.org/jira/browse/HIVE-25741
> Project: Hive
>  Issue Type: Bug
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> If {{hive.hook.proto.file.per.event=true}} (recommended for S3A filesystem), 
> the Hive proto {{EventLogger}} will create a new file for each proto event. 
> However, if we already had an appropriate writer (i.e. 
> maybeRolloverWriterForDay() returns false) from some previous operation - we 
> don't close the previous writer instance before creating a new one.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-24545) jdbc.HiveStatement: Number of rows is greater than Integer.MAX_VALUE

2021-11-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24545?focusedWorklogId=687338=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-687338
 ]

ASF GitHub Bot logged work on HIVE-24545:
-

Author: ASF GitHub Bot
Created on: 29/Nov/21 14:18
Start Date: 29/Nov/21 14:18
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on a change in pull request #1789:
URL: https://github.com/apache/hive/pull/1789#discussion_r758402670



##
File path: jdbc/src/java/org/apache/hive/jdbc/HiveStatement.java
##
@@ -587,6 +587,26 @@ public int getUpdateCount() throws SQLException {
 return (int) numModifiedRows;
   }
 
+  @Override
+  public long getLargeUpdateCount() throws SQLException {
+checkConnection("getLargeUpdateCount");
+/**
+ * Poll on the operation status, till the operation is complete. We want 
to ensure that since a
+ * client might end up using executeAsync and then call this to check if 
the query run is
+ * finished.
+ */
+long numModifiedRows = -1L;
+TGetOperationStatusResp resp = waitForOperationToComplete();
+if (resp != null) {
+  numModifiedRows = resp.getNumModifiedRows();
+}
+if (numModifiedRows == -1L || numModifiedRows > Long.MAX_VALUE) {
+  LOG.warn("Invalid number of updated rows: {}", numModifiedRows);
+  return -1;

Review comment:
   I'm not sure if returning `-1` is the best way to signal this 
problems... especially in the old `getUpdateCount` method

##
File path: jdbc/src/java/org/apache/hive/jdbc/HiveStatement.java
##
@@ -587,6 +587,26 @@ public int getUpdateCount() throws SQLException {
 return (int) numModifiedRows;
   }
 
+  @Override
+  public long getLargeUpdateCount() throws SQLException {
+checkConnection("getLargeUpdateCount");
+/**
+ * Poll on the operation status, till the operation is complete. We want 
to ensure that since a
+ * client might end up using executeAsync and then call this to check if 
the query run is
+ * finished.
+ */
+long numModifiedRows = -1L;
+TGetOperationStatusResp resp = waitForOperationToComplete();
+if (resp != null) {
+  numModifiedRows = resp.getNumModifiedRows();
+}
+if (numModifiedRows == -1L || numModifiedRows > Long.MAX_VALUE) {

Review comment:
   is `-2` valid?
   we could reuse the newly implemented method in the old `getUpdateCount` to 
reduce code duplication




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 687338)
Time Spent: 50m  (was: 40m)

> jdbc.HiveStatement: Number of rows is greater than Integer.MAX_VALUE
> 
>
> Key: HIVE-24545
> URL: https://issues.apache.org/jira/browse/HIVE-24545
> Project: Hive
>  Issue Type: Bug
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> I found this while IOW on TPCDS 10TB:
> {code}
> --
> VERTICES  MODESTATUS  TOTAL  COMPLETED  RUNNING  PENDING  
> FAILED  KILLED
> --
> Map 1 ..  llap SUCCEEDED   4210   421000  
>  0 362
> Reducer 2 ..  llap SUCCEEDED10110100  
>  0   2
> Reducer 3 ..  llap SUCCEEDED   1009   100900  
>  0   1
> --
> VERTICES: 03/03  [==>>] 100%  ELAPSED TIME: 12613.62 s
> --
> 20/12/16 01:37:36 [main]: WARN jdbc.HiveStatement: Number of rows is greater 
> than Integer.MAX_VALUE
> {code}
> my scenario was:
> {code}
> set hive.exec.max.dynamic.partitions=2000;
> drop table if exists test_sales_2;
> create table test_sales_2 like 
> tpcds_bin_partitioned_acid_orc_1.store_sales;
> insert overwrite table test_sales_2 select * from 
> tpcds_bin_partitioned_acid_orc_1.store_sales where ss_sold_date_sk > 
> 2451868;
> {code}
> regarding affected row numbers:
> {code}
> select count(*) from tpcds_bin_partitioned_acid_orc_1.store_sales where 
> ss_sold_date_sk > 2451868;
> 

[jira] [Work logged] (HIVE-25741) HiveProtoLoggingHook EventLogger should always close old writer

2021-11-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25741?focusedWorklogId=687335=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-687335
 ]

ASF GitHub Bot logged work on HIVE-25741:
-

Author: ASF GitHub Bot
Created on: 29/Nov/21 13:59
Start Date: 29/Nov/21 13:59
Worklog Time Spent: 10m 
  Work Description: marton-bod merged pull request #2819:
URL: https://github.com/apache/hive/pull/2819


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 687335)
Time Spent: 20m  (was: 10m)

> HiveProtoLoggingHook EventLogger should always close old writer
> ---
>
> Key: HIVE-25741
> URL: https://issues.apache.org/jira/browse/HIVE-25741
> Project: Hive
>  Issue Type: Bug
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> If {{hive.hook.proto.file.per.event=true}} (recommended for S3A filesystem), 
> the Hive proto {{EventLogger}} will create a new file for each proto event. 
> However, if we already had an appropriate writer (i.e. 
> maybeRolloverWriterForDay() returns false) from some previous operation - we 
> don't close the previous writer instance before creating a new one.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25738) NullIf doesn't support complex types

2021-11-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25738?focusedWorklogId=687331=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-687331
 ]

ASF GitHub Bot logged work on HIVE-25738:
-

Author: ASF GitHub Bot
Created on: 29/Nov/21 13:56
Start Date: 29/Nov/21 13:56
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on a change in pull request #2816:
URL: https://github.com/apache/hive/pull/2816#discussion_r758382898



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFNullif.java
##
@@ -86,17 +87,13 @@ public ObjectInspector initialize(ObjectInspector[] 
arguments) throws UDFArgumen
   public Object evaluate(DeferredObject[] arguments) throws HiveException {
 Object arg0 = arguments[0].get();
 Object arg1 = arguments[1].get();
-Object value0 = null;
-if (arg0 != null) {
-  value0 = returnOIResolver.convertIfNecessary(arg0, argumentOIs[0], 
false);
-}
+Object value0 = returnOIResolver.convertIfNecessary(arg0, argumentOIs[0], 
false);
 if (arg0 == null || arg1 == null) {
   return value0;
 }
-PrimitiveObjectInspector compareOI = (PrimitiveObjectInspector) 
returnOIResolver.get();
-if (PrimitiveObjectInspectorUtils.comparePrimitiveObjects(
-value0, compareOI,
-returnOIResolver.convertIfNecessary(arg1, argumentOIs[1], false), 
compareOI)) {
+Object value1 = returnOIResolver.convertIfNecessary(arg1, argumentOIs[1], 
false);
+ObjectInspector compareOI = returnOIResolver.get();
+if (ObjectInspectorUtils.compare(value0, compareOI, value1, compareOI) == 
0) {

Review comment:
   it should work now; the problem was that `PrimitiveObjectInspectorUtils` 
was used - but I could add some tests




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 687331)
Time Spent: 40m  (was: 0.5h)

> NullIf doesn't support complex types
> 
>
> Key: HIVE-25738
> URL: https://issues.apache.org/jira/browse/HIVE-25738
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> {code}
> SELECT NULLIF(array(1,2,3),array(1,2,3))
> {code}
> results in:
> {code}
>  java.lang.ClassCastException: 
> org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector 
> cannot be cast to 
> org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector
>   at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFNullif.evaluate(GenericUDFNullif.java:96)
>   at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDF.initializeAndFoldConstants(GenericUDF.java:177)
>   at 
> org.apache.hadoop.hive.ql.parse.type.HiveFunctionHelper.getReturnType(HiveFunctionHelper.java:135)
>   at 
> org.apache.hadoop.hive.ql.parse.type.RexNodeExprFactory.createFuncCallExpr(RexNodeExprFactory.java:647)
> [...]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-25743) Hive DELETE Query with join condition on transactional table fails with HiveException: Unexpected column vector type STRUCT

2021-11-29 Thread Sathyendra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sathyendra updated HIVE-25743:
--
Description: 
Hive delete query with join on transactional table fails with HiveException: 
Unexpected column vector type STRUCT.

 

+*Repro Queries:*+

{{CREATE TABLE tab_s(a int) STORED AS ORC TBLPROPERTIES 
('orc.compress'='SNAPPY');}}
{{INSERT INTO tab_s values(1);}}
{{CREATE TABLE tab_t(a int) STORED AS ORC TBLPROPERTIES 
('orc.compress'='SNAPPY','transactional'='true');}}
{{INSERT INTO tab_t select * from tab_s;}}
{{DELETE FROM tab_t  WHERE EXISTS (select tab_s.a from tab_s where 
tab_s.a=tab_t.a);}}

+*Workaround:*+

This issue is seen with {+}*vectorized execution enabled*{+}. If we disable the 
vectorization, the query will run smoothly.

Property value change:

{{{_}hive.vectorized.execution.enabled{_}={*}false{*}}} in {{hive-site.xml}}

+*Log:*+ (Full attached)

{{Caused by: java.lang.RuntimeException: Map operator initialization failed}}
{{    at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:354)}}
{{    at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:266)}}
{{    ... 16 more}}
{{Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
*{color:#de350b}Unexpected column vector type STRUCT{color}*}}
{{    at 
org.apache.hadoop.hive.ql.exec.vector.VectorCopyRow.init(VectorCopyRow.java:302)}}
{{    at 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinCommonOperator.initializeOp(VectorMapJoinCommonOperator.java:419)}}
{{    at 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinGenerateResultOperator.initializeOp(VectorMapJoinGenerateResultOperator.java:115)}}
{{    at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376)}}
{{    at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:573)}}
{{    at 
org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:525)}}
{{    at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:386)}}
{{    at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:335)}}
{{    ... 17 more}}

 

  was:
Hive delete query with join on transactional table fails with HiveException: 
Unexpected column vector type STRUCT.

 

+*Repro Queries:*+

{{CREATE TABLE tab_s(a int) STORED AS ORC TBLPROPERTIES 
('orc.compress'='SNAPPY');}}
{{INSERT INTO tab_s values(1);}}
{{CREATE TABLE tab_t(a int) STORED AS ORC TBLPROPERTIES 
('orc.compress'='SNAPPY','transactional'='true');}}
{{INSERT INTO tab_t select * from tab_s;}}
{{DELETE FROM tab_t  WHERE EXISTS (select tab_s.a from tab_s where 
tab_s.a=tab_t.a);}}

+*Workaround:*+

This issue is seen with {+}*vectorized execution enabled*{+}. If we disable the 
vectorization, the query will run smoothly.

Property value change:

{{{_}hive.vectorized.execution.enabled{_}={*}false{*}}} on cluster 
{{hive-site.xml}}

+*Log:*+ (Full attached)

{{Caused by: java.lang.RuntimeException: Map operator initialization failed}}
{{    at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:354)}}
{{    at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:266)}}
{{    ... 16 more}}
{{Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
*{color:#de350b}Unexpected column vector type STRUCT{color}*}}
{{    at 
org.apache.hadoop.hive.ql.exec.vector.VectorCopyRow.init(VectorCopyRow.java:302)}}
{{    at 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinCommonOperator.initializeOp(VectorMapJoinCommonOperator.java:419)}}
{{    at 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinGenerateResultOperator.initializeOp(VectorMapJoinGenerateResultOperator.java:115)}}
{{    at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376)}}
{{    at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:573)}}
{{    at 
org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:525)}}
{{    at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:386)}}
{{    at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:335)}}
{{    ... 17 more}}

 


> Hive DELETE Query with join condition on transactional table fails with 
> HiveException: Unexpected column vector type STRUCT
> ---
>
> Key: HIVE-25743
> URL: https://issues.apache.org/jira/browse/HIVE-25743
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Query Processor, Vectorization
>Affects Versions: 3.1.2
>Reporter: Sathyendra
>Priority: Critical
> Attachments: hive_error_unexpected_col.log
>
>
> Hive delete query with join on 

[jira] [Work logged] (HIVE-25710) Config used to enable non-blocking TRUNCATE is not properly propagated

2021-11-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25710?focusedWorklogId=687299=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-687299
 ]

ASF GitHub Bot logged work on HIVE-25710:
-

Author: ASF GitHub Bot
Created on: 29/Nov/21 12:47
Start Date: 29/Nov/21 12:47
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2796:
URL: https://github.com/apache/hive/pull/2796#discussion_r758329397



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/ddl/table/misc/truncate/TruncateTableAnalyzer.java
##
@@ -116,10 +116,13 @@ private void checkTruncateEligibility(ASTNode ast, 
ASTNode root, String tableNam
 
   private void addTruncateTableOutputs(ASTNode root, Table table, Map partitionSpec)
   throws SemanticException {
-boolean truncateKeepsDataFiles = AcidUtils.isTransactionalTable(table) &&
-MetastoreConf.getBoolVar(conf, 
MetastoreConf.ConfVars.TRUNCATE_ACID_USE_BASE);
+boolean truncateUseBase = (HiveConf.getBoolVar(conf, 
HiveConf.ConfVars.HIVE_ACID_TRUNCATE_USE_BASE)
+|| HiveConf.getBoolVar(conf, 
HiveConf.ConfVars.HIVE_ACID_LOCKLESS_READS_ENABLED))

Review comment:
   If it was never working, we do not need to deprecate it




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 687299)
Time Spent: 1h 40m  (was: 1.5h)

> Config used to enable non-blocking TRUNCATE is not properly propagated
> --
>
> Key: HIVE-25710
> URL: https://issues.apache.org/jira/browse/HIVE-25710
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25740) Handle race condition between compaction txn abort/commit and heartbeater

2021-11-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25740?focusedWorklogId=687283=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-687283
 ]

ASF GitHub Bot logged work on HIVE-25740:
-

Author: ASF GitHub Bot
Created on: 29/Nov/21 12:13
Start Date: 29/Nov/21 12:13
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2817:
URL: https://github.com/apache/hive/pull/2817#discussion_r758304580



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Worker.java
##
@@ -748,6 +736,7 @@ void wasSuccessful() {
  * @throws Exception
  */
 @Override public void close() throws Exception {
+  shutdownHeartbeater();

Review comment:
   As discussed, I've introduced a flag in the heartbeater thread to turn 
on/off error logging. The new order in the `CompactionTxn.close()` is now:
   - turn off heartbeater error logging
   - commit/abort txn
   - shut down heartbeater 
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 687283)
Time Spent: 50m  (was: 40m)

> Handle race condition between compaction txn abort/commit and heartbeater
> -
>
> Key: HIVE-25740
> URL: https://issues.apache.org/jira/browse/HIVE-25740
> Project: Hive
>  Issue Type: Bug
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> This issue is the following: once the compaction worker finishes, 
> commitTxn/abortTxn is invoked first, and the heartbeater thread is only 
> interrupted after that. This can lead to race conditions where the txn has 
> already been deleted from the backend DB via commit/abort, but the 
> concurrently running heartbeater thread still attempts to send a last 
> heartbeat after that, but the txn id won't be found in the DB, leading to 
> {{{}NoSuchTxnException{}}}.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25740) Handle race condition between compaction txn abort/commit and heartbeater

2021-11-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25740?focusedWorklogId=687282=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-687282
 ]

ASF GitHub Bot logged work on HIVE-25740:
-

Author: ASF GitHub Bot
Created on: 29/Nov/21 12:13
Start Date: 29/Nov/21 12:13
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2817:
URL: https://github.com/apache/hive/pull/2817#discussion_r758304580



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Worker.java
##
@@ -748,6 +736,7 @@ void wasSuccessful() {
  * @throws Exception
  */
 @Override public void close() throws Exception {
+  shutdownHeartbeater();

Review comment:
   As discussed, I've introduced a flag in the heartbeater thread to turn 
on/off error logging. The new order is now:
   - turn off heartbeater error logging
   - commit/abort txn
   - shut down heartbeater 
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 687282)
Time Spent: 40m  (was: 0.5h)

> Handle race condition between compaction txn abort/commit and heartbeater
> -
>
> Key: HIVE-25740
> URL: https://issues.apache.org/jira/browse/HIVE-25740
> Project: Hive
>  Issue Type: Bug
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> This issue is the following: once the compaction worker finishes, 
> commitTxn/abortTxn is invoked first, and the heartbeater thread is only 
> interrupted after that. This can lead to race conditions where the txn has 
> already been deleted from the backend DB via commit/abort, but the 
> concurrently running heartbeater thread still attempts to send a last 
> heartbeat after that, but the txn id won't be found in the DB, leading to 
> {{{}NoSuchTxnException{}}}.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HIVE-25739) Support Alter Partition Properties

2021-11-29 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17450389#comment-17450389
 ] 

Stamatis Zampetakis commented on HIVE-25739:


Thanks for getting back to this [~xiepengjie]. Please close this jira as 
duplicate and let's continue the discussion under HIVE-14261 or HIVE-4207. To 
make the discussion easier to follow please post again your reply in the 
previous JIRA. 

> Support Alter Partition Properties
> --
>
> Key: HIVE-25739
> URL: https://issues.apache.org/jira/browse/HIVE-25739
> Project: Hive
>  Issue Type: New Feature
>Affects Versions: All Versions
>Reporter: xiepengjie
>Assignee: xiepengjie
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.3.8
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Support alter partition properties like:}}{}}}
> {code:java}
> alter table alter1 partition(insertdate='2008-01-01') set tblproperties 
> ('a'='1', 'c'='3');
> alter table alter1 partition(insertdate='2008-01-01') unset tblproperties if 
> exists ('c'='3');{code}
>  
> relates to https://issues.apache.org/jira/browse/HIVE-14261



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25744) Support backward compatibility of thrift struct CreationMetadata

2021-11-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25744?focusedWorklogId=687242=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-687242
 ]

ASF GitHub Bot logged work on HIVE-25744:
-

Author: ASF GitHub Bot
Created on: 29/Nov/21 10:57
Start Date: 29/Nov/21 10:57
Worklog Time Spent: 10m 
  Work Description: kasakrisz opened a new pull request #2821:
URL: https://github.com/apache/hive/pull/2821


   
   
   
   
   ### What changes were proposed in this pull request?
   1. Restore the original type of `CreationMetadata.tablesUsed`
   2. Add new optional field `CreationMetadata.soruceTables`
   3. Wrap the generated `CreationMetadata` object into 
`MaterializedViewMetadata` and extract `CreationMetadata` operations.
   
   ### Why are the changes needed?
   See HIVE-25656 description
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   ### How was this patch tested?
   ```
   mvn test 
-Dtest=TestDbTxnManagerIsolationProperties#testRebuildMVWhenOpenTxnPresents -pl 
ql
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 687242)
Remaining Estimate: 0h
Time Spent: 10m

> Support backward compatibility of thrift struct CreationMetadata
> 
>
> Key: HIVE-25744
> URL: https://issues.apache.org/jira/browse/HIVE-25744
> Project: Hive
>  Issue Type: Task
>  Components: Materialized views, Thrift API
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Old
> {code}
> struct CreationMetadata {
> 1: required string catName
> 2: required string dbName,
> 3: required string tblName,
> 4: required set tablesUsed,
> 5: optional string validTxnList,
> 6: optional i64 materializationTime
> }HIVE-25656 introduced a breaking change in the HiveServer2 <-> Metastore 
> thrift api:
> {code}
> New
> {code}
> struct CreationMetadata {
> 1: required string catName
> 2: required string dbName,
> 3: required string tblName,
> 4: required set tablesUsed,
> 5: optional string validTxnList,
> 6: optional i64 materializationTime
> }
> {code}
> 4th field type changed



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-25744) Support backward compatibility of thrift struct CreationMetadata

2021-11-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25744:
--
Labels: pull-request-available  (was: )

> Support backward compatibility of thrift struct CreationMetadata
> 
>
> Key: HIVE-25744
> URL: https://issues.apache.org/jira/browse/HIVE-25744
> Project: Hive
>  Issue Type: Task
>  Components: Materialized views, Thrift API
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Old
> {code}
> struct CreationMetadata {
> 1: required string catName
> 2: required string dbName,
> 3: required string tblName,
> 4: required set tablesUsed,
> 5: optional string validTxnList,
> 6: optional i64 materializationTime
> }HIVE-25656 introduced a breaking change in the HiveServer2 <-> Metastore 
> thrift api:
> {code}
> New
> {code}
> struct CreationMetadata {
> 1: required string catName
> 2: required string dbName,
> 3: required string tblName,
> 4: required set tablesUsed,
> 5: optional string validTxnList,
> 6: optional i64 materializationTime
> }
> {code}
> 4th field type changed



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HIVE-25739) Support Alter Partition Properties

2021-11-29 Thread xiepengjie (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17450369#comment-17450369
 ] 

xiepengjie commented on HIVE-25739:
---

Yeer,as you said. Maybe we are afraid of user adding large number of KVs. The 
fact is that more and more companies use the hms as a unified metadata 
management system, which means the stored data not only hive's table, but also 
flink's table, or kafka's topic,etc. All of them need special parameters for 
the partition. Now we set the partition's parameters through the following code:

 
{code:java}
HiveConf hiveConf = new HiveConf();    
HiveMetaStoreClient hmsc = new HiveMetaStoreClient(hiveConf);
Partition partition = hmsc.getPartition("default", "test", "2021-11-29");
Map parameters = partition.getParameters();
parameters.put("newKey", "newValue");
hmsc.alter_partition("db", "tableName", partition);{code}
so, i think our restriction is in vain,and support this feature is more useful.

 

> Support Alter Partition Properties
> --
>
> Key: HIVE-25739
> URL: https://issues.apache.org/jira/browse/HIVE-25739
> Project: Hive
>  Issue Type: New Feature
>Affects Versions: All Versions
>Reporter: xiepengjie
>Assignee: xiepengjie
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.3.8
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Support alter partition properties like:}}{}}}
> {code:java}
> alter table alter1 partition(insertdate='2008-01-01') set tblproperties 
> ('a'='1', 'c'='3');
> alter table alter1 partition(insertdate='2008-01-01') unset tblproperties if 
> exists ('c'='3');{code}
>  
> relates to https://issues.apache.org/jira/browse/HIVE-14261



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-25656) Get materialized view state based on number of affected rows of transactions

2021-11-29 Thread Krisztian Kasa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa updated HIVE-25656:
--
Component/s: Thrift API

> Get materialized view state based on number of affected rows of transactions
> 
>
> Key: HIVE-25656
> URL: https://issues.apache.org/jira/browse/HIVE-25656
> Project: Hive
>  Issue Type: Improvement
>  Components: Materialized views, Thrift API, Transactions
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> To enable the faster incremental rebuild of materialized views presence of 
> update/delete operations on the source tables of the view since the last 
> rebuild must be checked. Based on the outcome different plan is generated for 
> scenarios in presence of update/delete and insert only operations.
> Currently this is done by querying the COMPLETED_TXN_COMPONENTS table however 
> the records from this table is cleaned when MV source tables are compacted. 
> This reduces the chances of incremental MV rebuild.
> The goal of this patch is to find an alternative way to store and retrieve 
> this information.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-25743) Hive DELETE Query with join condition on transactional table fails with HiveException: Unexpected column vector type STRUCT

2021-11-29 Thread Sathyendra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sathyendra updated HIVE-25743:
--
Description: 
Hive delete query with join on transactional table fails with HiveException: 
Unexpected column vector type STRUCT.

 

+*Repro Queries:*+

{{CREATE TABLE tab_s(a int) STORED AS ORC TBLPROPERTIES 
('orc.compress'='SNAPPY');}}
{{INSERT INTO tab_s values(1);}}
{{CREATE TABLE tab_t(a int) STORED AS ORC TBLPROPERTIES 
('orc.compress'='SNAPPY','transactional'='true');}}
{{INSERT INTO tab_t select * from tab_s;}}
{{DELETE FROM tab_t  WHERE EXISTS (select tab_s.a from tab_s where 
tab_s.a=tab_t.a);}}

+*Workaround:*+

This issue is seen with {+}*vectorized execution enabled*{+}. If we disable the 
vectorization, the query will run smoothly.

Property value change:

{{{_}hive.vectorized.execution.enabled{_}={*}false{*}}} on cluster 
{{hive-site.xml}}

+*Log:*+ (Full attached)

{{Caused by: java.lang.RuntimeException: Map operator initialization failed}}
{{    at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:354)}}
{{    at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:266)}}
{{    ... 16 more}}
{{Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
*{color:#de350b}Unexpected column vector type STRUCT{color}*}}
{{    at 
org.apache.hadoop.hive.ql.exec.vector.VectorCopyRow.init(VectorCopyRow.java:302)}}
{{    at 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinCommonOperator.initializeOp(VectorMapJoinCommonOperator.java:419)}}
{{    at 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinGenerateResultOperator.initializeOp(VectorMapJoinGenerateResultOperator.java:115)}}
{{    at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376)}}
{{    at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:573)}}
{{    at 
org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:525)}}
{{    at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:386)}}
{{    at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:335)}}
{{    ... 17 more}}

 

  was:
Hive insert query with join on transactional table fails with HiveException: 
Unexpected column vector type STRUCT.

 

+*Repro Queries:*+

{{CREATE TABLE tab_s(a int) STORED AS ORC TBLPROPERTIES 
('orc.compress'='SNAPPY');}}
{{INSERT INTO tab_s values(1);}}
{{CREATE TABLE tab_t(a int) STORED AS ORC TBLPROPERTIES 
('orc.compress'='SNAPPY','transactional'='true');}}
{{INSERT INTO tab_t select * from tab_s;}}
{{DELETE FROM tab_t  WHERE EXISTS (select tab_s.a from tab_s where 
tab_s.a=tab_t.a);}}

+*Workaround:*+

This issue is seen with {+}*vectorized execution enabled*{+}. If we disable the 
vectorization, the query will run smoothly.

Property value change:

{{{_}hive.vectorized.execution.enabled{_}={*}false{*}}} on cluster 
{{hive-site.xml}}

+*Log:*+ (Full attached)

{{Caused by: java.lang.RuntimeException: Map operator initialization failed}}
{{    at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:354)}}
{{    at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:266)}}
{{    ... 16 more}}
{{Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
*{color:#de350b}Unexpected column vector type STRUCT{color}*}}
{{    at 
org.apache.hadoop.hive.ql.exec.vector.VectorCopyRow.init(VectorCopyRow.java:302)}}
{{    at 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinCommonOperator.initializeOp(VectorMapJoinCommonOperator.java:419)}}
{{    at 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinGenerateResultOperator.initializeOp(VectorMapJoinGenerateResultOperator.java:115)}}
{{    at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376)}}
{{    at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:573)}}
{{    at 
org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:525)}}
{{    at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:386)}}
{{    at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:335)}}
{{    ... 17 more}}

 


> Hive DELETE Query with join condition on transactional table fails with 
> HiveException: Unexpected column vector type STRUCT
> ---
>
> Key: HIVE-25743
> URL: https://issues.apache.org/jira/browse/HIVE-25743
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Query Processor, Vectorization
>Affects Versions: 3.1.2
>Reporter: Sathyendra
>Priority: Critical
> Attachments: hive_error_unexpected_col.log
>
>
> Hive delete query with 

[jira] [Assigned] (HIVE-25744) Support backward compatibility of thrift struct CreationMetadata

2021-11-29 Thread Krisztian Kasa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa reassigned HIVE-25744:
-


> Support backward compatibility of thrift struct CreationMetadata
> 
>
> Key: HIVE-25744
> URL: https://issues.apache.org/jira/browse/HIVE-25744
> Project: Hive
>  Issue Type: Task
>  Components: Materialized views, Thrift API
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
> Fix For: 4.0.0
>
>
> Old
> {code}
> struct CreationMetadata {
> 1: required string catName
> 2: required string dbName,
> 3: required string tblName,
> 4: required set tablesUsed,
> 5: optional string validTxnList,
> 6: optional i64 materializationTime
> }HIVE-25656 introduced a breaking change in the HiveServer2 <-> Metastore 
> thrift api:
> {code}
> New
> {code}
> struct CreationMetadata {
> 1: required string catName
> 2: required string dbName,
> 3: required string tblName,
> 4: required set tablesUsed,
> 5: optional string validTxnList,
> 6: optional i64 materializationTime
> }
> {code}
> 4th field type changed



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-25743) Hive DELETE Query with join condition on transactional table fails with HiveException: Unexpected column vector type STRUCT

2021-11-29 Thread Sathyendra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sathyendra updated HIVE-25743:
--
Summary: Hive DELETE Query with join condition on transactional table fails 
with HiveException: Unexpected column vector type STRUCT  (was: Hive INSERT 
Query with join condition on transactional table fails with HiveException: 
Unexpected column vector type STRUCT)

> Hive DELETE Query with join condition on transactional table fails with 
> HiveException: Unexpected column vector type STRUCT
> ---
>
> Key: HIVE-25743
> URL: https://issues.apache.org/jira/browse/HIVE-25743
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Query Processor, Vectorization
>Affects Versions: 3.1.2
>Reporter: Sathyendra
>Priority: Critical
> Attachments: hive_error_unexpected_col.log
>
>
> Hive insert query with join on transactional table fails with HiveException: 
> Unexpected column vector type STRUCT.
>  
> +*Repro Queries:*+
> {{CREATE TABLE tab_s(a int) STORED AS ORC TBLPROPERTIES 
> ('orc.compress'='SNAPPY');}}
> {{INSERT INTO tab_s values(1);}}
> {{CREATE TABLE tab_t(a int) STORED AS ORC TBLPROPERTIES 
> ('orc.compress'='SNAPPY','transactional'='true');}}
> {{INSERT INTO tab_t select * from tab_s;}}
> {{DELETE FROM tab_t  WHERE EXISTS (select tab_s.a from tab_s where 
> tab_s.a=tab_t.a);}}
> 
> +*Workaround:*+
> This issue is seen with {+}*vectorized execution enabled*{+}. If we disable 
> the vectorization, the query will run smoothly.
> Property value change:
> {{{_}hive.vectorized.execution.enabled{_}={*}false{*}}} on cluster 
> {{hive-site.xml}}
> 
> +*Log:*+ (Full attached)
> {{Caused by: java.lang.RuntimeException: Map operator initialization failed}}
> {{    at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:354)}}
> {{    at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:266)}}
> {{    ... 16 more}}
> {{Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> *{color:#de350b}Unexpected column vector type STRUCT{color}*}}
> {{    at 
> org.apache.hadoop.hive.ql.exec.vector.VectorCopyRow.init(VectorCopyRow.java:302)}}
> {{    at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinCommonOperator.initializeOp(VectorMapJoinCommonOperator.java:419)}}
> {{    at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinGenerateResultOperator.initializeOp(VectorMapJoinGenerateResultOperator.java:115)}}
> {{    at 
> org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376)}}
> {{    at 
> org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:573)}}
> {{    at 
> org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:525)}}
> {{    at 
> org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:386)}}
> {{    at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:335)}}
> {{    ... 17 more}}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-25743) Hive INSERT Query with join condition on transactional table fails with HiveException: Unexpected column vector type STRUCT

2021-11-29 Thread Sathyendra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sathyendra updated HIVE-25743:
--
Description: 
Hive insert query with join on transactional table fails with HiveException: 
Unexpected column vector type STRUCT.

 

+*Repro Queries:*+

{{CREATE TABLE tab_s(a int) STORED AS ORC TBLPROPERTIES 
('orc.compress'='SNAPPY');}}
{{INSERT INTO tab_s values(1);}}
{{CREATE TABLE tab_t(a int) STORED AS ORC TBLPROPERTIES 
('orc.compress'='SNAPPY','transactional'='true');}}
{{INSERT INTO tab_t select * from tab_s;}}
{{DELETE FROM tab_t  WHERE EXISTS (select tab_s.a from tab_s where 
tab_s.a=tab_t.a);}}

+*Workaround:*+

This issue is seen with {+}*vectorized execution enabled*{+}. If we disable the 
vectorization, the query will run smoothly.

Property value change:

{{{_}hive.vectorized.execution.enabled{_}={*}false{*}}} on cluster 
{{hive-site.xml}}

+*Log:*+ (Full attached)

{{Caused by: java.lang.RuntimeException: Map operator initialization failed}}
{{    at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:354)}}
{{    at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:266)}}
{{    ... 16 more}}
{{Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
*{color:#de350b}Unexpected column vector type STRUCT{color}*}}
{{    at 
org.apache.hadoop.hive.ql.exec.vector.VectorCopyRow.init(VectorCopyRow.java:302)}}
{{    at 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinCommonOperator.initializeOp(VectorMapJoinCommonOperator.java:419)}}
{{    at 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinGenerateResultOperator.initializeOp(VectorMapJoinGenerateResultOperator.java:115)}}
{{    at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376)}}
{{    at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:573)}}
{{    at 
org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:525)}}
{{    at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:386)}}
{{    at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:335)}}
{{    ... 17 more}}

 

  was:
Hive insert query with join on transactional table fails with HiveException: 
Unexpected column vector type STRUCT.

Repro Queries:

{{CREATE TABLE tab_s(a int) STORED AS ORC TBLPROPERTIES 
('orc.compress'='SNAPPY');}}
{{INSERT INTO tab_s values(1);}}
{{CREATE TABLE tab_t(a int) STORED AS ORC TBLPROPERTIES 
('orc.compress'='SNAPPY','transactional'='true');}}
{{INSERT INTO tab_t select * from tab_s;}}
{{DELETE FROM tab_t  WHERE EXISTS (select tab_s.a from tab_s where 
tab_s.a=tab_t.a);}}

+*Workaround:*+

This issue is seen with {+}*vectorized execution enabled*{+}. If we disable the 
vectorization, the query will run smoothly.

Property value change:

{{{_}hive.vectorized.execution.enabled{_}={*}false{*}}} on cluster 
{{hive-site.xml}}

+*Log:*+ (Full attached)

{{Caused by: java.lang.RuntimeException: Map operator initialization failed}}
{{    at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:354)}}
{{    at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:266)}}
{{    ... 16 more}}
{{Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
*{color:#de350b}Unexpected column vector type STRUCT{color}*}}
{{    at 
org.apache.hadoop.hive.ql.exec.vector.VectorCopyRow.init(VectorCopyRow.java:302)}}
{{    at 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinCommonOperator.initializeOp(VectorMapJoinCommonOperator.java:419)}}
{{    at 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinGenerateResultOperator.initializeOp(VectorMapJoinGenerateResultOperator.java:115)}}
{{    at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376)}}
{{    at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:573)}}
{{    at 
org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:525)}}
{{    at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:386)}}
{{    at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:335)}}
{{    ... 17 more}}

 


> Hive INSERT Query with join condition on transactional table fails with 
> HiveException: Unexpected column vector type STRUCT
> ---
>
> Key: HIVE-25743
> URL: https://issues.apache.org/jira/browse/HIVE-25743
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Query Processor, Vectorization
>Affects Versions: 3.1.2
>Reporter: Sathyendra
>Priority: Critical
> Attachments: hive_error_unexpected_col.log
>
>
> Hive insert query with join on 

[jira] [Updated] (HIVE-25743) Hive INSERT Query with join condition on transactional table fails with HiveException: Unexpected column vector type STRUCT

2021-11-29 Thread Sathyendra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sathyendra updated HIVE-25743:
--
Attachment: hive_error_unexpected_col.log

> Hive INSERT Query with join condition on transactional table fails with 
> HiveException: Unexpected column vector type STRUCT
> ---
>
> Key: HIVE-25743
> URL: https://issues.apache.org/jira/browse/HIVE-25743
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Query Processor, Vectorization
>Affects Versions: 3.1.2
>Reporter: Sathyendra
>Priority: Critical
> Attachments: hive_error_unexpected_col.log
>
>
> Hive insert query with join on transactional table fails with HiveException: 
> Unexpected column vector type STRUCT.
> Repro Queries:
> {{CREATE TABLE tab_s(a int) STORED AS ORC TBLPROPERTIES 
> ('orc.compress'='SNAPPY');}}
> {{INSERT INTO tab_s values(1);}}
> {{CREATE TABLE tab_t(a int) STORED AS ORC TBLPROPERTIES 
> ('orc.compress'='SNAPPY','transactional'='true');}}
> {{INSERT INTO tab_t select * from tab_s;}}
> {{DELETE FROM tab_t  WHERE EXISTS (select tab_s.a from tab_s where 
> tab_s.a=tab_t.a);}}
> 
> +*Workaround:*+
> This issue is seen with {+}*vectorized execution enabled*{+}. If we disable 
> the vectorization, the query will run smoothly.
> Property value change:
> {{{_}hive.vectorized.execution.enabled{_}={*}false{*}}} on cluster 
> {{hive-site.xml}}
> 
> +*Log:*+ (Full attached)
> {{Caused by: java.lang.RuntimeException: Map operator initialization failed}}
> {{    at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:354)}}
> {{    at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:266)}}
> {{    ... 16 more}}
> {{Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> *{color:#de350b}Unexpected column vector type STRUCT{color}*}}
> {{    at 
> org.apache.hadoop.hive.ql.exec.vector.VectorCopyRow.init(VectorCopyRow.java:302)}}
> {{    at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinCommonOperator.initializeOp(VectorMapJoinCommonOperator.java:419)}}
> {{    at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinGenerateResultOperator.initializeOp(VectorMapJoinGenerateResultOperator.java:115)}}
> {{    at 
> org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376)}}
> {{    at 
> org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:573)}}
> {{    at 
> org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:525)}}
> {{    at 
> org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:386)}}
> {{    at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:335)}}
> {{    ... 17 more}}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-25743) Hive INSERT Query with join condition on transactional table fails with HiveException: Unexpected column vector type STRUCT

2021-11-29 Thread Sathyendra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sathyendra updated HIVE-25743:
--
Summary: Hive INSERT Query with join condition on transactional table fails 
with HiveException: Unexpected column vector type STRUCT  (was: Hive INSERT 
Query to transactional table fails with HiveException: Unexpected column vector 
type STRUCT)

> Hive INSERT Query with join condition on transactional table fails with 
> HiveException: Unexpected column vector type STRUCT
> ---
>
> Key: HIVE-25743
> URL: https://issues.apache.org/jira/browse/HIVE-25743
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Query Processor, Vectorization
>Affects Versions: 3.1.2
>Reporter: Sathyendra
>Priority: Critical
>
> Hive insert query with join on transactional table fails with HiveException: 
> Unexpected column vector type STRUCT.
> Repro Queries:
> {{CREATE TABLE tab_s(a int) STORED AS ORC TBLPROPERTIES 
> ('orc.compress'='SNAPPY');}}
> {{INSERT INTO tab_s values(1);}}
> {{CREATE TABLE tab_t(a int) STORED AS ORC TBLPROPERTIES 
> ('orc.compress'='SNAPPY','transactional'='true');}}
> {{INSERT INTO tab_t select * from tab_s;}}
> {{DELETE FROM tab_t  WHERE EXISTS (select tab_s.a from tab_s where 
> tab_s.a=tab_t.a);}}
> 
> +*Workaround:*+
> This issue is seen with {+}*vectorized execution enabled*{+}. If we disable 
> the vectorization, the query will run smoothly.
> Property value change:
> {{{_}hive.vectorized.execution.enabled{_}={*}false{*}}} on cluster 
> {{hive-site.xml}}
> 
> +*Log:*+ (Full attached)
> {{Caused by: java.lang.RuntimeException: Map operator initialization failed}}
> {{    at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:354)}}
> {{    at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:266)}}
> {{    ... 16 more}}
> {{Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> *{color:#de350b}Unexpected column vector type STRUCT{color}*}}
> {{    at 
> org.apache.hadoop.hive.ql.exec.vector.VectorCopyRow.init(VectorCopyRow.java:302)}}
> {{    at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinCommonOperator.initializeOp(VectorMapJoinCommonOperator.java:419)}}
> {{    at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinGenerateResultOperator.initializeOp(VectorMapJoinGenerateResultOperator.java:115)}}
> {{    at 
> org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376)}}
> {{    at 
> org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:573)}}
> {{    at 
> org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:525)}}
> {{    at 
> org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:386)}}
> {{    at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:335)}}
> {{    ... 17 more}}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HIVE-25739) Support Alter Partition Properties

2021-11-29 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17450302#comment-17450302
 ] 

Stamatis Zampetakis commented on HIVE-25739:


[~xiepengjie] this is not simply related to HIVE-14261 but it really looks like 
a duplicate of HIVE-14261 and HIVE-4207. I think this ticket should be closed 
and the PR should be associated with the previous ones. Moreover, there have 
been some questions raised around this feature in the previous JIRAs so these 
should be answered first before merging this.

> Support Alter Partition Properties
> --
>
> Key: HIVE-25739
> URL: https://issues.apache.org/jira/browse/HIVE-25739
> Project: Hive
>  Issue Type: New Feature
>Affects Versions: All Versions
>Reporter: xiepengjie
>Assignee: xiepengjie
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.3.8
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Support alter partition properties like:}}{}}}
> {code:java}
> alter table alter1 partition(insertdate='2008-01-01') set tblproperties 
> ('a'='1', 'c'='3');
> alter table alter1 partition(insertdate='2008-01-01') unset tblproperties if 
> exists ('c'='3');{code}
>  
> relates to https://issues.apache.org/jira/browse/HIVE-14261



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25738) NullIf doesn't support complex types

2021-11-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25738?focusedWorklogId=687200=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-687200
 ]

ASF GitHub Bot logged work on HIVE-25738:
-

Author: ASF GitHub Bot
Created on: 29/Nov/21 09:29
Start Date: 29/Nov/21 09:29
Worklog Time Spent: 10m 
  Work Description: zabetak commented on a change in pull request #2816:
URL: https://github.com/apache/hive/pull/2816#discussion_r758174023



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFNullif.java
##
@@ -86,17 +87,13 @@ public ObjectInspector initialize(ObjectInspector[] 
arguments) throws UDFArgumen
   public Object evaluate(DeferredObject[] arguments) throws HiveException {
 Object arg0 = arguments[0].get();
 Object arg1 = arguments[1].get();
-Object value0 = null;
-if (arg0 != null) {
-  value0 = returnOIResolver.convertIfNecessary(arg0, argumentOIs[0], 
false);
-}
+Object value0 = returnOIResolver.convertIfNecessary(arg0, argumentOIs[0], 
false);
 if (arg0 == null || arg1 == null) {
   return value0;
 }
-PrimitiveObjectInspector compareOI = (PrimitiveObjectInspector) 
returnOIResolver.get();
-if (PrimitiveObjectInspectorUtils.comparePrimitiveObjects(
-value0, compareOI,
-returnOIResolver.convertIfNecessary(arg1, argumentOIs[1], false), 
compareOI)) {
+Object value1 = returnOIResolver.convertIfNecessary(arg1, argumentOIs[1], 
false);
+ObjectInspector compareOI = returnOIResolver.get();
+if (ObjectInspectorUtils.compare(value0, compareOI, value1, compareOI) == 
0) {

Review comment:
   Since now we are expanding the support to non-primitives can/should we 
support all other categories: MAP, STRUCT, UNION? 
   I think we should have positive tests for all supported types and negative 
tests (exception) for non-supported ones (if any).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 687200)
Time Spent: 0.5h  (was: 20m)

> NullIf doesn't support complex types
> 
>
> Key: HIVE-25738
> URL: https://issues.apache.org/jira/browse/HIVE-25738
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> {code}
> SELECT NULLIF(array(1,2,3),array(1,2,3))
> {code}
> results in:
> {code}
>  java.lang.ClassCastException: 
> org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector 
> cannot be cast to 
> org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector
>   at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFNullif.evaluate(GenericUDFNullif.java:96)
>   at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDF.initializeAndFoldConstants(GenericUDF.java:177)
>   at 
> org.apache.hadoop.hive.ql.parse.type.HiveFunctionHelper.getReturnType(HiveFunctionHelper.java:135)
>   at 
> org.apache.hadoop.hive.ql.parse.type.RexNodeExprFactory.createFuncCallExpr(RexNodeExprFactory.java:647)
> [...]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25734) Wrongly-typed constant in case expression leads to incorrect empty result

2021-11-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25734?focusedWorklogId=687174=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-687174
 ]

ASF GitHub Bot logged work on HIVE-25734:
-

Author: ASF GitHub Bot
Created on: 29/Nov/21 08:28
Start Date: 29/Nov/21 08:28
Worklog Time Spent: 10m 
  Work Description: asolimando commented on pull request #2815:
URL: https://github.com/apache/hive/pull/2815#issuecomment-981395228


   Tests passed in the last run, can I squash the commits and rephrase the 
commit message?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 687174)
Time Spent: 40m  (was: 0.5h)

> Wrongly-typed constant in case expression leads to incorrect empty result
> -
>
> Key: HIVE-25734
> URL: https://issues.apache.org/jira/browse/HIVE-25734
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 4.0.0
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
>  
> The type of constants in case expressions should be inferred, if possible, by 
> the "surrounding" input reference columns, if any.
> Consider the following table and query: 
> {code:java}
> create external table test_case (row_seq smallint, row_desc string) stored as 
> parquet;
> insert into test_case values (1, 'a');
> insert into test_case values (2, 'aa');
> insert into test_case values (6, 'aa');
> with base_t as (select row_seq, row_desc,
>   case row_seq
> when 1 then '34'
> when 6 then '35'
> when 2 then '36'
>   end as zb from test_case where row_seq in (1,2,6))
> select row_seq, row_desc, zb from base_t where zb <> '34';{code}
> The aforementioned query fails by returning an empty results, while "1 a 34" 
> is expected.
>  
> To understand the root cause, let's consider the debug input and output of 
> some related CBO rules which are triggered during the evaluation of the 
> query: 
>  
> {noformat}
> --$0 is the column 'row_seq'
> 1. HiveReduceExpressionsRule
> Input: AND(IN($0, 1:SMALLINT, 2:SMALLINT, 6:SMALLINT), <>(CASE(=($0, 
> 1:INTEGER), '34':VARCHAR, =($0, 6:INTEGER), '35':VARCHAR, =($0, 2:INTEGER), 
> '36':VARCHAR, null:VARCHAR), '34':CHAR(2)))
> Output: AND(IN($0, 1:SMALLINT, 2:SMALLINT, 6:SMALLINT), OR(=($0, 6:INTEGER), 
> =($0, 2:INTEGER)), IS NOT TRUE(=($0, 1:INTEGER)))
> 2. HivePointLookupOptimizerRule.RexTransformIntoInClause
> Input: AND(IN($0, 1:SMALLINT, 2:SMALLINT, 6:SMALLINT), OR(=($0, 6:INTEGER), 
> =($0, 2:INTEGER)), IS NOT TRUE(=($0, 1:INTEGER)))
> Output: AND(IN($0, 1:SMALLINT, 2:SMALLINT, 6:SMALLINT), IN($0, 6:INTEGER, 
> 2:INTEGER), IS NOT TRUE(=($0, 1:INTEGER)))
> 3. HivePointLookupOptimizerRule.RexMergeInClause
> Input: AND(IN($0, 1:SMALLINT, 2:SMALLINT, 6:SMALLINT), IN($0, 6:INTEGER, 
> 2:INTEGER), IS NOT TRUE(=($0, 1:INTEGER)))
> Output: false{noformat}
> In the first part, we can see that the constants are correctly typed as 
> "SMALLINT" in the first part of the "AND" operand, while they are typed as 
> "INTEGER" for the "CASE" expression, despite the input reference "$0" being 
> available for inferring a more precise type.
> This type difference makes "HivePointLookupOptimizerRule.RexMergeInClause" 
> missing the commonality between the two "IN" expressions, whose intersection 
> is considered empty, hence the empty result.
> Providing a more refined type inference for "case" expressions should fix the 
> issue.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (HIVE-25736) Close ORC readers

2021-11-29 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary resolved HIVE-25736.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master.
Thanks for the review [~pgaref], [~dengzh] and [~Marton Bod]!

> Close ORC readers
> -
>
> Key: HIVE-25736
> URL: https://issues.apache.org/jira/browse/HIVE-25736
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> After ORC-498 the Orc readers should be closed explicitly. One of the cases 
> was HIVE-25683, but there are several places where the ORC readers are still 
> not closed. 
> We should go through the code and make sure that the readers are closed.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25736) Close ORC readers

2021-11-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25736?focusedWorklogId=687169=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-687169
 ]

ASF GitHub Bot logged work on HIVE-25736:
-

Author: ASF GitHub Bot
Created on: 29/Nov/21 08:15
Start Date: 29/Nov/21 08:15
Worklog Time Spent: 10m 
  Work Description: pvary merged pull request #2813:
URL: https://github.com/apache/hive/pull/2813


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 687169)
Time Spent: 1h 40m  (was: 1.5h)

> Close ORC readers
> -
>
> Key: HIVE-25736
> URL: https://issues.apache.org/jira/browse/HIVE-25736
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> After ORC-498 the Orc readers should be closed explicitly. One of the cases 
> was HIVE-25683, but there are several places where the ORC readers are still 
> not closed. 
> We should go through the code and make sure that the readers are closed.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25736) Close ORC readers

2021-11-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25736?focusedWorklogId=687166=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-687166
 ]

ASF GitHub Bot logged work on HIVE-25736:
-

Author: ASF GitHub Bot
Created on: 29/Nov/21 08:13
Start Date: 29/Nov/21 08:13
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2813:
URL: https://github.com/apache/hive/pull/2813#discussion_r758118501



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/OrcFileMergeOperator.java
##
@@ -154,6 +154,9 @@ private void processKeyValuePairs(Object key, Object value)
 
   // next file in the path
   if (!k.getInputPath().equals(prevPath)) {
+if (reader != null) {

Review comment:
   I think the line 111 is for the first time in the loop - we do not have 
a previous path/reader at that time




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 687166)
Time Spent: 1.5h  (was: 1h 20m)

> Close ORC readers
> -
>
> Key: HIVE-25736
> URL: https://issues.apache.org/jira/browse/HIVE-25736
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> After ORC-498 the Orc readers should be closed explicitly. One of the cases 
> was HIVE-25683, but there are several places where the ORC readers are still 
> not closed. 
> We should go through the code and make sure that the readers are closed.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)