[jira] [Comment Edited] (HIVE-25093) date_format() UDF is returning values in UTC time zone only

2021-05-20 Thread Matt McCline (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17349023#comment-17349023
 ] 

Matt McCline edited comment on HIVE-25093 at 5/21/21, 6:56 AM:
---

I think Ashish's basic observation is correct that the original HIVE-12192 says 
compute internally with UTC and let the Hive session setting for time zone 
instruct us on how to display the date / time (unless overridden by the z / Z 
format character). And, his observation that the dateformat function wasn't 
change needed to this new model as part of the HIVE-12192 changes is correct. 

 


was (Author: mattmccline):
I think Ashish's basic observation is correct that the original HIVE-12192 says 
compute internally with UTC and let the Hive session setting for time zone 
instruct us on how to display the date / time (unless overridden by the z / Z 
format character). And, his observation that the dateformat function wasn't 
change needed to this new model as part of the HIVE-12192 changes. 

 

> date_format() UDF is returning values in UTC time zone only 
> 
>
> Key: HIVE-25093
> URL: https://issues.apache.org/jira/browse/HIVE-25093
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 3.1.2
>Reporter: Ashish Sharma
>Assignee: Ashish Sharma
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> *HIVE - 1.2*
> sshuser@hn0-dateti:~$ *timedatectl*
>   Local time: Thu 2021-05-06 11:56:08 IST
>   Universal time: Thu 2021-05-06 06:26:08 UTC
> RTC time: Thu 2021-05-06 06:26:08
>Time zone: Asia/Kolkata (IST, +0530)
>  Network time on: yes
> NTP synchronized: yes
>  RTC in local TZ: no
> sshuser@hn0-dateti:~$ beeline
> 0: jdbc:hive2://localhost:10001/default> *select 
> date_format(current_timestamp,"-MM-dd HH:mm:ss.SSS z");*
> +--+--+
> | _c0  |
> +--+--+
> | 2021-05-06 11:58:53.760 IST  |
> +--+--+
> 1 row selected (1.271 seconds)
> *HIVE - 3.1.0*
> sshuser@hn0-testja:~$ *timedatectl*
>   Local time: Thu 2021-05-06 12:03:32 IST
>   Universal time: Thu 2021-05-06 06:33:32 UTC
> RTC time: Thu 2021-05-06 06:33:32
>Time zone: Asia/Kolkata (IST, +0530)
>  Network time on: yes
> NTP synchronized: yes
>  RTC in local TZ: no
> sshuser@hn0-testja:~$ beeline
> 0: jdbc:hive2://zk0-testja.e0mrrixnyxde5h1suy> *select 
> date_format(current_timestamp,"-MM-dd HH:mm:ss.SSS z");*
> +--+
> | _c0  |
> +--+
> | *2021-05-06 06:33:59.078 UTC*  |
> +--+
> 1 row selected (13.396 seconds)
> 0: jdbc:hive2://zk0-testja.e0mrrixnyxde5h1suy> *set 
> hive.local.time.zone=Asia/Kolkata;*
> No rows affected (0.025 seconds)
> 0: jdbc:hive2://zk0-testja.e0mrrixnyxde5h1suy> *select 
> date_format(current_timestamp,"-MM-dd HH:mm:ss.SSS z");*
> +--+
> | _c0  |
> +--+
> | *{color:red}2021-05-06 12:08:15.118 UTC{color}*  | 
> +--+
> 1 row selected (1.074 seconds)
> expected result was *2021-05-06 12:08:15.118 IST*
> As part of HIVE-12192 it was decided to have a common time zone for all 
> computation i.e. "UTC". Due to which data_format() function was hard coded to 
> "UTC".
> But later in HIVE-21039 it was decided that user session time zone value 
> should be the default not UTC. 
> date_format() was not fixed as part of HIVE-21039.
> what should be the ideal time zone value of date_format().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-25093) date_format() UDF is returning values in UTC time zone only

2021-05-20 Thread Matt McCline (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17349023#comment-17349023
 ] 

Matt McCline commented on HIVE-25093:
-

I think Ashish's basic observation is correct that the original HIVE-12192 says 
compute internally with UTC and let the Hive session setting for time zone 
instruct us on how to display the date / time (unless overridden by the z / Z 
format character). And, his observation that the dateformat function wasn't 
change needed to this new model as part of the HIVE-12192 changes. 

 

> date_format() UDF is returning values in UTC time zone only 
> 
>
> Key: HIVE-25093
> URL: https://issues.apache.org/jira/browse/HIVE-25093
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 3.1.2
>Reporter: Ashish Sharma
>Assignee: Ashish Sharma
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> *HIVE - 1.2*
> sshuser@hn0-dateti:~$ *timedatectl*
>   Local time: Thu 2021-05-06 11:56:08 IST
>   Universal time: Thu 2021-05-06 06:26:08 UTC
> RTC time: Thu 2021-05-06 06:26:08
>Time zone: Asia/Kolkata (IST, +0530)
>  Network time on: yes
> NTP synchronized: yes
>  RTC in local TZ: no
> sshuser@hn0-dateti:~$ beeline
> 0: jdbc:hive2://localhost:10001/default> *select 
> date_format(current_timestamp,"-MM-dd HH:mm:ss.SSS z");*
> +--+--+
> | _c0  |
> +--+--+
> | 2021-05-06 11:58:53.760 IST  |
> +--+--+
> 1 row selected (1.271 seconds)
> *HIVE - 3.1.0*
> sshuser@hn0-testja:~$ *timedatectl*
>   Local time: Thu 2021-05-06 12:03:32 IST
>   Universal time: Thu 2021-05-06 06:33:32 UTC
> RTC time: Thu 2021-05-06 06:33:32
>Time zone: Asia/Kolkata (IST, +0530)
>  Network time on: yes
> NTP synchronized: yes
>  RTC in local TZ: no
> sshuser@hn0-testja:~$ beeline
> 0: jdbc:hive2://zk0-testja.e0mrrixnyxde5h1suy> *select 
> date_format(current_timestamp,"-MM-dd HH:mm:ss.SSS z");*
> +--+
> | _c0  |
> +--+
> | *2021-05-06 06:33:59.078 UTC*  |
> +--+
> 1 row selected (13.396 seconds)
> 0: jdbc:hive2://zk0-testja.e0mrrixnyxde5h1suy> *set 
> hive.local.time.zone=Asia/Kolkata;*
> No rows affected (0.025 seconds)
> 0: jdbc:hive2://zk0-testja.e0mrrixnyxde5h1suy> *select 
> date_format(current_timestamp,"-MM-dd HH:mm:ss.SSS z");*
> +--+
> | _c0  |
> +--+
> | *{color:red}2021-05-06 12:08:15.118 UTC{color}*  | 
> +--+
> 1 row selected (1.074 seconds)
> expected result was *2021-05-06 12:08:15.118 IST*
> As part of HIVE-12192 it was decided to have a common time zone for all 
> computation i.e. "UTC". Due to which data_format() function was hard coded to 
> "UTC".
> But later in HIVE-21039 it was decided that user session time zone value 
> should be the default not UTC. 
> date_format() was not fixed as part of HIVE-21039.
> what should be the ideal time zone value of date_format().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24920) TRANSLATED_TO_EXTERNAL tables may write to the same location

2021-05-20 Thread Thejas Nair (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17348993#comment-17348993
 ] 

Thejas Nair commented on HIVE-24920:


{quote}create table t(i integer);
{quote}
I agree that this should behave like the old managed table behavior 
(irrespective of config). If old managed tables would have thrown error if the 
dir exists, it should do so now as well.

> TRANSLATED_TO_EXTERNAL tables may write to the same location
> 
>
> Key: HIVE-24920
> URL: https://issues.apache.org/jira/browse/HIVE-24920
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> {code}
> create table t (a integer);
> insert into t values(1);
> alter table t rename to t2;
> create table t (a integer); -- I expected an exception from this command 
> (location already exists) but because its an external table no exception
> insert into t values(2);
> select * from t;  -- shows 1 and 2
> drop table t2;-- wipes out data location
> select * from t;  -- empty resultset
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25144) Add NoReconnect Annotation to Create AlreadyExistsException Methods

2021-05-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25144?focusedWorklogId=600172&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-600172
 ]

ASF GitHub Bot logged work on HIVE-25144:
-

Author: ASF GitHub Bot
Created on: 21/May/21 04:53
Start Date: 21/May/21 04:53
Worklog Time Spent: 10m 
  Work Description: nrg4878 commented on pull request #2303:
URL: https://github.com/apache/hive/pull/2303#issuecomment-845651545


   @belugabehr How does adding this annotation create AlreadyExistsException? I 
havent looked into this in a very long time but its my recollection that this 
annotation was used by either the RetryingHMSHandler or RetryingMetastoreClient 
to determine whether or not to retry this operation. I would have guess it was 
the RetryingHMSHandler but given this change, I suspect it is the latter.
   So how exactly does this result in a AlreadyExistsException? What if the 
failure was due to a DB_LOCK timeout (Where another transaction has the lock on 
the table). 
   Also, do the implementing classes inherit annotations from the interface? I 
thought it was limited to just the javadoc annotations.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 600172)
Time Spent: 0.5h  (was: 20m)

> Add NoReconnect Annotation to Create AlreadyExistsException Methods
> ---
>
> Key: HIVE-25144
> URL: https://issues.apache.org/jira/browse/HIVE-25144
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> I have recently seen an issue where a Hive {{CREATE TABLE}} method fails with 
> {{AlreadyExistsException}} even though the table does absolutely not exist.
>  
> I believe the issue is there there is a timeout/transient error with HMS and 
> the backend database.  So, the client submits the request to HMS, and the 
> request does eventually succeed, but only after the connection to the client 
> connects.  Therefore, when the HMS Client "retry" functionality kicks it, the 
> second time around, the table looks like it already exists.
>  
> If something goes wrong during a HMS CREATE operation, we do not know the 
> state of the operation and therefore it should just fail.
>  
> It would certainly be more transparent to the end-user what is going on.  An 
> {{AlreadyExistsException}}  is confusing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25091) Implement connector provider for MSSQL and Oracle

2021-05-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25091?focusedWorklogId=600170&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-600170
 ]

ASF GitHub Bot logged work on HIVE-25091:
-

Author: ASF GitHub Bot
Created on: 21/May/21 04:22
Start Date: 21/May/21 04:22
Worklog Time Spent: 10m 
  Work Description: nrg4878 commented on a change in pull request #2248:
URL: https://github.com/apache/hive/pull/2248#discussion_r636624386



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/dataconnector/jdbc/AbstractJDBCConnectorProvider.java
##
@@ -172,10 +172,12 @@ protected Connection getConnection() {
 ResultSet rs = null;
 Table table = null;
 try {
-  // rs = fetchTableMetadata(tableName);
-  rs = fetchTableViaDBMetaData(tableName);
+  rs = fetchTableMetadata(tableName);

Review comment:
   I am not sure I understand this change. Dont we want to use 
fetchTableViaDBmetadata? we comment out the implementation in the abstract 
class and moved it into subclasses. can you please expand on your changes.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 600170)
Time Spent: 20m  (was: 10m)

> Implement connector provider for MSSQL and Oracle
> -
>
> Key: HIVE-25091
> URL: https://issues.apache.org/jira/browse/HIVE-25091
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Provide an implementation of Connector provider for MSSQL and Oracle



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24499) Throw error when respective connector JDBC jar is not present in the lib/ path.

2021-05-20 Thread Naveen Gangam (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam resolved HIVE-24499.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Fix has been committed to master. Thank you for the patch [~hemanth619].

> Throw error when respective connector JDBC jar is not present in the lib/ 
> path.
> ---
>
> Key: HIVE-24499
> URL: https://issues.apache.org/jira/browse/HIVE-24499
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24663) Batch process in ColStatsProcessor for partitions.

2021-05-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24663?focusedWorklogId=600168&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-600168
 ]

ASF GitHub Bot logged work on HIVE-24663:
-

Author: ASF GitHub Bot
Created on: 21/May/21 03:55
Start Date: 21/May/21 03:55
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on a change in pull request #2266:
URL: https://github.com/apache/hive/pull/2266#discussion_r636617565



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
##
@@ -5390,6 +5406,493 @@ public void countOpenTxns() throws MetaException {
 }
   }
 
+  private void cleanOldStatsFromPartColStatTable(Map 
statsPartInfoMap,
+ Map 
newStatsMap,
+ Connection dbConn) throws 
SQLException {
+PreparedStatement statementDelete = null;
+int numRows = 0;
+int maxNumRows = MetastoreConf.getIntVar(conf, 
ConfVars.DIRECT_SQL_MAX_ELEMENTS_VALUES_CLAUSE);
+String delete = "DELETE FROM \"PART_COL_STATS\" where \"PART_ID\" = ? AND 
\"COLUMN_NAME\" = ?";
+
+try {
+  statementDelete = dbConn.prepareStatement(delete);
+  for (Map.Entry entry : newStatsMap.entrySet()) {
+// If the partition does not exist (deleted/removed by some other 
task), no need to update the stats.
+if (!statsPartInfoMap.containsKey(entry.getKey())) {
+  continue;
+}
+
+ColumnStatistics colStats = (ColumnStatistics) entry.getValue();
+for (ColumnStatisticsObj statisticsObj : colStats.getStatsObj()) {
+  statementDelete.setLong(1, 
statsPartInfoMap.get(entry.getKey()).partitionId);
+  statementDelete.setString(2, statisticsObj.getColName());
+  numRows++;
+  statementDelete.addBatch();
+  if (numRows == maxNumRows) {
+statementDelete.executeBatch();
+numRows = 0;
+LOG.info("Executed delete " + delete + " for numRows " + numRows);
+  }
+}
+  }
+
+  if (numRows != 0) {
+statementDelete.executeBatch();
+  }
+} finally {
+  closeStmt(statementDelete);
+}
+  }
+
+  private long getMaxCSId(Connection dbConn) throws SQLException {
+Statement stmtInt = null;
+ResultSet rsInt = null;
+long maxCsId = 0;
+try {
+  stmtInt = dbConn.createStatement();
+  while (maxCsId == 0) {
+String query = "SELECT \"NEXT_VAL\" FROM \"SEQUENCE_TABLE\" WHERE 
\"SEQUENCE_NAME\"= "
++ 
quoteString("org.apache.hadoop.hive.metastore.model.MPartitionColumnStatistics")
++ " FOR UPDATE";
+rsInt = stmtInt.executeQuery(query);
+LOG.debug("Going to execute query " + query);
+if (rsInt.next()) {
+  maxCsId = rsInt.getLong(1);
+} else {
+  query = "INSERT INTO \"SEQUENCE_TABLE\" (\"SEQUENCE_NAME\", 
\"NEXT_VAL\")  VALUES ( "
+  + 
quoteString("org.apache.hadoop.hive.metastore.model.MPartitionColumnStatistics")
 + "," + 1
+  + ")";
+  stmtInt.executeUpdate(query);
+}
+  }
+  return maxCsId;
+} finally {
+  close(rsInt, stmtInt, null);
+}
+  }
+
+  private void updateMaxCSId(Connection dbConn, long maxCSId) throws 
SQLException {
+Statement stmtInt = null;
+try {
+  stmtInt = dbConn.createStatement();
+  String query = "UPDATE \"SEQUENCE_TABLE\" SET \"NEXT_VAL\" = "
+  + maxCSId
+  + " WHERE \"SEQUENCE_NAME\" = "
+  + 
quoteString("org.apache.hadoop.hive.metastore.model.MPartitionColumnStatistics");
+  stmtInt.executeUpdate(query);
+  LOG.debug("Going to execute update " + query);
+} finally {
+  closeStmt(stmtInt);
+}
+  }
+
+  private void insertIntoPartColStatTable(Map 
statsPartInfoMap,
+  Map 
newStatsMap,
+  Connection dbConn) throws 
SQLException, MetaException, NoSuchObjectException {
+PreparedStatement statement = null;
+long maxCsId = getMaxCSId(dbConn);
+
+try {
+  int numRows = 0;
+  int maxNumRows = MetastoreConf.getIntVar(conf, 
ConfVars.DIRECT_SQL_MAX_ELEMENTS_VALUES_CLAUSE);
+
+  String insert = "INSERT INTO \"PART_COL_STATS\" (\"CS_ID\", 
\"CAT_NAME\", \"DB_NAME\","
+  + "\"TABLE_NAME\", \"PARTITION_NAME\", \"COLUMN_NAME\", 
\"COLUMN_TYPE\", \"PART_ID\","
+  + " \"LONG_LOW_VALUE\", \"LONG_HIGH_VALUE\", 
\"DOUBLE_HIGH_VALUE\", \"DOUBLE_LOW_VALUE\","
+  + " \"BIG_DECIMAL_LOW_VALUE\", \"BIG_DECIMAL_HIGH_VALUE\", 
\"NUM_NULLS\", \"NUM_DISTINCTS\", \"BIT_VECTOR\" ,"
+  + " \"AVG_COL_LEN\", \"MAX_COL_LEN\", \"NUM_TRUES\", 
\"NUM_FALSES\", \"LAST_ANALYZED\", \"ENGINE\") values 

[jira] [Work logged] (HIVE-24663) Batch process in ColStatsProcessor for partitions.

2021-05-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24663?focusedWorklogId=600167&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-600167
 ]

ASF GitHub Bot logged work on HIVE-24663:
-

Author: ASF GitHub Bot
Created on: 21/May/21 03:53
Start Date: 21/May/21 03:53
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on a change in pull request #2266:
URL: https://github.com/apache/hive/pull/2266#discussion_r636616905



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
##
@@ -5390,6 +5406,493 @@ public void countOpenTxns() throws MetaException {
 }
   }
 
+  private void cleanOldStatsFromPartColStatTable(Map 
statsPartInfoMap,
+ Map 
newStatsMap,
+ Connection dbConn) throws 
SQLException {
+PreparedStatement statementDelete = null;
+int numRows = 0;
+int maxNumRows = MetastoreConf.getIntVar(conf, 
ConfVars.DIRECT_SQL_MAX_ELEMENTS_VALUES_CLAUSE);
+String delete = "DELETE FROM \"PART_COL_STATS\" where \"PART_ID\" = ? AND 
\"COLUMN_NAME\" = ?";
+
+try {
+  statementDelete = dbConn.prepareStatement(delete);
+  for (Map.Entry entry : newStatsMap.entrySet()) {
+// If the partition does not exist (deleted/removed by some other 
task), no need to update the stats.
+if (!statsPartInfoMap.containsKey(entry.getKey())) {
+  continue;
+}
+
+ColumnStatistics colStats = (ColumnStatistics) entry.getValue();
+for (ColumnStatisticsObj statisticsObj : colStats.getStatsObj()) {
+  statementDelete.setLong(1, 
statsPartInfoMap.get(entry.getKey()).partitionId);
+  statementDelete.setString(2, statisticsObj.getColName());
+  numRows++;
+  statementDelete.addBatch();
+  if (numRows == maxNumRows) {
+statementDelete.executeBatch();
+numRows = 0;
+LOG.info("Executed delete " + delete + " for numRows " + numRows);
+  }
+}
+  }
+
+  if (numRows != 0) {
+statementDelete.executeBatch();
+  }
+} finally {
+  closeStmt(statementDelete);
+}
+  }
+
+  private long getMaxCSId(Connection dbConn) throws SQLException {
+Statement stmtInt = null;
+ResultSet rsInt = null;
+long maxCsId = 0;
+try {
+  stmtInt = dbConn.createStatement();
+  while (maxCsId == 0) {
+String query = "SELECT \"NEXT_VAL\" FROM \"SEQUENCE_TABLE\" WHERE 
\"SEQUENCE_NAME\"= "
++ 
quoteString("org.apache.hadoop.hive.metastore.model.MPartitionColumnStatistics")
++ " FOR UPDATE";
+rsInt = stmtInt.executeQuery(query);
+LOG.debug("Going to execute query " + query);
+if (rsInt.next()) {
+  maxCsId = rsInt.getLong(1);
+} else {
+  query = "INSERT INTO \"SEQUENCE_TABLE\" (\"SEQUENCE_NAME\", 
\"NEXT_VAL\")  VALUES ( "
+  + 
quoteString("org.apache.hadoop.hive.metastore.model.MPartitionColumnStatistics")
 + "," + 1
+  + ")";
+  stmtInt.executeUpdate(query);
+}
+  }
+  return maxCsId;
+} finally {
+  close(rsInt, stmtInt, null);
+}
+  }
+
+  private void updateMaxCSId(Connection dbConn, long maxCSId) throws 
SQLException {
+Statement stmtInt = null;
+try {
+  stmtInt = dbConn.createStatement();
+  String query = "UPDATE \"SEQUENCE_TABLE\" SET \"NEXT_VAL\" = "
+  + maxCSId
+  + " WHERE \"SEQUENCE_NAME\" = "
+  + 
quoteString("org.apache.hadoop.hive.metastore.model.MPartitionColumnStatistics");
+  stmtInt.executeUpdate(query);
+  LOG.debug("Going to execute update " + query);
+} finally {
+  closeStmt(stmtInt);
+}
+  }
+
+  private void insertIntoPartColStatTable(Map 
statsPartInfoMap,
+  Map 
newStatsMap,
+  Connection dbConn) throws 
SQLException, MetaException, NoSuchObjectException {
+PreparedStatement statement = null;
+long maxCsId = getMaxCSId(dbConn);
+
+try {
+  int numRows = 0;
+  int maxNumRows = MetastoreConf.getIntVar(conf, 
ConfVars.DIRECT_SQL_MAX_ELEMENTS_VALUES_CLAUSE);
+
+  String insert = "INSERT INTO \"PART_COL_STATS\" (\"CS_ID\", 
\"CAT_NAME\", \"DB_NAME\","
+  + "\"TABLE_NAME\", \"PARTITION_NAME\", \"COLUMN_NAME\", 
\"COLUMN_TYPE\", \"PART_ID\","
+  + " \"LONG_LOW_VALUE\", \"LONG_HIGH_VALUE\", 
\"DOUBLE_HIGH_VALUE\", \"DOUBLE_LOW_VALUE\","
+  + " \"BIG_DECIMAL_LOW_VALUE\", \"BIG_DECIMAL_HIGH_VALUE\", 
\"NUM_NULLS\", \"NUM_DISTINCTS\", \"BIT_VECTOR\" ,"
+  + " \"AVG_COL_LEN\", \"MAX_COL_LEN\", \"NUM_TRUES\", 
\"NUM_FALSES\", \"LAST_ANALYZED\", \"ENGINE\") values 

[jira] [Work logged] (HIVE-24663) Batch process in ColStatsProcessor for partitions.

2021-05-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24663?focusedWorklogId=600163&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-600163
 ]

ASF GitHub Bot logged work on HIVE-24663:
-

Author: ASF GitHub Bot
Created on: 21/May/21 03:40
Start Date: 21/May/21 03:40
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on a change in pull request #2266:
URL: https://github.com/apache/hive/pull/2266#discussion_r636613773



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
##
@@ -5390,6 +5406,493 @@ public void countOpenTxns() throws MetaException {
 }
   }
 
+  private void cleanOldStatsFromPartColStatTable(Map 
statsPartInfoMap,
+ Map 
newStatsMap,
+ Connection dbConn) throws 
SQLException {
+PreparedStatement statementDelete = null;
+int numRows = 0;
+int maxNumRows = MetastoreConf.getIntVar(conf, 
ConfVars.DIRECT_SQL_MAX_ELEMENTS_VALUES_CLAUSE);
+String delete = "DELETE FROM \"PART_COL_STATS\" where \"PART_ID\" = ? AND 
\"COLUMN_NAME\" = ?";
+
+try {
+  statementDelete = dbConn.prepareStatement(delete);
+  for (Map.Entry entry : newStatsMap.entrySet()) {
+// If the partition does not exist (deleted/removed by some other 
task), no need to update the stats.
+if (!statsPartInfoMap.containsKey(entry.getKey())) {
+  continue;
+}
+
+ColumnStatistics colStats = (ColumnStatistics) entry.getValue();
+for (ColumnStatisticsObj statisticsObj : colStats.getStatsObj()) {
+  statementDelete.setLong(1, 
statsPartInfoMap.get(entry.getKey()).partitionId);
+  statementDelete.setString(2, statisticsObj.getColName());
+  numRows++;
+  statementDelete.addBatch();
+  if (numRows == maxNumRows) {
+statementDelete.executeBatch();
+numRows = 0;
+LOG.info("Executed delete " + delete + " for numRows " + numRows);
+  }
+}
+  }
+
+  if (numRows != 0) {
+statementDelete.executeBatch();
+  }
+} finally {
+  closeStmt(statementDelete);
+}
+  }
+
+  private long getMaxCSId(Connection dbConn) throws SQLException {
+Statement stmtInt = null;
+ResultSet rsInt = null;
+long maxCsId = 0;
+try {
+  stmtInt = dbConn.createStatement();
+  while (maxCsId == 0) {
+String query = "SELECT \"NEXT_VAL\" FROM \"SEQUENCE_TABLE\" WHERE 
\"SEQUENCE_NAME\"= "
++ 
quoteString("org.apache.hadoop.hive.metastore.model.MPartitionColumnStatistics")
++ " FOR UPDATE";
+rsInt = stmtInt.executeQuery(query);
+LOG.debug("Going to execute query " + query);
+if (rsInt.next()) {
+  maxCsId = rsInt.getLong(1);
+} else {
+  query = "INSERT INTO \"SEQUENCE_TABLE\" (\"SEQUENCE_NAME\", 
\"NEXT_VAL\")  VALUES ( "
+  + 
quoteString("org.apache.hadoop.hive.metastore.model.MPartitionColumnStatistics")
 + "," + 1
+  + ")";
+  stmtInt.executeUpdate(query);
+}
+  }
+  return maxCsId;
+} finally {
+  close(rsInt, stmtInt, null);
+}
+  }
+
+  private void updateMaxCSId(Connection dbConn, long maxCSId) throws 
SQLException {
+Statement stmtInt = null;
+try {
+  stmtInt = dbConn.createStatement();
+  String query = "UPDATE \"SEQUENCE_TABLE\" SET \"NEXT_VAL\" = "
+  + maxCSId
+  + " WHERE \"SEQUENCE_NAME\" = "
+  + 
quoteString("org.apache.hadoop.hive.metastore.model.MPartitionColumnStatistics");
+  stmtInt.executeUpdate(query);
+  LOG.debug("Going to execute update " + query);
+} finally {
+  closeStmt(stmtInt);
+}
+  }
+
+  private void insertIntoPartColStatTable(Map 
statsPartInfoMap,
+  Map 
newStatsMap,
+  Connection dbConn) throws 
SQLException, MetaException, NoSuchObjectException {
+PreparedStatement statement = null;
+long maxCsId = getMaxCSId(dbConn);
+
+try {
+  int numRows = 0;
+  int maxNumRows = MetastoreConf.getIntVar(conf, 
ConfVars.DIRECT_SQL_MAX_ELEMENTS_VALUES_CLAUSE);
+
+  String insert = "INSERT INTO \"PART_COL_STATS\" (\"CS_ID\", 
\"CAT_NAME\", \"DB_NAME\","
+  + "\"TABLE_NAME\", \"PARTITION_NAME\", \"COLUMN_NAME\", 
\"COLUMN_TYPE\", \"PART_ID\","
+  + " \"LONG_LOW_VALUE\", \"LONG_HIGH_VALUE\", 
\"DOUBLE_HIGH_VALUE\", \"DOUBLE_LOW_VALUE\","
+  + " \"BIG_DECIMAL_LOW_VALUE\", \"BIG_DECIMAL_HIGH_VALUE\", 
\"NUM_NULLS\", \"NUM_DISTINCTS\", \"BIT_VECTOR\" ,"
+  + " \"AVG_COL_LEN\", \"MAX_COL_LEN\", \"NUM_TRUES\", 
\"NUM_FALSES\", \"LAST_ANALYZED\", \"ENGINE\") values 

[jira] [Work logged] (HIVE-24663) Batch process in ColStatsProcessor for partitions.

2021-05-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24663?focusedWorklogId=600162&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-600162
 ]

ASF GitHub Bot logged work on HIVE-24663:
-

Author: ASF GitHub Bot
Created on: 21/May/21 03:40
Start Date: 21/May/21 03:40
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on a change in pull request #2266:
URL: https://github.com/apache/hive/pull/2266#discussion_r636613658



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
##
@@ -5390,6 +5406,493 @@ public void countOpenTxns() throws MetaException {
 }
   }
 
+  private void cleanOldStatsFromPartColStatTable(Map 
statsPartInfoMap,
+ Map 
newStatsMap,
+ Connection dbConn) throws 
SQLException {
+PreparedStatement statementDelete = null;
+int numRows = 0;
+int maxNumRows = MetastoreConf.getIntVar(conf, 
ConfVars.DIRECT_SQL_MAX_ELEMENTS_VALUES_CLAUSE);
+String delete = "DELETE FROM \"PART_COL_STATS\" where \"PART_ID\" = ? AND 
\"COLUMN_NAME\" = ?";
+
+try {
+  statementDelete = dbConn.prepareStatement(delete);
+  for (Map.Entry entry : newStatsMap.entrySet()) {
+// If the partition does not exist (deleted/removed by some other 
task), no need to update the stats.
+if (!statsPartInfoMap.containsKey(entry.getKey())) {
+  continue;
+}
+
+ColumnStatistics colStats = (ColumnStatistics) entry.getValue();
+for (ColumnStatisticsObj statisticsObj : colStats.getStatsObj()) {
+  statementDelete.setLong(1, 
statsPartInfoMap.get(entry.getKey()).partitionId);
+  statementDelete.setString(2, statisticsObj.getColName());
+  numRows++;
+  statementDelete.addBatch();
+  if (numRows == maxNumRows) {
+statementDelete.executeBatch();
+numRows = 0;
+LOG.info("Executed delete " + delete + " for numRows " + numRows);
+  }
+}
+  }
+
+  if (numRows != 0) {
+statementDelete.executeBatch();
+  }
+} finally {
+  closeStmt(statementDelete);
+}
+  }
+
+  private long getMaxCSId(Connection dbConn) throws SQLException {
+Statement stmtInt = null;
+ResultSet rsInt = null;
+long maxCsId = 0;
+try {
+  stmtInt = dbConn.createStatement();
+  while (maxCsId == 0) {
+String query = "SELECT \"NEXT_VAL\" FROM \"SEQUENCE_TABLE\" WHERE 
\"SEQUENCE_NAME\"= "
++ 
quoteString("org.apache.hadoop.hive.metastore.model.MPartitionColumnStatistics")
++ " FOR UPDATE";
+rsInt = stmtInt.executeQuery(query);
+LOG.debug("Going to execute query " + query);
+if (rsInt.next()) {
+  maxCsId = rsInt.getLong(1);
+} else {
+  query = "INSERT INTO \"SEQUENCE_TABLE\" (\"SEQUENCE_NAME\", 
\"NEXT_VAL\")  VALUES ( "
+  + 
quoteString("org.apache.hadoop.hive.metastore.model.MPartitionColumnStatistics")
 + "," + 1
+  + ")";
+  stmtInt.executeUpdate(query);
+}
+  }
+  return maxCsId;
+} finally {
+  close(rsInt, stmtInt, null);
+}
+  }
+
+  private void updateMaxCSId(Connection dbConn, long maxCSId) throws 
SQLException {
+Statement stmtInt = null;
+try {
+  stmtInt = dbConn.createStatement();
+  String query = "UPDATE \"SEQUENCE_TABLE\" SET \"NEXT_VAL\" = "
+  + maxCSId
+  + " WHERE \"SEQUENCE_NAME\" = "
+  + 
quoteString("org.apache.hadoop.hive.metastore.model.MPartitionColumnStatistics");
+  stmtInt.executeUpdate(query);
+  LOG.debug("Going to execute update " + query);
+} finally {
+  closeStmt(stmtInt);
+}
+  }
+
+  private void insertIntoPartColStatTable(Map 
statsPartInfoMap,
+  Map 
newStatsMap,
+  Connection dbConn) throws 
SQLException, MetaException, NoSuchObjectException {
+PreparedStatement statement = null;
+long maxCsId = getMaxCSId(dbConn);
+
+try {
+  int numRows = 0;
+  int maxNumRows = MetastoreConf.getIntVar(conf, 
ConfVars.DIRECT_SQL_MAX_ELEMENTS_VALUES_CLAUSE);
+
+  String insert = "INSERT INTO \"PART_COL_STATS\" (\"CS_ID\", 
\"CAT_NAME\", \"DB_NAME\","
+  + "\"TABLE_NAME\", \"PARTITION_NAME\", \"COLUMN_NAME\", 
\"COLUMN_TYPE\", \"PART_ID\","
+  + " \"LONG_LOW_VALUE\", \"LONG_HIGH_VALUE\", 
\"DOUBLE_HIGH_VALUE\", \"DOUBLE_LOW_VALUE\","
+  + " \"BIG_DECIMAL_LOW_VALUE\", \"BIG_DECIMAL_HIGH_VALUE\", 
\"NUM_NULLS\", \"NUM_DISTINCTS\", \"BIT_VECTOR\" ,"
+  + " \"AVG_COL_LEN\", \"MAX_COL_LEN\", \"NUM_TRUES\", 
\"NUM_FALSES\", \"LAST_ANALYZED\", \"ENGINE\") values 

[jira] [Work logged] (HIVE-24663) Batch process in ColStatsProcessor for partitions.

2021-05-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24663?focusedWorklogId=600161&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-600161
 ]

ASF GitHub Bot logged work on HIVE-24663:
-

Author: ASF GitHub Bot
Created on: 21/May/21 03:39
Start Date: 21/May/21 03:39
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on a change in pull request #2266:
URL: https://github.com/apache/hive/pull/2266#discussion_r636613441



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java
##
@@ -8994,10 +9028,15 @@ public boolean 
set_aggr_stats_for(SetPartitionsStatsRequest request) throws TExc
 colNames, newStatsMap, request);
   } else { // No merge.
 Table t = getTable(catName, dbName, tableName);
-for (Map.Entry entry : 
newStatsMap.entrySet()) {
-  // We don't short-circuit on errors here anymore. That can leave 
acid stats invalid.
-  ret = updatePartitonColStatsInternal(t, entry.getValue(),
-  request.getValidWriteIdList(), request.getWriteId()) && ret;
+// We don't short-circuit on errors here anymore. That can leave acid 
stats invalid.
+if (newStatsMap.size() > 1) {
+  LOG.info("ETL_PERF started updatePartitionColStatsInBatch");
+  ret = updatePartitionColStatsInBatch(t, newStatsMap,
+  request.getValidWriteIdList(), request.getWriteId());
+  LOG.info("ETL_PERF done updatePartitionColStatsInBatch");
+} else {
+  ret = updatePartitonColStatsInternal(t, 
newStatsMap.values().iterator().next(),

Review comment:
   The batching is a specific condition, when the partitions are from same 
table. I am not sure if in other flows this holds good. So i have used the 
batching only in the flow where this is guaranteed. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 600161)
Time Spent: 50m  (was: 40m)

> Batch process in ColStatsProcessor for partitions.
> --
>
> Key: HIVE-24663
> URL: https://issues.apache.org/jira/browse/HIVE-24663
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: performance, pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> When large number of partitions (>20K) are processed, ColStatsProcessor runs 
> into DB issues. 
> {{ db.setPartitionColumnStatistics(request);}} gets stuck for hours together 
> and in some cases postgres stops processing. 
> It would be good to introduce small batches for stats gathering in 
> ColStatsProcessor instead of bulk update.
> Ref: 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/stats/ColStatsProcessor.java#L181
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/stats/ColStatsProcessor.java#L199



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25134) NPE in TestHiveCli.java

2021-05-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25134?focusedWorklogId=600145&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-600145
 ]

ASF GitHub Bot logged work on HIVE-25134:
-

Author: ASF GitHub Bot
Created on: 21/May/21 02:29
Start Date: 21/May/21 02:29
Worklog Time Spent: 10m 
  Work Description: dgzdot closed pull request #2289:
URL: https://github.com/apache/hive/pull/2289


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 600145)
Time Spent: 20m  (was: 10m)

> NPE in TestHiveCli.java
> ---
>
> Key: HIVE-25134
> URL: https://issues.apache.org/jira/browse/HIVE-25134
> Project: Hive
>  Issue Type: Test
>  Components: Beeline, Test
>Reporter: gaozhan ding
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> {code:java}
> @Before
> public void setup() throws IOException, URISyntaxException {
>   System.setProperty("datanucleus.schema.autoCreateAll", "true");
>   cli = new HiveCli();
>   initFromFile();
>   redirectOutputStream();
> }
> {code}
> In *setup()*, *initFromFile()* may access *err* before initialization
>  
>  
> {code:java}
> [ERROR] org.apache.hive.beeline.cli.TestHiveCli.testSetPromptValue  Time 
> elapsed: 1.167 s  <<< ERROR!
> java.lang.NullPointerException
> at 
> org.apache.hive.beeline.cli.TestHiveCli.executeCMD(TestHiveCli.java:249)
> at 
> org.apache.hive.beeline.cli.TestHiveCli.initFromFile(TestHiveCli.java:315)
> at org.apache.hive.beeline.cli.TestHiveCli.setup(TestHiveCli.java:288)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
> at 
> org.junit.internal.runners.statements.RunBefores.invokeMethod(RunBefores.java:33)
> at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
> at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
> at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
> at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24451) Add schema changes for MSSQL

2021-05-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24451?focusedWorklogId=600067&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-600067
 ]

ASF GitHub Bot logged work on HIVE-24451:
-

Author: ASF GitHub Bot
Created on: 20/May/21 21:40
Start Date: 20/May/21 21:40
Worklog Time Spent: 10m 
  Work Description: saihemanth-cloudera closed pull request #2245:
URL: https://github.com/apache/hive/pull/2245


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 600067)
Remaining Estimate: 0h
Time Spent: 10m

> Add schema changes for MSSQL
> 
>
> Key: HIVE-24451
> URL: https://issues.apache.org/jira/browse/HIVE-24451
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Naveen Gangam
>Assignee: Sai Hemanth Gantasala
>Priority: Major
> Fix For: 4.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The current patch does not include schema changes for MSSQL backend. This 
> should be right after the initial commit.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24451) Add schema changes for MSSQL

2021-05-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24451:
--
Labels: pull-request-available  (was: )

> Add schema changes for MSSQL
> 
>
> Key: HIVE-24451
> URL: https://issues.apache.org/jira/browse/HIVE-24451
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Naveen Gangam
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The current patch does not include schema changes for MSSQL backend. This 
> should be right after the initial commit.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25140) Hive Distributed Tracing -- Part 1: Disabled

2021-05-20 Thread Matt McCline (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-25140:

Status: Patch Available  (was: Open)

Work-In-Progress (WIP) first patch to do a Hive QA run. Not ready for code 
review yet.

> Hive Distributed Tracing -- Part 1: Disabled
> 
>
> Key: HIVE-25140
> URL: https://issues.apache.org/jira/browse/HIVE-25140
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Major
> Attachments: HIVE-25140.01.patch
>
>
> Infrastructure except exporters to Jaeger or OpenTelementry (OTL) due to 
> Thrift and protobuf version conflicts. A logging only exporter is used.
> There are Spans for BeeLine and Hive. Server 2. The code was developed on 
> branch-3.1 and porting Spans to the Hive MetaStore on master is taking more 
> time due to major metastore code refactoring.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25140) Hive Distributed Tracing -- Part 1: Disabled

2021-05-20 Thread Matt McCline (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-25140:

Attachment: HIVE-25140.01.patch

> Hive Distributed Tracing -- Part 1: Disabled
> 
>
> Key: HIVE-25140
> URL: https://issues.apache.org/jira/browse/HIVE-25140
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Major
> Attachments: HIVE-25140.01.patch
>
>
> Infrastructure except exporters to Jaeger or OpenTelementry (OTL) due to 
> Thrift and protobuf version conflicts. A logging only exporter is used.
> There are Spans for BeeLine and Hive. Server 2. The code was developed on 
> branch-3.1 and porting Spans to the Hive MetaStore on master is taking more 
> time due to major metastore code refactoring.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25139) Filter out null table properties in HiveIcebergMetaHook

2021-05-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25139?focusedWorklogId=599953&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599953
 ]

ASF GitHub Bot logged work on HIVE-25139:
-

Author: ASF GitHub Bot
Created on: 20/May/21 18:29
Start Date: 20/May/21 18:29
Worklog Time Spent: 10m 
  Work Description: lcspinter commented on a change in pull request #2298:
URL: https://github.com/apache/hive/pull/2298#discussion_r636354615



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##
@@ -297,6 +297,9 @@ private void 
updateHmsTableProperties(org.apache.hadoop.hive.metastore.api.Table
   "Table location not set");
 }
 
+// Remove null values from hms table properties
+hmsTable.getParameters().entrySet().removeIf(e -> e.getValue() == null);

Review comment:
   Right, changed it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 599953)
Time Spent: 40m  (was: 0.5h)

> Filter out null table properties in HiveIcebergMetaHook
> ---
>
> Key: HIVE-25139
> URL: https://issues.apache.org/jira/browse/HIVE-25139
> Project: Hive
>  Issue Type: Bug
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25144) Add NoReconnect Annotation to Create AlreadyExistsException Methods

2021-05-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25144?focusedWorklogId=599949&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599949
 ]

ASF GitHub Bot logged work on HIVE-25144:
-

Author: ASF GitHub Bot
Created on: 20/May/21 18:19
Start Date: 20/May/21 18:19
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on pull request #2303:
URL: https://github.com/apache/hive/pull/2303#issuecomment-845356962


   @nrg4878 Mind taking a look?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 599949)
Time Spent: 20m  (was: 10m)

> Add NoReconnect Annotation to Create AlreadyExistsException Methods
> ---
>
> Key: HIVE-25144
> URL: https://issues.apache.org/jira/browse/HIVE-25144
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> I have recently seen an issue where a Hive {{CREATE TABLE}} method fails with 
> {{AlreadyExistsException}} even though the table does absolutely not exist.
>  
> I believe the issue is there there is a timeout/transient error with HMS and 
> the backend database.  So, the client submits the request to HMS, and the 
> request does eventually succeed, but only after the connection to the client 
> connects.  Therefore, when the HMS Client "retry" functionality kicks it, the 
> second time around, the table looks like it already exists.
>  
> If something goes wrong during a HMS CREATE operation, we do not know the 
> state of the operation and therefore it should just fail.
>  
> It would certainly be more transparent to the end-user what is going on.  An 
> {{AlreadyExistsException}}  is confusing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25144) Add NoReconnect Annotation to Create AlreadyExistsException Methods

2021-05-20 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HIVE-25144:
--
Description: 
I have recently seen an issue where a Hive {{CREATE TABLE}} method fails with 
{{AlreadyExistsException}} even though the table does absolutely not exist.

 

I believe the issue is there there is a timeout/transient error with HMS and 
the backend database.  So, the client submits the request to HMS, and the 
request does eventually succeed, but only after the connection to the client 
connects.  Therefore, when the HMS Client "retry" functionality kicks it, the 
second time around, the table looks like it already exists.

 

If something goes wrong during a HMS CREATE operation, we do not know the state 
of the operation and therefore it should just fail.

 

It would certainly be more transparent to the end-user what is going on.  An 
{{AlreadyExistsException}}  is confusing.

  was:
I have recently seen an issue where a Hive {{CREATE TABLE}} method fails with 
{{AlreadyExistsException}} even though the table does absolutely not exist.

 

I believe the issue is there there is a timeout/transient error with HMS and 
the backend database.  So, the client submits the request to HMS, and the 
request does eventually succeed, but only after the connection to the client 
connects.  Therefore, when the HMS Client "retry" functionality kicks it, the 
second time around, the table looks like it already exists.

 

If something goes wrong during a HMS CREATE operation, we do not know the state 
of the operation and therefore it should just fail.


> Add NoReconnect Annotation to Create AlreadyExistsException Methods
> ---
>
> Key: HIVE-25144
> URL: https://issues.apache.org/jira/browse/HIVE-25144
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I have recently seen an issue where a Hive {{CREATE TABLE}} method fails with 
> {{AlreadyExistsException}} even though the table does absolutely not exist.
>  
> I believe the issue is there there is a timeout/transient error with HMS and 
> the backend database.  So, the client submits the request to HMS, and the 
> request does eventually succeed, but only after the connection to the client 
> connects.  Therefore, when the HMS Client "retry" functionality kicks it, the 
> second time around, the table looks like it already exists.
>  
> If something goes wrong during a HMS CREATE operation, we do not know the 
> state of the operation and therefore it should just fail.
>  
> It would certainly be more transparent to the end-user what is going on.  An 
> {{AlreadyExistsException}}  is confusing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25144) Add NoReconnect Annotation to Create AlreadyExistsException Methods

2021-05-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25144:
--
Labels: pull-request-available  (was: )

> Add NoReconnect Annotation to Create AlreadyExistsException Methods
> ---
>
> Key: HIVE-25144
> URL: https://issues.apache.org/jira/browse/HIVE-25144
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I have recently seen an issue where a Hive {{CREATE TABLE}} method fails with 
> {{AlreadyExistsException}} even though the table does absolutely not exist.
>  
> I believe the issue is there there is a timeout/transient error with HMS and 
> the backend database.  So, the client submits the request to HMS, and the 
> request does eventually succeed, but only after the connection to the client 
> connects.  Therefore, when the HMS Client "retry" functionality kicks it, the 
> second time around, the table looks like it already exists.
>  
> If something goes wrong during a HMS CREATE operation, we do not know the 
> state of the operation and therefore it should just fail.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25144) Add NoReconnect Annotation to Create AlreadyExistsException Methods

2021-05-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25144?focusedWorklogId=599948&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599948
 ]

ASF GitHub Bot logged work on HIVE-25144:
-

Author: ASF GitHub Bot
Created on: 20/May/21 18:17
Start Date: 20/May/21 18:17
Worklog Time Spent: 10m 
  Work Description: belugabehr opened a new pull request #2303:
URL: https://github.com/apache/hive/pull/2303


   …on Methods
   
   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 599948)
Remaining Estimate: 0h
Time Spent: 10m

> Add NoReconnect Annotation to Create AlreadyExistsException Methods
> ---
>
> Key: HIVE-25144
> URL: https://issues.apache.org/jira/browse/HIVE-25144
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I have recently seen an issue where a Hive {{CREATE TABLE}} method fails with 
> {{AlreadyExistsException}} even though the table does absolutely not exist.
>  
> I believe the issue is there there is a timeout/transient error with HMS and 
> the backend database.  So, the client submits the request to HMS, and the 
> request does eventually succeed, but only after the connection to the client 
> connects.  Therefore, when the HMS Client "retry" functionality kicks it, the 
> second time around, the table looks like it already exists.
>  
> If something goes wrong during a HMS CREATE operation, we do not know the 
> state of the operation and therefore it should just fail.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25144) Add NoReconnect Annotation to Create AlreadyExistsException Methods

2021-05-20 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor reassigned HIVE-25144:
-


> Add NoReconnect Annotation to Create AlreadyExistsException Methods
> ---
>
> Key: HIVE-25144
> URL: https://issues.apache.org/jira/browse/HIVE-25144
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>
> I have recently seen an issue where a Hive {{CREATE TABLE}} method fails with 
> {{AlreadyExistsException}} even though the table does absolutely not exist.
>  
> I believe the issue is there there is a timeout/transient error with HMS and 
> the backend database.  So, the client submits the request to HMS, and the 
> request does eventually succeed, but only after the connection to the client 
> connects.  Therefore, when the HMS Client "retry" functionality kicks it, the 
> second time around, the table looks like it already exists.
>  
> If something goes wrong during a HMS CREATE operation, we do not know the 
> state of the operation and therefore it should just fail.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25090) Join condition parsing error in subquery

2021-05-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25090?focusedWorklogId=599924&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599924
 ]

ASF GitHub Bot logged work on HIVE-25090:
-

Author: ASF GitHub Bot
Created on: 20/May/21 17:13
Start Date: 20/May/21 17:13
Worklog Time Spent: 10m 
  Work Description: soumyakanti3578 commented on a change in pull request 
#2302:
URL: https://github.com/apache/hive/pull/2302#discussion_r636302618



##
File path: ql/src/test/results/clientpositive/llap/subquery_corr_join.q.out
##
@@ -0,0 +1,212 @@
+PREHOOK: query: create table alltypestiny(
+id int,
+int_col int,
+bigint_col bigint,
+bool_col boolean
+)
+PREHOOK: type: CREATETABLE
+PREHOOK: Output: database:default
+PREHOOK: Output: default@alltypestiny
+POSTHOOK: query: create table alltypestiny(
+id int,
+int_col int,
+bigint_col bigint,
+bool_col boolean
+)
+POSTHOOK: type: CREATETABLE
+POSTHOOK: Output: database:default
+POSTHOOK: Output: default@alltypestiny
+PREHOOK: query: insert into alltypestiny(id, int_col, bigint_col, bool_col) 
values
+(1, 1, 10, true),
+(2, 4, 5, false),
+(3, 5, 15, true),
+(10, 10, 30, false)
+PREHOOK: type: QUERY
+PREHOOK: Input: _dummy_database@_dummy_table
+PREHOOK: Output: default@alltypestiny
+POSTHOOK: query: insert into alltypestiny(id, int_col, bigint_col, bool_col) 
values
+(1, 1, 10, true),
+(2, 4, 5, false),
+(3, 5, 15, true),
+(10, 10, 30, false)
+POSTHOOK: type: QUERY
+POSTHOOK: Input: _dummy_database@_dummy_table
+POSTHOOK: Output: default@alltypestiny
+POSTHOOK: Lineage: alltypestiny.bigint_col SCRIPT []
+POSTHOOK: Lineage: alltypestiny.bool_col SCRIPT []
+POSTHOOK: Lineage: alltypestiny.id SCRIPT []
+POSTHOOK: Lineage: alltypestiny.int_col SCRIPT []
+PREHOOK: query: create table alltypesagg(
+id int,
+int_col int,
+bool_col boolean
+)
+PREHOOK: type: CREATETABLE
+PREHOOK: Output: database:default
+PREHOOK: Output: default@alltypesagg
+POSTHOOK: query: create table alltypesagg(
+id int,
+int_col int,
+bool_col boolean
+)
+POSTHOOK: type: CREATETABLE
+POSTHOOK: Output: database:default
+POSTHOOK: Output: default@alltypesagg
+PREHOOK: query: insert into alltypesagg(id, int_col, bool_col) values
+(1, 1, true),
+(2, 4, false),
+(5, 6, true),
+(null, null, false)
+PREHOOK: type: QUERY
+PREHOOK: Input: _dummy_database@_dummy_table
+PREHOOK: Output: default@alltypesagg
+POSTHOOK: query: insert into alltypesagg(id, int_col, bool_col) values
+(1, 1, true),
+(2, 4, false),
+(5, 6, true),
+(null, null, false)
+POSTHOOK: type: QUERY
+POSTHOOK: Input: _dummy_database@_dummy_table
+POSTHOOK: Output: default@alltypesagg
+POSTHOOK: Lineage: alltypesagg.bool_col SCRIPT []
+POSTHOOK: Lineage: alltypesagg.id SCRIPT []
+POSTHOOK: Lineage: alltypesagg.int_col SCRIPT []
+Warning: Shuffle Join MERGEJOIN[64][tables = [$hdt$_0, $hdt$_1, $hdt$_2]] in 
Stage 'Reducer 3' is a cross product
+PREHOOK: query: explain cbo select *
+from alltypesagg t1
+where t1.id not in
+(select tt1.id
+ from alltypestiny tt1 left JOIN alltypesagg tt2
+ on tt1.int_col = tt2.int_col)
+PREHOOK: type: QUERY
+PREHOOK: Input: default@alltypesagg
+PREHOOK: Input: default@alltypestiny
+ A masked pattern was here 
+POSTHOOK: query: explain cbo select *
+from alltypesagg t1
+where t1.id not in
+(select tt1.id
+ from alltypestiny tt1 left JOIN alltypesagg tt2
+ on tt1.int_col = tt2.int_col)
+POSTHOOK: type: QUERY
+POSTHOOK: Input: default@alltypesagg
+POSTHOOK: Input: default@alltypestiny
+ A masked pattern was here 
+CBO PLAN:
+HiveProject(id=[$0], int_col=[$1], bool_col=[$2])
+  HiveFilter(condition=[OR(=($3, 0), AND(IS NULL($6), >=($4, $3), IS NOT 
NULL($0)))])
+HiveProject(id=[$0], int_col=[$1], bool_col=[$2], c=[$5], ck=[$6], 
id0=[$3], literalTrue=[$4])
+  HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not 
available])
+HiveJoin(condition=[=($0, $3)], joinType=[left], algorithm=[none], 
cost=[not available])
+  HiveProject(id=[$0], int_col=[$1], bool_col=[$2])
+HiveTableScan(table=[[default, alltypesagg]], table:alias=[t1])
+  HiveProject(id=[$0], literalTrue=[true])
+HiveAggregate(group=[{0}])
+  HiveJoin(condition=[=($1, $2)], joinType=[left], 
algorithm=[none], cost=[not available])
+HiveProject(id=[$0], int_col=[$1])
+  HiveFilter(condition=[IS NOT NULL($0)])
+HiveTableScan(table=[[default, alltypestiny]], 
table:alias=[tt1])
+HiveProject(int_col=[$1])
+  HiveFilter(condition=[IS NOT NULL($1)])
+HiveTableScan(table=[[default, alltypesagg]], 
table:alias=[tt2])
+HiveProject(c=[$0], ck=[$1])
+  HiveAggregate(group=[{}], c=[COUNT()], ck=[COUNT($0)])
+HiveJoin(condition=[=($1, $2)], joinType=[left], algorithm=[none], 
cost=[not avail

[jira] [Work logged] (HIVE-25090) Join condition parsing error in subquery

2021-05-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25090?focusedWorklogId=599918&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599918
 ]

ASF GitHub Bot logged work on HIVE-25090:
-

Author: ASF GitHub Bot
Created on: 20/May/21 16:59
Start Date: 20/May/21 16:59
Worklog Time Spent: 10m 
  Work Description: soumyakanti3578 commented on a change in pull request 
#2302:
URL: https://github.com/apache/hive/pull/2302#discussion_r636287623



##
File path: ql/src/test/results/clientpositive/llap/subquery_corr_join.q.out
##
@@ -0,0 +1,212 @@
+PREHOOK: query: create table alltypestiny(
+id int,
+int_col int,
+bigint_col bigint,
+bool_col boolean
+)
+PREHOOK: type: CREATETABLE
+PREHOOK: Output: database:default
+PREHOOK: Output: default@alltypestiny
+POSTHOOK: query: create table alltypestiny(
+id int,
+int_col int,
+bigint_col bigint,
+bool_col boolean
+)
+POSTHOOK: type: CREATETABLE
+POSTHOOK: Output: database:default
+POSTHOOK: Output: default@alltypestiny
+PREHOOK: query: insert into alltypestiny(id, int_col, bigint_col, bool_col) 
values
+(1, 1, 10, true),
+(2, 4, 5, false),
+(3, 5, 15, true),
+(10, 10, 30, false)
+PREHOOK: type: QUERY
+PREHOOK: Input: _dummy_database@_dummy_table
+PREHOOK: Output: default@alltypestiny
+POSTHOOK: query: insert into alltypestiny(id, int_col, bigint_col, bool_col) 
values
+(1, 1, 10, true),
+(2, 4, 5, false),
+(3, 5, 15, true),
+(10, 10, 30, false)
+POSTHOOK: type: QUERY
+POSTHOOK: Input: _dummy_database@_dummy_table
+POSTHOOK: Output: default@alltypestiny
+POSTHOOK: Lineage: alltypestiny.bigint_col SCRIPT []
+POSTHOOK: Lineage: alltypestiny.bool_col SCRIPT []
+POSTHOOK: Lineage: alltypestiny.id SCRIPT []
+POSTHOOK: Lineage: alltypestiny.int_col SCRIPT []
+PREHOOK: query: create table alltypesagg(
+id int,
+int_col int,
+bool_col boolean
+)
+PREHOOK: type: CREATETABLE
+PREHOOK: Output: database:default
+PREHOOK: Output: default@alltypesagg
+POSTHOOK: query: create table alltypesagg(
+id int,
+int_col int,
+bool_col boolean
+)
+POSTHOOK: type: CREATETABLE
+POSTHOOK: Output: database:default
+POSTHOOK: Output: default@alltypesagg
+PREHOOK: query: insert into alltypesagg(id, int_col, bool_col) values
+(1, 1, true),
+(2, 4, false),
+(5, 6, true),
+(null, null, false)
+PREHOOK: type: QUERY
+PREHOOK: Input: _dummy_database@_dummy_table
+PREHOOK: Output: default@alltypesagg
+POSTHOOK: query: insert into alltypesagg(id, int_col, bool_col) values
+(1, 1, true),
+(2, 4, false),
+(5, 6, true),
+(null, null, false)
+POSTHOOK: type: QUERY
+POSTHOOK: Input: _dummy_database@_dummy_table
+POSTHOOK: Output: default@alltypesagg
+POSTHOOK: Lineage: alltypesagg.bool_col SCRIPT []
+POSTHOOK: Lineage: alltypesagg.id SCRIPT []
+POSTHOOK: Lineage: alltypesagg.int_col SCRIPT []
+Warning: Shuffle Join MERGEJOIN[64][tables = [$hdt$_0, $hdt$_1, $hdt$_2]] in 
Stage 'Reducer 3' is a cross product
+PREHOOK: query: explain cbo select *
+from alltypesagg t1
+where t1.id not in
+(select tt1.id
+ from alltypestiny tt1 left JOIN alltypesagg tt2
+ on tt1.int_col = tt2.int_col)
+PREHOOK: type: QUERY
+PREHOOK: Input: default@alltypesagg
+PREHOOK: Input: default@alltypestiny
+ A masked pattern was here 
+POSTHOOK: query: explain cbo select *
+from alltypesagg t1
+where t1.id not in
+(select tt1.id
+ from alltypestiny tt1 left JOIN alltypesagg tt2
+ on tt1.int_col = tt2.int_col)
+POSTHOOK: type: QUERY
+POSTHOOK: Input: default@alltypesagg
+POSTHOOK: Input: default@alltypestiny
+ A masked pattern was here 
+CBO PLAN:
+HiveProject(id=[$0], int_col=[$1], bool_col=[$2])
+  HiveFilter(condition=[OR(=($3, 0), AND(IS NULL($6), >=($4, $3), IS NOT 
NULL($0)))])
+HiveProject(id=[$0], int_col=[$1], bool_col=[$2], c=[$5], ck=[$6], 
id0=[$3], literalTrue=[$4])
+  HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not 
available])
+HiveJoin(condition=[=($0, $3)], joinType=[left], algorithm=[none], 
cost=[not available])
+  HiveProject(id=[$0], int_col=[$1], bool_col=[$2])
+HiveTableScan(table=[[default, alltypesagg]], table:alias=[t1])
+  HiveProject(id=[$0], literalTrue=[true])
+HiveAggregate(group=[{0}])
+  HiveJoin(condition=[=($1, $2)], joinType=[left], 
algorithm=[none], cost=[not available])
+HiveProject(id=[$0], int_col=[$1])
+  HiveFilter(condition=[IS NOT NULL($0)])
+HiveTableScan(table=[[default, alltypestiny]], 
table:alias=[tt1])
+HiveProject(int_col=[$1])
+  HiveFilter(condition=[IS NOT NULL($1)])
+HiveTableScan(table=[[default, alltypesagg]], 
table:alias=[tt2])
+HiveProject(c=[$0], ck=[$1])
+  HiveAggregate(group=[{}], c=[COUNT()], ck=[COUNT($0)])
+HiveJoin(condition=[=($1, $2)], joinType=[left], algorithm=[none], 
cost=[not avail

[jira] [Work logged] (HIVE-24663) Batch process in ColStatsProcessor for partitions.

2021-05-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24663?focusedWorklogId=599896&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599896
 ]

ASF GitHub Bot logged work on HIVE-24663:
-

Author: ASF GitHub Bot
Created on: 20/May/21 16:00
Start Date: 20/May/21 16:00
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #2266:
URL: https://github.com/apache/hive/pull/2266#discussion_r636240428



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
##
@@ -5390,6 +5406,493 @@ public void countOpenTxns() throws MetaException {
 }
   }
 
+  private void cleanOldStatsFromPartColStatTable(Map 
statsPartInfoMap,

Review comment:
   can we simplify this to just 1 arg ```Map statsMap``` ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 599896)
Time Spent: 40m  (was: 0.5h)

> Batch process in ColStatsProcessor for partitions.
> --
>
> Key: HIVE-24663
> URL: https://issues.apache.org/jira/browse/HIVE-24663
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: performance, pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> When large number of partitions (>20K) are processed, ColStatsProcessor runs 
> into DB issues. 
> {{ db.setPartitionColumnStatistics(request);}} gets stuck for hours together 
> and in some cases postgres stops processing. 
> It would be good to introduce small batches for stats gathering in 
> ColStatsProcessor instead of bulk update.
> Ref: 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/stats/ColStatsProcessor.java#L181
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/stats/ColStatsProcessor.java#L199



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24663) Batch process in ColStatsProcessor for partitions.

2021-05-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24663?focusedWorklogId=599891&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599891
 ]

ASF GitHub Bot logged work on HIVE-24663:
-

Author: ASF GitHub Bot
Created on: 20/May/21 15:52
Start Date: 20/May/21 15:52
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #2266:
URL: https://github.com/apache/hive/pull/2266#discussion_r636228265



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
##
@@ -5390,6 +5406,493 @@ public void countOpenTxns() throws MetaException {
 }
   }
 
+  private void cleanOldStatsFromPartColStatTable(Map 
statsPartInfoMap,
+ Map 
newStatsMap,
+ Connection dbConn) throws 
SQLException {
+PreparedStatement statementDelete = null;
+int numRows = 0;
+int maxNumRows = MetastoreConf.getIntVar(conf, 
ConfVars.DIRECT_SQL_MAX_ELEMENTS_VALUES_CLAUSE);
+String delete = "DELETE FROM \"PART_COL_STATS\" where \"PART_ID\" = ? AND 
\"COLUMN_NAME\" = ?";
+
+try {
+  statementDelete = dbConn.prepareStatement(delete);
+  for (Map.Entry entry : newStatsMap.entrySet()) {
+// If the partition does not exist (deleted/removed by some other 
task), no need to update the stats.
+if (!statsPartInfoMap.containsKey(entry.getKey())) {
+  continue;
+}
+
+ColumnStatistics colStats = (ColumnStatistics) entry.getValue();
+for (ColumnStatisticsObj statisticsObj : colStats.getStatsObj()) {
+  statementDelete.setLong(1, 
statsPartInfoMap.get(entry.getKey()).partitionId);
+  statementDelete.setString(2, statisticsObj.getColName());
+  numRows++;
+  statementDelete.addBatch();
+  if (numRows == maxNumRows) {
+statementDelete.executeBatch();
+numRows = 0;
+LOG.info("Executed delete " + delete + " for numRows " + numRows);
+  }
+}
+  }
+
+  if (numRows != 0) {
+statementDelete.executeBatch();
+  }
+} finally {
+  closeStmt(statementDelete);
+}
+  }
+
+  private long getMaxCSId(Connection dbConn) throws SQLException {
+Statement stmtInt = null;
+ResultSet rsInt = null;
+long maxCsId = 0;
+try {
+  stmtInt = dbConn.createStatement();
+  while (maxCsId == 0) {
+String query = "SELECT \"NEXT_VAL\" FROM \"SEQUENCE_TABLE\" WHERE 
\"SEQUENCE_NAME\"= "

Review comment:
   that would create lock on SEQUENCE_TABLE for the duration of the whole 
stats update operation. Won't it interfere with the regular flow? 
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 599891)
Time Spent: 0.5h  (was: 20m)

> Batch process in ColStatsProcessor for partitions.
> --
>
> Key: HIVE-24663
> URL: https://issues.apache.org/jira/browse/HIVE-24663
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: performance, pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When large number of partitions (>20K) are processed, ColStatsProcessor runs 
> into DB issues. 
> {{ db.setPartitionColumnStatistics(request);}} gets stuck for hours together 
> and in some cases postgres stops processing. 
> It would be good to introduce small batches for stats gathering in 
> ColStatsProcessor instead of bulk update.
> Ref: 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/stats/ColStatsProcessor.java#L181
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/stats/ColStatsProcessor.java#L199



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24663) Batch process in ColStatsProcessor for partitions.

2021-05-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24663?focusedWorklogId=599888&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599888
 ]

ASF GitHub Bot logged work on HIVE-24663:
-

Author: ASF GitHub Bot
Created on: 20/May/21 15:48
Start Date: 20/May/21 15:48
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #2266:
URL: https://github.com/apache/hive/pull/2266#discussion_r636181745



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java
##
@@ -8994,10 +9028,15 @@ public boolean 
set_aggr_stats_for(SetPartitionsStatsRequest request) throws TExc
 colNames, newStatsMap, request);
   } else { // No merge.
 Table t = getTable(catName, dbName, tableName);
-for (Map.Entry entry : 
newStatsMap.entrySet()) {
-  // We don't short-circuit on errors here anymore. That can leave 
acid stats invalid.
-  ret = updatePartitonColStatsInternal(t, entry.getValue(),
-  request.getValidWriteIdList(), request.getWriteId()) && ret;
+// We don't short-circuit on errors here anymore. That can leave acid 
stats invalid.
+if (newStatsMap.size() > 1) {
+  LOG.info("ETL_PERF started updatePartitionColStatsInBatch");
+  ret = updatePartitionColStatsInBatch(t, newStatsMap,
+  request.getValidWriteIdList(), request.getWriteId());
+  LOG.info("ETL_PERF done updatePartitionColStatsInBatch");
+} else {
+  ret = updatePartitonColStatsInternal(t, 
newStatsMap.values().iterator().next(),

Review comment:
   what's the reason for keeping old/non-batched implementation? 

##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java
##
@@ -7040,6 +7040,40 @@ private boolean updatePartitonColStatsInternal(Table 
tbl, ColumnStatistics colSt
 return parameters != null;
   }
 
+  private boolean updatePartitionColStatsInBatch(Table tbl, Map statsMap,
+ String validWriteIds, long 
writeId)
+  throws MetaException, InvalidObjectException, NoSuchObjectException, 
InvalidInputException {
+
+if (statsMap.size() == 0) {
+  return false;
+}
+
+String catalogName = tbl.getCatName();
+String dbName = tbl.getDbName();
+String tableName = tbl.getTableName();
+
+startFunction("updatePartitionColStatsInBatch", ":  db=" + dbName  + " 
table=" + tableName);
+
+Map newStatsMap = new HashMap<>();
+for (Map.Entry entry : statsMap.entrySet()) {
+  ColumnStatistics colStats = (ColumnStatistics) entry.getValue();
+  normalizeColStatsInput(colStats);
+  assert 
catalogName.equalsIgnoreCase(colStats.getStatsDesc().getCatName());
+  assert dbName.equalsIgnoreCase(colStats.getStatsDesc().getDbName());
+  assert 
tableName.equalsIgnoreCase(colStats.getStatsDesc().getTableName());
+  newStatsMap.put((String) entry.getKey(), colStats);
+}
+
+boolean ret = false;
+try {
+  ret = getTxnHandler().updatePartitionColumnStatistics(newStatsMap, this,

Review comment:
   Do you need to pass the reference back to HMSHandler inside of 
TxnHandler? Could this be refactored. This could cause multiple loopholes.

##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
##
@@ -5390,6 +5406,493 @@ public void countOpenTxns() throws MetaException {
 }
   }
 
+  private void cleanOldStatsFromPartColStatTable(Map 
statsPartInfoMap,
+ Map 
newStatsMap,
+ Connection dbConn) throws 
SQLException {
+PreparedStatement statementDelete = null;
+int numRows = 0;
+int maxNumRows = MetastoreConf.getIntVar(conf, 
ConfVars.DIRECT_SQL_MAX_ELEMENTS_VALUES_CLAUSE);
+String delete = "DELETE FROM \"PART_COL_STATS\" where \"PART_ID\" = ? AND 
\"COLUMN_NAME\" = ?";
+
+try {
+  statementDelete = dbConn.prepareStatement(delete);
+  for (Map.Entry entry : newStatsMap.entrySet()) {
+// If the partition does not exist (deleted/removed by some other 
task), no need to update the stats.
+if (!statsPartInfoMap.containsKey(entry.getKey())) {
+  continue;
+}
+
+ColumnStatistics colStats = (ColumnStatistics) entry.getValue();
+for (ColumnStatisticsObj statisticsObj : colStats.getStatsObj()) {
+  statementDelete.setLong(1, 
statsPartInfoMap.get(entry.getKey()).partitionId);
+  statementDelete.setString(2, statisticsObj.getColName());
+  numRows++;
+  statementDelete.addBatch();
+  if (numRows == maxNumRows) {
+statementDelete.executeBatch();
+

[jira] [Updated] (HIVE-25090) Join condition parsing error in subquery

2021-05-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25090:
--
Labels: pull-request-available  (was: )

> Join condition parsing error in subquery
> 
>
> Key: HIVE-25090
> URL: https://issues.apache.org/jira/browse/HIVE-25090
> Project: Hive
>  Issue Type: Bug
>  Components: Parser
>Reporter: Soumyakanti Das
>Assignee: Soumyakanti Das
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
>  
> The following query fails
> {code:java}
> select *
> from alltypesagg t1
> where t1.id not in
> (select tt1.id
>  from alltypesagg tt1 LEFT JOIN alltypestiny tt2
>  on t1.int_col = tt2.int_col){code}
> Stack trace:
> {code:java}
>  
> org.apache.hadoop.hive.ql.optimizer.calcite.CalciteSubquerySemanticException: 
> Line 8:8 Invalid table alias or column reference 't1': (possible column names 
> are: tt1.id, tt1.int_col, tt1.bool_col, tt2.id, tt2.int_col, tt2.bigint_col, 
> tt2.bool_col) 
> org.apache.hadoop.hive.ql.optimizer.calcite.CalciteSubquerySemanticException: 
> Line 8:8 Invalid table alias or column reference 't1': (possible column names 
> are: tt1.id, tt1.int_col, tt1.bool_col, tt2.id, tt2.int_col, tt2.bigint_col, 
> tt2.bool_col) at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genSubQueryRelNode(CalcitePlanner.java:3886)
>  at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genFilterRelNode(CalcitePlanner.java:3899)
>  at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genFilterLogicalPlan(CalcitePlanner.java:3927)
>  at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genLogicalPlan(CalcitePlanner.java:5489)
>  at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:2018)
>  at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1964)
>  at 
> org.apache.calcite.tools.Frameworks.lambda$withPlanner$0(Frameworks.java:130) 
> at 
> org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:915)
>  at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:179) at 
> org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:125) at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1725)
>  at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:565)
>  at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12486)
>  at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:458)
>  at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:316)
>  at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:223) at 
> org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:104) at 
> org.apache.hadoop.hive.ql.Driver.compile(Driver.java:492) at 
> org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:445) at 
> org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:409) at 
> org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:403) at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:125)
>  at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:229) 
> at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258) 
> at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:203) at 
> org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:129) at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:424) at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:355) at 
> org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:744) 
> at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:714) at 
> org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:170)
>  at 
> org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157) at 
> org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:62)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
> 

[jira] [Work logged] (HIVE-25090) Join condition parsing error in subquery

2021-05-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25090?focusedWorklogId=599885&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599885
 ]

ASF GitHub Bot logged work on HIVE-25090:
-

Author: ASF GitHub Bot
Created on: 20/May/21 15:46
Start Date: 20/May/21 15:46
Worklog Time Spent: 10m 
  Work Description: soumyakanti3578 opened a new pull request #2302:
URL: https://github.com/apache/hive/pull/2302


   This is a draft PR to resolve parsing errors in subquery with correlated 
join conditions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 599885)
Remaining Estimate: 0h
Time Spent: 10m

> Join condition parsing error in subquery
> 
>
> Key: HIVE-25090
> URL: https://issues.apache.org/jira/browse/HIVE-25090
> Project: Hive
>  Issue Type: Bug
>  Components: Parser
>Reporter: Soumyakanti Das
>Assignee: Soumyakanti Das
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
>  
> The following query fails
> {code:java}
> select *
> from alltypesagg t1
> where t1.id not in
> (select tt1.id
>  from alltypesagg tt1 LEFT JOIN alltypestiny tt2
>  on t1.int_col = tt2.int_col){code}
> Stack trace:
> {code:java}
>  
> org.apache.hadoop.hive.ql.optimizer.calcite.CalciteSubquerySemanticException: 
> Line 8:8 Invalid table alias or column reference 't1': (possible column names 
> are: tt1.id, tt1.int_col, tt1.bool_col, tt2.id, tt2.int_col, tt2.bigint_col, 
> tt2.bool_col) 
> org.apache.hadoop.hive.ql.optimizer.calcite.CalciteSubquerySemanticException: 
> Line 8:8 Invalid table alias or column reference 't1': (possible column names 
> are: tt1.id, tt1.int_col, tt1.bool_col, tt2.id, tt2.int_col, tt2.bigint_col, 
> tt2.bool_col) at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genSubQueryRelNode(CalcitePlanner.java:3886)
>  at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genFilterRelNode(CalcitePlanner.java:3899)
>  at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genFilterLogicalPlan(CalcitePlanner.java:3927)
>  at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genLogicalPlan(CalcitePlanner.java:5489)
>  at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:2018)
>  at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1964)
>  at 
> org.apache.calcite.tools.Frameworks.lambda$withPlanner$0(Frameworks.java:130) 
> at 
> org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:915)
>  at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:179) at 
> org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:125) at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1725)
>  at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:565)
>  at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12486)
>  at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:458)
>  at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:316)
>  at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:223) at 
> org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:104) at 
> org.apache.hadoop.hive.ql.Driver.compile(Driver.java:492) at 
> org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:445) at 
> org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:409) at 
> org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:403) at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:125)
>  at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:229) 
> at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258) 
> at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:203) at 
> org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:129) at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:424) at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:355) at 
> org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:744) 
> at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:714) at 
> org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:170)
>  at 
> org.apache.hadoop.hive.cli.control

[jira] [Work logged] (HIVE-25102) Cache Iceberg table objects within same query

2021-05-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25102?focusedWorklogId=599871&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599871
 ]

ASF GitHub Bot logged work on HIVE-25102:
-

Author: ASF GitHub Bot
Created on: 20/May/21 15:25
Start Date: 20/May/21 15:25
Worklog Time Spent: 10m 
  Work Description: lcspinter commented on a change in pull request #2261:
URL: https://github.com/apache/hive/pull/2261#discussion_r636211834



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/IcebergTableUtil.java
##
@@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.mr.hive;
+
+import java.util.Properties;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.ql.QueryState;
+import org.apache.hadoop.hive.ql.session.SessionState;
+import org.apache.iceberg.Table;
+import org.apache.iceberg.mr.Catalogs;
+
+public class IcebergTableUtil {
+
+  private IcebergTableUtil() {
+
+  }
+
+  /**
+   * Load the iceberg table either from the {@link QueryState} or through the 
configured catalog.
+   * @param configuration a Hadoop configuration
+   * @param properties controlling properties
+   * @return
+   */
+  static Table getTable(Configuration configuration, Properties properties) {
+// look for the table object stored in the query state. If it's null, it 
means the table was not loaded yet
+// within the same query therefore we claim it through the Catalogs API 
and then store it in query state.
+QueryState queryState = SessionState.get()
+
.getQueryState(configuration.get(HiveConf.ConfVars.HIVEQUERYID.varname));
+Table table = null;
+if (queryState != null) {

Review comment:
   Right now we don't expect it to be null, but I'm not sure about the 
availability of SessionState object. I noticed this comment in the class:  
`SessionState is not available in runtime and Hive.get().getConf() is not safe 
to call`.
   So I put there this safeguard, to be more robust in case the SessionState is 
lost, but I agree some additional logging doesn't hurt. 
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 599871)
Time Spent: 5h 50m  (was: 5h 40m)

> Cache Iceberg table objects within same query
> -
>
> Key: HIVE-25102
> URL: https://issues.apache.org/jira/browse/HIVE-25102
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> We run Catalogs.loadTable(configuration, props) plenty of times which is 
> costly.
> We should:
>  - Cache it maybe even globally based on the queryId
>  - Make sure that the query uses one snapshot during the whole execution of a 
> single query



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25102) Cache Iceberg table objects within same query

2021-05-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25102?focusedWorklogId=599866&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599866
 ]

ASF GitHub Bot logged work on HIVE-25102:
-

Author: ASF GitHub Bot
Created on: 20/May/21 15:09
Start Date: 20/May/21 15:09
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2261:
URL: https://github.com/apache/hive/pull/2261#discussion_r636196743



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/IcebergTableUtil.java
##
@@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.mr.hive;
+
+import java.util.Properties;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.ql.QueryState;
+import org.apache.hadoop.hive.ql.session.SessionState;
+import org.apache.iceberg.Table;
+import org.apache.iceberg.mr.Catalogs;
+
+public class IcebergTableUtil {
+
+  private IcebergTableUtil() {
+
+  }
+
+  /**
+   * Load the iceberg table either from the {@link QueryState} or through the 
configured catalog.
+   * @param configuration a Hadoop configuration
+   * @param properties controlling properties
+   * @return
+   */
+  static Table getTable(Configuration configuration, Properties properties) {
+// look for the table object stored in the query state. If it's null, it 
means the table was not loaded yet
+// within the same query therefore we claim it through the Catalogs API 
and then store it in query state.
+QueryState queryState = SessionState.get()
+
.getQueryState(configuration.get(HiveConf.ConfVars.HIVEQUERYID.varname));
+Table table = null;
+if (queryState != null) {

Review comment:
   Do we expect the query state to be null at any point during compilation? 
If so, that would result in constant reloading of the table. So maybe we should 
either remove the null checks to simplify the code, or at least log it if it's 
not there? What do you think?

##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/IcebergTableUtil.java
##
@@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.mr.hive;
+
+import java.util.Properties;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.ql.QueryState;
+import org.apache.hadoop.hive.ql.session.SessionState;
+import org.apache.iceberg.Table;
+import org.apache.iceberg.mr.Catalogs;
+
+public class IcebergTableUtil {
+
+  private IcebergTableUtil() {
+
+  }
+
+  /**
+   * Load the iceberg table either from the {@link QueryState} or through the 
configured catalog.
+   * @param configuration a Hadoop configuration
+   * @param properties controlling properties
+   * @return
+   */
+  static Table getTable(Configuration configuration, Properties properties) {
+// look for the table object stored in the query state. If it's null, it 
means the table was not loaded yet

Review comment:
   nit: I think this comment belongs to the javadoc more naturally, but 
it's up to you, I don't feel strongly on that




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHu

[jira] [Work logged] (HIVE-25102) Cache Iceberg table objects within same query

2021-05-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25102?focusedWorklogId=599848&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599848
 ]

ASF GitHub Bot logged work on HIVE-25102:
-

Author: ASF GitHub Bot
Created on: 20/May/21 14:48
Start Date: 20/May/21 14:48
Worklog Time Spent: 10m 
  Work Description: lcspinter commented on a change in pull request #2261:
URL: https://github.com/apache/hive/pull/2261#discussion_r636177564



##
File path: ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java
##
@@ -349,6 +350,20 @@
 
   private final AtomicLong sparkSessionId = new AtomicLong();
 
+  private final Map queryStateMap = new HashMap<>();
+
+  public Object getQueryState(String queryId) {

Review comment:
   Sure!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 599848)
Time Spent: 5.5h  (was: 5h 20m)

> Cache Iceberg table objects within same query
> -
>
> Key: HIVE-25102
> URL: https://issues.apache.org/jira/browse/HIVE-25102
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> We run Catalogs.loadTable(configuration, props) plenty of times which is 
> costly.
> We should:
>  - Cache it maybe even globally based on the queryId
>  - Make sure that the query uses one snapshot during the whole execution of a 
> single query



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25102) Cache Iceberg table objects within same query

2021-05-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25102?focusedWorklogId=599842&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599842
 ]

ASF GitHub Bot logged work on HIVE-25102:
-

Author: ASF GitHub Bot
Created on: 20/May/21 14:44
Start Date: 20/May/21 14:44
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2261:
URL: https://github.com/apache/hive/pull/2261#discussion_r636173569



##
File path: ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java
##
@@ -349,6 +350,20 @@
 
   private final AtomicLong sparkSessionId = new AtomicLong();
 
+  private final Map queryStateMap = new HashMap<>();
+
+  public Object getQueryState(String queryId) {

Review comment:
   Can we update the return type too so we don't need any casting?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 599842)
Time Spent: 5h 20m  (was: 5h 10m)

> Cache Iceberg table objects within same query
> -
>
> Key: HIVE-25102
> URL: https://issues.apache.org/jira/browse/HIVE-25102
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> We run Catalogs.loadTable(configuration, props) plenty of times which is 
> costly.
> We should:
>  - Cache it maybe even globally based on the queryId
>  - Make sure that the query uses one snapshot during the whole execution of a 
> single query



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25102) Cache Iceberg table objects within same query

2021-05-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25102?focusedWorklogId=599837&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599837
 ]

ASF GitHub Bot logged work on HIVE-25102:
-

Author: ASF GitHub Bot
Created on: 20/May/21 14:39
Start Date: 20/May/21 14:39
Worklog Time Spent: 10m 
  Work Description: lcspinter commented on a change in pull request #2261:
URL: https://github.com/apache/hive/pull/2261#discussion_r636168550



##
File path: ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java
##
@@ -349,6 +349,20 @@
 
   private final AtomicLong sparkSessionId = new AtomicLong();
 
+  private final Map queryStateMap = new HashMap<>();
+
+  public Object getQueryState(String queryId) {
+return queryStateMap.get(queryId);
+  }
+
+  public void addQueryState(String queryId, Object queryState) {

Review comment:
   Right. Changed it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 599837)
Time Spent: 4h 50m  (was: 4h 40m)

> Cache Iceberg table objects within same query
> -
>
> Key: HIVE-25102
> URL: https://issues.apache.org/jira/browse/HIVE-25102
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> We run Catalogs.loadTable(configuration, props) plenty of times which is 
> costly.
> We should:
>  - Cache it maybe even globally based on the queryId
>  - Make sure that the query uses one snapshot during the whole execution of a 
> single query



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25102) Cache Iceberg table objects within same query

2021-05-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25102?focusedWorklogId=599839&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599839
 ]

ASF GitHub Bot logged work on HIVE-25102:
-

Author: ASF GitHub Bot
Created on: 20/May/21 14:39
Start Date: 20/May/21 14:39
Worklog Time Spent: 10m 
  Work Description: lcspinter commented on a change in pull request #2261:
URL: https://github.com/apache/hive/pull/2261#discussion_r636169311



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/Catalogs.java
##
@@ -97,13 +98,15 @@ public static Table loadTable(Configuration conf, 
Properties props) {
 props.getProperty(InputFormatConfig.CATALOG_NAME));
   }
 
-  private static Table loadTable(Configuration conf, String tableIdentifier, 
String tableLocation,
- String catalogName) {
+  private static Table loadTable(Configuration conf, String tableIdentifier, 
String tableLocation, String catalogName) {
 Optional catalog = loadCatalog(conf, catalogName);
 
+Table cachedTable = null;

Review comment:
   Not at all. The revert was not 100% successful. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 599839)
Time Spent: 5h 10m  (was: 5h)

> Cache Iceberg table objects within same query
> -
>
> Key: HIVE-25102
> URL: https://issues.apache.org/jira/browse/HIVE-25102
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> We run Catalogs.loadTable(configuration, props) plenty of times which is 
> costly.
> We should:
>  - Cache it maybe even globally based on the queryId
>  - Make sure that the query uses one snapshot during the whole execution of a 
> single query



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25102) Cache Iceberg table objects within same query

2021-05-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25102?focusedWorklogId=599838&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599838
 ]

ASF GitHub Bot logged work on HIVE-25102:
-

Author: ASF GitHub Bot
Created on: 20/May/21 14:39
Start Date: 20/May/21 14:39
Worklog Time Spent: 10m 
  Work Description: lcspinter commented on a change in pull request #2261:
URL: https://github.com/apache/hive/pull/2261#discussion_r636168810



##
File path: ql/src/java/org/apache/hadoop/hive/ql/QueryState.java
##
@@ -59,6 +60,11 @@
 
   static public final String USERID_TAG = "userid";
 
+  /**
+   * map of tables involved in the query.
+   */
+  private final Map tableMap = new HashMap<>();

Review comment:
   Changed the naming of the map.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 599838)
Time Spent: 5h  (was: 4h 50m)

> Cache Iceberg table objects within same query
> -
>
> Key: HIVE-25102
> URL: https://issues.apache.org/jira/browse/HIVE-25102
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> We run Catalogs.loadTable(configuration, props) plenty of times which is 
> costly.
> We should:
>  - Cache it maybe even globally based on the queryId
>  - Make sure that the query uses one snapshot during the whole execution of a 
> single query



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24911) Metastore: Create index on SDS.CD_ID for Postgres

2021-05-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24911?focusedWorklogId=599832&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599832
 ]

ASF GitHub Bot logged work on HIVE-24911:
-

Author: ASF GitHub Bot
Created on: 20/May/21 14:31
Start Date: 20/May/21 14:31
Worklog Time Spent: 10m 
  Work Description: abstractdog merged pull request #2090:
URL: https://github.com/apache/hive/pull/2090


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 599832)
Time Spent: 0.5h  (was: 20m)

> Metastore: Create index on SDS.CD_ID for Postgres
> -
>
> Key: HIVE-24911
> URL: https://issues.apache.org/jira/browse/HIVE-24911
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
> Attachments: command-output.txt
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> While investigating HIVE-24870, we found that during a long incremental 
> replication, an SDS.CD_ID can improve the performance.
> It was tested by postgres like below:
> {code}
> CREATE INDEX IF NOT EXISTS "SDS_N50" ON "SDS" USING btree ("CD_ID");
> EXPLAIN (ANALYZE,BUFFERS,TIMING) select count(*) from "SDS" where 
> "CD_ID"=THE_MOST_FREQUENTLY_USED_CD_ID_HERE;
> DROP INDEX IF EXISTS "SDS_N50";
> EXPLAIN (ANALYZE,BUFFERS,TIMING) select count(*) from "SDS" where 
> "CD_ID"=THE_MOST_FREQUENTLY_USED_CD_ID_HERE;
> {code}
> Further results can be found in:  [^command-output.txt] 
> After some investigation, I found that this index is also part of the schemas 
> for a very long time:
> orcale: HIVE-2928
> mysql: HIVE-2246
> mssql: HIVE-6862 (or earlier)
> ...except Postgres.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24911) Metastore: Create index on SDS.CD_ID for Postgres

2021-05-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24911?focusedWorklogId=599833&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599833
 ]

ASF GitHub Bot logged work on HIVE-24911:
-

Author: ASF GitHub Bot
Created on: 20/May/21 14:31
Start Date: 20/May/21 14:31
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on pull request #2090:
URL: https://github.com/apache/hive/pull/2090#issuecomment-845175125


   merged, thanks for the review @pgaref and @zeroflag !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 599833)
Time Spent: 40m  (was: 0.5h)

> Metastore: Create index on SDS.CD_ID for Postgres
> -
>
> Key: HIVE-24911
> URL: https://issues.apache.org/jira/browse/HIVE-24911
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
> Attachments: command-output.txt
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> While investigating HIVE-24870, we found that during a long incremental 
> replication, an SDS.CD_ID can improve the performance.
> It was tested by postgres like below:
> {code}
> CREATE INDEX IF NOT EXISTS "SDS_N50" ON "SDS" USING btree ("CD_ID");
> EXPLAIN (ANALYZE,BUFFERS,TIMING) select count(*) from "SDS" where 
> "CD_ID"=THE_MOST_FREQUENTLY_USED_CD_ID_HERE;
> DROP INDEX IF EXISTS "SDS_N50";
> EXPLAIN (ANALYZE,BUFFERS,TIMING) select count(*) from "SDS" where 
> "CD_ID"=THE_MOST_FREQUENTLY_USED_CD_ID_HERE;
> {code}
> Further results can be found in:  [^command-output.txt] 
> After some investigation, I found that this index is also part of the schemas 
> for a very long time:
> orcale: HIVE-2928
> mysql: HIVE-2246
> mssql: HIVE-6862 (or earlier)
> ...except Postgres.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24911) Metastore: Create index on SDS.CD_ID for Postgres

2021-05-20 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-24911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-24911:

Fix Version/s: 4.0.0

> Metastore: Create index on SDS.CD_ID for Postgres
> -
>
> Key: HIVE-24911
> URL: https://issues.apache.org/jira/browse/HIVE-24911
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: command-output.txt
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> While investigating HIVE-24870, we found that during a long incremental 
> replication, an SDS.CD_ID can improve the performance.
> It was tested by postgres like below:
> {code}
> CREATE INDEX IF NOT EXISTS "SDS_N50" ON "SDS" USING btree ("CD_ID");
> EXPLAIN (ANALYZE,BUFFERS,TIMING) select count(*) from "SDS" where 
> "CD_ID"=THE_MOST_FREQUENTLY_USED_CD_ID_HERE;
> DROP INDEX IF EXISTS "SDS_N50";
> EXPLAIN (ANALYZE,BUFFERS,TIMING) select count(*) from "SDS" where 
> "CD_ID"=THE_MOST_FREQUENTLY_USED_CD_ID_HERE;
> {code}
> Further results can be found in:  [^command-output.txt] 
> After some investigation, I found that this index is also part of the schemas 
> for a very long time:
> orcale: HIVE-2928
> mysql: HIVE-2246
> mssql: HIVE-6862 (or earlier)
> ...except Postgres.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24911) Metastore: Create index on SDS.CD_ID for Postgres

2021-05-20 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-24911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor resolved HIVE-24911.
-
Resolution: Fixed

> Metastore: Create index on SDS.CD_ID for Postgres
> -
>
> Key: HIVE-24911
> URL: https://issues.apache.org/jira/browse/HIVE-24911
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: command-output.txt
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> While investigating HIVE-24870, we found that during a long incremental 
> replication, an SDS.CD_ID can improve the performance.
> It was tested by postgres like below:
> {code}
> CREATE INDEX IF NOT EXISTS "SDS_N50" ON "SDS" USING btree ("CD_ID");
> EXPLAIN (ANALYZE,BUFFERS,TIMING) select count(*) from "SDS" where 
> "CD_ID"=THE_MOST_FREQUENTLY_USED_CD_ID_HERE;
> DROP INDEX IF EXISTS "SDS_N50";
> EXPLAIN (ANALYZE,BUFFERS,TIMING) select count(*) from "SDS" where 
> "CD_ID"=THE_MOST_FREQUENTLY_USED_CD_ID_HERE;
> {code}
> Further results can be found in:  [^command-output.txt] 
> After some investigation, I found that this index is also part of the schemas 
> for a very long time:
> orcale: HIVE-2928
> mysql: HIVE-2246
> mssql: HIVE-6862 (or earlier)
> ...except Postgres.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25079) Create new metric about number of writes to tables with manually disabled compaction

2021-05-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25079?focusedWorklogId=599830&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599830
 ]

ASF GitHub Bot logged work on HIVE-25079:
-

Author: ASF GitHub Bot
Created on: 20/May/21 14:28
Start Date: 20/May/21 14:28
Worklog Time Spent: 10m 
  Work Description: klcopp commented on a change in pull request #2281:
URL: https://github.com/apache/hive/pull/2281#discussion_r636158595



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSMetricsListener.java
##
@@ -86,4 +92,24 @@ public void onAddPartition(AddPartitionEvent partitionEvent) 
throws MetaExceptio
 
Metrics.getOrCreateGauge(MetricsConstants.TOTAL_PARTITIONS).incrementAndGet();
 createdParts.inc();
   }
+
+  @Override
+  public void onAllocWriteId(AllocWriteIdEvent allocWriteIdEvent, Connection 
dbConn, SQLGenerator sqlGenerator) throws MetaException {
+Table table = getTable(allocWriteIdEvent);

Review comment:
   Ok. Maybe add some JavaDoc about this so nobody else gets confused... 
but otherwise LGTM then :)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 599830)
Time Spent: 1h 10m  (was: 1h)

> Create new metric about number of writes to tables with manually disabled 
> compaction
> 
>
> Key: HIVE-25079
> URL: https://issues.apache.org/jira/browse/HIVE-25079
> Project: Hive
>  Issue Type: Bug
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Create a new metric that measures the number of writes tables that has 
> compaction turned off manually. It does not matter if the write is committed 
> or aborted (both are bad...)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25102) Cache Iceberg table objects within same query

2021-05-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25102?focusedWorklogId=599829&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599829
 ]

ASF GitHub Bot logged work on HIVE-25102:
-

Author: ASF GitHub Bot
Created on: 20/May/21 14:23
Start Date: 20/May/21 14:23
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2261:
URL: https://github.com/apache/hive/pull/2261#discussion_r636154183



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/Catalogs.java
##
@@ -97,13 +98,15 @@ public static Table loadTable(Configuration conf, 
Properties props) {
 props.getProperty(InputFormatConfig.CATALOG_NAME));
   }
 
-  private static Table loadTable(Configuration conf, String tableIdentifier, 
String tableLocation,
- String catalogName) {
+  private static Table loadTable(Configuration conf, String tableIdentifier, 
String tableLocation, String catalogName) {
 Optional catalog = loadCatalog(conf, catalogName);
 
+Table cachedTable = null;

Review comment:
   Are these changes still necessary?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 599829)
Time Spent: 4h 40m  (was: 4.5h)

> Cache Iceberg table objects within same query
> -
>
> Key: HIVE-25102
> URL: https://issues.apache.org/jira/browse/HIVE-25102
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> We run Catalogs.loadTable(configuration, props) plenty of times which is 
> costly.
> We should:
>  - Cache it maybe even globally based on the queryId
>  - Make sure that the query uses one snapshot during the whole execution of a 
> single query



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25102) Cache Iceberg table objects within same query

2021-05-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25102?focusedWorklogId=599826&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599826
 ]

ASF GitHub Bot logged work on HIVE-25102:
-

Author: ASF GitHub Bot
Created on: 20/May/21 14:21
Start Date: 20/May/21 14:21
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2261:
URL: https://github.com/apache/hive/pull/2261#discussion_r636151988



##
File path: ql/src/java/org/apache/hadoop/hive/ql/QueryState.java
##
@@ -59,6 +60,11 @@
 
   static public final String USERID_TAG = "userid";
 
+  /**
+   * map of tables involved in the query.
+   */
+  private final Map tableMap = new HashMap<>();

Review comment:
   Can we make this a generic map so we can store things other than tables 
here? We will need this container also for other info that - in the absence of 
this new query state feature - we previously had to force into the conf, such 
as write commit info (jobID, vertexID, taskNum), CTAS info (is the query ctas? 
what's the ctas target table name), etc.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 599826)
Time Spent: 4.5h  (was: 4h 20m)

> Cache Iceberg table objects within same query
> -
>
> Key: HIVE-25102
> URL: https://issues.apache.org/jira/browse/HIVE-25102
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> We run Catalogs.loadTable(configuration, props) plenty of times which is 
> costly.
> We should:
>  - Cache it maybe even globally based on the queryId
>  - Make sure that the query uses one snapshot during the whole execution of a 
> single query



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25102) Cache Iceberg table objects within same query

2021-05-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25102?focusedWorklogId=599825&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599825
 ]

ASF GitHub Bot logged work on HIVE-25102:
-

Author: ASF GitHub Bot
Created on: 20/May/21 14:18
Start Date: 20/May/21 14:18
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2261:
URL: https://github.com/apache/hive/pull/2261#discussion_r636149696



##
File path: ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java
##
@@ -349,6 +349,20 @@
 
   private final AtomicLong sparkSessionId = new AtomicLong();
 
+  private final Map queryStateMap = new HashMap<>();
+
+  public Object getQueryState(String queryId) {
+return queryStateMap.get(queryId);
+  }
+
+  public void addQueryState(String queryId, Object queryState) {

Review comment:
   Shouldn't we only accept only QueryState objects here?
   i.e. `public void addQueryState(String queryId, QueryState queryState) {`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 599825)
Time Spent: 4h 20m  (was: 4h 10m)

> Cache Iceberg table objects within same query
> -
>
> Key: HIVE-25102
> URL: https://issues.apache.org/jira/browse/HIVE-25102
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> We run Catalogs.loadTable(configuration, props) plenty of times which is 
> costly.
> We should:
>  - Cache it maybe even globally based on the queryId
>  - Make sure that the query uses one snapshot during the whole execution of a 
> single query



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25143) Improve ERROR Logging in QL Package

2021-05-20 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HIVE-25143:
--
Description: 
I went through and reviewed all of the ERROR logging in the HS2 {{ql}} module 
and I removed (most of) the following bad habits:

 
 * Log-and-Throw (log or throw, not both)
 * Pass in the Exception to the logging framework instead of logging its 
toString() : LOG.error("alter table update columns: {}", e);
 * Add additional context instead of copying the message from the wrapped 
Exception : throw new SemanticException(e.getMessage(), e);
 * The wrapped exception is being lost in some case, though the message 
survives :  throw new HiveException(e.getMessage());
 * Remove new-lines from Exception messages, this is annoying as log messages 
should all be on a single line for GREP
 * Not logging the Exception stack trace :  LOG.error("Error in close loader: " 
+ ie);
 * Logging information but not passing it into an Exception for bubbling up:  
LOG.error("Failed to return session: {} to pool", session, e); throw e;
 * Other miscellaneous improvements

> Improve ERROR Logging in QL Package
> ---
>
> Key: HIVE-25143
> URL: https://issues.apache.org/jira/browse/HIVE-25143
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I went through and reviewed all of the ERROR logging in the HS2 {{ql}} module 
> and I removed (most of) the following bad habits:
>  
>  * Log-and-Throw (log or throw, not both)
>  * Pass in the Exception to the logging framework instead of logging its 
> toString() : LOG.error("alter table update columns: {}", e);
>  * Add additional context instead of copying the message from the wrapped 
> Exception : throw new SemanticException(e.getMessage(), e);
>  * The wrapped exception is being lost in some case, though the message 
> survives :  throw new HiveException(e.getMessage());
>  * Remove new-lines from Exception messages, this is annoying as log messages 
> should all be on a single line for GREP
>  * Not logging the Exception stack trace :  LOG.error("Error in close loader: 
> " + ie);
>  * Logging information but not passing it into an Exception for bubbling up:  
> LOG.error("Failed to return session: {} to pool", session, e); throw e;
>  * Other miscellaneous improvements



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25143) Improve ERROR Logging in QL Package

2021-05-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25143?focusedWorklogId=599816&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599816
 ]

ASF GitHub Bot logged work on HIVE-25143:
-

Author: ASF GitHub Bot
Created on: 20/May/21 13:57
Start Date: 20/May/21 13:57
Worklog Time Spent: 10m 
  Work Description: belugabehr opened a new pull request #2301:
URL: https://github.com/apache/hive/pull/2301


   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 599816)
Remaining Estimate: 0h
Time Spent: 10m

> Improve ERROR Logging in QL Package
> ---
>
> Key: HIVE-25143
> URL: https://issues.apache.org/jira/browse/HIVE-25143
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25143) Improve ERROR Logging in QL Package

2021-05-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25143:
--
Labels: pull-request-available  (was: )

> Improve ERROR Logging in QL Package
> ---
>
> Key: HIVE-25143
> URL: https://issues.apache.org/jira/browse/HIVE-25143
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25143) Improve ERROR Logging in QL Package

2021-05-20 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor reassigned HIVE-25143:
-


> Improve ERROR Logging in QL Package
> ---
>
> Key: HIVE-25143
> URL: https://issues.apache.org/jira/browse/HIVE-25143
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25128) Remove Thrift Exceptions From RawStore alterCatalog

2021-05-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25128?focusedWorklogId=599811&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599811
 ]

ASF GitHub Bot logged work on HIVE-25128:
-

Author: ASF GitHub Bot
Created on: 20/May/21 13:46
Start Date: 20/May/21 13:46
Worklog Time Spent: 10m 
  Work Description: belugabehr closed pull request #2291:
URL: https://github.com/apache/hive/pull/2291


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 599811)
Time Spent: 50m  (was: 40m)

> Remove Thrift Exceptions From RawStore alterCatalog
> ---
>
> Key: HIVE-25128
> URL: https://issues.apache.org/jira/browse/HIVE-25128
> Project: Hive
>  Issue Type: Sub-task
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-25046) Log CBO plans right after major transformations

2021-05-20 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez resolved HIVE-25046.

Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master, thanks [~zabetak]!

> Log CBO plans right after major transformations
> ---
>
> Key: HIVE-25046
> URL: https://issues.apache.org/jira/browse/HIVE-25046
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Currently the results of various CBO transformations are logged (in DEBUG 
> mode) at the end of the optimization 
> [phase|https://github.com/apache/hive/blob/9f5bd72e908244b2fe915e8dc39f55afa94bbffa/ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java#L2106]
>  and only if we are not in test mode. This has some disadvantages:
> * If there is a failure (exception) in some intermediate step we will miss 
> all the intermediate  plans, possibly losing track of what plan led to the 
> problem.
> * Intermediate logs are very useful for identifying plan problems while 
> working on a patch; unfortunately the logs are explicitly disabled in test 
> mode which means that in order to appear the respective code needs to change 
> every time we need to see those logs.
> * Logging at the end necessitates keeping additional local variables that 
> make code harder to read.
> The goal of this issue is to place DEBUG logging right after major 
> transformations and independently if we are running in test mode or not to 
> alleviate the shortcomings mentioned above.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25046) Log CBO plans right after major transformations

2021-05-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25046?focusedWorklogId=599803&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599803
 ]

ASF GitHub Bot logged work on HIVE-25046:
-

Author: ASF GitHub Bot
Created on: 20/May/21 13:26
Start Date: 20/May/21 13:26
Worklog Time Spent: 10m 
  Work Description: jcamachor merged pull request #2205:
URL: https://github.com/apache/hive/pull/2205


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 599803)
Time Spent: 1h  (was: 50m)

> Log CBO plans right after major transformations
> ---
>
> Key: HIVE-25046
> URL: https://issues.apache.org/jira/browse/HIVE-25046
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Currently the results of various CBO transformations are logged (in DEBUG 
> mode) at the end of the optimization 
> [phase|https://github.com/apache/hive/blob/9f5bd72e908244b2fe915e8dc39f55afa94bbffa/ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java#L2106]
>  and only if we are not in test mode. This has some disadvantages:
> * If there is a failure (exception) in some intermediate step we will miss 
> all the intermediate  plans, possibly losing track of what plan led to the 
> problem.
> * Intermediate logs are very useful for identifying plan problems while 
> working on a patch; unfortunately the logs are explicitly disabled in test 
> mode which means that in order to appear the respective code needs to change 
> every time we need to see those logs.
> * Logging at the end necessitates keeping additional local variables that 
> make code harder to read.
> The goal of this issue is to place DEBUG logging right after major 
> transformations and independently if we are running in test mode or not to 
> alleviate the shortcomings mentioned above.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25127) Remove Thrift Exceptions From RawStore getCatalogs

2021-05-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25127?focusedWorklogId=599801&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599801
 ]

ASF GitHub Bot logged work on HIVE-25127:
-

Author: ASF GitHub Bot
Created on: 20/May/21 13:25
Start Date: 20/May/21 13:25
Worklog Time Spent: 10m 
  Work Description: belugabehr merged pull request #2283:
URL: https://github.com/apache/hive/pull/2283


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 599801)
Time Spent: 50m  (was: 40m)

> Remove Thrift Exceptions From RawStore getCatalogs
> --
>
> Key: HIVE-25127
> URL: https://issues.apache.org/jira/browse/HIVE-25127
> Project: Hive
>  Issue Type: Sub-task
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-25127) Remove Thrift Exceptions From RawStore getCatalogs

2021-05-20 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor resolved HIVE-25127.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master. Thanks [~mgergely] for the review!

> Remove Thrift Exceptions From RawStore getCatalogs
> --
>
> Key: HIVE-25127
> URL: https://issues.apache.org/jira/browse/HIVE-25127
> Project: Hive
>  Issue Type: Sub-task
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25046) Log CBO plans right after major transformations

2021-05-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25046?focusedWorklogId=599787&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599787
 ]

ASF GitHub Bot logged work on HIVE-25046:
-

Author: ASF GitHub Bot
Created on: 20/May/21 13:00
Start Date: 20/May/21 13:00
Worklog Time Spent: 10m 
  Work Description: zabetak commented on pull request #2205:
URL: https://github.com/apache/hive/pull/2205#issuecomment-845096727


   Tests are green can we get this in @jcamachor ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 599787)
Time Spent: 50m  (was: 40m)

> Log CBO plans right after major transformations
> ---
>
> Key: HIVE-25046
> URL: https://issues.apache.org/jira/browse/HIVE-25046
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Currently the results of various CBO transformations are logged (in DEBUG 
> mode) at the end of the optimization 
> [phase|https://github.com/apache/hive/blob/9f5bd72e908244b2fe915e8dc39f55afa94bbffa/ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java#L2106]
>  and only if we are not in test mode. This has some disadvantages:
> * If there is a failure (exception) in some intermediate step we will miss 
> all the intermediate  plans, possibly losing track of what plan led to the 
> problem.
> * Intermediate logs are very useful for identifying plan problems while 
> working on a patch; unfortunately the logs are explicitly disabled in test 
> mode which means that in order to appear the respective code needs to change 
> every time we need to see those logs.
> * Logging at the end necessitates keeping additional local variables that 
> make code harder to read.
> The goal of this issue is to place DEBUG logging right after major 
> transformations and independently if we are running in test mode or not to 
> alleviate the shortcomings mentioned above.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25034) Implement CTAS for Iceberg

2021-05-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25034?focusedWorklogId=599750&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599750
 ]

ASF GitHub Bot logged work on HIVE-25034:
-

Author: ASF GitHub Bot
Created on: 20/May/21 11:21
Start Date: 20/May/21 11:21
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2243:
URL: https://github.com/apache/hive/pull/2243#discussion_r636009820



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezTask.java
##
@@ -361,7 +362,7 @@ private void collectCommitInformation(TezWork work) throws 
IOException, TezExcep
   .filter(name -> 
name.endsWith("HiveIcebergNoJobCommitter")).isPresent();
   // we should only consider jobs with Iceberg output committer and a data 
sink
   if (hasIcebergCommitter && !vertex.getDataSinks().isEmpty()) {
-String tableLocationRoot = jobConf.get("location");
+String tableLocationRoot = jobConf.get(ICEBERG_MR_TABLE_LOCATION);

Review comment:
   My mistake. It was the intellij-generated constant, and didn't realize 
it had MR in it :)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 599750)
Time Spent: 5h 10m  (was: 5h)

> Implement CTAS for Iceberg
> --
>
> Key: HIVE-25034
> URL: https://issues.apache.org/jira/browse/HIVE-25034
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25034) Implement CTAS for Iceberg

2021-05-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25034?focusedWorklogId=599745&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599745
 ]

ASF GitHub Bot logged work on HIVE-25034:
-

Author: ASF GitHub Bot
Created on: 20/May/21 11:20
Start Date: 20/May/21 11:20
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2243:
URL: https://github.com/apache/hive/pull/2243#discussion_r636009266



##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithMultipleCatalogs.java
##
@@ -128,13 +130,55 @@ public void testJoinTablesFromDifferentCatalogs() throws 
IOException {
 
HiveIcebergTestUtils.valueForRow(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
 rows), 0);
   }
 
+  @Test
+  public void testCTASFromOtherCatalog() throws IOException {
+testTables2.createTable(shell, "source", 
HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+fileFormat2, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+shell.executeStatement(String.format(
+"CREATE TABLE target STORED BY '%s' TBLPROPERTIES ('%s'='%s') AS 
SELECT * FROM source",
+HiveIcebergStorageHandler.class.getName(),
+InputFormatConfig.CATALOG_NAME, HIVECATALOGNAME));
+
+List objects = shell.executeStatement("SELECT * FROM target");
+Assert.assertEquals(3, objects.size());
+
+Table target = testTables1.loadTable(TableIdentifier.of("default", 
"target"));
+HiveIcebergTestUtils.validateData(target, 
HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS, 0);
+  }
+
+  @Test
+  public void testCTASFromOtherCatalogFailureRollback() throws IOException {
+// force an execution error by passing in a committer class that Tez won't 
be able to load
+shell.setHiveSessionValue("hive.tez.mapreduce.output.committer.class", 
"org.apache.NotExistingClass");
+
+TableIdentifier target = TableIdentifier.of("default", "target");
+testTables2.createTable(shell, "source", 
HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+fileFormat2, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+try {
+  shell.executeStatement(String.format(
+  "CREATE TABLE target STORED BY '%s' TBLPROPERTIES ('%s'='%s') AS 
SELECT * FROM source",
+  HiveIcebergStorageHandler.class.getName(),
+  InputFormatConfig.CATALOG_NAME, HIVECATALOGNAME));
+} catch (Exception e) {
+  // expected error
+}
+
+// CTAS table should have been dropped by the lifecycle hook
+Assert.assertThrows(NoSuchTableException.class, () -> 
testTables1.loadTable(target));
+  }
+
   private void createAndAddRecords(TestTables testTables, FileFormat 
fileFormat, TableIdentifier identifier,

Review comment:
   Agreed. We've had a conversation about this testTables cleanup with 
@lcspinter too. I'll file a ticket for that work.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 599745)
Time Spent: 5h  (was: 4h 50m)

> Implement CTAS for Iceberg
> --
>
> Key: HIVE-25034
> URL: https://issues.apache.org/jira/browse/HIVE-25034
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25034) Implement CTAS for Iceberg

2021-05-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25034?focusedWorklogId=599741&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599741
 ]

ASF GitHub Bot logged work on HIVE-25034:
-

Author: ASF GitHub Bot
Created on: 20/May/21 11:15
Start Date: 20/May/21 11:15
Worklog Time Spent: 10m 
  Work Description: pvary commented on pull request #2243:
URL: https://github.com/apache/hive/pull/2243#issuecomment-844993309


   LGTM - a few leftover questions, but nothing crucial


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 599741)
Time Spent: 4h 50m  (was: 4h 40m)

> Implement CTAS for Iceberg
> --
>
> Key: HIVE-25034
> URL: https://issues.apache.org/jira/browse/HIVE-25034
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25034) Implement CTAS for Iceberg

2021-05-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25034?focusedWorklogId=599740&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599740
 ]

ASF GitHub Bot logged work on HIVE-25034:
-

Author: ASF GitHub Bot
Created on: 20/May/21 11:13
Start Date: 20/May/21 11:13
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2243:
URL: https://github.com/apache/hive/pull/2243#discussion_r636004598



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezTask.java
##
@@ -361,7 +362,7 @@ private void collectCommitInformation(TezWork work) throws 
IOException, TezExcep
   .filter(name -> 
name.endsWith("HiveIcebergNoJobCommitter")).isPresent();
   // we should only consider jobs with Iceberg output committer and a data 
sink
   if (hasIcebergCommitter && !vertex.getDataSinks().isEmpty()) {
-String tableLocationRoot = jobConf.get("location");
+String tableLocationRoot = jobConf.get(ICEBERG_MR_TABLE_LOCATION);

Review comment:
   Why it is `mr`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 599740)
Time Spent: 4h 40m  (was: 4.5h)

> Implement CTAS for Iceberg
> --
>
> Key: HIVE-25034
> URL: https://issues.apache.org/jira/browse/HIVE-25034
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25034) Implement CTAS for Iceberg

2021-05-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25034?focusedWorklogId=599739&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599739
 ]

ASF GitHub Bot logged work on HIVE-25034:
-

Author: ASF GitHub Bot
Created on: 20/May/21 11:12
Start Date: 20/May/21 11:12
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2243:
URL: https://github.com/apache/hive/pull/2243#discussion_r636004243



##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithMultipleCatalogs.java
##
@@ -128,13 +130,55 @@ public void testJoinTablesFromDifferentCatalogs() throws 
IOException {
 
HiveIcebergTestUtils.valueForRow(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
 rows), 0);
   }
 
+  @Test
+  public void testCTASFromOtherCatalog() throws IOException {
+testTables2.createTable(shell, "source", 
HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+fileFormat2, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+shell.executeStatement(String.format(
+"CREATE TABLE target STORED BY '%s' TBLPROPERTIES ('%s'='%s') AS 
SELECT * FROM source",
+HiveIcebergStorageHandler.class.getName(),
+InputFormatConfig.CATALOG_NAME, HIVECATALOGNAME));
+
+List objects = shell.executeStatement("SELECT * FROM target");
+Assert.assertEquals(3, objects.size());
+
+Table target = testTables1.loadTable(TableIdentifier.of("default", 
"target"));
+HiveIcebergTestUtils.validateData(target, 
HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS, 0);
+  }
+
+  @Test
+  public void testCTASFromOtherCatalogFailureRollback() throws IOException {
+// force an execution error by passing in a committer class that Tez won't 
be able to load
+shell.setHiveSessionValue("hive.tez.mapreduce.output.committer.class", 
"org.apache.NotExistingClass");
+
+TableIdentifier target = TableIdentifier.of("default", "target");
+testTables2.createTable(shell, "source", 
HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+fileFormat2, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+try {
+  shell.executeStatement(String.format(
+  "CREATE TABLE target STORED BY '%s' TBLPROPERTIES ('%s'='%s') AS 
SELECT * FROM source",
+  HiveIcebergStorageHandler.class.getName(),
+  InputFormatConfig.CATALOG_NAME, HIVECATALOGNAME));
+} catch (Exception e) {
+  // expected error
+}
+
+// CTAS table should have been dropped by the lifecycle hook
+Assert.assertThrows(NoSuchTableException.class, () -> 
testTables1.loadTable(target));
+  }
+
   private void createAndAddRecords(TestTables testTables, FileFormat 
fileFormat, TableIdentifier identifier,

Review comment:
   We might want to check if we can move this method to TestTables. We have 
several variations of `createTable` already there.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 599739)
Time Spent: 4.5h  (was: 4h 20m)

> Implement CTAS for Iceberg
> --
>
> Key: HIVE-25034
> URL: https://issues.apache.org/jira/browse/HIVE-25034
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-25109) CBO fails when updating table has constraints defined

2021-05-20 Thread Krisztian Kasa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa resolved HIVE-25109.
---
Resolution: Fixed

Pushed to master. Thanks [~kgyrtkirk] for review.

> CBO fails when updating table has constraints defined
> -
>
> Key: HIVE-25109
> URL: https://issues.apache.org/jira/browse/HIVE-25109
> Project: Hive
>  Issue Type: Bug
>  Components: CBO, Logical Optimizer
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> {code}
> create table acid_uami_n0(i int,
>  de decimal(5,2) constraint nn1 not null enforced,
>  vc varchar(128) constraint ch2 CHECK (de >= cast(i as 
> decimal(5,2))) enforced)
>  clustered by (i) into 2 buckets stored as orc TBLPROPERTIES 
> ('transactional'='true');
> -- update
> explain cbo
> update acid_uami_n0 set de = 893.14 where de = 103.00;
> {code}
> hive.log
> {code}
> 2021-05-13T06:08:05,547 ERROR [061f4d3b-9cbd-464f-80db-f0cd443dc3d7 main] 
> parse.UpdateDeleteSemanticAnalyzer: CBO failed, skipping CBO. 
> org.apache.hadoop.hive.ql.optimizer.calcite.CalciteSemanticException: Result 
> Schema didn't match Optimized Op Tree Schema
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.PlanModifierForASTConv.renameTopLevelSelectInResultSchema(PlanModifierForASTConv.java:217)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.PlanModifierForASTConv.convertOpTree(PlanModifierForASTConv.java:105)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:119)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:1410)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:572)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12488)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:449)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.RewriteSemanticAnalyzer.analyzeInternal(RewriteSemanticAnalyzer.java:67)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:316)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.UpdateDeleteSemanticAnalyzer.reparseAndSuperAnalyze(UpdateDeleteSemanticAnalyzer.java:208)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.UpdateDeleteSemanticAnalyzer.analyzeUpdate(UpdateDeleteSemanticAnalyzer.java:63)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.UpdateDeleteSemanticAnalyzer.analyze(UpdateDeleteSemanticAnalyzer.java:53)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.RewriteSemanticAnalyzer.analyzeInternal(RewriteSemanticAnalyzer.java:72)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:316)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:171)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:316)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:223) 
> [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:104) 
> [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:492) 
> [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:445) 
> [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:409) 
> [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:403) 
> [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.had

[jira] [Work logged] (HIVE-25109) CBO fails when updating table has constraints defined

2021-05-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25109?focusedWorklogId=599729&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599729
 ]

ASF GitHub Bot logged work on HIVE-25109:
-

Author: ASF GitHub Bot
Created on: 20/May/21 10:44
Start Date: 20/May/21 10:44
Worklog Time Spent: 10m 
  Work Description: kasakrisz merged pull request #2268:
URL: https://github.com/apache/hive/pull/2268


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 599729)
Time Spent: 50m  (was: 40m)

> CBO fails when updating table has constraints defined
> -
>
> Key: HIVE-25109
> URL: https://issues.apache.org/jira/browse/HIVE-25109
> Project: Hive
>  Issue Type: Bug
>  Components: CBO, Logical Optimizer
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> {code}
> create table acid_uami_n0(i int,
>  de decimal(5,2) constraint nn1 not null enforced,
>  vc varchar(128) constraint ch2 CHECK (de >= cast(i as 
> decimal(5,2))) enforced)
>  clustered by (i) into 2 buckets stored as orc TBLPROPERTIES 
> ('transactional'='true');
> -- update
> explain cbo
> update acid_uami_n0 set de = 893.14 where de = 103.00;
> {code}
> hive.log
> {code}
> 2021-05-13T06:08:05,547 ERROR [061f4d3b-9cbd-464f-80db-f0cd443dc3d7 main] 
> parse.UpdateDeleteSemanticAnalyzer: CBO failed, skipping CBO. 
> org.apache.hadoop.hive.ql.optimizer.calcite.CalciteSemanticException: Result 
> Schema didn't match Optimized Op Tree Schema
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.PlanModifierForASTConv.renameTopLevelSelectInResultSchema(PlanModifierForASTConv.java:217)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.PlanModifierForASTConv.convertOpTree(PlanModifierForASTConv.java:105)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:119)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:1410)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:572)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12488)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:449)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.RewriteSemanticAnalyzer.analyzeInternal(RewriteSemanticAnalyzer.java:67)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:316)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.UpdateDeleteSemanticAnalyzer.reparseAndSuperAnalyze(UpdateDeleteSemanticAnalyzer.java:208)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.UpdateDeleteSemanticAnalyzer.analyzeUpdate(UpdateDeleteSemanticAnalyzer.java:63)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.UpdateDeleteSemanticAnalyzer.analyze(UpdateDeleteSemanticAnalyzer.java:53)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.RewriteSemanticAnalyzer.analyzeInternal(RewriteSemanticAnalyzer.java:72)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:316)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:171)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:316)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:223) 
> [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Compiler.compile(C

[jira] [Work logged] (HIVE-24936) Fix file name parsing and copy file move.

2021-05-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24936?focusedWorklogId=599722&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599722
 ]

ASF GitHub Bot logged work on HIVE-24936:
-

Author: ASF GitHub Bot
Created on: 20/May/21 10:31
Start Date: 20/May/21 10:31
Worklog Time Spent: 10m 
  Work Description: harishjp commented on a change in pull request #2120:
URL: https://github.com/apache/hive/pull/2120#discussion_r635977406



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
##
@@ -1093,40 +1093,69 @@ public static void rename(FileSystem fs, Path src, Path 
dst) throws IOException,
 }
   }
 
-  private static void moveFile(FileSystem fs, FileStatus file, Path dst) 
throws IOException,
+  private static void moveFileOrDir(FileSystem fs, FileStatus file, Path dst) 
throws IOException,

Review comment:
   Test renameOrMove files calls this, and is the public API. Tests for 
that should cover cases here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 599722)
Time Spent: 1h  (was: 50m)

> Fix file name parsing and copy file move.
> -
>
> Key: HIVE-24936
> URL: https://issues.apache.org/jira/browse/HIVE-24936
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Harish JP
>Assignee: Harish JP
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The taskId and taskAttemptId is not extracted correctly for copy files 
> (1_02_copy_3) and when doing a move file of an incompatible copy file the 
> rename utility generates wrong file names. Ex: 1_02_copy_3 is renamed to 
> 1_02_copy_3_1 if 1_02_copy_3 already exists, ideally it should be 
> 1_02_copy_N.
>  
> Incompatible files should be always renamed using the current task or it can 
> get deleted if the file name conflicts with another task output file. Ex: if 
> the input file name for a task is 5_01 and is incompatible then if we 
> move this file, it will be treated as an output file for task id 5, attempt 1 
> which if exists will try to generate the same file and fail and another 
> attempt will be made. There will be 2 files 5_01, 5_02, the deduping 
> code will remove 5_01 resulting in data loss. There are other scenarios 
> where the same can happen.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24936) Fix file name parsing and copy file move.

2021-05-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24936?focusedWorklogId=599721&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599721
 ]

ASF GitHub Bot logged work on HIVE-24936:
-

Author: ASF GitHub Bot
Created on: 20/May/21 10:28
Start Date: 20/May/21 10:28
Worklog Time Spent: 10m 
  Work Description: harishjp commented on a change in pull request #2120:
URL: https://github.com/apache/hive/pull/2120#discussion_r635975315



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/ParsedOutputFileName.java
##
@@ -0,0 +1,113 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.exec;
+
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+
+/**
+ * Helper class to match hive filenames and extract taskId, taskAttemptId, 
copyIndex.
+ *
+ * Matches following:
+ * 1_02
+ * 1_02.gz
+ * 1_02.zlib.gz
+ * 1_02_copy_1
+ * 1_02_copy_1.gz
+ * 
+ * All the components are here:
+ * tmp_(taskPrefix)1_02_copy_1.zlib.gz
+ */
+public class ParsedOutputFileName {
+  private static final Pattern COPY_FILE_NAME_TO_TASK_ID_REGEX = 
Pattern.compile(
+  "^(.*?)?" + // any prefix
+  "(\\(.*\\))?" + // taskId prefix
+  "([0-9]+)" + // taskId
+  "(?:_([0-9]{1,6}))?" + // _ (limited to 6 digits)

Review comment:
   No idea, this was refactored from the existing pattern which assumed 6 
digits, kept it same.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 599721)
Time Spent: 50m  (was: 40m)

> Fix file name parsing and copy file move.
> -
>
> Key: HIVE-24936
> URL: https://issues.apache.org/jira/browse/HIVE-24936
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Harish JP
>Assignee: Harish JP
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The taskId and taskAttemptId is not extracted correctly for copy files 
> (1_02_copy_3) and when doing a move file of an incompatible copy file the 
> rename utility generates wrong file names. Ex: 1_02_copy_3 is renamed to 
> 1_02_copy_3_1 if 1_02_copy_3 already exists, ideally it should be 
> 1_02_copy_N.
>  
> Incompatible files should be always renamed using the current task or it can 
> get deleted if the file name conflicts with another task output file. Ex: if 
> the input file name for a task is 5_01 and is incompatible then if we 
> move this file, it will be treated as an output file for task id 5, attempt 1 
> which if exists will try to generate the same file and fail and another 
> attempt will be made. There will be 2 files 5_01, 5_02, the deduping 
> code will remove 5_01 resulting in data loss. There are other scenarios 
> where the same can happen.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24936) Fix file name parsing and copy file move.

2021-05-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24936?focusedWorklogId=599720&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599720
 ]

ASF GitHub Bot logged work on HIVE-24936:
-

Author: ASF GitHub Bot
Created on: 20/May/21 10:27
Start Date: 20/May/21 10:27
Worklog Time Spent: 10m 
  Work Description: harishjp commented on a change in pull request #2120:
URL: https://github.com/apache/hive/pull/2120#discussion_r635974959



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/AbstractFileMergeOperator.java
##
@@ -275,39 +275,33 @@ public void closeOp(boolean abort) throws HiveException {
   throw new HiveException("Incompatible files should not happen in MM 
tables.");
 }
 Path destDir = finalPath.getParent();
-Path destPath = destDir;
 // move any incompatible files to final path
 if (incompatFileSet != null && !incompatFileSet.isEmpty()) {
   for (Path incompatFile : incompatFileSet) {
-// check if path conforms to Hive's file name convention. Hive 
expects filenames to be in specific format
+// Hive expects filenames to be in specific format
 // like 00_0, but "LOAD DATA" commands can let you add any 
files to any partitions/tables without
-// renaming. This can cause MoveTask to remove files in some cases 
where MoveTask assumes the files are
-// are generated by speculatively executed tasks.
+// renaming.
+// This can cause a few issues:
+// MoveTask will remove files in some cases where MoveTask assumes 
the files are are generated by
+// speculatively executed tasks.
 // Example: MoveTask thinks the following files are same
 // part-m-0_1417075294718
 // part-m-1_1417075294718
 // Assumes 1417075294718 as taskId and retains only large file 
supposedly generated by speculative execution.
-// This can result in data loss in case of CONCATENATE/merging. 
Filter out files that does not match Hive's
-// filename convention.
-if (!Utilities.isHiveManagedFile(incompatFile)) {
-  // rename un-managed files to conform to Hive's naming standard
-  // Example:
-  // /warehouse/table/part-m-0_1417075294718 will get renamed 
to /warehouse/table/.hive-staging/00_0
-  // If staging directory already contains the file, taskId_copy_N 
naming will be used.
-  final String taskId = Utilities.getTaskId(jc);
-  Path destFilePath = new Path(destDir, new Path(taskId));
-  for (int counter = 1; fs.exists(destFilePath); counter++) {
-destFilePath = new Path(destDir, taskId + 
(Utilities.COPY_KEYWORD + counter));
-  }
-  LOG.warn("Path doesn't conform to Hive's expectation. Renaming 
{} to {}", incompatFile, destFilePath);
-  destPath = destFilePath;
-}
+// This can result in data loss in case of CONCATENATE/merging.
 
+// If filename is consistent with XX_N and another task with 
same task-id runs after this move, then
+// the same file name is used in the other task which will result 
in task failure and retry of task and
+// subsequent removal of this file as duplicate.
+// Example: if the file name is 01_0 and another task runs 
with taskid 01_0, it will fail to create
+// the file and next attempt will create 01_1, both the files 
will be considered as output of same task
+// and only 01_1 will be picked resulting it loss of existing 
file 01_0.
+final String destFileName = Utilities.getTaskId(jc) + 
Utilities.COPY_KEYWORD + 1;

Review comment:
   moveFile does not just move, if file already exists it keeps 
incrementing the copy index until it finds one which does not exist. So no data 
loss.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 599720)
Time Spent: 40m  (was: 0.5h)

> Fix file name parsing and copy file move.
> -
>
> Key: HIVE-24936
> URL: https://issues.apache.org/jira/browse/HIVE-24936
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Harish JP
>Assignee: Harish JP
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The taskId and

[jira] [Work logged] (HIVE-24761) Vectorization: Support PTF - bounded start windows

2021-05-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24761?focusedWorklogId=599704&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599704
 ]

ASF GitHub Bot logged work on HIVE-24761:
-

Author: ASF GitHub Bot
Created on: 20/May/21 09:55
Start Date: 20/May/21 09:55
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on a change in pull request #2099:
URL: https://github.com/apache/hive/pull/2099#discussion_r635952821



##
File path: 
ql/src/gen/vectorization/ExpressionTemplates/ColumnArithmeticColumn.txt
##
@@ -34,20 +34,17 @@ public class  extends VectorExpression {
 
   private static final long serialVersionUID = 1L;
 
-  private final int colNum1;
   private final int colNum2;

Review comment:
   okay, I'm trying to proceed with 2) with your advice, because an array 
is nicer than having fields, not to mention if we're adding convenience methods 
for processing input columns (simple loop vs. mentioning all the fields)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 599704)
Time Spent: 3h 20m  (was: 3h 10m)

> Vectorization: Support PTF - bounded start windows
> --
>
> Key: HIVE-24761
> URL: https://issues.apache.org/jira/browse/HIVE-24761
> Project: Hive
>  Issue Type: Sub-task
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> {code}
>  notVectorizedReason: PTF operator: *** only UNBOUNDED start frame is 
> supported
> {code}
> Currently, bounded windows are not supported in VectorPTFOperator. If we 
> simply remove the check compile-time:
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java#L2911
> {code}
>   if (!windowFrameDef.isStartUnbounded()) {
> setOperatorIssue(functionName + " only UNBOUNDED start frame is 
> supported");
> return false;
>   }
> {code}
> We get incorrect results, that's because vectorized codepath completely 
> ignores boundaries, and simply iterates through all the input batches in 
> [VectorPTFGroupBatches|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/ptf/VectorPTFGroupBatches.java#L172]:
> {code}
> for (VectorPTFEvaluatorBase evaluator : evaluators) {
>   evaluator.evaluateGroupBatch(batch);
>   if (isLastGroupBatch) {
> evaluator.doLastBatchWork();
>   }
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25139) Filter out null table properties in HiveIcebergMetaHook

2021-05-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25139?focusedWorklogId=599693&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599693
 ]

ASF GitHub Bot logged work on HIVE-25139:
-

Author: ASF GitHub Bot
Created on: 20/May/21 09:31
Start Date: 20/May/21 09:31
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2298:
URL: https://github.com/apache/hive/pull/2298#discussion_r635934956



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##
@@ -297,6 +297,9 @@ private void 
updateHmsTableProperties(org.apache.hadoop.hive.metastore.api.Table
   "Table location not set");
 }
 
+// Remove null values from hms table properties
+hmsTable.getParameters().entrySet().removeIf(e -> e.getValue() == null);

Review comment:
   Do we need to remove null keys here too, just like in 
`getCatalogProperties`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 599693)
Time Spent: 0.5h  (was: 20m)

> Filter out null table properties in HiveIcebergMetaHook
> ---
>
> Key: HIVE-25139
> URL: https://issues.apache.org/jira/browse/HIVE-25139
> Project: Hive
>  Issue Type: Bug
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25139) Filter out null table properties in HiveIcebergMetaHook

2021-05-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25139?focusedWorklogId=599689&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599689
 ]

ASF GitHub Bot logged work on HIVE-25139:
-

Author: ASF GitHub Bot
Created on: 20/May/21 09:17
Start Date: 20/May/21 09:17
Worklog Time Spent: 10m 
  Work Description: lcspinter commented on pull request #2298:
URL: https://github.com/apache/hive/pull/2298#issuecomment-844899632


   @marton-bod @pvary @szlta Could you please review this PR? Thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 599689)
Time Spent: 20m  (was: 10m)

> Filter out null table properties in HiveIcebergMetaHook
> ---
>
> Key: HIVE-25139
> URL: https://issues.apache.org/jira/browse/HIVE-25139
> Project: Hive
>  Issue Type: Bug
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25037) Create metric: Number of tables with > x aborts

2021-05-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25037?focusedWorklogId=599682&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599682
 ]

ASF GitHub Bot logged work on HIVE-25037:
-

Author: ASF GitHub Bot
Created on: 20/May/21 08:55
Start Date: 20/May/21 08:55
Worklog Time Spent: 10m 
  Work Description: klcopp merged pull request #2199:
URL: https://github.com/apache/hive/pull/2199


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 599682)
Time Spent: 1h 10m  (was: 1h)

> Create metric: Number of tables with > x aborts
> ---
>
> Key: HIVE-25037
> URL: https://issues.apache.org/jira/browse/HIVE-25037
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Create metric about number of tables with > x aborts.
> x should be settable and default to 1500.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-25037) Create metric: Number of tables with > x aborts

2021-05-20 Thread Karen Coppage (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karen Coppage resolved HIVE-25037.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Committed to master branch. Thanks for your contribution [~asinkovits]!

> Create metric: Number of tables with > x aborts
> ---
>
> Key: HIVE-25037
> URL: https://issues.apache.org/jira/browse/HIVE-25037
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Create metric about number of tables with > x aborts.
> x should be settable and default to 1500.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24936) Fix file name parsing and copy file move.

2021-05-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24936?focusedWorklogId=599680&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599680
 ]

ASF GitHub Bot logged work on HIVE-24936:
-

Author: ASF GitHub Bot
Created on: 20/May/21 08:47
Start Date: 20/May/21 08:47
Worklog Time Spent: 10m 
  Work Description: kishendas commented on a change in pull request #2120:
URL: https://github.com/apache/hive/pull/2120#discussion_r635874792



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/AbstractFileMergeOperator.java
##
@@ -275,39 +275,33 @@ public void closeOp(boolean abort) throws HiveException {
   throw new HiveException("Incompatible files should not happen in MM 
tables.");
 }
 Path destDir = finalPath.getParent();
-Path destPath = destDir;
 // move any incompatible files to final path
 if (incompatFileSet != null && !incompatFileSet.isEmpty()) {
   for (Path incompatFile : incompatFileSet) {
-// check if path conforms to Hive's file name convention. Hive 
expects filenames to be in specific format
+// Hive expects filenames to be in specific format
 // like 00_0, but "LOAD DATA" commands can let you add any 
files to any partitions/tables without
-// renaming. This can cause MoveTask to remove files in some cases 
where MoveTask assumes the files are
-// are generated by speculatively executed tasks.
+// renaming.
+// This can cause a few issues:
+// MoveTask will remove files in some cases where MoveTask assumes 
the files are are generated by

Review comment:
   files are generated by

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/AbstractFileMergeOperator.java
##
@@ -275,39 +275,33 @@ public void closeOp(boolean abort) throws HiveException {
   throw new HiveException("Incompatible files should not happen in MM 
tables.");
 }
 Path destDir = finalPath.getParent();
-Path destPath = destDir;
 // move any incompatible files to final path
 if (incompatFileSet != null && !incompatFileSet.isEmpty()) {
   for (Path incompatFile : incompatFileSet) {
-// check if path conforms to Hive's file name convention. Hive 
expects filenames to be in specific format
+// Hive expects filenames to be in specific format
 // like 00_0, but "LOAD DATA" commands can let you add any 
files to any partitions/tables without
-// renaming. This can cause MoveTask to remove files in some cases 
where MoveTask assumes the files are
-// are generated by speculatively executed tasks.
+// renaming.
+// This can cause a few issues:
+// MoveTask will remove files in some cases where MoveTask assumes 
the files are are generated by
+// speculatively executed tasks.
 // Example: MoveTask thinks the following files are same
 // part-m-0_1417075294718
 // part-m-1_1417075294718
 // Assumes 1417075294718 as taskId and retains only large file 
supposedly generated by speculative execution.
-// This can result in data loss in case of CONCATENATE/merging. 
Filter out files that does not match Hive's
-// filename convention.
-if (!Utilities.isHiveManagedFile(incompatFile)) {
-  // rename un-managed files to conform to Hive's naming standard
-  // Example:
-  // /warehouse/table/part-m-0_1417075294718 will get renamed 
to /warehouse/table/.hive-staging/00_0
-  // If staging directory already contains the file, taskId_copy_N 
naming will be used.
-  final String taskId = Utilities.getTaskId(jc);
-  Path destFilePath = new Path(destDir, new Path(taskId));
-  for (int counter = 1; fs.exists(destFilePath); counter++) {
-destFilePath = new Path(destDir, taskId + 
(Utilities.COPY_KEYWORD + counter));
-  }
-  LOG.warn("Path doesn't conform to Hive's expectation. Renaming 
{} to {}", incompatFile, destFilePath);
-  destPath = destFilePath;
-}
+// This can result in data loss in case of CONCATENATE/merging.
 
+// If filename is consistent with XX_N and another task with 
same task-id runs after this move, then
+// the same file name is used in the other task which will result 
in task failure and retry of task and
+// subsequent removal of this file as duplicate.
+// Example: if the file name is 01_0 and another task runs 
with taskid 01_0, it will fail to create
+// the file and next attempt will create 01_1, both the files 
will be cons

[jira] [Work logged] (HIVE-25142) Rehashing in map join fast hash table causing corruption for large keys

2021-05-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25142?focusedWorklogId=599679&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599679
 ]

ASF GitHub Bot logged work on HIVE-25142:
-

Author: ASF GitHub Bot
Created on: 20/May/21 08:47
Start Date: 20/May/21 08:47
Worklog Time Spent: 10m 
  Work Description: pgaref commented on a change in pull request #2300:
URL: https://github.com/apache/hive/pull/2300#discussion_r635901296



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/fast/VectorMapJoinFastBytesHashKeyRef.java
##
@@ -69,12 +69,14 @@ public static int calculateHashCode(long refWord, 
WriteBuffers writeBuffers,
 
   // And, if current value is big we must read it.
   actualKeyLength = writeBuffers.readVInt(readPos);
-  keyAbsoluteOffset = absoluteOffset + actualKeyLength;
+
+  // Now the read position is set to start of the key as readVInt moved the
+  // position by size of key length.
+  return writeBuffers.hashCode(actualKeyLength, readPos);

Review comment:
   I guess the fact that we have perform a read to get the actual KeyLen is 
making things more complex here. 
   Shall we add some comments on them method level for future ref?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 599679)
Time Spent: 0.5h  (was: 20m)

> Rehashing in map join fast hash table  causing corruption for large keys
> 
>
> Key: HIVE-25142
> URL: https://issues.apache.org/jira/browse/HIVE-25142
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> In map join the hash table is created using the keys. To support rehashing, 
> the keys are stored in write buffer. The hash table contains the offset of 
> the keys along with the hash code. When rehashing is done, the offset is 
> extracted from the hash table and then hash code is generated again. For 
> large keys of size greater than 255, the key length is also stored along with 
> the key. In case of fast hash table implementation the way key is extracted 
> is not proper. There is a code bug and thats causing the wrong key to be 
> extracted and causing wrong hash code generation. This is causing the 
> corruption in the hash table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25142) Rehashing in map join fast hash table causing corruption for large keys

2021-05-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25142?focusedWorklogId=599676&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599676
 ]

ASF GitHub Bot logged work on HIVE-25142:
-

Author: ASF GitHub Bot
Created on: 20/May/21 08:35
Start Date: 20/May/21 08:35
Worklog Time Spent: 10m 
  Work Description: pgaref commented on a change in pull request #2300:
URL: https://github.com/apache/hive/pull/2300#discussion_r635892282



##
File path: serde/src/java/org/apache/hadoop/hive/serde2/WriteBuffers.java
##
@@ -149,7 +149,12 @@ public int unsafeHashCode(long offset, int length) {
   }
 
   public int hashCode(long offset, int length, Position readPos) {
+// If caller has not set the read position, then set it.
 setReadPoint(offset, readPos);
+return hashCode(length, readPos);

Review comment:
   Shall we move the positioning code to the unsafeHashCode method and 
remove this method? Looks like the only place its needed




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 599676)
Time Spent: 20m  (was: 10m)

> Rehashing in map join fast hash table  causing corruption for large keys
> 
>
> Key: HIVE-25142
> URL: https://issues.apache.org/jira/browse/HIVE-25142
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In map join the hash table is created using the keys. To support rehashing, 
> the keys are stored in write buffer. The hash table contains the offset of 
> the keys along with the hash code. When rehashing is done, the offset is 
> extracted from the hash table and then hash code is generated again. For 
> large keys of size greater than 255, the key length is also stored along with 
> the key. In case of fast hash table implementation the way key is extracted 
> is not proper. There is a code bug and thats causing the wrong key to be 
> extracted and causing wrong hash code generation. This is causing the 
> corruption in the hash table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24911) Metastore: Create index on SDS.CD_ID for Postgres

2021-05-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24911?focusedWorklogId=599672&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599672
 ]

ASF GitHub Bot logged work on HIVE-24911:
-

Author: ASF GitHub Bot
Created on: 20/May/21 08:17
Start Date: 20/May/21 08:17
Worklog Time Spent: 10m 
  Work Description: zeroflag commented on pull request #2090:
URL: https://github.com/apache/hive/pull/2090#issuecomment-844848865


   LGTM


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 599672)
Time Spent: 20m  (was: 10m)

> Metastore: Create index on SDS.CD_ID for Postgres
> -
>
> Key: HIVE-24911
> URL: https://issues.apache.org/jira/browse/HIVE-24911
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
> Attachments: command-output.txt
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> While investigating HIVE-24870, we found that during a long incremental 
> replication, an SDS.CD_ID can improve the performance.
> It was tested by postgres like below:
> {code}
> CREATE INDEX IF NOT EXISTS "SDS_N50" ON "SDS" USING btree ("CD_ID");
> EXPLAIN (ANALYZE,BUFFERS,TIMING) select count(*) from "SDS" where 
> "CD_ID"=THE_MOST_FREQUENTLY_USED_CD_ID_HERE;
> DROP INDEX IF EXISTS "SDS_N50";
> EXPLAIN (ANALYZE,BUFFERS,TIMING) select count(*) from "SDS" where 
> "CD_ID"=THE_MOST_FREQUENTLY_USED_CD_ID_HERE;
> {code}
> Further results can be found in:  [^command-output.txt] 
> After some investigation, I found that this index is also part of the schemas 
> for a very long time:
> orcale: HIVE-2928
> mysql: HIVE-2246
> mssql: HIVE-6862 (or earlier)
> ...except Postgres.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25130) alter table concat gives NullPointerException, when data is inserted from Spark

2021-05-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25130?focusedWorklogId=599660&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599660
 ]

ASF GitHub Bot logged work on HIVE-25130:
-

Author: ASF GitHub Bot
Created on: 20/May/21 08:00
Start Date: 20/May/21 08:00
Worklog Time Spent: 10m 
  Work Description: kishendas commented on a change in pull request #2285:
URL: https://github.com/apache/hive/pull/2285#discussion_r635866315



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
##
@@ -1252,32 +1253,45 @@ public static String getTaskIdFromFilename(String 
filename) {
* @param filename
*  filename to extract taskid from
*/
-  private static String getPrefixedTaskIdFromFilename(String filename) {
+  static String getPrefixedTaskIdFromFilename(String filename) {
 return getTaskIdFromFilename(filename, FILE_NAME_PREFIXED_TASK_ID_REGEX);
   }
 
   private static String getTaskIdFromFilename(String filename, Pattern 
pattern) {
-return getIdFromFilename(filename, pattern, 1);
+return getIdFromFilename(filename, pattern, 1, false);
   }
 
-  private static int getAttemptIdFromFilename(String filename) {
-String attemptStr = getIdFromFilename(filename, 
FILE_NAME_PREFIXED_TASK_ID_REGEX, 3);
+  static int getAttemptIdFromFilename(String filename) {
+String attemptStr = getIdFromFilename(filename, 
FILE_NAME_PREFIXED_TASK_ID_REGEX, 3, true);
 return Integer.parseInt(attemptStr.substring(1));
   }
 
-  private static String getIdFromFilename(String filename, Pattern pattern, 
int group) {
+  private static String getIdFromFilename(String filename, Pattern pattern, 
int group, boolean extractAttemptId) {
 String taskId = filename;
 int dirEnd = filename.lastIndexOf(Path.SEPARATOR);
-if (dirEnd != -1) {
+if (dirEnd!=-1) {
   taskId = filename.substring(dirEnd + 1);
 }
 
-Matcher m = pattern.matcher(taskId);
-if (!m.matches()) {
-  LOG.warn("Unable to get task id from file name: {}. Using last component 
{}"
-  + " as task id.", filename, taskId);
+// Spark emitted files have the format 
part-[number-string]-uuid..
+// Examples: 
part-00026-23003837-facb-49ec-b1c4-eeda902cacf3.c000.zlib.orc, 00026-23003837 
is the taskId
+// and part-4-c6acfdee-0c32-492e-b209-c2f1cf40.c000, 
4-c6acfdee is the taskId
+String strings[] = taskId.split("-");

Review comment:
   @harishjp Please review now. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 599660)
Time Spent: 50m  (was: 40m)

> alter table concat gives NullPointerException, when data is inserted from 
> Spark
> ---
>
> Key: HIVE-25130
> URL: https://issues.apache.org/jira/browse/HIVE-25130
> Project: Hive
>  Issue Type: Bug
>Reporter: Kishen Das
>Assignee: Kishen Das
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> This is the complete stack trace of the NullPointerException
> 2021-03-01 14:50:32,201 ERROR org.apache.hadoop.hive.ql.exec.Task: 
> [HiveServer2-Background-Pool: Thread-76760]: Job Commit failed with exception 
> 'java.lang.NullPointerException(null)'
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.getAttemptIdFromFilename(Utilities.java:1333)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.compareTempOrDuplicateFiles(Utilities.java:1966)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.ponderRemovingTempOrDuplicateFile(Utilities.java:1907)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.removeTempOrDuplicateFilesNonMm(Utilities.java:1892)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.removeTempOrDuplicateFiles(Utilities.java:1797)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.removeTempOrDuplicateFiles(Utilities.java:1674)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.mvFileToFinalPath(Utilities.java:1544)
> at 
> org.apache.hadoop.hive.ql.exec.AbstractFileMergeOperator.jobCloseOp(AbstractFileMergeOperator.java:304)
> at org.apache.hadoop.hive.ql.exec.Operator.jobClose(Operator.java:798)
> at org.apache.hadoop.hive.ql.exec.tez.TezTask.close(TezTask.java:637)
> at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:335)
> at 
> org.apache.hadoop.hive.ql.ddl.table.storage.concatenate.AlterTableConcatenateOperation.executeTask(AlterTableConcatenateOperation.java:129)
> at 
> org.apache.hadoop.hive.ql.ddl.table.storage.concaten

[jira] [Work logged] (HIVE-25079) Create new metric about number of writes to tables with manually disabled compaction

2021-05-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25079?focusedWorklogId=599652&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599652
 ]

ASF GitHub Bot logged work on HIVE-25079:
-

Author: ASF GitHub Bot
Created on: 20/May/21 07:37
Start Date: 20/May/21 07:37
Worklog Time Spent: 10m 
  Work Description: asinkovits commented on a change in pull request #2281:
URL: https://github.com/apache/hive/pull/2281#discussion_r635843838



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSMetricsListener.java
##
@@ -86,4 +92,24 @@ public void onAddPartition(AddPartitionEvent partitionEvent) 
throws MetaExceptio
 
Metrics.getOrCreateGauge(MetricsConstants.TOTAL_PARTITIONS).incrementAndGet();
 createdParts.inc();
   }
+
+  @Override
+  public void onAllocWriteId(AllocWriteIdEvent allocWriteIdEvent, Connection 
dbConn, SQLGenerator sqlGenerator) throws MetaException {
+Table table = getTable(allocWriteIdEvent);

Review comment:
   Yeah, if i understand correctly this was already done in a tricky way.
   
https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java#L482
   
   The listener is only added if the metrics registry is initialized, which is 
only done when metrics are enabled.
   Do you think we should explicitly add the check here as well?
   
   
   
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 599652)
Time Spent: 1h  (was: 50m)

> Create new metric about number of writes to tables with manually disabled 
> compaction
> 
>
> Key: HIVE-25079
> URL: https://issues.apache.org/jira/browse/HIVE-25079
> Project: Hive
>  Issue Type: Bug
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Create a new metric that measures the number of writes tables that has 
> compaction turned off manually. It does not matter if the write is committed 
> or aborted (both are bad...)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25079) Create new metric about number of writes to tables with manually disabled compaction

2021-05-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25079?focusedWorklogId=599651&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599651
 ]

ASF GitHub Bot logged work on HIVE-25079:
-

Author: ASF GitHub Bot
Created on: 20/May/21 07:36
Start Date: 20/May/21 07:36
Worklog Time Spent: 10m 
  Work Description: asinkovits commented on a change in pull request #2281:
URL: https://github.com/apache/hive/pull/2281#discussion_r635842924



##
File path: 
ql/src/test/org/apache/hadoop/hive/ql/txn/compactor/TestCompactionMetrics.java
##
@@ -687,6 +689,26 @@ static boolean equivalent(Map lhs, 
Map rhs) {
 return value.isEmpty()? Collections.emptyMap() : 
Splitter.on(',').withKeyValueSeparator("->").split(value);
   }
 
+  @Test
+  public void textWritesToDisabledCompactionTable() throws Exception {
+MetastoreConf.setVar(conf, 
MetastoreConf.ConfVars.TRANSACTIONAL_EVENT_LISTENERS, 
"org.apache.hadoop.hive.metastore.HMSMetricsListener");

Review comment:
   fixed




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 599651)
Time Spent: 50m  (was: 40m)

> Create new metric about number of writes to tables with manually disabled 
> compaction
> 
>
> Key: HIVE-25079
> URL: https://issues.apache.org/jira/browse/HIVE-25079
> Project: Hive
>  Issue Type: Bug
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Create a new metric that measures the number of writes tables that has 
> compaction turned off manually. It does not matter if the write is committed 
> or aborted (both are bad...)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25079) Create new metric about number of writes to tables with manually disabled compaction

2021-05-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25079?focusedWorklogId=599650&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599650
 ]

ASF GitHub Bot logged work on HIVE-25079:
-

Author: ASF GitHub Bot
Created on: 20/May/21 07:36
Start Date: 20/May/21 07:36
Worklog Time Spent: 10m 
  Work Description: asinkovits commented on a change in pull request #2281:
URL: https://github.com/apache/hive/pull/2281#discussion_r635842723



##
File path: 
ql/src/test/org/apache/hadoop/hive/ql/txn/compactor/TestCompactionMetrics.java
##
@@ -687,6 +689,26 @@ static boolean equivalent(Map lhs, 
Map rhs) {
 return value.isEmpty()? Collections.emptyMap() : 
Splitter.on(',').withKeyValueSeparator("->").split(value);
   }
 
+  @Test
+  public void textWritesToDisabledCompactionTable() throws Exception {

Review comment:
   nice :D
   fixed




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 599650)
Time Spent: 40m  (was: 0.5h)

> Create new metric about number of writes to tables with manually disabled 
> compaction
> 
>
> Key: HIVE-25079
> URL: https://issues.apache.org/jira/browse/HIVE-25079
> Project: Hive
>  Issue Type: Bug
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Create a new metric that measures the number of writes tables that has 
> compaction turned off manually. It does not matter if the write is committed 
> or aborted (both are bad...)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)