[jira] [Work logged] (HIVE-24881) Abort old open replication txns

2021-03-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24881?focusedWorklogId=568774=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-568774
 ]

ASF GitHub Bot logged work on HIVE-24881:
-

Author: ASF GitHub Bot
Created on: 19/Mar/21 04:22
Start Date: 19/Mar/21 04:22
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #2068:
URL: https://github.com/apache/hive/pull/2068#discussion_r597397758



##
File path: 
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java
##
@@ -1031,6 +1031,9 @@ public static ConfVars getMetaConf(String name) {
 REPL_METRICS_MAX_AGE("metastore.repl.metrics.max.age",
   "hive.metastore.repl.metrics.max.age", 7, TimeUnit.DAYS,
   "Maximal age of a replication metrics entry before it is removed."),
+REPL_TXN_TIMEOUT("metastore.repl.txn.timeout", "hive.repl.txn.timeout", 1, 
TimeUnit.DAYS,

Review comment:
   The default value of this should be a very high value.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 568774)
Time Spent: 1h 50m  (was: 1h 40m)

> Abort old open replication txns
> ---
>
> Key: HIVE-24881
> URL: https://issues.apache.org/jira/browse/HIVE-24881
> Project: Hive
>  Issue Type: Bug
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> We should auto-abort/remove open replication txns that are older than a time 
> threshold (default: 24h).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24396) [New Feature] Add data connector support for remote datasources

2021-03-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24396?focusedWorklogId=568769=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-568769
 ]

ASF GitHub Bot logged work on HIVE-24396:
-

Author: ASF GitHub Bot
Created on: 19/Mar/21 04:02
Start Date: 19/Mar/21 04:02
Worklog Time Spent: 10m 
  Work Description: nrg4878 commented on a change in pull request #2037:
URL: https://github.com/apache/hive/pull/2037#discussion_r597392445



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/ddl/database/desc/DescDatabaseFormatter.java
##
@@ -69,6 +71,12 @@ void showDatabaseDescription(DataOutputStream out, String 
database, String comme
   if (ownerType != null) {
 builder.put("ownerType", ownerType.name());
   }
+  if (null != connectorName) {

Review comment:
   the other style is old code that already existed where as I added 2 
clauses with different style. I will make them consistent.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 568769)
Time Spent: 2.5h  (was: 2h 20m)

> [New Feature] Add data connector support for remote datasources
> ---
>
> Key: HIVE-24396
> URL: https://issues.apache.org/jira/browse/HIVE-24396
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> This feature work is to be able to support in Hive Metastore to be able to 
> configure data connectors for remote datasources and map databases. We 
> currently have support for remote tables via StorageHandlers like 
> JDBCStorageHandler and HBaseStorageHandler.
> Data connectors are a natural extension to this where we can map an entire 
> database or catalogs instead of individual tables. The tables within are 
> automagically mapped at runtime. The metadata for these tables are not 
> persisted in Hive. They are always mapped and built at runtime. 
> With this feature, we introduce a concept of type for Databases in Hive. 
> NATIVE vs REMOTE. All current databases are NATIVE. To create a REMOTE 
> database, the following syntax is to be used
> CREATE REMOTE DATABASE remote_db USING  WITH DCPROPERTIES 
> ();
> Will attach a design doc to this jira. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24396) [New Feature] Add data connector support for remote datasources

2021-03-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24396?focusedWorklogId=568768=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-568768
 ]

ASF GitHub Bot logged work on HIVE-24396:
-

Author: ASF GitHub Bot
Created on: 19/Mar/21 04:00
Start Date: 19/Mar/21 04:00
Worklog Time Spent: 10m 
  Work Description: nrg4878 commented on a change in pull request #2037:
URL: https://github.com/apache/hive/pull/2037#discussion_r597392174



##
File path: parser/src/java/org/apache/hadoop/hive/ql/parse/CreateDDLParser.g
##
@@ -108,3 +108,45 @@ createTableStatement
  selectStatementWithCTE?
 )
 ;
+
+createDataConnectorStatement
+@init { gParent.pushMsg("create connector statement", state); }
+@after { gParent.popMsg(state); }
+: KW_CREATE KW_DATACONNECTOR ifNotExists? name=identifier 
dataConnectorType dataConnectorUrl dataConnectorComment? ( KW_WITH 
KW_DCPROPERTIES dcprops=dcProperties)?
+-> ^(TOK_CREATEDATACONNECTOR $name ifNotExists? dataConnectorType 
dataConnectorUrl dataConnectorComment? $dcprops?)
+;
+
+dataConnectorComment
+@init { gParent.pushMsg("dataconnector comment", state); }
+@after { gParent.popMsg(state); }
+: KW_COMMENT comment=StringLiteral
+-> ^(TOK_DATACONNECTORCOMMENT $comment)
+;
+
+dataConnectorUrl
+@init { gParent.pushMsg("dataconnector URL", state); }
+@after { gParent.popMsg(state); }
+: KW_URL url=StringLiteral
+-> ^(TOK_DATACONNECTORURL $url)
+;
+
+dataConnectorType
+@init { gParent.pushMsg("dataconnector type", state); }
+@after { gParent.popMsg(state); }
+: KW_TYPE dcType=StringLiteral
+-> ^(TOK_DATACONNECTORTYPE $dcType)
+;
+
+dcProperties
+@init { gParent.pushMsg("dcproperties", state); }
+@after { gParent.popMsg(state); }
+:
+  LPAREN dbPropertiesList RPAREN -> ^(TOK_DATACONNECTORPROPERTIES 
dbPropertiesList)
+;
+
+dropDataConnectorStatement
+@init { gParent.pushMsg("drop connector statement", state); }
+@after { gParent.popMsg(state); }
+: KW_DROP (KW_DATACONNECTOR) ifExists? identifier

Review comment:
   () shouldn't be needed. I will make a note of this and remove this in a 
sub-sequent patch as I will have to re-run tests.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 568768)
Time Spent: 2h 20m  (was: 2h 10m)

> [New Feature] Add data connector support for remote datasources
> ---
>
> Key: HIVE-24396
> URL: https://issues.apache.org/jira/browse/HIVE-24396
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> This feature work is to be able to support in Hive Metastore to be able to 
> configure data connectors for remote datasources and map databases. We 
> currently have support for remote tables via StorageHandlers like 
> JDBCStorageHandler and HBaseStorageHandler.
> Data connectors are a natural extension to this where we can map an entire 
> database or catalogs instead of individual tables. The tables within are 
> automagically mapped at runtime. The metadata for these tables are not 
> persisted in Hive. They are always mapped and built at runtime. 
> With this feature, we introduce a concept of type for Databases in Hive. 
> NATIVE vs REMOTE. All current databases are NATIVE. To create a REMOTE 
> database, the following syntax is to be used
> CREATE REMOTE DATABASE remote_db USING  WITH DCPROPERTIES 
> ();
> Will attach a design doc to this jira. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24396) [New Feature] Add data connector support for remote datasources

2021-03-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24396?focusedWorklogId=568766=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-568766
 ]

ASF GitHub Bot logged work on HIVE-24396:
-

Author: ASF GitHub Bot
Created on: 19/Mar/21 03:57
Start Date: 19/Mar/21 03:57
Worklog Time Spent: 10m 
  Work Description: nrg4878 commented on a change in pull request #2037:
URL: https://github.com/apache/hive/pull/2037#discussion_r597391176



##
File path: parser/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g
##
@@ -1104,14 +1125,16 @@ orReplace
 createDatabaseStatement
 @init { pushMsg("create database statement", state); }
 @after { popMsg(state); }
-: KW_CREATE (KW_DATABASE|KW_SCHEMA)
+: KW_CREATE (remote=KW_REMOTE)? (KW_DATABASE|KW_SCHEMA)
 ifNotExists?
 name=identifier
 databaseComment?
 dbLocation?
 dbManagedLocation?
+dbConnectorName?
 (KW_WITH KW_DBPROPERTIES dbprops=dbProperties)?
--> ^(TOK_CREATEDATABASE $name ifNotExists? dbLocation? dbManagedLocation? 
databaseComment? $dbprops?)
+-> {$remote != null}? ^(TOK_CREATEDATABASE $name ifNotExists? 
databaseComment? $dbprops? dbConnectorName?)

Review comment:
   so connector encapsulates the properties related to the connection 
(username/password, urls, connection pool sizes etc). But a single connector 
can be used for various REMOTE databases. So anything specific to that database 
will need to be in the DBPROPERTIES. A DB only uses the connector properties to 
establish the connection, it does not automatically inherit any of them.
   
   For a REMOTE database, both location and managedLocation are meaningless 
because these point to the default location for where the table database is to 
be stored. For such DBs, table data is hosted remotely.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 568766)
Time Spent: 2h 10m  (was: 2h)

> [New Feature] Add data connector support for remote datasources
> ---
>
> Key: HIVE-24396
> URL: https://issues.apache.org/jira/browse/HIVE-24396
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> This feature work is to be able to support in Hive Metastore to be able to 
> configure data connectors for remote datasources and map databases. We 
> currently have support for remote tables via StorageHandlers like 
> JDBCStorageHandler and HBaseStorageHandler.
> Data connectors are a natural extension to this where we can map an entire 
> database or catalogs instead of individual tables. The tables within are 
> automagically mapped at runtime. The metadata for these tables are not 
> persisted in Hive. They are always mapped and built at runtime. 
> With this feature, we introduce a concept of type for Databases in Hive. 
> NATIVE vs REMOTE. All current databases are NATIVE. To create a REMOTE 
> database, the following syntax is to be used
> CREATE REMOTE DATABASE remote_db USING  WITH DCPROPERTIES 
> ();
> Will attach a design doc to this jira. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24396) [New Feature] Add data connector support for remote datasources

2021-03-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24396?focusedWorklogId=568763=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-568763
 ]

ASF GitHub Bot logged work on HIVE-24396:
-

Author: ASF GitHub Bot
Created on: 19/Mar/21 03:53
Start Date: 19/Mar/21 03:53
Worklog Time Spent: 10m 
  Work Description: nrg4878 commented on a change in pull request #2037:
URL: https://github.com/apache/hive/pull/2037#discussion_r597390099



##
File path: parser/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g
##
@@ -1041,6 +1060,8 @@ ddlStatement
 | abortTransactionStatement
 | killQueryStatement
 | resourcePlanDdlStatements
+| createDataConnectorStatement

Review comment:
   alterDataConnectorStatementSuffix  is one of the syntax form of the 
alterStatement ddl statement which is included in the ddlStatement block. So it 
is indirectly included.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 568763)
Time Spent: 2h  (was: 1h 50m)

> [New Feature] Add data connector support for remote datasources
> ---
>
> Key: HIVE-24396
> URL: https://issues.apache.org/jira/browse/HIVE-24396
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> This feature work is to be able to support in Hive Metastore to be able to 
> configure data connectors for remote datasources and map databases. We 
> currently have support for remote tables via StorageHandlers like 
> JDBCStorageHandler and HBaseStorageHandler.
> Data connectors are a natural extension to this where we can map an entire 
> database or catalogs instead of individual tables. The tables within are 
> automagically mapped at runtime. The metadata for these tables are not 
> persisted in Hive. They are always mapped and built at runtime. 
> With this feature, we introduce a concept of type for Databases in Hive. 
> NATIVE vs REMOTE. All current databases are NATIVE. To create a REMOTE 
> database, the following syntax is to be used
> CREATE REMOTE DATABASE remote_db USING  WITH DCPROPERTIES 
> ();
> Will attach a design doc to this jira. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24396) [New Feature] Add data connector support for remote datasources

2021-03-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24396?focusedWorklogId=568759=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-568759
 ]

ASF GitHub Bot logged work on HIVE-24396:
-

Author: ASF GitHub Bot
Created on: 19/Mar/21 03:47
Start Date: 19/Mar/21 03:47
Worklog Time Spent: 10m 
  Work Description: nrg4878 commented on a change in pull request #2037:
URL: https://github.com/apache/hive/pull/2037#discussion_r597388484



##
File path: parser/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g
##
@@ -355,6 +364,11 @@ TOK_ALTERDATABASE_PROPERTIES;
 TOK_ALTERDATABASE_OWNER;
 TOK_ALTERDATABASE_LOCATION;
 TOK_ALTERDATABASE_MANAGEDLOCATION;
+TOK_DATACONNECTORPROPERTIES;

Review comment:
   fixed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 568759)
Time Spent: 1h 50m  (was: 1h 40m)

> [New Feature] Add data connector support for remote datasources
> ---
>
> Key: HIVE-24396
> URL: https://issues.apache.org/jira/browse/HIVE-24396
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> This feature work is to be able to support in Hive Metastore to be able to 
> configure data connectors for remote datasources and map databases. We 
> currently have support for remote tables via StorageHandlers like 
> JDBCStorageHandler and HBaseStorageHandler.
> Data connectors are a natural extension to this where we can map an entire 
> database or catalogs instead of individual tables. The tables within are 
> automagically mapped at runtime. The metadata for these tables are not 
> persisted in Hive. They are always mapped and built at runtime. 
> With this feature, we introduce a concept of type for Databases in Hive. 
> NATIVE vs REMOTE. All current databases are NATIVE. To create a REMOTE 
> database, the following syntax is to be used
> CREATE REMOTE DATABASE remote_db USING  WITH DCPROPERTIES 
> ();
> Will attach a design doc to this jira. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24396) [New Feature] Add data connector support for remote datasources

2021-03-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24396?focusedWorklogId=568757=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-568757
 ]

ASF GitHub Bot logged work on HIVE-24396:
-

Author: ASF GitHub Bot
Created on: 19/Mar/21 03:45
Start Date: 19/Mar/21 03:45
Worklog Time Spent: 10m 
  Work Description: nrg4878 commented on a change in pull request #2037:
URL: https://github.com/apache/hive/pull/2037#discussion_r597387878



##
File path: parser/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g
##
@@ -1104,14 +1125,16 @@ orReplace
 createDatabaseStatement
 @init { pushMsg("create database statement", state); }
 @after { popMsg(state); }
-: KW_CREATE (KW_DATABASE|KW_SCHEMA)
+: KW_CREATE (remote=KW_REMOTE)? (KW_DATABASE|KW_SCHEMA)
 ifNotExists?
 name=identifier
 databaseComment?
 dbLocation?
 dbManagedLocation?
+dbConnectorName?
 (KW_WITH KW_DBPROPERTIES dbprops=dbProperties)?

Review comment:
   DBPROPS can be used with both NATIVE and REMOTE dbtypes.
   
   I already explained one usecase below, where the name of remote db could be 
different from local dbname. This is where the mapping can be specified.
   In general, users can insert any other custom properties they like, 
something like a description for the database in the DBPROPERTIES.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 568757)
Time Spent: 1h 40m  (was: 1.5h)

> [New Feature] Add data connector support for remote datasources
> ---
>
> Key: HIVE-24396
> URL: https://issues.apache.org/jira/browse/HIVE-24396
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> This feature work is to be able to support in Hive Metastore to be able to 
> configure data connectors for remote datasources and map databases. We 
> currently have support for remote tables via StorageHandlers like 
> JDBCStorageHandler and HBaseStorageHandler.
> Data connectors are a natural extension to this where we can map an entire 
> database or catalogs instead of individual tables. The tables within are 
> automagically mapped at runtime. The metadata for these tables are not 
> persisted in Hive. They are always mapped and built at runtime. 
> With this feature, we introduce a concept of type for Databases in Hive. 
> NATIVE vs REMOTE. All current databases are NATIVE. To create a REMOTE 
> database, the following syntax is to be used
> CREATE REMOTE DATABASE remote_db USING  WITH DCPROPERTIES 
> ();
> Will attach a design doc to this jira. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24396) [New Feature] Add data connector support for remote datasources

2021-03-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24396?focusedWorklogId=568754=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-568754
 ]

ASF GitHub Bot logged work on HIVE-24396:
-

Author: ASF GitHub Bot
Created on: 19/Mar/21 03:39
Start Date: 19/Mar/21 03:39
Worklog Time Spent: 10m 
  Work Description: nrg4878 commented on a change in pull request #2037:
URL: https://github.com/apache/hive/pull/2037#discussion_r597386228



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/ddl/database/create/CreateDatabaseAnalyzer.java
##
@@ -70,19 +73,43 @@ public void analyzeInternal(ASTNode root) throws 
SemanticException {
 managedLocationUri = 
unescapeSQLString(childNode.getChild(0).getText());
 outputs.add(toWriteEntity(managedLocationUri));
 break;
+  case HiveParser.TOK_DATACONNECTOR:
+type = "REMOTE";
+// locationUri = "REMOTE_DATABASE"; // TODO
+ASTNode nextNode = (ASTNode) root.getChild(i);
+connectorName = ((ASTNode)nextNode).getChild(0).getText();
+outputs.add(toWriteEntity(connectorName));
+// outputs.remove(toWriteEntity(locationUri));
+if (managedLocationUri != null) {
+  outputs.remove(toWriteEntity(managedLocationUri));
+  managedLocationUri = null;
+}
+break;
   default:
 throw new SemanticException("Unrecognized token in CREATE DATABASE 
statement");
   }
 }
 
-CreateDatabaseDesc desc = new CreateDatabaseDesc(databaseName, comment, 
locationUri, managedLocationUri,
-ifNotExists, props);
-rootTasks.add(TaskFactory.get(new DDLWork(getInputs(), getOutputs(), 
desc)));
-
+CreateDatabaseDesc desc = null;
 Database database = new Database(databaseName, comment, locationUri, 
props);
-if (managedLocationUri != null) {
-  database.setManagedLocationUri(managedLocationUri);
+if (type.equalsIgnoreCase("NATIVE")) {
+  desc = new CreateDatabaseDesc(databaseName, comment, locationUri, 
managedLocationUri, ifNotExists, props);
+  database.setType(DatabaseType.NATIVE);
+  // database = new Database(databaseName, comment, locationUri, props);
+  if (managedLocationUri != null) {
+database.setManagedLocationUri(managedLocationUri);
+  }
+} else {
+  String remoteDbName = databaseName;
+  if (props != null && props.get("connector.remoteDbName") != null) // 
TODO finalize the property name
+remoteDbName = props.get("connector.remoteDbName");
+  desc = new CreateDatabaseDesc(databaseName, comment, locationUri, null, 
ifNotExists, props, type,

Review comment:
   I am not sure I understand the question. But I can explain this logic.
   For NATIVE DBs, location and optionally managedlocation make sense.
   For REMOTE DBs, neither of them have any significance and the data for the 
tables within this DB is in the remote source. So the 4 lines of code above are 
specific to when a REMOTE DB is being created. For such DBs, users can 
optionally include a "connector.remoteDbName" in the DBPROPERTIES to use map 
the hive DB to a remote DB with a different name than the Hive DB.
   For example,
   create remote database mysql_testdb using  // maps "mysql_testdb" to 
"mysql_testdb" in the remote datasource as well because the create statement 
does not include an alternate name in the DBPROPERTIES.
   
   create remote database mysql_testdb using mysql_connector with DBPROPERTIES 
("connector.remoteDbName"="mydb")
   this maps mysql_testdb to database named "mydb" in the remote datasource.
   
   Hope this helps.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 568754)
Time Spent: 1.5h  (was: 1h 20m)

> [New Feature] Add data connector support for remote datasources
> ---
>
> Key: HIVE-24396
> URL: https://issues.apache.org/jira/browse/HIVE-24396
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> This feature work is to be able to support in Hive Metastore to be able to 
> configure data connectors for remote datasources and map databases. We 
> currently have support for remote tables via StorageHandlers like 
> JDBCStorageHandler and HBaseStorageHandler.
> Data connectors are a natural extension to this where we can map an entire 
> database or 

[jira] [Work logged] (HIVE-24396) [New Feature] Add data connector support for remote datasources

2021-03-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24396?focusedWorklogId=568750=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-568750
 ]

ASF GitHub Bot logged work on HIVE-24396:
-

Author: ASF GitHub Bot
Created on: 19/Mar/21 03:31
Start Date: 19/Mar/21 03:31
Worklog Time Spent: 10m 
  Work Description: nrg4878 commented on a change in pull request #2037:
URL: https://github.com/apache/hive/pull/2037#discussion_r597384004



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/ddl/database/create/CreateDatabaseAnalyzer.java
##
@@ -70,19 +73,43 @@ public void analyzeInternal(ASTNode root) throws 
SemanticException {
 managedLocationUri = 
unescapeSQLString(childNode.getChild(0).getText());
 outputs.add(toWriteEntity(managedLocationUri));
 break;
+  case HiveParser.TOK_DATACONNECTOR:
+type = "REMOTE";
+// locationUri = "REMOTE_DATABASE"; // TODO
+ASTNode nextNode = (ASTNode) root.getChild(i);
+connectorName = ((ASTNode)nextNode).getChild(0).getText();
+outputs.add(toWriteEntity(connectorName));
+// outputs.remove(toWriteEntity(locationUri));

Review comment:
   This commented LOC has also been deleted locally. 
   Essentially, for a DB of type REMOTE, we do not want to honor any location 
or managedlocation values that could have been used in the DDL statement 
(although it might be valid syntactically, its incorrect semantics). So I 
remove them both.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 568750)
Time Spent: 1h 20m  (was: 1h 10m)

> [New Feature] Add data connector support for remote datasources
> ---
>
> Key: HIVE-24396
> URL: https://issues.apache.org/jira/browse/HIVE-24396
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> This feature work is to be able to support in Hive Metastore to be able to 
> configure data connectors for remote datasources and map databases. We 
> currently have support for remote tables via StorageHandlers like 
> JDBCStorageHandler and HBaseStorageHandler.
> Data connectors are a natural extension to this where we can map an entire 
> database or catalogs instead of individual tables. The tables within are 
> automagically mapped at runtime. The metadata for these tables are not 
> persisted in Hive. They are always mapped and built at runtime. 
> With this feature, we introduce a concept of type for Databases in Hive. 
> NATIVE vs REMOTE. All current databases are NATIVE. To create a REMOTE 
> database, the following syntax is to be used
> CREATE REMOTE DATABASE remote_db USING  WITH DCPROPERTIES 
> ();
> Will attach a design doc to this jira. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24201) WorkloadManager kills query being moved to different pool if destination pool does not have enough sessions

2021-03-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24201?focusedWorklogId=568748=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-568748
 ]

ASF GitHub Bot logged work on HIVE-24201:
-

Author: ASF GitHub Bot
Created on: 19/Mar/21 03:28
Start Date: 19/Mar/21 03:28
Worklog Time Spent: 10m 
  Work Description: Dawn2111 commented on pull request #2065:
URL: https://github.com/apache/hive/pull/2065#issuecomment-802520264


   @sankarh, Could you please review?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 568748)
Time Spent: 50m  (was: 40m)

> WorkloadManager kills query being moved to different pool if destination pool 
> does not have enough sessions
> ---
>
> Key: HIVE-24201
> URL: https://issues.apache.org/jira/browse/HIVE-24201
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2, llap
>Affects Versions: 4.0.0
>Reporter: Adesh Kumar Rao
>Assignee: Pritha Dawn
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> To reproduce, create a resource plan with move trigger, like below:
> {code:java}
> ++
> |line|
> ++
> | experiment[status=DISABLED,parallelism=null,defaultPool=default] |
> |  +  default[allocFraction=0.888,schedulingPolicy=null,parallelism=1] |
> |  |  mapped for default |
> |  +  pool2[allocFraction=0.1,schedulingPolicy=fair,parallelism=1] |
> |  |  trigger t1: if (ELAPSED_TIME > 20) { MOVE TO pool1 } |
> |  |  mapped for users: abcd   |
> |  +  pool1[allocFraction=0.012,schedulingPolicy=null,parallelism=1] |
> |  |  mapped for users: efgh   |
>  
> {code}
> Now, run two queries in pool1 and pool2 using different users. The query 
> running in pool2 will tried to move to pool1 and it will get killed because 
> pool1 will not have session to handle the query.
> Currently, the Workload management move trigger kills the query being moved 
> to a different pool if destination pool does not have enough capacity.  We 
> could have a "delayed move" configuration which lets the query run in the 
> source pool as long as possible, if the destination pool is full. It will 
> attempt the move to destination pool only when there is claim upon the source 
> pool. If the destination pool is not full, delayed move behaves as normal 
> move i.e. the move will happen immediately.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24396) [New Feature] Add data connector support for remote datasources

2021-03-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24396?focusedWorklogId=568747=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-568747
 ]

ASF GitHub Bot logged work on HIVE-24396:
-

Author: ASF GitHub Bot
Created on: 19/Mar/21 03:28
Start Date: 19/Mar/21 03:28
Worklog Time Spent: 10m 
  Work Description: nrg4878 commented on a change in pull request #2037:
URL: https://github.com/apache/hive/pull/2037#discussion_r597383142



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/ddl/database/create/CreateDatabaseAnalyzer.java
##
@@ -70,19 +73,43 @@ public void analyzeInternal(ASTNode root) throws 
SemanticException {
 managedLocationUri = 
unescapeSQLString(childNode.getChild(0).getText());
 outputs.add(toWriteEntity(managedLocationUri));
 break;
+  case HiveParser.TOK_DATACONNECTOR:
+type = "REMOTE";
+// locationUri = "REMOTE_DATABASE"; // TODO

Review comment:
   this is a dead comment. I had this deleted in the local set of changes.
   Essentially, the location is a required field on the Database. But this is 
handled in the CreateDatabaseOperation.makeQualifiedPath() where the path of 
the DB is set to what it would be for a NATIVE DB. There is no harm in using 
that path.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 568747)
Time Spent: 1h 10m  (was: 1h)

> [New Feature] Add data connector support for remote datasources
> ---
>
> Key: HIVE-24396
> URL: https://issues.apache.org/jira/browse/HIVE-24396
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> This feature work is to be able to support in Hive Metastore to be able to 
> configure data connectors for remote datasources and map databases. We 
> currently have support for remote tables via StorageHandlers like 
> JDBCStorageHandler and HBaseStorageHandler.
> Data connectors are a natural extension to this where we can map an entire 
> database or catalogs instead of individual tables. The tables within are 
> automagically mapped at runtime. The metadata for these tables are not 
> persisted in Hive. They are always mapped and built at runtime. 
> With this feature, we introduce a concept of type for Databases in Hive. 
> NATIVE vs REMOTE. All current databases are NATIVE. To create a REMOTE 
> database, the following syntax is to be used
> CREATE REMOTE DATABASE remote_db USING  WITH DCPROPERTIES 
> ();
> Will attach a design doc to this jira. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24201) WorkloadManager kills query being moved to different pool if destination pool does not have enough sessions

2021-03-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24201?focusedWorklogId=568746=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-568746
 ]

ASF GitHub Bot logged work on HIVE-24201:
-

Author: ASF GitHub Bot
Created on: 19/Mar/21 03:24
Start Date: 19/Mar/21 03:24
Worklog Time Spent: 10m 
  Work Description: Dawn2111 removed a comment on pull request #2065:
URL: https://github.com/apache/hive/pull/2065#issuecomment-800405747


   @guptanikhil007, could you please review ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 568746)
Time Spent: 40m  (was: 0.5h)

> WorkloadManager kills query being moved to different pool if destination pool 
> does not have enough sessions
> ---
>
> Key: HIVE-24201
> URL: https://issues.apache.org/jira/browse/HIVE-24201
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2, llap
>Affects Versions: 4.0.0
>Reporter: Adesh Kumar Rao
>Assignee: Pritha Dawn
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> To reproduce, create a resource plan with move trigger, like below:
> {code:java}
> ++
> |line|
> ++
> | experiment[status=DISABLED,parallelism=null,defaultPool=default] |
> |  +  default[allocFraction=0.888,schedulingPolicy=null,parallelism=1] |
> |  |  mapped for default |
> |  +  pool2[allocFraction=0.1,schedulingPolicy=fair,parallelism=1] |
> |  |  trigger t1: if (ELAPSED_TIME > 20) { MOVE TO pool1 } |
> |  |  mapped for users: abcd   |
> |  +  pool1[allocFraction=0.012,schedulingPolicy=null,parallelism=1] |
> |  |  mapped for users: efgh   |
>  
> {code}
> Now, run two queries in pool1 and pool2 using different users. The query 
> running in pool2 will tried to move to pool1 and it will get killed because 
> pool1 will not have session to handle the query.
> Currently, the Workload management move trigger kills the query being moved 
> to a different pool if destination pool does not have enough capacity.  We 
> could have a "delayed move" configuration which lets the query run in the 
> source pool as long as possible, if the destination pool is full. It will 
> attempt the move to destination pool only when there is claim upon the source 
> pool. If the destination pool is not full, delayed move behaves as normal 
> move i.e. the move will happen immediately.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24396) [New Feature] Add data connector support for remote datasources

2021-03-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24396?focusedWorklogId=568742=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-568742
 ]

ASF GitHub Bot logged work on HIVE-24396:
-

Author: ASF GitHub Bot
Created on: 19/Mar/21 03:08
Start Date: 19/Mar/21 03:08
Worklog Time Spent: 10m 
  Work Description: nrg4878 commented on a change in pull request #2037:
URL: https://github.com/apache/hive/pull/2037#discussion_r597377766



##
File path: 
itests/hcatalog-unit/src/test/java/org/apache/hive/hcatalog/listener/DummyRawStoreFailEvent.java
##
@@ -244,6 +245,32 @@ public boolean alterDatabase(String catName, String 
dbName, Database db)
 return objectStore.getAllDatabases(catName);
   }
 
+  @Override
+  public List getAllDataConnectors() throws MetaException {

Review comment:
   Fair point. I was mimicing mostly what was done for Databases. For 
example, RawStore.getDatabases() returns a List but getDatabase() 
returns a Database.
   2 wrongs do not make it right. Will fix.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 568742)
Time Spent: 1h  (was: 50m)

> [New Feature] Add data connector support for remote datasources
> ---
>
> Key: HIVE-24396
> URL: https://issues.apache.org/jira/browse/HIVE-24396
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> This feature work is to be able to support in Hive Metastore to be able to 
> configure data connectors for remote datasources and map databases. We 
> currently have support for remote tables via StorageHandlers like 
> JDBCStorageHandler and HBaseStorageHandler.
> Data connectors are a natural extension to this where we can map an entire 
> database or catalogs instead of individual tables. The tables within are 
> automagically mapped at runtime. The metadata for these tables are not 
> persisted in Hive. They are always mapped and built at runtime. 
> With this feature, we introduce a concept of type for Databases in Hive. 
> NATIVE vs REMOTE. All current databases are NATIVE. To create a REMOTE 
> database, the following syntax is to be used
> CREATE REMOTE DATABASE remote_db USING  WITH DCPROPERTIES 
> ();
> Will attach a design doc to this jira. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24396) [New Feature] Add data connector support for remote datasources

2021-03-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24396?focusedWorklogId=568740=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-568740
 ]

ASF GitHub Bot logged work on HIVE-24396:
-

Author: ASF GitHub Bot
Created on: 19/Mar/21 02:59
Start Date: 19/Mar/21 02:59
Worklog Time Spent: 10m 
  Work Description: nrg4878 commented on a change in pull request #2037:
URL: https://github.com/apache/hive/pull/2037#discussion_r597375097



##
File path: common/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java
##
@@ -471,6 +471,9 @@
   AMBIGUOUS_STRUCT_ATTRIBUTE(10423, "Attribute \"{0}\" specified more than 
once in structured type.", true),
   OFFSET_NOT_SUPPORTED_IN_SUBQUERY(10424, "OFFSET is not supported in subquery 
of exists", true),
   WITH_COL_LIST_NUM_OVERFLOW(10425, "WITH-clause query {0} returns {1} 
columns, but {2} labels were specified. The number of column labels must be 
smaller or equal to the number of expressions returned by the query.", true),
+  DATACONNECTOR_ALREADY_EXISTS(10426, "Dataconnector {0} already exists", 
true),
+  DATACONNECTOR_NOT_EXISTS(10427, "Dataconnector does not exist:"),

Review comment:
   fixed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 568740)
Time Spent: 50m  (was: 40m)

> [New Feature] Add data connector support for remote datasources
> ---
>
> Key: HIVE-24396
> URL: https://issues.apache.org/jira/browse/HIVE-24396
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> This feature work is to be able to support in Hive Metastore to be able to 
> configure data connectors for remote datasources and map databases. We 
> currently have support for remote tables via StorageHandlers like 
> JDBCStorageHandler and HBaseStorageHandler.
> Data connectors are a natural extension to this where we can map an entire 
> database or catalogs instead of individual tables. The tables within are 
> automagically mapped at runtime. The metadata for these tables are not 
> persisted in Hive. They are always mapped and built at runtime. 
> With this feature, we introduce a concept of type for Databases in Hive. 
> NATIVE vs REMOTE. All current databases are NATIVE. To create a REMOTE 
> database, the following syntax is to be used
> CREATE REMOTE DATABASE remote_db USING  WITH DCPROPERTIES 
> ();
> Will attach a design doc to this jira. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24901) Re-enable tests in TestBeeLineWithArgs

2021-03-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24901?focusedWorklogId=568725=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-568725
 ]

ASF GitHub Bot logged work on HIVE-24901:
-

Author: ASF GitHub Bot
Created on: 19/Mar/21 01:41
Start Date: 19/Mar/21 01:41
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 opened a new pull request #2087:
URL: https://github.com/apache/hive/pull/2087


   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 568725)
Time Spent: 0.5h  (was: 20m)

> Re-enable tests in TestBeeLineWithArgs
> --
>
> Key: HIVE-24901
> URL: https://issues.apache.org/jira/browse/HIVE-24901
> Project: Hive
>  Issue Type: Sub-task
>  Components: Test
>Reporter: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Re-enable the tests in TestBeeLineWithArgs, cause they are stable on master 
> now:
> http://ci.hive.apache.org/job/hive-flaky-check/219/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24901) Re-enable tests in TestBeeLineWithArgs

2021-03-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24901?focusedWorklogId=568721=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-568721
 ]

ASF GitHub Bot logged work on HIVE-24901:
-

Author: ASF GitHub Bot
Created on: 19/Mar/21 01:36
Start Date: 19/Mar/21 01:36
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 closed pull request #2087:
URL: https://github.com/apache/hive/pull/2087


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 568721)
Time Spent: 20m  (was: 10m)

> Re-enable tests in TestBeeLineWithArgs
> --
>
> Key: HIVE-24901
> URL: https://issues.apache.org/jira/browse/HIVE-24901
> Project: Hive
>  Issue Type: Sub-task
>  Components: Test
>Reporter: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Re-enable the tests in TestBeeLineWithArgs, cause they are stable on master 
> now:
> http://ci.hive.apache.org/job/hive-flaky-check/219/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24902) Incorrect result due to ReduceExpressionsRule

2021-03-18 Thread Nemon Lou (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17304572#comment-17304572
 ] 

Nemon Lou commented on HIVE-24902:
--

Sorry if I offended you. And thanks for your response.
I'm trying to figure out what cause this bug and calcite part is difficult for 
me now.I will dig more.
If it is a common issue , any help from the community is appreciated.

> Incorrect result due to ReduceExpressionsRule
> -
>
> Key: HIVE-24902
> URL: https://issues.apache.org/jira/browse/HIVE-24902
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 3.1.2, 4.0.0
>Reporter: Nemon Lou
>Priority: Major
>
> The following sql returns only one record (20210308)but we expect two(20210308
> 20210309).
> {code:sql}
> select * from (
> select 
>   case when b.a=1
>  then  
>   cast 
> (from_unixtime(unix_timestamp(cast(20210309 as string),'MMdd') - 
> 86400,'MMdd') as bigint)
> else 
> 20210309 
>  end 
> as col
> from 
> (select stack(2,1,2) as (a))
>  as b
> ) t 
> where t.col is not null;
> {code}
> After debuging, i find the ReduceExpressionsRule changes expression in the 
> wrong way.
> Original expression:
> {code:sql}
> IS NOT NULL(CASE(=($0, 1), 
> CAST(FROM_UNIXTIME(-(UNIX_TIMESTAMP(CAST(_UTF-16LE'20210309'):VARCHAR(2147483647)
>  CHARACTER SET "UTF-16LE" COLLATE "ISO-8859-1$en_US$primary", 
> _UTF-16LE'MMdd'), CAST(86400):BIGINT), _UTF-16LE'MMdd')):BIGINT, 
> 20210309))
> {code}
> After reducing expressions:
> {code:sql}
> CASE(=($0, 1), IS NOT 
> NULL(CAST(FROM_UNIXTIME(-(UNIX_TIMESTAMP(CAST(_UTF-16LE'20210309'):VARCHAR(2147483647)
>  CHARACTER SET "UTF-16LE" COLLATE "ISO-8859-1$en_US$primary", 
> _UTF-16LE'MMdd'), CAST(86400):BIGINT), _UTF-16LE'MMdd')):BIGINT), 
> true)
> {code}
> The query plan in main branch:
> {code:sql}
> STAGE DEPENDENCIES:
>   Stage-0 is a root stage
> STAGE PLANS:
>   Stage: Stage-0
> Fetch Operator
>   limit: -1
>   Processor Tree:
> TableScan
>   alias: _dummy_table
>   Row Limit Per Split: 1
>   Statistics: Num rows: 1 Data size: 10 Basic stats: COMPLETE Column 
> stats: COMPLETE
>   Select Operator
> expressions: 2 (type: int), 1 (type: int), 2 (type: int)
> outputColumnNames: _col0, _col1, _col2
> Statistics: Num rows: 1 Data size: 12 Basic stats: COMPLETE 
> Column stats: COMPLETE
> UDTF Operator
>   Statistics: Num rows: 1 Data size: 12 Basic stats: COMPLETE 
> Column stats: COMPLETE
>   function name: stack
>   Filter Operator
> predicate: COALESCE((col0 = 1),false) (type: boolean)
> Statistics: Num rows: 1 Data size: 12 Basic stats: COMPLETE 
> Column stats: COMPLETE
> Select Operator
>   expressions: CASE WHEN ((col0 = 1)) THEN (20210308L) ELSE 
> (20210309L) END (type: bigint)
>   outputColumnNames: _col0
>   Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
> Column stats: COMPLETE
>   ListSink
> Time taken: 0.155 seconds, Fetched: 28 row(s)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24902) Incorrect result due to ReduceExpressionsRule

2021-03-18 Thread Julian Hyde (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17304552#comment-17304552
 ] 

Julian Hyde commented on HIVE-24902:


Sorry, no.

> Incorrect result due to ReduceExpressionsRule
> -
>
> Key: HIVE-24902
> URL: https://issues.apache.org/jira/browse/HIVE-24902
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 3.1.2, 4.0.0
>Reporter: Nemon Lou
>Priority: Major
>
> The following sql returns only one record (20210308)but we expect two(20210308
> 20210309).
> {code:sql}
> select * from (
> select 
>   case when b.a=1
>  then  
>   cast 
> (from_unixtime(unix_timestamp(cast(20210309 as string),'MMdd') - 
> 86400,'MMdd') as bigint)
> else 
> 20210309 
>  end 
> as col
> from 
> (select stack(2,1,2) as (a))
>  as b
> ) t 
> where t.col is not null;
> {code}
> After debuging, i find the ReduceExpressionsRule changes expression in the 
> wrong way.
> Original expression:
> {code:sql}
> IS NOT NULL(CASE(=($0, 1), 
> CAST(FROM_UNIXTIME(-(UNIX_TIMESTAMP(CAST(_UTF-16LE'20210309'):VARCHAR(2147483647)
>  CHARACTER SET "UTF-16LE" COLLATE "ISO-8859-1$en_US$primary", 
> _UTF-16LE'MMdd'), CAST(86400):BIGINT), _UTF-16LE'MMdd')):BIGINT, 
> 20210309))
> {code}
> After reducing expressions:
> {code:sql}
> CASE(=($0, 1), IS NOT 
> NULL(CAST(FROM_UNIXTIME(-(UNIX_TIMESTAMP(CAST(_UTF-16LE'20210309'):VARCHAR(2147483647)
>  CHARACTER SET "UTF-16LE" COLLATE "ISO-8859-1$en_US$primary", 
> _UTF-16LE'MMdd'), CAST(86400):BIGINT), _UTF-16LE'MMdd')):BIGINT), 
> true)
> {code}
> The query plan in main branch:
> {code:sql}
> STAGE DEPENDENCIES:
>   Stage-0 is a root stage
> STAGE PLANS:
>   Stage: Stage-0
> Fetch Operator
>   limit: -1
>   Processor Tree:
> TableScan
>   alias: _dummy_table
>   Row Limit Per Split: 1
>   Statistics: Num rows: 1 Data size: 10 Basic stats: COMPLETE Column 
> stats: COMPLETE
>   Select Operator
> expressions: 2 (type: int), 1 (type: int), 2 (type: int)
> outputColumnNames: _col0, _col1, _col2
> Statistics: Num rows: 1 Data size: 12 Basic stats: COMPLETE 
> Column stats: COMPLETE
> UDTF Operator
>   Statistics: Num rows: 1 Data size: 12 Basic stats: COMPLETE 
> Column stats: COMPLETE
>   function name: stack
>   Filter Operator
> predicate: COALESCE((col0 = 1),false) (type: boolean)
> Statistics: Num rows: 1 Data size: 12 Basic stats: COMPLETE 
> Column stats: COMPLETE
> Select Operator
>   expressions: CASE WHEN ((col0 = 1)) THEN (20210308L) ELSE 
> (20210309L) END (type: bigint)
>   outputColumnNames: _col0
>   Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
> Column stats: COMPLETE
>   ListSink
> Time taken: 0.155 seconds, Fetched: 28 row(s)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24907) Wrong results with LEFT JOIN and subqueries with UNION and GROUP BY

2021-03-18 Thread Sungwoo (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17304540#comment-17304540
 ] 

Sungwoo commented on HIVE-24907:


FYI, when tested with Hive 3.1, the query returns a correct result.

I tested with commit 949ff1c67614d4f50a6231fc0b78ab5d753cbeb9 (Nov 2 13:07:56 
2020) in branch-3.1 using hive.execution.engine=tez and 
hive.execution.mode=container.



> Wrong results with LEFT JOIN and subqueries with UNION and GROUP BY
> ---
>
> Key: HIVE-24907
> URL: https://issues.apache.org/jira/browse/HIVE-24907
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.4.0, 3.2.0, 4.0.0
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>
> The following SQL query returns wrong results when run in TEZ/LLAP:
> {code:sql}
> SET hive.auto.convert.sortmerge.join=true;
> CREATE TABLE tbl (key int,value int);
> INSERT INTO tbl VALUES (1, 2000);
> INSERT INTO tbl VALUES (2, 2001);
> INSERT INTO tbl VALUES (3, 2005);
> SELECT sub1.key, sub2.key
> FROM
>   (SELECT a.key FROM tbl a GROUP BY a.key) sub1
> LEFT OUTER JOIN (
>   SELECT b.key FROM tbl b WHERE b.value = 2001 GROUP BY b.key
>   UNION
>   SELECT c.key FROM tbl c WHERE c.value = 2005 GROUP BY c.key) sub2 
> ON sub1.key = sub2.key;
> {code}
> Actual results:
> ||SUB1.KEY||SUB2.KEY||
> |1|NULL|
> |2|NULL|
> |3|NULL|
> Expected results:
> ||SUB1.KEY||SUB2.KEY||
> |1|NULL|
> |2|2|
> |3|3|
> Tested can be reproduced with {{TestMiniLlapLocalCliDriver}} or 
> {{TestMiniTezCliDriver}} in older versions of Hive.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24315) Improve validation and error handling in HPL/SQL

2021-03-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24315?focusedWorklogId=568673=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-568673
 ]

ASF GitHub Bot logged work on HIVE-24315:
-

Author: ASF GitHub Bot
Created on: 18/Mar/21 23:17
Start Date: 18/Mar/21 23:17
Worklog Time Spent: 10m 
  Work Description: mustafaiman commented on a change in pull request #2059:
URL: https://github.com/apache/hive/pull/2059#discussion_r597255333



##
File path: hplsql/src/main/java/org/apache/hive/hplsql/Exec.java
##
@@ -845,6 +854,13 @@ public Integer init(String[] args) throws Exception {
 return 0;
   }
 
+  private HplsqlParser newParser(CommonTokenStream tokens) {
+HplsqlParser parser = new HplsqlParser(tokens);
+parser.removeErrorListeners();

Review comment:
   This looks unconventional. Can you add a comment explaining why this is 
necessary?

##
File path: hplsql/src/main/java/org/apache/hive/hplsql/Exec.java
##
@@ -2256,24 +2276,26 @@ public Integer visitIdent(HplsqlParser.IdentContext 
ctx) {
   Var var1 = new Var(var);
   var1.negate();
   exec.stackPush(var1);
-}
-else {
+} else {
   exec.stackPush(var);
 }
-  }
-  else {
+  } else {
 exec.stackPush(new Var(ident, Var.Type.STRING, var.toSqlString()));
   }
-}
-else {
-  if (!exec.buildSql && !exec.inCallStmt && 
exec.functions.exec(ident.toUpperCase(), null)) {
-return 0;
+} else {
+  if (exec.buildSql || exec.inCallStmt) {

Review comment:
   I could not understand what is happening here. Can you give some info on 
what this if-else statement is for?

##
File path: hplsql/src/main/java/org/apache/hive/hplsql/Exec.java
##
@@ -974,14 +990,16 @@ void cleanup() {
   void printExceptions() {
 while (!signals.empty()) {
   Signal sig = signals.pop();
-  if (sig.type == Signal.Type.SQLEXCEPTION) {
+  if (sig.type == Signal.Type.VALIDATION) {

Review comment:
   What about "NOTFOUND" and "UNSUPPORTED_OPERATION"? Do we suppress them 
or are they handled somewhere else?

##
File path: hplsql/src/main/java/org/apache/hive/hplsql/Exec.java
##
@@ -501,6 +501,10 @@ public void signal(QueryResult query) {
 signal(Signal.Type.SQLEXCEPTION, query.errorText(), query.exception());
   }
 
+  public void signalHplsql(HplValidationException exception) {

Review comment:
   This looks unused

##
File path: hplsql/src/main/java/org/apache/hive/hplsql/SyntaxErrorReporter.java
##
@@ -0,0 +1,50 @@
+/*
+ *  Licensed to the Apache Software Foundation (ASF) under one
+ *  or more contributor license agreements.  See the NOTICE file
+ *  distributed with this work for additional information
+ *  regarding copyright ownership.  The ASF licenses this file
+ *  to you under the Apache License, Version 2.0 (the
+ *  "License"); you may not use this file except in compliance
+ *  with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *  Unless required by applicable law or agreed to in writing, software
+ *  distributed under the License is distributed on an "AS IS" BASIS,
+ *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ *  See the License for the specific language governing permissions and
+ *  limitations under the License.
+ */
+
+package org.apache.hive.hplsql;
+
+import java.util.BitSet;
+
+import org.antlr.v4.runtime.BaseErrorListener;
+import org.antlr.v4.runtime.Parser;
+import org.antlr.v4.runtime.RecognitionException;
+import org.antlr.v4.runtime.Recognizer;
+import org.antlr.v4.runtime.atn.ATNConfigSet;
+import org.antlr.v4.runtime.dfa.DFA;
+
+public class SyntaxErrorReporter extends BaseErrorListener {
+  private final Console console;
+
+  public SyntaxErrorReporter(Console console) {
+this.console = console;
+  }
+
+  @Override
+  public void syntaxError(Recognizer recognizer, Object offendingSymbol, 
int line, int charPositionInLine, String msg, RecognitionException e) {
+console.printError("Syntax error at line " + line + ":" + 
charPositionInLine + " " + msg);
+  }
+
+  public void reportAmbiguity(Parser recognizer, DFA dfa, int startIndex, int 
stopIndex, boolean exact, BitSet ambigAlts, ATNConfigSet configs) {

Review comment:
   We can remove these three empty overrides.

##
File path: hplsql/src/test/results/local/func_no_return.out.txt
##
@@ -0,0 +1 @@
+Ln:1 identifier 'CREATE' must be declared.

Review comment:
   This is not the correct error.  We most certainly do not need to declare 
'CREATE'.

##
File path: hplsql/src/main/java/org/apache/hive/hplsql/Expression.java
##
@@ -369,21 +380,32 @@ public void operatorSub(HplsqlParser.ExprContext ctx) {
 Var v2 = 

[jira] [Work logged] (HIVE-24900) Failed compaction does not cleanup the directories

2021-03-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24900?focusedWorklogId=568623=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-568623
 ]

ASF GitHub Bot logged work on HIVE-24900:
-

Author: ASF GitHub Bot
Created on: 18/Mar/21 20:25
Start Date: 18/Mar/21 20:25
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #2086:
URL: https://github.com/apache/hive/pull/2086#discussion_r597217616



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Worker.java
##
@@ -545,6 +556,8 @@ protected Boolean findNextCompactionAndExecute(boolean 
computeStats) throws Inte
 
 heartbeater.cancel();
 
+failAfterCompactionIfSetForTest();

Review comment:
   +1




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 568623)
Time Spent: 1h 20m  (was: 1h 10m)

> Failed compaction does not cleanup the directories
> --
>
> Key: HIVE-24900
> URL: https://issues.apache.org/jira/browse/HIVE-24900
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Ramesh Kumar Thangarajan
>Assignee: Ramesh Kumar Thangarajan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Failed compaction does not cleanup the directories



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24900) Failed compaction does not cleanup the directories

2021-03-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24900?focusedWorklogId=568621=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-568621
 ]

ASF GitHub Bot logged work on HIVE-24900:
-

Author: ASF GitHub Bot
Created on: 18/Mar/21 20:16
Start Date: 18/Mar/21 20:16
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #2086:
URL: https://github.com/apache/hive/pull/2086#discussion_r597211444



##
File path: ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands3.java
##
@@ -463,6 +463,48 @@ public void testCompactionAbort() throws Exception {
 runCleaner(hiveConf);
   }
 
+
+  @Test
+  public void testCompactionAbortLeftoverFiles() throws Exception {
+MetastoreConf.setBoolVar(hiveConf, 
MetastoreConf.ConfVars.CREATE_TABLES_AS_ACID, true);
+
+dropTable(new String[] {"T"});
+//note: transaction names T1, T2, etc below, are logical, the actual txnid 
will be different
+runStatementOnDriver("create table T (a int, b int) stored as orc");
+runStatementOnDriver("insert into T values(0,2)");//makes delta_1_1 in T1
+runStatementOnDriver("insert into T values(1,4)");//makes delta_2_2 in T2
+
+//create failed compaction attempt so that compactor txn is aborted
+HiveConf.setBoolVar(hiveConf, 
HiveConf.ConfVars.HIVETESTMODEFAILAFTERCOMPACTION, true);
+runStatementOnDriver("alter table T compact 'major'");

Review comment:
   Please add a test to cover the MINOR compaction use case. 
   Note: current implementation doesn't handle delete deltas. Add an update 
statement in your test. You need to handle both delta and delete_delta 
directories. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 568621)
Time Spent: 1h 10m  (was: 1h)

> Failed compaction does not cleanup the directories
> --
>
> Key: HIVE-24900
> URL: https://issues.apache.org/jira/browse/HIVE-24900
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Ramesh Kumar Thangarajan
>Assignee: Ramesh Kumar Thangarajan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Failed compaction does not cleanup the directories



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24900) Failed compaction does not cleanup the directories

2021-03-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24900?focusedWorklogId=568620=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-568620
 ]

ASF GitHub Bot logged work on HIVE-24900:
-

Author: ASF GitHub Bot
Created on: 18/Mar/21 20:15
Start Date: 18/Mar/21 20:15
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #2086:
URL: https://github.com/apache/hive/pull/2086#discussion_r597211444



##
File path: ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands3.java
##
@@ -463,6 +463,48 @@ public void testCompactionAbort() throws Exception {
 runCleaner(hiveConf);
   }
 
+
+  @Test
+  public void testCompactionAbortLeftoverFiles() throws Exception {
+MetastoreConf.setBoolVar(hiveConf, 
MetastoreConf.ConfVars.CREATE_TABLES_AS_ACID, true);
+
+dropTable(new String[] {"T"});
+//note: transaction names T1, T2, etc below, are logical, the actual txnid 
will be different
+runStatementOnDriver("create table T (a int, b int) stored as orc");
+runStatementOnDriver("insert into T values(0,2)");//makes delta_1_1 in T1
+runStatementOnDriver("insert into T values(1,4)");//makes delta_2_2 in T2
+
+//create failed compaction attempt so that compactor txn is aborted
+HiveConf.setBoolVar(hiveConf, 
HiveConf.ConfVars.HIVETESTMODEFAILAFTERCOMPACTION, true);
+runStatementOnDriver("alter table T compact 'major'");

Review comment:
   Please add a test to cover the MINOR compaction use case. 
   Note: current implementation doesn't handle delete deltas. Add an update 
statement in your test. You need to handle both delete and delete_delta 
directories 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 568620)
Time Spent: 1h  (was: 50m)

> Failed compaction does not cleanup the directories
> --
>
> Key: HIVE-24900
> URL: https://issues.apache.org/jira/browse/HIVE-24900
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Ramesh Kumar Thangarajan
>Assignee: Ramesh Kumar Thangarajan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Failed compaction does not cleanup the directories



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24396) [New Feature] Add data connector support for remote datasources

2021-03-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24396?focusedWorklogId=568613=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-568613
 ]

ASF GitHub Bot logged work on HIVE-24396:
-

Author: ASF GitHub Bot
Created on: 18/Mar/21 19:50
Start Date: 18/Mar/21 19:50
Worklog Time Spent: 10m 
  Work Description: vnhive commented on a change in pull request #2037:
URL: https://github.com/apache/hive/pull/2037#discussion_r596729081



##
File path: common/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java
##
@@ -471,6 +471,9 @@
   AMBIGUOUS_STRUCT_ATTRIBUTE(10423, "Attribute \"{0}\" specified more than 
once in structured type.", true),
   OFFSET_NOT_SUPPORTED_IN_SUBQUERY(10424, "OFFSET is not supported in subquery 
of exists", true),
   WITH_COL_LIST_NUM_OVERFLOW(10425, "WITH-clause query {0} returns {1} 
columns, but {2} labels were specified. The number of column labels must be 
smaller or equal to the number of expressions returned by the query.", true),
+  DATACONNECTOR_ALREADY_EXISTS(10426, "Dataconnector {0} already exists", 
true),
+  DATACONNECTOR_NOT_EXISTS(10427, "Dataconnector does not exist:"),

Review comment:
   This should be Dataconnector {0} does not exist
   
   since the error message is being used with the connector name
   
   throw new HiveException(ErrorMsg.DATACONNECTOR_NOT_EXISTS, 
desc.getConnectorName());

##
File path: 
itests/hcatalog-unit/src/test/java/org/apache/hive/hcatalog/listener/DummyRawStoreFailEvent.java
##
@@ -244,6 +245,32 @@ public boolean alterDatabase(String catName, String 
dbName, Database db)
 return objectStore.getAllDatabases(catName);
   }
 
+  @Override
+  public List getAllDataConnectors() throws MetaException {

Review comment:
   I don't have a strong opinion on this, but just wanted to bring it up 
that,
   
   if a method returns just the list of all the names of connectors, it is 
better to name it getAllDataConnectorNames().
   
   See for example there is a method getDataConnector() that returns a 
DataConnector object. Whenever possible it would be nice to make the semantics 
evident from the nomenclature.

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/ddl/database/desc/DescDatabaseFormatter.java
##
@@ -104,6 +113,14 @@ void showDatabaseDescription(DataOutputStream out, String 
database, String comme
   out.write(ownerType.name().getBytes(StandardCharsets.UTF_8));
 }
 out.write(Utilities.tabCode);
+if (connectorName != null) {

Review comment:
   nit picking : but why  != null here and null !=  
previously ?  Can you please consider sticking to one style ?
   
   Also java.util.Objects.isNull can used to check for null

##
File path: parser/src/java/org/apache/hadoop/hive/ql/parse/CreateDDLParser.g
##
@@ -108,3 +108,45 @@ createTableStatement
  selectStatementWithCTE?
 )
 ;
+
+createDataConnectorStatement
+@init { gParent.pushMsg("create connector statement", state); }
+@after { gParent.popMsg(state); }
+: KW_CREATE KW_DATACONNECTOR ifNotExists? name=identifier 
dataConnectorType dataConnectorUrl dataConnectorComment? ( KW_WITH 
KW_DCPROPERTIES dcprops=dcProperties)?
+-> ^(TOK_CREATEDATACONNECTOR $name ifNotExists? dataConnectorType 
dataConnectorUrl dataConnectorComment? $dcprops?)
+;
+
+dataConnectorComment
+@init { gParent.pushMsg("dataconnector comment", state); }
+@after { gParent.popMsg(state); }
+: KW_COMMENT comment=StringLiteral
+-> ^(TOK_DATACONNECTORCOMMENT $comment)
+;
+
+dataConnectorUrl
+@init { gParent.pushMsg("dataconnector URL", state); }
+@after { gParent.popMsg(state); }
+: KW_URL url=StringLiteral
+-> ^(TOK_DATACONNECTORURL $url)
+;
+
+dataConnectorType
+@init { gParent.pushMsg("dataconnector type", state); }
+@after { gParent.popMsg(state); }
+: KW_TYPE dcType=StringLiteral
+-> ^(TOK_DATACONNECTORTYPE $dcType)
+;
+
+dcProperties
+@init { gParent.pushMsg("dcproperties", state); }
+@after { gParent.popMsg(state); }
+:
+  LPAREN dbPropertiesList RPAREN -> ^(TOK_DATACONNECTORPROPERTIES 
dbPropertiesList)
+;
+
+dropDataConnectorStatement
+@init { gParent.pushMsg("drop connector statement", state); }
+@after { gParent.popMsg(state); }
+: KW_DROP (KW_DATACONNECTOR) ifExists? identifier

Review comment:
   Is the parenthesis surrounding KW_DATACONNECTOR required?

##
File path: parser/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g
##
@@ -1104,14 +1125,16 @@ orReplace
 createDatabaseStatement
 @init { pushMsg("create database statement", state); }
 @after { popMsg(state); }
-: KW_CREATE (KW_DATABASE|KW_SCHEMA)
+: KW_CREATE (remote=KW_REMOTE)? (KW_DATABASE|KW_SCHEMA)
 ifNotExists?
 name=identifier
 databaseComment?
 dbLocation?
 dbManagedLocation?
+

[jira] [Comment Edited] (HIVE-18334) Cannot JOIN ON result of COALESCE

2021-03-18 Thread Hendrik Schultze (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-18334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17304336#comment-17304336
 ] 

Hendrik Schultze edited comment on HIVE-18334 at 3/18/21, 6:50 PM:
---

Can also reproduce on 2.3.3 through TEZ with and without CBO/PDP set.

 

It can temporary be solved by moving the condition from on to where like
{code:sql}
SELECT *
FROM t5 as t5
INNER JOIN t6 as t6
where (
 t5.eno = t6.eno
  or t5.eno = t6.dno
  or t5.dno = t6.eno
  or t5.dno = t6.dno
);
{code}

A patch is highly appreciated.


was (Author: 0xbadbac0n):
Can also reproduce on 2.3.3 through TEZ with and without cbo/pdp set.

> Cannot JOIN ON result of COALESCE 
> --
>
> Key: HIVE-18334
> URL: https://issues.apache.org/jira/browse/HIVE-18334
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.3.2, 2.3.3
> Environment: Amazon AWS
> Release label:emr-5.11.0
> Hadoop distribution:Amazon 2.7.3
> Applications:Hive 2.3.2, Pig 0.17.0, Hue 4.0.1
> classification=hive-site,properties=[hive.strict.checks.cartesian.product=false,hive.mapred.mode=nonstrict]
>Reporter: Paul Jackson
>Priority: Critical
>
> A join is returning no results when the ON clause is equating the results of 
> two COALESCE functions. To reproduce:
> {code:SQL}
> CREATE TABLE t5 (
>   dno INTEGER,
>   dname VARCHAR(30),
>   eno INTEGER,
>   ename VARCHAR(30));
> CREATE TABLE t6 (
>   dno INTEGER,
>   dname VARCHAR(30),
>   eno INTEGER,
>   ename VARCHAR(30));
> INSERT INTO t5 VALUES
>   (10, 'FOO', NULL, NULL),
>   (20, 'BAR', NULL, NULL),
>   (NULL, NULL, 7300, 'LARRY'),
>   (NULL, NULL, 7400, 'MOE'),
>   (NULL, NULL, 7500, 'CURLY');
> INSERT INTO t6 VALUES
>   (10, 'LENNON', NULL, NULL),
>   (20, 'MCCARTNEY', NULL, NULL),
>   (NULL, NULL, 7300, 'READY'),
>   (NULL, NULL, 7400, 'WILLING'),
>   (NULL, NULL, 7500, 'ABLE');
> -- Fails with 0 results
> SELECT *
> FROM t5
> INNER JOIN t6
> ON COALESCE(`t5`.`eno`, `t5`.`dno`) = COALESCE(`t6`.`eno`, `t6`.`dno`)
> -- Full cross with where clause works (in nonstrict mode), returning 5 results
> SELECT *
> FROM t5
> JOIN t6
> WHERE `t5`.`eno` = `t6`.`eno` OR `t5`.`dno` = `t6`.`dno`
> -- Strange that coalescing the same field returns 2 results...
> SELECT *
> FROM t5
> INNER JOIN t6
> ON COALESCE(`t5`.`dno`, `t5`.`dno`) = COALESCE(`t6`.`dno`, `t6`.`dno`)
> -- ...and coalescing the other field returns 3 results
> SELECT *
> FROM t5
> INNER JOIN t6
> ON COALESCE(`t5`.`eno`, `t5`.`eno`) = COALESCE(`t6`.`eno`, `t6`.`eno`)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-18334) Cannot JOIN ON result of COALESCE

2021-03-18 Thread Hendrik Schultze (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-18334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hendrik Schultze updated HIVE-18334:

Priority: Critical  (was: Minor)

> Cannot JOIN ON result of COALESCE 
> --
>
> Key: HIVE-18334
> URL: https://issues.apache.org/jira/browse/HIVE-18334
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.3.2, 2.3.3
> Environment: Amazon AWS
> Release label:emr-5.11.0
> Hadoop distribution:Amazon 2.7.3
> Applications:Hive 2.3.2, Pig 0.17.0, Hue 4.0.1
> classification=hive-site,properties=[hive.strict.checks.cartesian.product=false,hive.mapred.mode=nonstrict]
>Reporter: Paul Jackson
>Priority: Critical
>
> A join is returning no results when the ON clause is equating the results of 
> two COALESCE functions. To reproduce:
> {code:SQL}
> CREATE TABLE t5 (
>   dno INTEGER,
>   dname VARCHAR(30),
>   eno INTEGER,
>   ename VARCHAR(30));
> CREATE TABLE t6 (
>   dno INTEGER,
>   dname VARCHAR(30),
>   eno INTEGER,
>   ename VARCHAR(30));
> INSERT INTO t5 VALUES
>   (10, 'FOO', NULL, NULL),
>   (20, 'BAR', NULL, NULL),
>   (NULL, NULL, 7300, 'LARRY'),
>   (NULL, NULL, 7400, 'MOE'),
>   (NULL, NULL, 7500, 'CURLY');
> INSERT INTO t6 VALUES
>   (10, 'LENNON', NULL, NULL),
>   (20, 'MCCARTNEY', NULL, NULL),
>   (NULL, NULL, 7300, 'READY'),
>   (NULL, NULL, 7400, 'WILLING'),
>   (NULL, NULL, 7500, 'ABLE');
> -- Fails with 0 results
> SELECT *
> FROM t5
> INNER JOIN t6
> ON COALESCE(`t5`.`eno`, `t5`.`dno`) = COALESCE(`t6`.`eno`, `t6`.`dno`)
> -- Full cross with where clause works (in nonstrict mode), returning 5 results
> SELECT *
> FROM t5
> JOIN t6
> WHERE `t5`.`eno` = `t6`.`eno` OR `t5`.`dno` = `t6`.`dno`
> -- Strange that coalescing the same field returns 2 results...
> SELECT *
> FROM t5
> INNER JOIN t6
> ON COALESCE(`t5`.`dno`, `t5`.`dno`) = COALESCE(`t6`.`dno`, `t6`.`dno`)
> -- ...and coalescing the other field returns 3 results
> SELECT *
> FROM t5
> INNER JOIN t6
> ON COALESCE(`t5`.`eno`, `t5`.`eno`) = COALESCE(`t6`.`eno`, `t6`.`eno`)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-18334) Cannot JOIN ON result of COALESCE

2021-03-18 Thread Hendrik Schultze (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-18334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17304336#comment-17304336
 ] 

Hendrik Schultze commented on HIVE-18334:
-

Can also reproduce on 2.3.3 through TEZ with and without cbo/pdp set.

> Cannot JOIN ON result of COALESCE 
> --
>
> Key: HIVE-18334
> URL: https://issues.apache.org/jira/browse/HIVE-18334
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.3.2, 2.3.3
> Environment: Amazon AWS
> Release label:emr-5.11.0
> Hadoop distribution:Amazon 2.7.3
> Applications:Hive 2.3.2, Pig 0.17.0, Hue 4.0.1
> classification=hive-site,properties=[hive.strict.checks.cartesian.product=false,hive.mapred.mode=nonstrict]
>Reporter: Paul Jackson
>Priority: Minor
>
> A join is returning no results when the ON clause is equating the results of 
> two COALESCE functions. To reproduce:
> {code:SQL}
> CREATE TABLE t5 (
>   dno INTEGER,
>   dname VARCHAR(30),
>   eno INTEGER,
>   ename VARCHAR(30));
> CREATE TABLE t6 (
>   dno INTEGER,
>   dname VARCHAR(30),
>   eno INTEGER,
>   ename VARCHAR(30));
> INSERT INTO t5 VALUES
>   (10, 'FOO', NULL, NULL),
>   (20, 'BAR', NULL, NULL),
>   (NULL, NULL, 7300, 'LARRY'),
>   (NULL, NULL, 7400, 'MOE'),
>   (NULL, NULL, 7500, 'CURLY');
> INSERT INTO t6 VALUES
>   (10, 'LENNON', NULL, NULL),
>   (20, 'MCCARTNEY', NULL, NULL),
>   (NULL, NULL, 7300, 'READY'),
>   (NULL, NULL, 7400, 'WILLING'),
>   (NULL, NULL, 7500, 'ABLE');
> -- Fails with 0 results
> SELECT *
> FROM t5
> INNER JOIN t6
> ON COALESCE(`t5`.`eno`, `t5`.`dno`) = COALESCE(`t6`.`eno`, `t6`.`dno`)
> -- Full cross with where clause works (in nonstrict mode), returning 5 results
> SELECT *
> FROM t5
> JOIN t6
> WHERE `t5`.`eno` = `t6`.`eno` OR `t5`.`dno` = `t6`.`dno`
> -- Strange that coalescing the same field returns 2 results...
> SELECT *
> FROM t5
> INNER JOIN t6
> ON COALESCE(`t5`.`dno`, `t5`.`dno`) = COALESCE(`t6`.`dno`, `t6`.`dno`)
> -- ...and coalescing the other field returns 3 results
> SELECT *
> FROM t5
> INNER JOIN t6
> ON COALESCE(`t5`.`eno`, `t5`.`eno`) = COALESCE(`t6`.`eno`, `t6`.`eno`)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-18334) Cannot JOIN ON result of COALESCE

2021-03-18 Thread Hendrik Schultze (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-18334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hendrik Schultze updated HIVE-18334:

Affects Version/s: 2.3.3

> Cannot JOIN ON result of COALESCE 
> --
>
> Key: HIVE-18334
> URL: https://issues.apache.org/jira/browse/HIVE-18334
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.3.2, 2.3.3
> Environment: Amazon AWS
> Release label:emr-5.11.0
> Hadoop distribution:Amazon 2.7.3
> Applications:Hive 2.3.2, Pig 0.17.0, Hue 4.0.1
> classification=hive-site,properties=[hive.strict.checks.cartesian.product=false,hive.mapred.mode=nonstrict]
>Reporter: Paul Jackson
>Priority: Minor
>
> A join is returning no results when the ON clause is equating the results of 
> two COALESCE functions. To reproduce:
> {code:SQL}
> CREATE TABLE t5 (
>   dno INTEGER,
>   dname VARCHAR(30),
>   eno INTEGER,
>   ename VARCHAR(30));
> CREATE TABLE t6 (
>   dno INTEGER,
>   dname VARCHAR(30),
>   eno INTEGER,
>   ename VARCHAR(30));
> INSERT INTO t5 VALUES
>   (10, 'FOO', NULL, NULL),
>   (20, 'BAR', NULL, NULL),
>   (NULL, NULL, 7300, 'LARRY'),
>   (NULL, NULL, 7400, 'MOE'),
>   (NULL, NULL, 7500, 'CURLY');
> INSERT INTO t6 VALUES
>   (10, 'LENNON', NULL, NULL),
>   (20, 'MCCARTNEY', NULL, NULL),
>   (NULL, NULL, 7300, 'READY'),
>   (NULL, NULL, 7400, 'WILLING'),
>   (NULL, NULL, 7500, 'ABLE');
> -- Fails with 0 results
> SELECT *
> FROM t5
> INNER JOIN t6
> ON COALESCE(`t5`.`eno`, `t5`.`dno`) = COALESCE(`t6`.`eno`, `t6`.`dno`)
> -- Full cross with where clause works (in nonstrict mode), returning 5 results
> SELECT *
> FROM t5
> JOIN t6
> WHERE `t5`.`eno` = `t6`.`eno` OR `t5`.`dno` = `t6`.`dno`
> -- Strange that coalescing the same field returns 2 results...
> SELECT *
> FROM t5
> INNER JOIN t6
> ON COALESCE(`t5`.`dno`, `t5`.`dno`) = COALESCE(`t6`.`dno`, `t6`.`dno`)
> -- ...and coalescing the other field returns 3 results
> SELECT *
> FROM t5
> INNER JOIN t6
> ON COALESCE(`t5`.`eno`, `t5`.`eno`) = COALESCE(`t6`.`eno`, `t6`.`eno`)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24907) Wrong results with LEFT JOIN and subqueries with UNION and GROUP BY

2021-03-18 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis reassigned HIVE-24907:
--


> Wrong results with LEFT JOIN and subqueries with UNION and GROUP BY
> ---
>
> Key: HIVE-24907
> URL: https://issues.apache.org/jira/browse/HIVE-24907
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.4.0, 3.2.0, 4.0.0
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>
> The following SQL query returns wrong results when run in TEZ/LLAP:
> {code:sql}
> SET hive.auto.convert.sortmerge.join=true;
> CREATE TABLE tbl (key int,value int);
> INSERT INTO tbl VALUES (1, 2000);
> INSERT INTO tbl VALUES (2, 2001);
> INSERT INTO tbl VALUES (3, 2005);
> SELECT sub1.key, sub2.key
> FROM
>   (SELECT a.key FROM tbl a GROUP BY a.key) sub1
> LEFT OUTER JOIN (
>   SELECT b.key FROM tbl b WHERE b.value = 2001 GROUP BY b.key
>   UNION
>   SELECT c.key FROM tbl c WHERE c.value = 2005 GROUP BY c.key) sub2 
> ON sub1.key = sub2.key;
> {code}
> Actual results:
> ||SUB1.KEY||SUB2.KEY||
> |1|NULL|
> |2|NULL|
> |3|NULL|
> Expected results:
> ||SUB1.KEY||SUB2.KEY||
> |1|NULL|
> |2|2|
> |3|3|
> Tested can be reproduced with {{TestMiniLlapLocalCliDriver}} or 
> {{TestMiniTezCliDriver}} in older versions of Hive.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24906) Suffix the table location with UUID/txnId

2021-03-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24906:
--
Labels: pull-request-available  (was: )

> Suffix the table location with UUID/txnId
> -
>
> Key: HIVE-24906
> URL: https://issues.apache.org/jira/browse/HIVE-24906
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Suffixing the table location during create table with UUID/txnId can help in 
> deleting the data in asynchronous fashion.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24906) Suffix the table location with UUID/txnId

2021-03-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24906?focusedWorklogId=568530=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-568530
 ]

ASF GitHub Bot logged work on HIVE-24906:
-

Author: ASF GitHub Bot
Created on: 18/Mar/21 17:47
Start Date: 18/Mar/21 17:47
Worklog Time Spent: 10m 
  Work Description: deniskuzZ opened a new pull request #2089:
URL: https://github.com/apache/hive/pull/2089


   
   
   ### What changes were proposed in this pull request?
   
   Suffixes managed table location with txnId that created the table.
   
   ### Why are the changes needed?
   
   Part of non-blocking drop table implementation. Could resolve concurrency 
issues between ongoing compaction and table re-create operation.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No
   
   ### How was this patch tested?
   
   Unit tests


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 568530)
Remaining Estimate: 0h
Time Spent: 10m

> Suffix the table location with UUID/txnId
> -
>
> Key: HIVE-24906
> URL: https://issues.apache.org/jira/browse/HIVE-24906
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Denys Kuzmenko
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Suffixing the table location during create table with UUID/txnId can help in 
> deleting the data in asynchronous fashion.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24794) [HMS] Populate tableId in the response of get_valid_write_ids API

2021-03-18 Thread Kishen Das (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kishen Das resolved HIVE-24794.
---
Resolution: Won't Fix

> [HMS] Populate tableId in the response of get_valid_write_ids API
> -
>
> Key: HIVE-24794
> URL: https://issues.apache.org/jira/browse/HIVE-24794
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Reporter: Kishen Das
>Assignee: Kishen Das
>Priority: Major
>
> In HS2, after query compilation phase, we acquire lock in DriverTxnHandler.
> isValidTxnListState and later ensure there are no conflicting transactions by 
> using driverContext.getTxnManager().getLatestTxnIdInConflict(); . This 
> doesn't work well, if there are external entities that drop and recreate the 
> table with the same name. So, we should also make sure the tableId itself is 
> not changed, after lock has been acquired. This Jira is to enhance 
> getValidWriteIdList to return tableId as well. Idea is to cache the tableId 
> in SessionState and later compare it with what getValidWriteIdList returns. 
> If the table was dropped and recreated, the tableId will not match and we 
> have to recompile the query. Caching the tableId in SessionState will be done 
> as part of a separate Jira. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24796) [HS2] Enhance DriverTxnHandler.isValidTxnListState logic to include tableId comparison

2021-03-18 Thread Kishen Das (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kishen Das resolved HIVE-24796.
---
Resolution: Won't Fix

> [HS2] Enhance DriverTxnHandler.isValidTxnListState logic to include tableId 
> comparison
> --
>
> Key: HIVE-24796
> URL: https://issues.apache.org/jira/browse/HIVE-24796
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Reporter: Kishen Das
>Assignee: Kishen Das
>Priority: Major
>
> In HS2, after query compilation phase, we acquire lock in DriverTxnHandler.
> isValidTxnListState and later ensure there are no conflicting transactions by 
> using driverContext.getTxnManager().getLatestTxnIdInConflict(); . This 
> doesn't work well, if there are external entities that drop and recreate the 
> table with the same name. So, we should also make sure the tableId itself is 
> not changed, after lock has been acquired. This Jira is to enhance the 
> DriverTxnHandler.isValidTxnListState logic to include tableId comparison. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24794) [HMS] Populate tableId in the response of get_valid_write_ids API

2021-03-18 Thread Kishen Das (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17304279#comment-17304279
 ] 

Kishen Das commented on HIVE-24794:
---

This would take care of the above issue -> 
https://issues.apache.org/jira/browse/HIVE-24662 . 

> [HMS] Populate tableId in the response of get_valid_write_ids API
> -
>
> Key: HIVE-24794
> URL: https://issues.apache.org/jira/browse/HIVE-24794
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Reporter: Kishen Das
>Assignee: Kishen Das
>Priority: Major
>
> In HS2, after query compilation phase, we acquire lock in DriverTxnHandler.
> isValidTxnListState and later ensure there are no conflicting transactions by 
> using driverContext.getTxnManager().getLatestTxnIdInConflict(); . This 
> doesn't work well, if there are external entities that drop and recreate the 
> table with the same name. So, we should also make sure the tableId itself is 
> not changed, after lock has been acquired. This Jira is to enhance 
> getValidWriteIdList to return tableId as well. Idea is to cache the tableId 
> in SessionState and later compare it with what getValidWriteIdList returns. 
> If the table was dropped and recreated, the tableId will not match and we 
> have to recompile the query. Caching the tableId in SessionState will be done 
> as part of a separate Jira. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24795) [HS2] Cache tableId in SessionState

2021-03-18 Thread Kishen Das (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17304278#comment-17304278
 ] 

Kishen Das commented on HIVE-24795:
---

This would take care of the above issue -> 
https://issues.apache.org/jira/browse/HIVE-24662 . 

> [HS2] Cache tableId in SessionState 
> 
>
> Key: HIVE-24795
> URL: https://issues.apache.org/jira/browse/HIVE-24795
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Reporter: Kishen Das
>Assignee: Kishen Das
>Priority: Major
>
> Please go through https://issues.apache.org/jira/browse/HIVE-24794 to 
> understand why this is required. Its basically to handle a corner case, in 
> which a table gets dropped and recreated with the same name, after we gather 
> information about all the tables and we acquire the lock. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24795) [HS2] Cache tableId in SessionState

2021-03-18 Thread Kishen Das (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kishen Das resolved HIVE-24795.
---
Resolution: Won't Fix

> [HS2] Cache tableId in SessionState 
> 
>
> Key: HIVE-24795
> URL: https://issues.apache.org/jira/browse/HIVE-24795
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Reporter: Kishen Das
>Assignee: Kishen Das
>Priority: Major
>
> Please go through https://issues.apache.org/jira/browse/HIVE-24794 to 
> understand why this is required. Its basically to handle a corner case, in 
> which a table gets dropped and recreated with the same name, after we gather 
> information about all the tables and we acquire the lock. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24796) [HS2] Enhance DriverTxnHandler.isValidTxnListState logic to include tableId comparison

2021-03-18 Thread Kishen Das (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17304277#comment-17304277
 ] 

Kishen Das commented on HIVE-24796:
---

This would take care of the above issue -> 
https://issues.apache.org/jira/browse/HIVE-24662 . 

> [HS2] Enhance DriverTxnHandler.isValidTxnListState logic to include tableId 
> comparison
> --
>
> Key: HIVE-24796
> URL: https://issues.apache.org/jira/browse/HIVE-24796
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Reporter: Kishen Das
>Assignee: Kishen Das
>Priority: Major
>
> In HS2, after query compilation phase, we acquire lock in DriverTxnHandler.
> isValidTxnListState and later ensure there are no conflicting transactions by 
> using driverContext.getTxnManager().getLatestTxnIdInConflict(); . This 
> doesn't work well, if there are external entities that drop and recreate the 
> table with the same name. So, we should also make sure the tableId itself is 
> not changed, after lock has been acquired. This Jira is to enhance the 
> DriverTxnHandler.isValidTxnListState logic to include tableId comparison. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24906) Suffix the table location with UUID/txnId

2021-03-18 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-24906:
--
Description: Suffixing the table location during create table with 
UUID/txnId can help in deleting the data in asynchronous fashion.

> Suffix the table location with UUID/txnId
> -
>
> Key: HIVE-24906
> URL: https://issues.apache.org/jira/browse/HIVE-24906
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Denys Kuzmenko
>Priority: Major
>
> Suffixing the table location during create table with UUID/txnId can help in 
> deleting the data in asynchronous fashion.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24608) Switch back to get_table in HMS client for Hive 2.3.x

2021-03-18 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HIVE-24608:

Target Version/s: 2.3.9

> Switch back to get_table in HMS client for Hive 2.3.x
> -
>
> Key: HIVE-24608
> URL: https://issues.apache.org/jira/browse/HIVE-24608
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.3.7
>Reporter: Chao Sun
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> HIVE-15062 introduced a backward-incompatible change by replacing 
> {{get_table}} with {{get_table_req}}. As consequence, when HMS client w/ 
> version > 2.3 talks to a HMS w/ version < 2.3, it will get error similar to 
> the following:
> {code}
> AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: Unable 
> to fetch table testpartitiondata. Invalid method name: 'get_table_req';
> {code}
> Looking at HIVE-15062, the {{get_table_req}} is to introduce client-side 
> check for capabilities. However in branch-2.3 the check is a no-op since 
> there is no capability yet (it is assigned to null). Therefore, this JIRA 
> proposes to switch back to {{get_table}} in branch-2.3 to fix the 
> compatibility issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24797) Disable validate default values when parsing Avro schemas

2021-03-18 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HIVE-24797:

Fix Version/s: 2.3.9

> Disable validate default values when parsing Avro schemas
> -
>
> Key: HIVE-24797
> URL: https://issues.apache.org/jira/browse/HIVE-24797
> Project: Hive
>  Issue Type: Bug
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.3.9, 4.0.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> It will throw exceptions when upgrading Avro to 1.10.1 for this schema:
> {code:json}
> {
> "type": "record",
> "name": "EventData",
> "doc": "event data",
> "fields": [
> {"name": "ARRAY_WITH_DEFAULT", "type": {"type": "array", "items": 
> "string"}, "default": null }
> ]
> }
> {code}
> {noformat}
> org.apache.avro.AvroTypeException: Invalid default for field 
> ARRAY_WITH_DEFAULT: null not a {"type":"array","items":"string"}
>   at org.apache.avro.Schema.validateDefault(Schema.java:1571)
>   at org.apache.avro.Schema.access$500(Schema.java:87)
>   at org.apache.avro.Schema$Field.(Schema.java:544)
>   at org.apache.avro.Schema.parse(Schema.java:1678)
>   at org.apache.avro.Schema$Parser.parse(Schema.java:1425)
>   at org.apache.avro.Schema$Parser.parse(Schema.java:1396)
>   at 
> org.apache.hadoop.hive.serde2.avro.AvroSerdeUtils.getSchemaFor(AvroSerdeUtils.java:287)
>   at 
> org.apache.hadoop.hive.serde2.avro.AvroSerdeUtils.getSchemaFromFS(AvroSerdeUtils.java:170)
>   at 
> org.apache.hadoop.hive.serde2.avro.AvroSerdeUtils.determineSchemaOrThrowException(AvroSerdeUtils.java:139)
>   at 
> org.apache.hadoop.hive.serde2.avro.AvroSerDe.determineSchemaOrReturnErrorSchema(AvroSerDe.java:187)
>   at 
> org.apache.hadoop.hive.serde2.avro.AvroSerDe.initialize(AvroSerDe.java:107)
>   at 
> org.apache.hadoop.hive.serde2.avro.AvroSerDe.initialize(AvroSerDe.java:83)
>   at 
> org.apache.hadoop.hive.serde2.SerDeUtils.initializeSerDe(SerDeUtils.java:533)
>   at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:493)
>   at 
> org.apache.hadoop.hive.ql.metadata.Partition.getDeserializer(Partition.java:225)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (HIVE-24761) Support vectorization for bounded windows in PTF

2021-03-18 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-24761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-24761 started by László Bodor.
---
> Support vectorization for bounded windows in PTF
> 
>
> Key: HIVE-24761
> URL: https://issues.apache.org/jira/browse/HIVE-24761
> Project: Hive
>  Issue Type: Sub-task
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>
> {code}
>  notVectorizedReason: PTF operator: *** only UNBOUNDED start frame is 
> supported
> {code}
> Currently, bounded windows are not supported in VectorPTFOperator. If we 
> simply remove the check compile-time:
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java#L2911
> {code}
>   if (!windowFrameDef.isStartUnbounded()) {
> setOperatorIssue(functionName + " only UNBOUNDED start frame is 
> supported");
> return false;
>   }
> {code}
> We get incorrect results, that's because vectorized codepath completely 
> ignores boundaries, and simply iterates through all the input batches in 
> [VectorPTFGroupBatches|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/ptf/VectorPTFGroupBatches.java#L172]:
> {code}
> for (VectorPTFEvaluatorBase evaluator : evaluators) {
>   evaluator.evaluateGroupBatch(batch);
>   if (isLastGroupBatch) {
> evaluator.doLastBatchWork();
>   }
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24905) only CURRENT ROW end frame is supported for RANGE

2021-03-18 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-24905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-24905:

Description: 
This one is about to take care of vectorizing the FOLLOWING rows case:
{code}
avg(p_retailprice) over(partition by p_mfgr order by p_date range between 1 
preceding and 3 following) as avg1,
{code}

{code}
Reduce Vectorization:
enabled: true
enableConditionsMet: hive.vectorized.execution.reduce.enabled 
IS true, hive.execution.engine tez IN [tez, spark] IS true
notVectorizedReason: PTF operator: count only CURRENT ROW end 
frame is supported for RANGE
vectorized: false
{code}

> only CURRENT ROW end frame is supported for RANGE
> -
>
> Key: HIVE-24905
> URL: https://issues.apache.org/jira/browse/HIVE-24905
> Project: Hive
>  Issue Type: Sub-task
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>
> This one is about to take care of vectorizing the FOLLOWING rows case:
> {code}
> avg(p_retailprice) over(partition by p_mfgr order by p_date range between 1 
> preceding and 3 following) as avg1,
> {code}
> {code}
> Reduce Vectorization:
> enabled: true
> enableConditionsMet: hive.vectorized.execution.reduce.enabled 
> IS true, hive.execution.engine tez IN [tez, spark] IS true
> notVectorizedReason: PTF operator: count only CURRENT ROW end 
> frame is supported for RANGE
> vectorized: false
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24905) only CURRENT ROW end frame is supported for RANGE

2021-03-18 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-24905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor reassigned HIVE-24905:
---

Assignee: László Bodor

> only CURRENT ROW end frame is supported for RANGE
> -
>
> Key: HIVE-24905
> URL: https://issues.apache.org/jira/browse/HIVE-24905
> Project: Hive
>  Issue Type: Sub-task
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24905) only CURRENT ROW end frame is supported for RANGE

2021-03-18 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-24905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-24905:

Parent: HIVE-24872
Issue Type: Sub-task  (was: Improvement)

> only CURRENT ROW end frame is supported for RANGE
> -
>
> Key: HIVE-24905
> URL: https://issues.apache.org/jira/browse/HIVE-24905
> Project: Hive
>  Issue Type: Sub-task
>Reporter: László Bodor
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24590) Operation Logging still leaks the log4j Appenders

2021-03-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24590?focusedWorklogId=568372=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-568372
 ]

ASF GitHub Bot logged work on HIVE-24590:
-

Author: ASF GitHub Bot
Created on: 18/Mar/21 14:11
Start Date: 18/Mar/21 14:11
Worklog Time Spent: 10m 
  Work Description: zabetak commented on pull request #1849:
URL: https://github.com/apache/hive/pull/1849#issuecomment-801960687


   Thanks for testing this out @EugeneChung !
   
   Hey @prasanthj , can you please review this change. The change is very 
similar to what was done as part of HIVE-24569 that you reviewed previously.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 568372)
Time Spent: 1h 40m  (was: 1.5h)

> Operation Logging still leaks the log4j Appenders
> -
>
> Key: HIVE-24590
> URL: https://issues.apache.org/jira/browse/HIVE-24590
> Project: Hive
>  Issue Type: Bug
>  Components: Logging
>Reporter: Eugene Chung
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Attachments: Screen Shot 2021-01-06 at 18.42.05.png, Screen Shot 
> 2021-01-06 at 18.42.24.png, Screen Shot 2021-01-06 at 18.42.55.png, Screen 
> Shot 2021-01-06 at 21.38.32.png, Screen Shot 2021-01-06 at 21.47.28.png, 
> Screen Shot 2021-01-08 at 21.01.40.png, add_debug_log_and_trace.patch
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> I'm using Hive 3.1.2 with options below.
>  * hive.server2.logging.operation.enabled=true
>  * hive.server2.logging.operation.level=VERBOSE
>  * hive.async.log.enabled=false
> I already know the ticket, https://issues.apache.org/jira/browse/HIVE-17128 
> but HS2 still leaks log4j RandomAccessFileManager.
> !Screen Shot 2021-01-06 at 18.42.05.png|width=756,height=197!
> I checked the operation log file which is not closed/deleted properly.
> !Screen Shot 2021-01-06 at 18.42.24.png|width=603,height=272!
> Then there's the log,
> {code:java}
> client.TezClient: Shutting down Tez Session, sessionName= {code}
> !Screen Shot 2021-01-06 at 18.42.55.png|width=1372,height=26!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24348) Beeline: Isolating dependencies and execution with java

2021-03-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24348?focusedWorklogId=568329=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-568329
 ]

ASF GitHub Bot logged work on HIVE-24348:
-

Author: ASF GitHub Bot
Created on: 18/Mar/21 13:18
Start Date: 18/Mar/21 13:18
Worklog Time Spent: 10m 
  Work Description: nrg4878 closed pull request #1852:
URL: https://github.com/apache/hive/pull/1852


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 568329)
Time Spent: 1.5h  (was: 1h 20m)

> Beeline: Isolating dependencies and execution with java
> ---
>
> Key: HIVE-24348
> URL: https://issues.apache.org/jira/browse/HIVE-24348
> Project: Hive
>  Issue Type: Improvement
>  Components: Beeline
>Affects Versions: 3.1.0
>Reporter: Naveen Gangam
>Assignee: Abhay
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Currently, beeline code, binaries and executables are somewhat tightly 
> coupled with the hive product. To be able to execute beeline from a node with 
> just JRE installed and some jars in classpath is impossible.
> * beeline.sh/hive scripts rely on HADOOP_HOME to be set which are designed to 
> use "hadoop" executable to run beeline.
> * Ideally, just the hive-beeline.jar and hive-jdbc-standalone jars should be 
> enough but sadly they arent. The latter jar adds more problems than it solves 
> because all the classfiles are shaded some dependencies cannot be resolved.
> * Beeline has many other dependencies like hive-exec, hive-common. 
> hadoop-common, supercsv, jline, commons-cli, commons-io, commons-logging etc. 
> While it may not be possible to eliminate some of these, we should atleast 
> have a self-contains jar that contains all these to be able to make it work.
> * the underlying script used to run beeline should use JAVA as an alternate 
> means to execute if HADOOP_HOME is not set



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24348) Beeline: Isolating dependencies and execution with java

2021-03-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24348?focusedWorklogId=568328=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-568328
 ]

ASF GitHub Bot logged work on HIVE-24348:
-

Author: ASF GitHub Bot
Created on: 18/Mar/21 13:18
Start Date: 18/Mar/21 13:18
Worklog Time Spent: 10m 
  Work Description: nrg4878 commented on pull request #1852:
URL: https://github.com/apache/hive/pull/1852#issuecomment-801919129


   This change has been re-submitted as PR#1906 and merged. Closing this PR.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 568328)
Time Spent: 1h 20m  (was: 1h 10m)

> Beeline: Isolating dependencies and execution with java
> ---
>
> Key: HIVE-24348
> URL: https://issues.apache.org/jira/browse/HIVE-24348
> Project: Hive
>  Issue Type: Improvement
>  Components: Beeline
>Affects Versions: 3.1.0
>Reporter: Naveen Gangam
>Assignee: Abhay
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Currently, beeline code, binaries and executables are somewhat tightly 
> coupled with the hive product. To be able to execute beeline from a node with 
> just JRE installed and some jars in classpath is impossible.
> * beeline.sh/hive scripts rely on HADOOP_HOME to be set which are designed to 
> use "hadoop" executable to run beeline.
> * Ideally, just the hive-beeline.jar and hive-jdbc-standalone jars should be 
> enough but sadly they arent. The latter jar adds more problems than it solves 
> because all the classfiles are shaded some dependencies cannot be resolved.
> * Beeline has many other dependencies like hive-exec, hive-common. 
> hadoop-common, supercsv, jline, commons-cli, commons-io, commons-logging etc. 
> While it may not be possible to eliminate some of these, we should atleast 
> have a self-contains jar that contains all these to be able to make it work.
> * the underlying script used to run beeline should use JAVA as an alternate 
> means to execute if HADOOP_HOME is not set



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24900) Failed compaction does not cleanup the directories

2021-03-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24900?focusedWorklogId=568310=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-568310
 ]

ASF GitHub Bot logged work on HIVE-24900:
-

Author: ASF GitHub Bot
Created on: 18/Mar/21 12:39
Start Date: 18/Mar/21 12:39
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #2086:
URL: https://github.com/apache/hive/pull/2086#discussion_r596828645



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Worker.java
##
@@ -523,6 +524,15 @@ protected Boolean findNextCompactionAndExecute(boolean 
computeStats) throws Inte
   final StatsUpdater su = computeStats ? StatsUpdater.init(ci, 
msc.findColumnsWithStats(
   CompactionInfo.compactionInfoToStruct(ci)), conf,
   runJobAsSelf(ci.runAs) ? ci.runAs : t.getOwner()) : null;
+  // result directory for compactor to write new files
+  Path resultDir = null;
+  if (ci.type == CompactionType.MAJOR) {

Review comment:
   Could this be simplified to just: 
   Path resultDir = QueryCompactor.Util.getCompactionResultDir(sd, 
tblValidWriteIds, conf, 
ci.type == CompactionType.MAJOR, false, false, null);





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 568310)
Time Spent: 50m  (was: 40m)

> Failed compaction does not cleanup the directories
> --
>
> Key: HIVE-24900
> URL: https://issues.apache.org/jira/browse/HIVE-24900
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Ramesh Kumar Thangarajan
>Assignee: Ramesh Kumar Thangarajan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Failed compaction does not cleanup the directories



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24727) Cache hydration api in llap proto

2021-03-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24727?focusedWorklogId=568282=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-568282
 ]

ASF GitHub Bot logged work on HIVE-24727:
-

Author: ASF GitHub Bot
Created on: 18/Mar/21 11:42
Start Date: 18/Mar/21 11:42
Worklog Time Spent: 10m 
  Work Description: asinkovits commented on a change in pull request #2066:
URL: https://github.com/apache/hive/pull/2066#discussion_r596786428



##
File path: 
llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapCacheMetadataSerializer.java
##
@@ -0,0 +1,202 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.llap.io.api.impl;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.protobuf.ByteString;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.common.io.CacheTag;
+import org.apache.hadoop.hive.common.io.DataCache;
+import org.apache.hadoop.hive.common.io.DiskRangeList;
+import org.apache.hadoop.hive.common.io.FileMetadataCache;
+import org.apache.hadoop.hive.llap.cache.LlapCacheableBuffer;
+import org.apache.hadoop.hive.llap.cache.LlapDataBuffer;
+import org.apache.hadoop.hive.llap.cache.LowLevelCachePolicy;
+import org.apache.hadoop.hive.llap.cache.PathCache;
+import org.apache.hadoop.hive.llap.daemon.rpc.LlapDaemonProtocolProtos;
+import org.apache.hadoop.hive.llap.io.encoded.LlapOrcCacheLoader;
+import org.apache.hadoop.hive.ql.io.SyntheticFileId;
+import org.apache.hadoop.hive.ql.io.orc.encoded.IoTrace;
+import org.apache.hive.common.util.FixedSizedObjectPool;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.DataInput;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.util.Arrays;
+import java.util.Comparator;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.stream.Collectors;
+
+/**
+ * Internal helper class for extracting the metadata of the cache content and 
loading data into the cache
+ * based on the provided metadata.
+ */
+final class LlapCacheMetadataSerializer {
+
+  public static final Logger LOG = 
LoggerFactory.getLogger(LlapCacheMetadataSerializer.class);
+
+  private final FileMetadataCache metadataCache;
+  private final DataCache cache;
+  private final Configuration conf;
+  private final PathCache pathCache;
+  private final FixedSizedObjectPool tracePool;
+  private final LowLevelCachePolicy cachePolicy;
+
+  LlapCacheMetadataSerializer(FileMetadataCache fileMetadataCache, DataCache 
cache, Configuration daemonConf,
+  PathCache pathCache, FixedSizedObjectPool tracePool, 
LowLevelCachePolicy realCachePolicy) {
+this.metadataCache = fileMetadataCache;
+this.cache = cache;
+this.conf = daemonConf;
+this.pathCache = pathCache;
+this.tracePool = tracePool;
+this.cachePolicy = realCachePolicy;
+
+  }
+
+  public LlapDaemonProtocolProtos.CacheEntryList fetchMetadata() {
+List buffers = cachePolicy.getHotBuffers();
+List entries = 
encodeAndConvertHotBuffers(buffers);
+return 
LlapDaemonProtocolProtos.CacheEntryList.newBuilder().addAllEntries(entries).build();
+  }
+
+  private List 
encodeAndConvertHotBuffers(List buffers) {
+Map entries = 
encodeAndSortHotBuffersByFileKey(buffers);
+return entries.values().stream().map(v -> 
v.build()).collect(Collectors.toList());
+  }
+
+  private Map 
encodeAndSortHotBuffersByFileKey(
+  List buffers) {
+Map lookupMap = new 
HashMap<>();
+for (LlapCacheableBuffer b : buffers) {
+  if (b instanceof LlapDataBuffer) {
+LlapDataBuffer db = (LlapDataBuffer) b;
+try {
+  Object fileKey = db.getFileKey();
+  String path = pathCache.resolve(db.getFileKey());
+  if (path != null) {
+LlapDaemonProtocolProtos.CacheEntry.Builder builder =
+lookupOrCreateCacheEntryFromDataBuffer(lookupMap, db, fileKey, 
path);
+

[jira] [Work logged] (HIVE-24727) Cache hydration api in llap proto

2021-03-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24727?focusedWorklogId=568280=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-568280
 ]

ASF GitHub Bot logged work on HIVE-24727:
-

Author: ASF GitHub Bot
Created on: 18/Mar/21 11:41
Start Date: 18/Mar/21 11:41
Worklog Time Spent: 10m 
  Work Description: asinkovits commented on a change in pull request #2066:
URL: https://github.com/apache/hive/pull/2066#discussion_r596785277



##
File path: 
llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapCacheMetadataSerializer.java
##
@@ -0,0 +1,202 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.llap.io.api.impl;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.protobuf.ByteString;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.common.io.CacheTag;
+import org.apache.hadoop.hive.common.io.DataCache;
+import org.apache.hadoop.hive.common.io.DiskRangeList;
+import org.apache.hadoop.hive.common.io.FileMetadataCache;
+import org.apache.hadoop.hive.llap.cache.LlapCacheableBuffer;
+import org.apache.hadoop.hive.llap.cache.LlapDataBuffer;
+import org.apache.hadoop.hive.llap.cache.LowLevelCachePolicy;
+import org.apache.hadoop.hive.llap.cache.PathCache;
+import org.apache.hadoop.hive.llap.daemon.rpc.LlapDaemonProtocolProtos;
+import org.apache.hadoop.hive.llap.io.encoded.LlapOrcCacheLoader;
+import org.apache.hadoop.hive.ql.io.SyntheticFileId;
+import org.apache.hadoop.hive.ql.io.orc.encoded.IoTrace;
+import org.apache.hive.common.util.FixedSizedObjectPool;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.DataInput;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.util.Arrays;
+import java.util.Comparator;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.stream.Collectors;
+
+/**
+ * Internal helper class for extracting the metadata of the cache content and 
loading data into the cache
+ * based on the provided metadata.
+ */
+final class LlapCacheMetadataSerializer {
+
+  public static final Logger LOG = 
LoggerFactory.getLogger(LlapCacheMetadataSerializer.class);
+
+  private final FileMetadataCache metadataCache;
+  private final DataCache cache;
+  private final Configuration conf;
+  private final PathCache pathCache;
+  private final FixedSizedObjectPool tracePool;
+  private final LowLevelCachePolicy cachePolicy;
+
+  LlapCacheMetadataSerializer(FileMetadataCache fileMetadataCache, DataCache 
cache, Configuration daemonConf,
+  PathCache pathCache, FixedSizedObjectPool tracePool, 
LowLevelCachePolicy realCachePolicy) {
+this.metadataCache = fileMetadataCache;
+this.cache = cache;
+this.conf = daemonConf;
+this.pathCache = pathCache;
+this.tracePool = tracePool;
+this.cachePolicy = realCachePolicy;
+
+  }
+
+  public LlapDaemonProtocolProtos.CacheEntryList fetchMetadata() {
+List buffers = cachePolicy.getHotBuffers();
+List entries = 
encodeAndConvertHotBuffers(buffers);
+return 
LlapDaemonProtocolProtos.CacheEntryList.newBuilder().addAllEntries(entries).build();
+  }
+
+  private List 
encodeAndConvertHotBuffers(List buffers) {
+Map entries = 
encodeAndSortHotBuffersByFileKey(buffers);
+return entries.values().stream().map(v -> 
v.build()).collect(Collectors.toList());
+  }
+
+  private Map 
encodeAndSortHotBuffersByFileKey(
+  List buffers) {
+Map lookupMap = new 
HashMap<>();
+for (LlapCacheableBuffer b : buffers) {
+  if (b instanceof LlapDataBuffer) {
+LlapDataBuffer db = (LlapDataBuffer) b;
+try {
+  Object fileKey = db.getFileKey();
+  String path = pathCache.resolve(db.getFileKey());
+  if (path != null) {
+LlapDaemonProtocolProtos.CacheEntry.Builder builder =
+lookupOrCreateCacheEntryFromDataBuffer(lookupMap, db, fileKey, 
path);
+

[jira] [Work logged] (HIVE-24727) Cache hydration api in llap proto

2021-03-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24727?focusedWorklogId=568274=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-568274
 ]

ASF GitHub Bot logged work on HIVE-24727:
-

Author: ASF GitHub Bot
Created on: 18/Mar/21 11:38
Start Date: 18/Mar/21 11:38
Worklog Time Spent: 10m 
  Work Description: asinkovits commented on a change in pull request #2066:
URL: https://github.com/apache/hive/pull/2066#discussion_r596781752



##
File path: 
llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapCacheMetadataSerializer.java
##
@@ -0,0 +1,202 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.llap.io.api.impl;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.protobuf.ByteString;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.common.io.CacheTag;
+import org.apache.hadoop.hive.common.io.DataCache;
+import org.apache.hadoop.hive.common.io.DiskRangeList;
+import org.apache.hadoop.hive.common.io.FileMetadataCache;
+import org.apache.hadoop.hive.llap.cache.LlapCacheableBuffer;
+import org.apache.hadoop.hive.llap.cache.LlapDataBuffer;
+import org.apache.hadoop.hive.llap.cache.LowLevelCachePolicy;
+import org.apache.hadoop.hive.llap.cache.PathCache;
+import org.apache.hadoop.hive.llap.daemon.rpc.LlapDaemonProtocolProtos;
+import org.apache.hadoop.hive.llap.io.encoded.LlapOrcCacheLoader;
+import org.apache.hadoop.hive.ql.io.SyntheticFileId;
+import org.apache.hadoop.hive.ql.io.orc.encoded.IoTrace;
+import org.apache.hive.common.util.FixedSizedObjectPool;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.DataInput;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.util.Arrays;
+import java.util.Comparator;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.stream.Collectors;
+
+/**
+ * Internal helper class for extracting the metadata of the cache content and 
loading data into the cache
+ * based on the provided metadata.
+ */
+final class LlapCacheMetadataSerializer {
+
+  public static final Logger LOG = 
LoggerFactory.getLogger(LlapCacheMetadataSerializer.class);
+
+  private final FileMetadataCache metadataCache;
+  private final DataCache cache;
+  private final Configuration conf;
+  private final PathCache pathCache;
+  private final FixedSizedObjectPool tracePool;
+  private final LowLevelCachePolicy cachePolicy;
+
+  LlapCacheMetadataSerializer(FileMetadataCache fileMetadataCache, DataCache 
cache, Configuration daemonConf,
+  PathCache pathCache, FixedSizedObjectPool tracePool, 
LowLevelCachePolicy realCachePolicy) {
+this.metadataCache = fileMetadataCache;
+this.cache = cache;
+this.conf = daemonConf;
+this.pathCache = pathCache;
+this.tracePool = tracePool;
+this.cachePolicy = realCachePolicy;
+
+  }
+
+  public LlapDaemonProtocolProtos.CacheEntryList fetchMetadata() {
+List buffers = cachePolicy.getHotBuffers();
+List entries = 
encodeAndConvertHotBuffers(buffers);
+return 
LlapDaemonProtocolProtos.CacheEntryList.newBuilder().addAllEntries(entries).build();
+  }
+
+  private List 
encodeAndConvertHotBuffers(List buffers) {
+Map entries = 
encodeAndSortHotBuffersByFileKey(buffers);
+return entries.values().stream().map(v -> 
v.build()).collect(Collectors.toList());
+  }
+
+  private Map 
encodeAndSortHotBuffersByFileKey(
+  List buffers) {
+Map lookupMap = new 
HashMap<>();
+for (LlapCacheableBuffer b : buffers) {
+  if (b instanceof LlapDataBuffer) {
+LlapDataBuffer db = (LlapDataBuffer) b;
+try {
+  Object fileKey = db.getFileKey();
+  String path = pathCache.resolve(db.getFileKey());
+  if (path != null) {
+LlapDaemonProtocolProtos.CacheEntry.Builder builder =
+lookupOrCreateCacheEntryFromDataBuffer(lookupMap, db, fileKey, 
path);
+

[jira] [Assigned] (HIVE-24904) CVE-2019-10172,CVE-2019-10202 vulnerabilities in jackson-mapper-asl-1.9.13.jar

2021-03-18 Thread Oleksiy Sayankin (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Oleksiy Sayankin reassigned HIVE-24904:
---

Assignee: Zoltan Haindrich

> CVE-2019-10172,CVE-2019-10202 vulnerabilities in jackson-mapper-asl-1.9.13.jar
> --
>
> Key: HIVE-24904
> URL: https://issues.apache.org/jira/browse/HIVE-24904
> Project: Hive
>  Issue Type: Bug
>  Components: Security
>Reporter: Oleksiy Sayankin
>Assignee: Zoltan Haindrich
>Priority: Critical
>  Labels: CVE
>
> CVE list: CVE-2019-10172,CVE-2019-10202
> CVSS score: High
> {code}
> ./packaging/target/apache-hive-4.0.0-SNAPSHOT-bin/apache-hive-4.0.0-SNAPSHOT-bin/lib/jackson-mapper-asl-1.9.13.jar
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24904) CVE-2019-10172,CVE-2019-10202 vulnerabilities in jackson-mapper-asl-1.9.13.jar

2021-03-18 Thread Oleksiy Sayankin (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Oleksiy Sayankin updated HIVE-24904:

Labels: CVE  (was: )

> CVE-2019-10172,CVE-2019-10202 vulnerabilities in jackson-mapper-asl-1.9.13.jar
> --
>
> Key: HIVE-24904
> URL: https://issues.apache.org/jira/browse/HIVE-24904
> Project: Hive
>  Issue Type: Bug
>  Components: Security
>Reporter: Oleksiy Sayankin
>Priority: Critical
>  Labels: CVE
>
> CVE list: CVE-2019-10172,CVE-2019-10202
> CVSS score: High
> {code}
> ./packaging/target/apache-hive-4.0.0-SNAPSHOT-bin/apache-hive-4.0.0-SNAPSHOT-bin/lib/jackson-mapper-asl-1.9.13.jar
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-24904) CVE-2019-10172,CVE-2019-10202 vulnerabilities in jackson-mapper-asl-1.9.13.jar

2021-03-18 Thread Oleksiy Sayankin (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17304055#comment-17304055
 ] 

Oleksiy Sayankin edited comment on HIVE-24904 at 3/18/21, 11:23 AM:


The latest supported release of the lib is 1.9.13 
([https://mvnrepository.com/artifact/org.codehaus.jackson/jackson-mapper-asl])
 for updating the lib to version with fix we have 3 options:
 1. 
[https://mvnrepository.com/artifact/org.codehaus.jackson/jackson-mapper-asl/1.9.14.jdk17-redhat-1]
 update to lib that was bundled by RedHat
 2. Build our own lib from the master: [https://github.com/FasterXML/jackson-1]
 3. Move to new artifact
{panel}
 com.fasterxml.jackson.core » jackson-databind
{panel}

FYI: [~kgyrtkirk], [~jcamachorodriguez], [~pvary]


was (Author: osayankin):
The latest supported release of the lib is 1.9.13 
([https://mvnrepository.com/artifact/org.codehaus.jackson/jackson-mapper-asl])
 for updating the lib to version with fix we have 3 options:
 1. 
[https://mvnrepository.com/artifact/org.codehaus.jackson/jackson-mapper-asl/1.9.14.jdk17-redhat-1]
 update to lib that was bundled by RedHat
 2. Build our own lib from the master: [https://github.com/FasterXML/jackson-1]
 3. Move to new artifact
{panel}
com.fasterxml.jackson.core » jackson-databind{panel}

> CVE-2019-10172,CVE-2019-10202 vulnerabilities in jackson-mapper-asl-1.9.13.jar
> --
>
> Key: HIVE-24904
> URL: https://issues.apache.org/jira/browse/HIVE-24904
> Project: Hive
>  Issue Type: Bug
>Reporter: Oleksiy Sayankin
>Priority: Critical
>
> CVE list: CVE-2019-10172,CVE-2019-10202
> CVSS score: High
> {code}
> ./packaging/target/apache-hive-4.0.0-SNAPSHOT-bin/apache-hive-4.0.0-SNAPSHOT-bin/lib/jackson-mapper-asl-1.9.13.jar
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24904) CVE-2019-10172,CVE-2019-10202 vulnerabilities in jackson-mapper-asl-1.9.13.jar

2021-03-18 Thread Oleksiy Sayankin (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Oleksiy Sayankin updated HIVE-24904:

Component/s: Security

> CVE-2019-10172,CVE-2019-10202 vulnerabilities in jackson-mapper-asl-1.9.13.jar
> --
>
> Key: HIVE-24904
> URL: https://issues.apache.org/jira/browse/HIVE-24904
> Project: Hive
>  Issue Type: Bug
>  Components: Security
>Reporter: Oleksiy Sayankin
>Priority: Critical
>
> CVE list: CVE-2019-10172,CVE-2019-10202
> CVSS score: High
> {code}
> ./packaging/target/apache-hive-4.0.0-SNAPSHOT-bin/apache-hive-4.0.0-SNAPSHOT-bin/lib/jackson-mapper-asl-1.9.13.jar
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24904) CVE-2019-10172,CVE-2019-10202 vulnerabilities in jackson-mapper-asl-1.9.13.jar

2021-03-18 Thread Oleksiy Sayankin (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17304055#comment-17304055
 ] 

Oleksiy Sayankin commented on HIVE-24904:
-

The latest supported release of the lib is 1.9.13 
([https://mvnrepository.com/artifact/org.codehaus.jackson/jackson-mapper-asl])
 for updating the lib to version with fix we have 3 options:
 1. 
[https://mvnrepository.com/artifact/org.codehaus.jackson/jackson-mapper-asl/1.9.14.jdk17-redhat-1]
 update to lib that was bundled by RedHat
 2. Build our own lib from the master: [https://github.com/FasterXML/jackson-1]
 3. Move to new artifact
{panel}
com.fasterxml.jackson.core » jackson-databind{panel}

> CVE-2019-10172,CVE-2019-10202 vulnerabilities in jackson-mapper-asl-1.9.13.jar
> --
>
> Key: HIVE-24904
> URL: https://issues.apache.org/jira/browse/HIVE-24904
> Project: Hive
>  Issue Type: Bug
>Reporter: Oleksiy Sayankin
>Priority: Critical
>
> CVE list: CVE-2019-10172,CVE-2019-10202
> CVSS score: High
> {code}
> ./packaging/target/apache-hive-4.0.0-SNAPSHOT-bin/apache-hive-4.0.0-SNAPSHOT-bin/lib/jackson-mapper-asl-1.9.13.jar
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24867) Create iceberg-handler module in Hive

2021-03-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24867?focusedWorklogId=568247=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-568247
 ]

ASF GitHub Bot logged work on HIVE-24867:
-

Author: ASF GitHub Bot
Created on: 18/Mar/21 10:54
Start Date: 18/Mar/21 10:54
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2058:
URL: https://github.com/apache/hive/pull/2058#discussion_r596747337



##
File path: iceberg-handler/pom.xml
##
@@ -0,0 +1,252 @@
+
+http://maven.apache.org/POM/4.0.0;
+ xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;
+ xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd;>
+
+org.apache.hive
+hive
+4.0.0-SNAPSHOT
+../pom.xml
+
+4.0.0
+
+iceberg-handler

Review comment:
   Good question. I've seen two patterns: 
   
   - druid, hbase and kudu had the hive- prefix 
   - kafka-handler did not have the prefix
   
   I went with the second one, but I'm fine with adding the prefix. I guess 
it's the majority :) 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 568247)
Time Spent: 4h  (was: 3h 50m)

> Create iceberg-handler module in Hive
> -
>
> Key: HIVE-24867
> URL: https://issues.apache.org/jira/browse/HIVE-24867
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> * Create a new iceberg-handler module in Hive
>  * Copy the code from the Iceberg/iceberg-mr module into this new Hive module
>  * Make necessary changes so it compiles with Hive 4.0.0 dependencies 
> (iceberg-mr code was based on Hive 3.1)
>  * Make sure all tests pass



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24867) Create iceberg-handler module in Hive

2021-03-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24867?focusedWorklogId=568245=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-568245
 ]

ASF GitHub Bot logged work on HIVE-24867:
-

Author: ASF GitHub Bot
Created on: 18/Mar/21 10:49
Start Date: 18/Mar/21 10:49
Worklog Time Spent: 10m 
  Work Description: szlta commented on a change in pull request #2058:
URL: https://github.com/apache/hive/pull/2058#discussion_r596744016



##
File path: iceberg-handler/pom.xml
##
@@ -0,0 +1,252 @@
+
+http://maven.apache.org/POM/4.0.0;
+ xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;
+ xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd;>
+
+org.apache.hive
+hive
+4.0.0-SNAPSHOT
+../pom.xml
+
+4.0.0
+
+iceberg-handler

Review comment:
   Looks like the other handlers have this name pattern: hive-*-handler. 
Any reason the 'hive' part is omitted here?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 568245)
Time Spent: 3h 50m  (was: 3h 40m)

> Create iceberg-handler module in Hive
> -
>
> Key: HIVE-24867
> URL: https://issues.apache.org/jira/browse/HIVE-24867
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> * Create a new iceberg-handler module in Hive
>  * Copy the code from the Iceberg/iceberg-mr module into this new Hive module
>  * Make necessary changes so it compiles with Hive 4.0.0 dependencies 
> (iceberg-mr code was based on Hive 3.1)
>  * Make sure all tests pass



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24853) HMS leaks queries in case of timeout

2021-03-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24853?focusedWorklogId=568227=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-568227
 ]

ASF GitHub Bot logged work on HIVE-24853:
-

Author: ASF GitHub Bot
Created on: 18/Mar/21 10:13
Start Date: 18/Mar/21 10:13
Worklog Time Spent: 10m 
  Work Description: ayushtkn commented on pull request #2044:
URL: https://github.com/apache/hive/pull/2044#issuecomment-801798182


   Thanx Zoltan for helping with review and commit. :-)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 568227)
Time Spent: 4h  (was: 3h 50m)

> HMS leaks queries in case of timeout
> 
>
> Key: HIVE-24853
> URL: https://issues.apache.org/jira/browse/HIVE-24853
> Project: Hive
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> The queries aren't closed in case of timeout.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24853) HMS leaks queries in case of timeout

2021-03-18 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich resolved HIVE-24853.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

merged into master. Thank you Ayush!

> HMS leaks queries in case of timeout
> 
>
> Key: HIVE-24853
> URL: https://issues.apache.org/jira/browse/HIVE-24853
> Project: Hive
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> The queries aren't closed in case of timeout.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24853) HMS leaks queries in case of timeout

2021-03-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24853?focusedWorklogId=568225=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-568225
 ]

ASF GitHub Bot logged work on HIVE-24853:
-

Author: ASF GitHub Bot
Created on: 18/Mar/21 10:11
Start Date: 18/Mar/21 10:11
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk merged pull request #2044:
URL: https://github.com/apache/hive/pull/2044


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 568225)
Time Spent: 3h 50m  (was: 3h 40m)

> HMS leaks queries in case of timeout
> 
>
> Key: HIVE-24853
> URL: https://issues.apache.org/jira/browse/HIVE-24853
> Project: Hive
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> The queries aren't closed in case of timeout.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24867) Create iceberg-handler module in Hive

2021-03-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24867?focusedWorklogId=568216=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-568216
 ]

ASF GitHub Bot logged work on HIVE-24867:
-

Author: ASF GitHub Bot
Created on: 18/Mar/21 10:02
Start Date: 18/Mar/21 10:02
Worklog Time Spent: 10m 
  Work Description: marton-bod edited a comment on pull request #2058:
URL: https://github.com/apache/hive/pull/2058#issuecomment-801789947


   @jcamachor Would you like to review as well?
   Most of the changes are copy-paste from the Iceberg repo, but if you have 
any insights regarding the new module structure, pom, packaging, etc. please 
let me know. Thanks!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 568216)
Time Spent: 3h 40m  (was: 3.5h)

> Create iceberg-handler module in Hive
> -
>
> Key: HIVE-24867
> URL: https://issues.apache.org/jira/browse/HIVE-24867
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> * Create a new iceberg-handler module in Hive
>  * Copy the code from the Iceberg/iceberg-mr module into this new Hive module
>  * Make necessary changes so it compiles with Hive 4.0.0 dependencies 
> (iceberg-mr code was based on Hive 3.1)
>  * Make sure all tests pass



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24867) Create iceberg-handler module in Hive

2021-03-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24867?focusedWorklogId=568215=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-568215
 ]

ASF GitHub Bot logged work on HIVE-24867:
-

Author: ASF GitHub Bot
Created on: 18/Mar/21 10:01
Start Date: 18/Mar/21 10:01
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on pull request #2058:
URL: https://github.com/apache/hive/pull/2058#issuecomment-801789947


   @jcamachor Would you like to review as well?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 568215)
Time Spent: 3.5h  (was: 3h 20m)

> Create iceberg-handler module in Hive
> -
>
> Key: HIVE-24867
> URL: https://issues.apache.org/jira/browse/HIVE-24867
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> * Create a new iceberg-handler module in Hive
>  * Copy the code from the Iceberg/iceberg-mr module into this new Hive module
>  * Make necessary changes so it compiles with Hive 4.0.0 dependencies 
> (iceberg-mr code was based on Hive 3.1)
>  * Make sure all tests pass



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24867) Create iceberg-handler module in Hive

2021-03-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24867?focusedWorklogId=568214=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-568214
 ]

ASF GitHub Bot logged work on HIVE-24867:
-

Author: ASF GitHub Bot
Created on: 18/Mar/21 10:01
Start Date: 18/Mar/21 10:01
Worklog Time Spent: 10m 
  Work Description: marton-bod edited a comment on pull request #2058:
URL: https://github.com/apache/hive/pull/2058#issuecomment-801789445


   > Can we enable the Checkstyle and other code checks for the tests the same 
way as it is done in the Iceberg repo?
   
   Sure, will do



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 568214)
Time Spent: 3h 20m  (was: 3h 10m)

> Create iceberg-handler module in Hive
> -
>
> Key: HIVE-24867
> URL: https://issues.apache.org/jira/browse/HIVE-24867
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> * Create a new iceberg-handler module in Hive
>  * Copy the code from the Iceberg/iceberg-mr module into this new Hive module
>  * Make necessary changes so it compiles with Hive 4.0.0 dependencies 
> (iceberg-mr code was based on Hive 3.1)
>  * Make sure all tests pass



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24867) Create iceberg-handler module in Hive

2021-03-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24867?focusedWorklogId=568213=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-568213
 ]

ASF GitHub Bot logged work on HIVE-24867:
-

Author: ASF GitHub Bot
Created on: 18/Mar/21 10:00
Start Date: 18/Mar/21 10:00
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on pull request #2058:
URL: https://github.com/apache/hive/pull/2058#issuecomment-801789445


   > Can we enable the Checkstyle and other code checks for the tests the same 
way as it is done in the Iceberg repo?
   Sure, will do



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 568213)
Time Spent: 3h 10m  (was: 3h)

> Create iceberg-handler module in Hive
> -
>
> Key: HIVE-24867
> URL: https://issues.apache.org/jira/browse/HIVE-24867
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> * Create a new iceberg-handler module in Hive
>  * Copy the code from the Iceberg/iceberg-mr module into this new Hive module
>  * Make necessary changes so it compiles with Hive 4.0.0 dependencies 
> (iceberg-mr code was based on Hive 3.1)
>  * Make sure all tests pass



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24900) Failed compaction does not cleanup the directories

2021-03-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24900?focusedWorklogId=568210=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-568210
 ]

ASF GitHub Bot logged work on HIVE-24900:
-

Author: ASF GitHub Bot
Created on: 18/Mar/21 09:57
Start Date: 18/Mar/21 09:57
Worklog Time Spent: 10m 
  Work Description: pvargacl commented on a change in pull request #2086:
URL: https://github.com/apache/hive/pull/2086#discussion_r596707252



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Worker.java
##
@@ -545,6 +556,8 @@ protected Boolean findNextCompactionAndExecute(boolean 
computeStats) throws Inte
 
 heartbeater.cancel();
 
+failAfterCompactionIfSetForTest();

Review comment:
   instead of introducing this test only code here, you could use 
Mockito.spy in the test and override verifyTableIdHasNotChanged to always throw 
exception, that would have the same effect (probably you need the change the 
visibility to protected)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 568210)
Time Spent: 40m  (was: 0.5h)

> Failed compaction does not cleanup the directories
> --
>
> Key: HIVE-24900
> URL: https://issues.apache.org/jira/browse/HIVE-24900
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Ramesh Kumar Thangarajan
>Assignee: Ramesh Kumar Thangarajan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Failed compaction does not cleanup the directories



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24880) Add host and version information to compection queue

2021-03-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24880?focusedWorklogId=568201=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-568201
 ]

ASF GitHub Bot logged work on HIVE-24880:
-

Author: ASF GitHub Bot
Created on: 18/Mar/21 09:34
Start Date: 18/Mar/21 09:34
Worklog Time Spent: 10m 
  Work Description: pvargacl commented on a change in pull request #2079:
URL: https://github.com/apache/hive/pull/2079#discussion_r596689908



##
File path: 
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java
##
@@ -4638,8 +4649,8 @@ public GetPartitionsResponse 
getPartitionsWithSpecs(GetPartitionsRequest request
   }
 
   @Override
-  public OptionalCompactionInfoStruct findNextCompact(String workerId) throws 
MetaException, TException {
-return client.find_next_compact(workerId);
+  public OptionalCompactionInfoStruct findNextCompact(String workerId, String 
workerVersion) throws MetaException, TException {

Review comment:
   ok, added back the original method with deprecation.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 568201)
Time Spent: 4h  (was: 3h 50m)

> Add host and version information to compection queue
> 
>
> Key: HIVE-24880
> URL: https://issues.apache.org/jira/browse/HIVE-24880
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> The Initiator host and version should be added to compaction and completed 
> compaction queue.
> The worker version should be added to compaction and completed compaction 
> queue.
> They should be available in sys tables and view.
> The version should come from the runtime version (not the schema): 
> Initiator.class.getPackage().getImplementationVersion() works on clusters 
> (hive exec has manifest), might not work in unit tests.
> This would make it possible to create checks on use cases like these:
>  * multiple hosts are running Initiator
>  ** in some scenarios with different runtime verion
>  * the worker and initiator runtime version are not the same



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24727) Cache hydration api in llap proto

2021-03-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24727?focusedWorklogId=568187=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-568187
 ]

ASF GitHub Bot logged work on HIVE-24727:
-

Author: ASF GitHub Bot
Created on: 18/Mar/21 09:13
Start Date: 18/Mar/21 09:13
Worklog Time Spent: 10m 
  Work Description: asinkovits commented on a change in pull request #2066:
URL: https://github.com/apache/hive/pull/2066#discussion_r596673725



##
File path: 
llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapIoImpl.java
##
@@ -18,24 +18,36 @@
 
 package org.apache.hadoop.hive.llap.io.api.impl;
 
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.DataInput;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
 import java.io.IOException;
 import java.util.ArrayList;
 import java.util.Arrays;
+import java.util.Comparator;
+import java.util.HashMap;
 import java.util.List;
 import java.util.concurrent.ExecutorService;
 import java.util.concurrent.LinkedBlockingQueue;
 import java.util.concurrent.TimeUnit;
 import java.util.function.Predicate;
+import java.util.stream.Collectors;
 
 import javax.management.ObjectName;
 
+import com.google.protobuf.ByteString;
 import org.apache.hadoop.fs.Path;
 import org.apache.hadoop.hive.common.io.CacheTag;
 import org.apache.hadoop.hive.llap.ProactiveEviction;
 import org.apache.hadoop.hive.llap.cache.MemoryLimitedPathCache;
 import org.apache.hadoop.hive.llap.cache.PathCache;
+import org.apache.hadoop.hive.llap.cache.LlapCacheableBuffer;
 import org.apache.hadoop.hive.llap.cache.ProactiveEvictingCachePolicy;
 import org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool;
+import org.apache.hadoop.hive.llap.io.encoded.LlapOrcCacheLoader;
+import org.apache.hadoop.hive.ql.io.SyntheticFileId;

Review comment:
   Good catch. Missed removing the imports.
   The LlapCacheMetadataSerializer solely purpose is to decouple the logic, and 
make it more testable. It's in the same package as LlapIoImpl, so no import is 
required.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 568187)
Time Spent: 50m  (was: 40m)

> Cache hydration api in llap proto
> -
>
> Key: HIVE-24727
> URL: https://issues.apache.org/jira/browse/HIVE-24727
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24727) Cache hydration api in llap proto

2021-03-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24727?focusedWorklogId=568183=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-568183
 ]

ASF GitHub Bot logged work on HIVE-24727:
-

Author: ASF GitHub Bot
Created on: 18/Mar/21 08:58
Start Date: 18/Mar/21 08:58
Worklog Time Spent: 10m 
  Work Description: asinkovits commented on a change in pull request #2066:
URL: https://github.com/apache/hive/pull/2066#discussion_r596663012



##
File path: 
llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/LlapProtocolServerImpl.java
##
@@ -369,6 +372,13 @@ public GetTokenResponseProto 
getDelegationToken(RpcController controller,
 return responseProtoBuilder.build();
   }
 
+  @Override
+  public GetCacheContentResponseProto getCacheContent(RpcController controller,
+  GetCacheContentRequestProto request) {
+CacheEntryList entries = LlapProxy.getIo().fetchCachedMetadata();

Review comment:
   For the sake of con consistency, i think its better to return an empty 
response rather than a null. All rpc calls in the api do this.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 568183)
Time Spent: 40m  (was: 0.5h)

> Cache hydration api in llap proto
> -
>
> Key: HIVE-24727
> URL: https://issues.apache.org/jira/browse/HIVE-24727
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24727) Cache hydration api in llap proto

2021-03-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24727?focusedWorklogId=568181=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-568181
 ]

ASF GitHub Bot logged work on HIVE-24727:
-

Author: ASF GitHub Bot
Created on: 18/Mar/21 08:57
Start Date: 18/Mar/21 08:57
Worklog Time Spent: 10m 
  Work Description: asinkovits commented on a change in pull request #2066:
URL: https://github.com/apache/hive/pull/2066#discussion_r596662014



##
File path: llap-client/src/java/org/apache/hadoop/hive/llap/io/api/LlapIo.java
##
@@ -87,4 +87,15 @@
*/
   RecordReader 
llapVectorizedOrcReaderForPath(Object fileKey, Path path, CacheTag tag,
   List tableIncludedCols, JobConf conf, long offset, long length) 
throws IOException;
+
+  /**
+   * Extract and return the cache content metadata.
+   */
+  LlapDaemonProtocolProtos.CacheEntryList fetchCachedMetadata();

Review comment:
   Sounds good.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 568181)
Time Spent: 0.5h  (was: 20m)

> Cache hydration api in llap proto
> -
>
> Key: HIVE-24727
> URL: https://issues.apache.org/jira/browse/HIVE-24727
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24880) Add host and version information to compection queue

2021-03-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24880?focusedWorklogId=568179=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-568179
 ]

ASF GitHub Bot logged work on HIVE-24880:
-

Author: ASF GitHub Bot
Created on: 18/Mar/21 08:52
Start Date: 18/Mar/21 08:52
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #2079:
URL: https://github.com/apache/hive/pull/2079#discussion_r596659034



##
File path: 
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java
##
@@ -3890,9 +3890,20 @@ public CompactionResponse compact2(String dbname, String 
tableName, String parti
 }
 cr.setType(type);
 cr.setProperties(tblproperties);
+cr.setInitiatorId(hostname() + "-manual");

Review comment:
   gotcha! 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 568179)
Time Spent: 3h 50m  (was: 3h 40m)

> Add host and version information to compection queue
> 
>
> Key: HIVE-24880
> URL: https://issues.apache.org/jira/browse/HIVE-24880
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> The Initiator host and version should be added to compaction and completed 
> compaction queue.
> The worker version should be added to compaction and completed compaction 
> queue.
> They should be available in sys tables and view.
> The version should come from the runtime version (not the schema): 
> Initiator.class.getPackage().getImplementationVersion() works on clusters 
> (hive exec has manifest), might not work in unit tests.
> This would make it possible to create checks on use cases like these:
>  * multiple hosts are running Initiator
>  ** in some scenarios with different runtime verion
>  * the worker and initiator runtime version are not the same



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24880) Add host and version information to compection queue

2021-03-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24880?focusedWorklogId=568178=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-568178
 ]

ASF GitHub Bot logged work on HIVE-24880:
-

Author: ASF GitHub Bot
Created on: 18/Mar/21 08:50
Start Date: 18/Mar/21 08:50
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #2079:
URL: https://github.com/apache/hive/pull/2079#discussion_r596657330



##
File path: 
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java
##
@@ -4638,8 +4649,8 @@ public GetPartitionsResponse 
getPartitionsWithSpecs(GetPartitionsRequest request
   }
 
   @Override
-  public OptionalCompactionInfoStruct findNextCompact(String workerId) throws 
MetaException, TException {
-return client.find_next_compact(workerId);
+  public OptionalCompactionInfoStruct findNextCompact(String workerId, String 
workerVersion) throws MetaException, TException {

Review comment:
   I am not sure we could retire public API like this upstream. Not 100% 
sure, but I think we should keep prev method signature as well but mark it as 
deprecated.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 568178)
Time Spent: 3h 40m  (was: 3.5h)

> Add host and version information to compection queue
> 
>
> Key: HIVE-24880
> URL: https://issues.apache.org/jira/browse/HIVE-24880
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> The Initiator host and version should be added to compaction and completed 
> compaction queue.
> The worker version should be added to compaction and completed compaction 
> queue.
> They should be available in sys tables and view.
> The version should come from the runtime version (not the schema): 
> Initiator.class.getPackage().getImplementationVersion() works on clusters 
> (hive exec has manifest), might not work in unit tests.
> This would make it possible to create checks on use cases like these:
>  * multiple hosts are running Initiator
>  ** in some scenarios with different runtime verion
>  * the worker and initiator runtime version are not the same



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24840) Materialized View incremental rebuild produces wrong result set after compaction

2021-03-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24840?focusedWorklogId=568163=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-568163
 ]

ASF GitHub Bot logged work on HIVE-24840:
-

Author: ASF GitHub Bot
Created on: 18/Mar/21 08:04
Start Date: 18/Mar/21 08:04
Worklog Time Spent: 10m 
  Work Description: kasakrisz opened a new pull request #2088:
URL: https://github.com/apache/hive/pull/2088


   ### What changes were proposed in this pull request?
   When checking a Materialized view validity check whether any of the source 
tables compacted since the last materialized view rebuild.
   
   ### Why are the changes needed?
   During Materialized view rebuild we choose from incremental or full rebuild. 
   To make this choice existing implementation searches for delete transactions 
affect the source tables of the MV in `COMPLETED_TXN_COMPONENTS` table 
(Metastore) since the last rebuild. However these records are deleted during 
compaction. This leads to corrupted materialized view datasets since 
incremental rebuild will be used which does not handle deleted records.
   
   ### Does this PR introduce _any_ user-facing change?
   Yes. Query the materialized view and queries which plan is rewritten to scan 
the materialized view may produce different results.
   Only transactional materialized views are affected.
   
   ### How was this patch tested?
   ```
   mvn test -Dtest.output.overwrite -DskipSparkTests 
-Dtest=TestMiniLlapLocalCliDriver -Dqfile=materialized_view_create_rewrite_4.q 
-pl itests/qtest -Pitests
   mvn test -Dtest=TestMaterializedViewRebuild -pl itests/hive-unit -Pitests
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 568163)
Remaining Estimate: 0h
Time Spent: 10m

> Materialized View incremental rebuild produces wrong result set after 
> compaction
> 
>
> Key: HIVE-24840
> URL: https://issues.apache.org/jira/browse/HIVE-24840
> Project: Hive
>  Issue Type: Bug
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Critical
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {code}
> create table t1(a int, b varchar(128), c float) stored as orc TBLPROPERTIES 
> ('transactional'='true');
> insert into t1(a,b, c) values (1, 'one', 1.1), (2, 'two', 2.2), (NULL, NULL, 
> NULL);
> create materialized view mat1 stored as orc TBLPROPERTIES 
> ('transactional'='true') as 
> select a,b,c from t1 where a > 0 or a is null;
> delete from t1 where a = 1;
> alter table t1 compact 'major';
> -- Wait until compaction finished.
> alter materialized view mat1 rebuild;
> {code}
> Expected result of query
> {code}
> select * from mat1;
> {code}
> {code}
> 2 two 2
> NULL NULL NULL
> {code}
> but if incremental rebuild is enabled the result is
> {code}
> 1 one 1
> 2 two 2
> NULL NULL NULL
> {code}
> Cause: Incremental rebuild queries whether the source tables of a 
> materialized view has delete or update transaction since the last rebuild 
> from metastore from COMPLETED_TXN_COMPONENTS table. However when a major 
> compaction is performed on the source tables the records related to these 
> tables are deleted from COMPLETED_TXN_COMPONENTS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24840) Materialized View incremental rebuild produces wrong result set after compaction

2021-03-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24840:
--
Labels: pull-request-available  (was: )

> Materialized View incremental rebuild produces wrong result set after 
> compaction
> 
>
> Key: HIVE-24840
> URL: https://issues.apache.org/jira/browse/HIVE-24840
> Project: Hive
>  Issue Type: Bug
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {code}
> create table t1(a int, b varchar(128), c float) stored as orc TBLPROPERTIES 
> ('transactional'='true');
> insert into t1(a,b, c) values (1, 'one', 1.1), (2, 'two', 2.2), (NULL, NULL, 
> NULL);
> create materialized view mat1 stored as orc TBLPROPERTIES 
> ('transactional'='true') as 
> select a,b,c from t1 where a > 0 or a is null;
> delete from t1 where a = 1;
> alter table t1 compact 'major';
> -- Wait until compaction finished.
> alter materialized view mat1 rebuild;
> {code}
> Expected result of query
> {code}
> select * from mat1;
> {code}
> {code}
> 2 two 2
> NULL NULL NULL
> {code}
> but if incremental rebuild is enabled the result is
> {code}
> 1 one 1
> 2 two 2
> NULL NULL NULL
> {code}
> Cause: Incremental rebuild queries whether the source tables of a 
> materialized view has delete or update transaction since the last rebuild 
> from metastore from COMPLETED_TXN_COMPONENTS table. However when a major 
> compaction is performed on the source tables the records related to these 
> tables are deleted from COMPLETED_TXN_COMPONENTS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24718) Moving to file based iteration for copying data

2021-03-18 Thread Pravin Sinha (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pravin Sinha resolved HIVE-24718.
-
Resolution: Fixed

Committed to master

Thank you for the patch, [~^sharma]

> Moving to file based iteration for copying data
> ---
>
> Key: HIVE-24718
> URL: https://issues.apache.org/jira/browse/HIVE-24718
> Project: Hive
>  Issue Type: Bug
>Reporter: Arko Sharma
>Assignee: Arko Sharma
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24718.01.patch, HIVE-24718.02.patch, 
> HIVE-24718.04.patch, HIVE-24718.05.patch, HIVE-24718.06.patch
>
>  Time Spent: 6h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24718) Moving to file based iteration for copying data

2021-03-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24718?focusedWorklogId=568132=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-568132
 ]

ASF GitHub Bot logged work on HIVE-24718:
-

Author: ASF GitHub Bot
Created on: 18/Mar/21 07:05
Start Date: 18/Mar/21 07:05
Worklog Time Spent: 10m 
  Work Description: pkumarsinha merged pull request #1936:
URL: https://github.com/apache/hive/pull/1936


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 568132)
Time Spent: 6h 40m  (was: 6.5h)

> Moving to file based iteration for copying data
> ---
>
> Key: HIVE-24718
> URL: https://issues.apache.org/jira/browse/HIVE-24718
> Project: Hive
>  Issue Type: Bug
>Reporter: Arko Sharma
>Assignee: Arko Sharma
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24718.01.patch, HIVE-24718.02.patch, 
> HIVE-24718.04.patch, HIVE-24718.05.patch, HIVE-24718.06.patch
>
>  Time Spent: 6h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24882) Compaction task reattempt fails with FileAlreadyExistsException for DeleteEventWriter

2021-03-18 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-24882:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Compaction task reattempt fails with FileAlreadyExistsException for 
> DeleteEventWriter
> -
>
> Key: HIVE-24882
> URL: https://issues.apache.org/jira/browse/HIVE-24882
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> If first attempt of compaction task is pre-empted by yarn or execution failed 
> because of environmental issues, re-attempted tasks will fail with 
> FileAlreadyExistsException
> {noformat}
> Error: org.apache.hadoop.fs.FileAlreadyExistsException: 
> /warehouse/tablespace/managed/hive/test.db/acid_table/dept=cse/_tmp_xxx/delete_delta_001_010/bucket_0
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.startFile(FSDirWriteFileOp.java:380)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2453)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2351)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:774)
>  
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:462)
>  
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>  
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
>  
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) 
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) 
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) 
> at java.security.AccessController.doPrivileged(Native Method) 
> at javax.security.auth.Subject.doAs(Subject.java:422) 
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>  
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682) 
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) 
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>  
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423) 
> at 
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:121)
>  
> at 
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:88)
>  
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:278)
>  
> at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1211) 
> at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1190) 
> at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1128) 
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$8.doCall(DistributedFileSystem.java:531)
>  
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$8.doCall(DistributedFileSystem.java:528)
>  
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>  
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:542)
>  
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:469)
>  
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1118) 
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1098) 
> at org.apache.orc.impl.PhysicalFsWriter.(PhysicalFsWriter.java:95) 
> at org.apache.orc.impl.WriterImpl.(WriterImpl.java:177) 
> at org.apache.hadoop.hive.ql.io.orc.WriterImpl.(WriterImpl.java:94) 
> at org.apache.hadoop.hive.ql.io.orc.OrcFile.createWriter(OrcFile.java:378) 
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat.getRawRecordWriter(OrcOutputFormat.java:299)
>  
> at 
> org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorMap.getDeleteEventWriter(CompactorMR.java:1084)
>  
> at 
> org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorMap.map(CompactorMR.java:995)
>  
> at 
> org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorMap.map(CompactorMR.java:958){noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24882) Compaction task reattempt fails with FileAlreadyExistsException for DeleteEventWriter

2021-03-18 Thread Denys Kuzmenko (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17303891#comment-17303891
 ] 

Denys Kuzmenko commented on HIVE-24882:
---

Merged to master.
Thank you for the patch [~nareshpr]!

> Compaction task reattempt fails with FileAlreadyExistsException for 
> DeleteEventWriter
> -
>
> Key: HIVE-24882
> URL: https://issues.apache.org/jira/browse/HIVE-24882
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> If first attempt of compaction task is pre-empted by yarn or execution failed 
> because of environmental issues, re-attempted tasks will fail with 
> FileAlreadyExistsException
> {noformat}
> Error: org.apache.hadoop.fs.FileAlreadyExistsException: 
> /warehouse/tablespace/managed/hive/test.db/acid_table/dept=cse/_tmp_xxx/delete_delta_001_010/bucket_0
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.startFile(FSDirWriteFileOp.java:380)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2453)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2351)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:774)
>  
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:462)
>  
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>  
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
>  
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) 
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) 
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) 
> at java.security.AccessController.doPrivileged(Native Method) 
> at javax.security.auth.Subject.doAs(Subject.java:422) 
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>  
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682) 
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) 
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>  
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423) 
> at 
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:121)
>  
> at 
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:88)
>  
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:278)
>  
> at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1211) 
> at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1190) 
> at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1128) 
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$8.doCall(DistributedFileSystem.java:531)
>  
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$8.doCall(DistributedFileSystem.java:528)
>  
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>  
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:542)
>  
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:469)
>  
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1118) 
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1098) 
> at org.apache.orc.impl.PhysicalFsWriter.(PhysicalFsWriter.java:95) 
> at org.apache.orc.impl.WriterImpl.(WriterImpl.java:177) 
> at org.apache.hadoop.hive.ql.io.orc.WriterImpl.(WriterImpl.java:94) 
> at org.apache.hadoop.hive.ql.io.orc.OrcFile.createWriter(OrcFile.java:378) 
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat.getRawRecordWriter(OrcOutputFormat.java:299)
>  
> at 
> org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorMap.getDeleteEventWriter(CompactorMR.java:1084)
>  
> at 
> org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorMap.map(CompactorMR.java:995)
>  
> at 
> org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorMap.map(CompactorMR.java:958){noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24882) Compaction task reattempt fails with FileAlreadyExistsException for DeleteEventWriter

2021-03-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24882?focusedWorklogId=568131=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-568131
 ]

ASF GitHub Bot logged work on HIVE-24882:
-

Author: ASF GitHub Bot
Created on: 18/Mar/21 07:02
Start Date: 18/Mar/21 07:02
Worklog Time Spent: 10m 
  Work Description: deniskuzZ merged pull request #2069:
URL: https://github.com/apache/hive/pull/2069


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 568131)
Time Spent: 20m  (was: 10m)

> Compaction task reattempt fails with FileAlreadyExistsException for 
> DeleteEventWriter
> -
>
> Key: HIVE-24882
> URL: https://issues.apache.org/jira/browse/HIVE-24882
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> If first attempt of compaction task is pre-empted by yarn or execution failed 
> because of environmental issues, re-attempted tasks will fail with 
> FileAlreadyExistsException
> {noformat}
> Error: org.apache.hadoop.fs.FileAlreadyExistsException: 
> /warehouse/tablespace/managed/hive/test.db/acid_table/dept=cse/_tmp_xxx/delete_delta_001_010/bucket_0
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.startFile(FSDirWriteFileOp.java:380)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2453)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2351)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:774)
>  
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:462)
>  
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>  
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
>  
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) 
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) 
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) 
> at java.security.AccessController.doPrivileged(Native Method) 
> at javax.security.auth.Subject.doAs(Subject.java:422) 
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>  
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682) 
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) 
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>  
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423) 
> at 
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:121)
>  
> at 
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:88)
>  
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:278)
>  
> at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1211) 
> at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1190) 
> at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1128) 
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$8.doCall(DistributedFileSystem.java:531)
>  
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$8.doCall(DistributedFileSystem.java:528)
>  
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>  
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:542)
>  
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:469)
>  
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1118) 
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1098) 
> at org.apache.orc.impl.PhysicalFsWriter.(PhysicalFsWriter.java:95) 
> at org.apache.orc.impl.WriterImpl.(WriterImpl.java:177) 
> at org.apache.hadoop.hive.ql.io.orc.WriterImpl.(WriterImpl.java:94) 
> at org.apache.hadoop.hive.ql.io.orc.OrcFile.createWriter(OrcFile.java:378) 
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat.getRawRecordWriter(OrcOutputFormat.java:299)
>  

[jira] [Resolved] (HIVE-24842) SHOW CREATE TABLE on a VIEW with partition returns wrong sql.

2021-03-18 Thread Pravin Sinha (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pravin Sinha resolved HIVE-24842.
-
Resolution: Fixed

Committed to master.
 Thank you for the patch, [~anuragshekhar]

> SHOW CREATE TABLE on a VIEW with partition returns wrong sql. 
> --
>
> Key: HIVE-24842
> URL: https://issues.apache.org/jira/browse/HIVE-24842
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Anurag Shekhar
>Assignee: Anurag Shekhar
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Steps to reproduce
> Create a view with partitions.
> execute "Show create table " on above view.
> The sql returned will not have partitioned on clause in it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24842) SHOW CREATE TABLE on a VIEW with partition returns wrong sql.

2021-03-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24842?focusedWorklogId=568115=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-568115
 ]

ASF GitHub Bot logged work on HIVE-24842:
-

Author: ASF GitHub Bot
Created on: 18/Mar/21 06:24
Start Date: 18/Mar/21 06:24
Worklog Time Spent: 10m 
  Work Description: pkumarsinha merged pull request #2036:
URL: https://github.com/apache/hive/pull/2036


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 568115)
Time Spent: 20m  (was: 10m)

> SHOW CREATE TABLE on a VIEW with partition returns wrong sql. 
> --
>
> Key: HIVE-24842
> URL: https://issues.apache.org/jira/browse/HIVE-24842
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Anurag Shekhar
>Assignee: Anurag Shekhar
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Steps to reproduce
> Create a view with partitions.
> execute "Show create table " on above view.
> The sql returned will not have partitioned on clause in it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24902) Incorrect result due to ReduceExpressionsRule

2021-03-18 Thread Nemon Lou (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17303874#comment-17303874
 ] 

Nemon Lou commented on HIVE-24902:
--

[~julianhyde] Would you mind taking a look? Calcite version is 1.21.0

> Incorrect result due to ReduceExpressionsRule
> -
>
> Key: HIVE-24902
> URL: https://issues.apache.org/jira/browse/HIVE-24902
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 3.1.2, 4.0.0
>Reporter: Nemon Lou
>Priority: Major
>
> The following sql returns only one record (20210308)but we expect two(20210308
> 20210309).
> {code:sql}
> select * from (
> select 
>   case when b.a=1
>  then  
>   cast 
> (from_unixtime(unix_timestamp(cast(20210309 as string),'MMdd') - 
> 86400,'MMdd') as bigint)
> else 
> 20210309 
>  end 
> as col
> from 
> (select stack(2,1,2) as (a))
>  as b
> ) t 
> where t.col is not null;
> {code}
> After debuging, i find the ReduceExpressionsRule changes expression in the 
> wrong way.
> Original expression:
> {code:sql}
> IS NOT NULL(CASE(=($0, 1), 
> CAST(FROM_UNIXTIME(-(UNIX_TIMESTAMP(CAST(_UTF-16LE'20210309'):VARCHAR(2147483647)
>  CHARACTER SET "UTF-16LE" COLLATE "ISO-8859-1$en_US$primary", 
> _UTF-16LE'MMdd'), CAST(86400):BIGINT), _UTF-16LE'MMdd')):BIGINT, 
> 20210309))
> {code}
> After reducing expressions:
> {code:sql}
> CASE(=($0, 1), IS NOT 
> NULL(CAST(FROM_UNIXTIME(-(UNIX_TIMESTAMP(CAST(_UTF-16LE'20210309'):VARCHAR(2147483647)
>  CHARACTER SET "UTF-16LE" COLLATE "ISO-8859-1$en_US$primary", 
> _UTF-16LE'MMdd'), CAST(86400):BIGINT), _UTF-16LE'MMdd')):BIGINT), 
> true)
> {code}
> The query plan in main branch:
> {code:sql}
> STAGE DEPENDENCIES:
>   Stage-0 is a root stage
> STAGE PLANS:
>   Stage: Stage-0
> Fetch Operator
>   limit: -1
>   Processor Tree:
> TableScan
>   alias: _dummy_table
>   Row Limit Per Split: 1
>   Statistics: Num rows: 1 Data size: 10 Basic stats: COMPLETE Column 
> stats: COMPLETE
>   Select Operator
> expressions: 2 (type: int), 1 (type: int), 2 (type: int)
> outputColumnNames: _col0, _col1, _col2
> Statistics: Num rows: 1 Data size: 12 Basic stats: COMPLETE 
> Column stats: COMPLETE
> UDTF Operator
>   Statistics: Num rows: 1 Data size: 12 Basic stats: COMPLETE 
> Column stats: COMPLETE
>   function name: stack
>   Filter Operator
> predicate: COALESCE((col0 = 1),false) (type: boolean)
> Statistics: Num rows: 1 Data size: 12 Basic stats: COMPLETE 
> Column stats: COMPLETE
> Select Operator
>   expressions: CASE WHEN ((col0 = 1)) THEN (20210308L) ELSE 
> (20210309L) END (type: bigint)
>   outputColumnNames: _col0
>   Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
> Column stats: COMPLETE
>   ListSink
> Time taken: 0.155 seconds, Fetched: 28 row(s)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24842) SHOW CREATE TABLE on a VIEW with partition returns wrong sql.

2021-03-18 Thread Pravin Sinha (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17303872#comment-17303872
 ] 

Pravin Sinha commented on HIVE-24842:
-

+1

> SHOW CREATE TABLE on a VIEW with partition returns wrong sql. 
> --
>
> Key: HIVE-24842
> URL: https://issues.apache.org/jira/browse/HIVE-24842
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Anurag Shekhar
>Assignee: Anurag Shekhar
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Steps to reproduce
> Create a view with partitions.
> execute "Show create table " on above view.
> The sql returned will not have partitioned on clause in it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)