[jira] [Work started] (HIVE-25653) Precision problem in STD, STDDDEV_SAMP,STDDEV_POP

2021-10-28 Thread Ashish Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-25653 started by Ashish Sharma.

> Precision problem in STD, STDDDEV_SAMP,STDDEV_POP
> -
>
> Key: HIVE-25653
> URL: https://issues.apache.org/jira/browse/HIVE-25653
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ashish Sharma
>Assignee: Ashish Sharma
>Priority: Major
>
> Description
> *Script*- 
> create table test ( col1 int );
> insert into values 
> ('10230.72'),('10230.72'),('10230.72'),('10230.72'),('10230.72'),('10230.72'),('10230.72');
> select STDDEV_SAMP(col1) AS STDDEV_6M , STDDEV(col1) as STDDEV 
> ,STDDEV_POP(col1) as STDDEV_POP from test;
> *Result*- 
> STDDDEV_SAMPSTDDEV  
> STDDEV_POP 
> 5.940794514955821E-13 5.42317860890711E-13 5.42317860890711E-13
> *Expected*- 
> STDDDEV_SAMPSTDDEV  
> STDDEV_POP 
> 0   0 
>0



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25626) JDBCStorageHandler CBO fails when JDBC_PASSWORD_URI is used

2021-10-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25626?focusedWorklogId=671667=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-671667
 ]

ASF GitHub Bot logged work on HIVE-25626:
-

Author: ASF GitHub Bot
Created on: 28/Oct/21 21:31
Start Date: 28/Oct/21 21:31
Worklog Time Spent: 10m 
  Work Description: cravani commented on a change in pull request #2734:
URL: https://github.com/apache/hive/pull/2734#discussion_r738789344



##
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java
##
@@ -3011,9 +3011,14 @@ private RelNode genTableLogicalPlan(String tableAlias, 
QB qb) throws SemanticExc
 final String user = 
tabMetaData.getProperty(Constants.JDBC_USERNAME);
 String pswd = tabMetaData.getProperty(Constants.JDBC_PASSWORD);
 if (pswd == null) {
-  String keystore = 
tabMetaData.getProperty(Constants.JDBC_KEYSTORE);
-  String key = tabMetaData.getProperty(Constants.JDBC_KEY);
-  pswd = Utilities.getPasswdFromKeystore(keystore, key);
+  if(!(tabMetaData.getProperty(Constants.JDBC_PASSWORD_URI) == 
null)) {
+  pswd = 
Utilities.getPasswdFromUri(tabMetaData.getProperty(Constants.JDBC_PASSWORD_URI));
+  }
+  else {
+String keystore = 
tabMetaData.getProperty(Constants.JDBC_KEYSTORE);
+String key = tabMetaData.getProperty(Constants.JDBC_KEY);
+pswd = Utilities.getPasswdFromKeystore(keystore, key);
+  }

Review comment:
   @zabetak Thank you for the comments, problem with final string would 
lead to compilation error if used in el else block.
   Modified patch a bit and submitted a PR again.
   
   Test case is sill pending, would it be Ok if I write a test case post 
HIVE-25594 gets pushed? maybe another Jira?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 671667)
Time Spent: 0.5h  (was: 20m)

> JDBCStorageHandler CBO fails when JDBC_PASSWORD_URI is used
> ---
>
> Key: HIVE-25626
> URL: https://issues.apache.org/jira/browse/HIVE-25626
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, JDBC storage handler
>Affects Versions: 3.1.2, 4.0.0
>Reporter: Chiran Ravani
>Assignee: Chiran Ravani
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When table created with JDBCStorageHandler and JDBC_PASSWORD_URI is used as a 
> password mechanism, CBO fails causing all the data to be fetched from DB and 
> then processed in hive.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25591) CREATE EXTERNAL TABLE fails for JDBC tables stored in non-default schema

2021-10-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25591?focusedWorklogId=671550=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-671550
 ]

ASF GitHub Bot logged work on HIVE-25591:
-

Author: ASF GitHub Bot
Created on: 28/Oct/21 16:59
Start Date: 28/Oct/21 16:59
Worklog Time Spent: 10m 
  Work Description: zabetak commented on pull request #2759:
URL: https://github.com/apache/hive/pull/2759#issuecomment-954030659


   Hey @cravani please have a look as well and let me know what you think.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 671550)
Time Spent: 20m  (was: 10m)

> CREATE EXTERNAL TABLE fails for JDBC tables stored in non-default schema
> 
>
> Key: HIVE-25591
> URL: https://issues.apache.org/jira/browse/HIVE-25591
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Consider the following use case where tables reside in some user-defined 
> schema in some JDBC compliant database:
> +Postgres+
> {code:sql}
> create schema world;
> create table if not exists world.country (name varchar(80) not null);
> insert into world.country (name) values ('India');
> insert into world.country (name) values ('Russia');
> insert into world.country (name) values ('USA');
> {code}
> The following DDL statement in Hive fails:
> +Hive+
> {code:sql}
> CREATE EXTERNAL TABLE country (name varchar(80))
> STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler'
> TBLPROPERTIES (
> "hive.sql.database.type" = "POSTGRES",
> "hive.sql.jdbc.driver" = "org.postgresql.Driver",
> "hive.sql.jdbc.url" = "jdbc:postgresql://localhost:5432/test",
> "hive.sql.dbcp.username" = "user",
> "hive.sql.dbcp.password" = "pwd",
> "hive.sql.schema" = "world",
> "hive.sql.table" = "country");
> {code}
> The exception is the following:
> {noformat}
> org.postgresql.util.PSQLException: ERROR: relation "country" does not exist
>   Position: 15
>   at 
> org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2532)
>  ~[postgresql-42.2.14.jar:42.2.14]
>   at 
> org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2267)
>  ~[postgresql-42.2.14.jar:42.2.14]
>   at 
> org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:312) 
> ~[postgresql-42.2.14.jar:42.2.14]
>   at 
> org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:448) 
> ~[postgresql-42.2.14.jar:42.2.14]
>   at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:369) 
> ~[postgresql-42.2.14.jar:42.2.14]
>   at 
> org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:153)
>  ~[postgresql-42.2.14.jar:42.2.14]
>   at 
> org.postgresql.jdbc.PgPreparedStatement.executeQuery(PgPreparedStatement.java:103)
>  ~[postgresql-42.2.14.jar:42.2.14]
>   at 
> org.apache.commons.dbcp2.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:122)
>  ~[commons-dbcp2-2.7.0.jar:2.7.0]
>   at 
> org.apache.commons.dbcp2.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:122)
>  ~[commons-dbcp2-2.7.0.jar:2.7.0]
>   at 
> org.apache.hive.storage.jdbc.dao.GenericJdbcDatabaseAccessor.getColumnNames(GenericJdbcDatabaseAccessor.java:83)
>  [hive-jdbc-handler-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at org.apache.hive.storage.jdbc.JdbcSerDe.initialize(JdbcSerDe.java:98) 
> [hive-jdbc-handler-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreUtils.getDeserializer(HiveMetaStoreUtils.java:95)
>  [hive-metastore-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreUtils.getDeserializer(HiveMetaStoreUtils.java:78)
>  [hive-metastore-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:342)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.metadata.Table.getDeserializer(Table.java:324) 
> [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.metadata.Table.getColsInternal(Table.java:734) 
> [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 

[jira] [Updated] (HIVE-25591) CREATE EXTERNAL TABLE fails for JDBC tables stored in non-default schema

2021-10-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25591:
--
Labels: pull-request-available  (was: )

> CREATE EXTERNAL TABLE fails for JDBC tables stored in non-default schema
> 
>
> Key: HIVE-25591
> URL: https://issues.apache.org/jira/browse/HIVE-25591
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Consider the following use case where tables reside in some user-defined 
> schema in some JDBC compliant database:
> +Postgres+
> {code:sql}
> create schema world;
> create table if not exists world.country (name varchar(80) not null);
> insert into world.country (name) values ('India');
> insert into world.country (name) values ('Russia');
> insert into world.country (name) values ('USA');
> {code}
> The following DDL statement in Hive fails:
> +Hive+
> {code:sql}
> CREATE EXTERNAL TABLE country (name varchar(80))
> STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler'
> TBLPROPERTIES (
> "hive.sql.database.type" = "POSTGRES",
> "hive.sql.jdbc.driver" = "org.postgresql.Driver",
> "hive.sql.jdbc.url" = "jdbc:postgresql://localhost:5432/test",
> "hive.sql.dbcp.username" = "user",
> "hive.sql.dbcp.password" = "pwd",
> "hive.sql.schema" = "world",
> "hive.sql.table" = "country");
> {code}
> The exception is the following:
> {noformat}
> org.postgresql.util.PSQLException: ERROR: relation "country" does not exist
>   Position: 15
>   at 
> org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2532)
>  ~[postgresql-42.2.14.jar:42.2.14]
>   at 
> org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2267)
>  ~[postgresql-42.2.14.jar:42.2.14]
>   at 
> org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:312) 
> ~[postgresql-42.2.14.jar:42.2.14]
>   at 
> org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:448) 
> ~[postgresql-42.2.14.jar:42.2.14]
>   at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:369) 
> ~[postgresql-42.2.14.jar:42.2.14]
>   at 
> org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:153)
>  ~[postgresql-42.2.14.jar:42.2.14]
>   at 
> org.postgresql.jdbc.PgPreparedStatement.executeQuery(PgPreparedStatement.java:103)
>  ~[postgresql-42.2.14.jar:42.2.14]
>   at 
> org.apache.commons.dbcp2.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:122)
>  ~[commons-dbcp2-2.7.0.jar:2.7.0]
>   at 
> org.apache.commons.dbcp2.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:122)
>  ~[commons-dbcp2-2.7.0.jar:2.7.0]
>   at 
> org.apache.hive.storage.jdbc.dao.GenericJdbcDatabaseAccessor.getColumnNames(GenericJdbcDatabaseAccessor.java:83)
>  [hive-jdbc-handler-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at org.apache.hive.storage.jdbc.JdbcSerDe.initialize(JdbcSerDe.java:98) 
> [hive-jdbc-handler-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreUtils.getDeserializer(HiveMetaStoreUtils.java:95)
>  [hive-metastore-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreUtils.getDeserializer(HiveMetaStoreUtils.java:78)
>  [hive-metastore-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:342)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.metadata.Table.getDeserializer(Table.java:324) 
> [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.metadata.Table.getColsInternal(Table.java:734) 
> [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at org.apache.hadoop.hive.ql.metadata.Table.getCols(Table.java:717) 
> [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.ddl.table.create.CreateTableDesc.toTable(CreateTableDesc.java:933)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.ddl.table.create.CreateTableOperation.execute(CreateTableOperation.java:59)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:84) 
> [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:212) 
> [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) 
> [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 

[jira] [Work logged] (HIVE-25591) CREATE EXTERNAL TABLE fails for JDBC tables stored in non-default schema

2021-10-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25591?focusedWorklogId=671549=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-671549
 ]

ASF GitHub Bot logged work on HIVE-25591:
-

Author: ASF GitHub Bot
Created on: 28/Oct/21 16:58
Start Date: 28/Oct/21 16:58
Worklog Time Spent: 10m 
  Work Description: zabetak opened a new pull request #2759:
URL: https://github.com/apache/hive/pull/2759


   The tests rely on HIVE-25594 for which there is a separate pull request 
(https://github.com/apache/hive/pull/2742). Please do not review 
https://github.com/apache/hive/commit/cb3026b4db9454c12d5376c71a28eb34b35d783d 
here. If there are remarks please comment on 
https://github.com/apache/hive/pull/2742 instead. 
   
   ### What changes were proposed in this pull request?
   1. Remove getOriQueryToExecute method in favor of getQueryToExecute
   2. Move getQueryToExecute method into GenericJdbcDatabaseAccessor to improve 
encapsulation since the method is only used in this class.
   3. Include hive.sql.schema if available when generating the SQL query.
   4. Add tests/usage samples of hive.sql.schema property in different DBMS.
   
   ### Why are the changes needed?
   1. Avoid failures when the table is in non-default schema.
   2. Demonstrate how hive.sql.schema can be used in different DBMS.
   3. Minor encapsulation improvement.
   
   ### Does this PR introduce _any_ user-facing change?
   Fixes a failure.
   
   ### How was this patch tested?
   `mvn test -Dtest=TestMiniLlapLocalCliDriver 
-Dqfile_regex="jdbc_table_with_schema.*" -Dtest.output.overwrite`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 671549)
Remaining Estimate: 0h
Time Spent: 10m

> CREATE EXTERNAL TABLE fails for JDBC tables stored in non-default schema
> 
>
> Key: HIVE-25591
> URL: https://issues.apache.org/jira/browse/HIVE-25591
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Consider the following use case where tables reside in some user-defined 
> schema in some JDBC compliant database:
> +Postgres+
> {code:sql}
> create schema world;
> create table if not exists world.country (name varchar(80) not null);
> insert into world.country (name) values ('India');
> insert into world.country (name) values ('Russia');
> insert into world.country (name) values ('USA');
> {code}
> The following DDL statement in Hive fails:
> +Hive+
> {code:sql}
> CREATE EXTERNAL TABLE country (name varchar(80))
> STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler'
> TBLPROPERTIES (
> "hive.sql.database.type" = "POSTGRES",
> "hive.sql.jdbc.driver" = "org.postgresql.Driver",
> "hive.sql.jdbc.url" = "jdbc:postgresql://localhost:5432/test",
> "hive.sql.dbcp.username" = "user",
> "hive.sql.dbcp.password" = "pwd",
> "hive.sql.schema" = "world",
> "hive.sql.table" = "country");
> {code}
> The exception is the following:
> {noformat}
> org.postgresql.util.PSQLException: ERROR: relation "country" does not exist
>   Position: 15
>   at 
> org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2532)
>  ~[postgresql-42.2.14.jar:42.2.14]
>   at 
> org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2267)
>  ~[postgresql-42.2.14.jar:42.2.14]
>   at 
> org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:312) 
> ~[postgresql-42.2.14.jar:42.2.14]
>   at 
> org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:448) 
> ~[postgresql-42.2.14.jar:42.2.14]
>   at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:369) 
> ~[postgresql-42.2.14.jar:42.2.14]
>   at 
> org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:153)
>  ~[postgresql-42.2.14.jar:42.2.14]
>   at 
> org.postgresql.jdbc.PgPreparedStatement.executeQuery(PgPreparedStatement.java:103)
>  ~[postgresql-42.2.14.jar:42.2.14]
>   at 
> org.apache.commons.dbcp2.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:122)
>  ~[commons-dbcp2-2.7.0.jar:2.7.0]
>   at 
> org.apache.commons.dbcp2.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:122)
>  ~[commons-dbcp2-2.7.0.jar:2.7.0]
>   at 
> 

[jira] [Updated] (HIVE-25658) Fix regex for masking totalSize table properties in Iceberg q-tests

2021-10-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25658:
--
Labels: pull-request-available  (was: )

> Fix regex for masking totalSize table properties in Iceberg q-tests
> ---
>
> Key: HIVE-25658
> URL: https://issues.apache.org/jira/browse/HIVE-25658
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> HIVE-25607 introduced a text replace regex for masking out the totalSize 
> table property values in Iceberg q.out files. The regex however did not cover 
> all of the props in the q.out files, so here is the fix for the regex.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25658) Fix regex for masking totalSize table properties in Iceberg q-tests

2021-10-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25658?focusedWorklogId=671495=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-671495
 ]

ASF GitHub Bot logged work on HIVE-25658:
-

Author: ASF GitHub Bot
Created on: 28/Oct/21 15:11
Start Date: 28/Oct/21 15:11
Worklog Time Spent: 10m 
  Work Description: marton-bod merged pull request #2757:
URL: https://github.com/apache/hive/pull/2757


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 671495)
Remaining Estimate: 0h
Time Spent: 10m

> Fix regex for masking totalSize table properties in Iceberg q-tests
> ---
>
> Key: HIVE-25658
> URL: https://issues.apache.org/jira/browse/HIVE-25658
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> HIVE-25607 introduced a text replace regex for masking out the totalSize 
> table property values in Iceberg q.out files. The regex however did not cover 
> all of the props in the q.out files, so here is the fix for the regex.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-25658) Fix regex for masking totalSize table properties in Iceberg q-tests

2021-10-28 Thread Marton Bod (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod resolved HIVE-25658.
---
Resolution: Fixed

> Fix regex for masking totalSize table properties in Iceberg q-tests
> ---
>
> Key: HIVE-25658
> URL: https://issues.apache.org/jira/browse/HIVE-25658
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> HIVE-25607 introduced a text replace regex for masking out the totalSize 
> table property values in Iceberg q.out files. The regex however did not cover 
> all of the props in the q.out files, so here is the fix for the regex.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-25658) Fix regex for masking totalSize table properties in Iceberg q-tests

2021-10-28 Thread Marton Bod (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17435485#comment-17435485
 ] 

Marton Bod commented on HIVE-25658:
---

Committed to master. Thanks [~szita] for the review!

> Fix regex for masking totalSize table properties in Iceberg q-tests
> ---
>
> Key: HIVE-25658
> URL: https://issues.apache.org/jira/browse/HIVE-25658
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> HIVE-25607 introduced a text replace regex for masking out the totalSize 
> table property values in Iceberg q.out files. The regex however did not cover 
> all of the props in the q.out files, so here is the fix for the regex.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25660) File Format (ORC/AVRO/TextFile...) available in information schema for bulk query

2021-10-28 Thread Simon AUBERT (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon AUBERT updated HIVE-25660:

Description: 
Hello all,

As of today, when you want to know the file format of every table, you have, as 
far I know, two solutions :
 -a loop in shell
 -a loop in the tool you use for HQL queries, and then parse the answer, etc..

I think this is way too complicated for such a very basic need. So a 
table_file_format (or partition_file_format, I don't know) in the 
information_schema would be a very precious help for monitoring. It can be 
directly read by a reporting tool (Superset, Tableau, PowerBi, Qlik, whatever 
you want).

Best regards,

Simon

  was:
Hello all,

As of today, when you want to know the file format of every table, you have, as 
far I know, two solutions :
 -a loop in shell
 -a loop in the tool you use for HQL queries, and then parse the answer, etc..

I think this is way too complicated for such a very basic need. So a 
table_file_format (or partition_file_format, I don't know) in the 
information_schema would be a very precious help for monitoring.

Best regards,

Simon


> File Format (ORC/AVRO/TextFile...) available in information schema for bulk 
> query
> -
>
> Key: HIVE-25660
> URL: https://issues.apache.org/jira/browse/HIVE-25660
> Project: Hive
>  Issue Type: Improvement
>  Components: File Formats, Metastore
>Reporter: Simon AUBERT
>Priority: Major
>
> Hello all,
> As of today, when you want to know the file format of every table, you have, 
> as far I know, two solutions :
>  -a loop in shell
>  -a loop in the tool you use for HQL queries, and then parse the answer, etc..
> I think this is way too complicated for such a very basic need. So a 
> table_file_format (or partition_file_format, I don't know) in the 
> information_schema would be a very precious help for monitoring. It can be 
> directly read by a reporting tool (Superset, Tableau, PowerBi, Qlik, whatever 
> you want).
> Best regards,
> Simon



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25660) File Format (ORC/AVRO/TextFile...) available in information schema for bulk query

2021-10-28 Thread Simon AUBERT (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon AUBERT updated HIVE-25660:

Description: 
Hello all,

As of today, when you want to know the file format of every table, you have, as 
far I know, two solutions :
 -a loop in shell
 -a loop in the tool you use for HQL queries, and then parse the answer, etc..

I think this is way too complicated for such a very basic need. So a 
table_file_format (or partition_file_format, I don't know) in the 
information_schema would be a very precious help for monitoring.

Best regards,

Simon

  was:
Hello all,

As of today, when you want to know the file format of every table, you have, as 
far I know, two solutions :
-a loop in shell
-a loop in the tool you use for HQL queries, and then parse the answer, etc..

I think this is way too complicated for such a very basic need. So a 
table_file_format (or partion_file_format, I don't know) in the 
information_schema would be a very precious help for monitoring.

Best regards,

Simon


> File Format (ORC/AVRO/TextFile...) available in information schema for bulk 
> query
> -
>
> Key: HIVE-25660
> URL: https://issues.apache.org/jira/browse/HIVE-25660
> Project: Hive
>  Issue Type: Improvement
>  Components: File Formats, Metastore
>Reporter: Simon AUBERT
>Priority: Major
>
> Hello all,
> As of today, when you want to know the file format of every table, you have, 
> as far I know, two solutions :
>  -a loop in shell
>  -a loop in the tool you use for HQL queries, and then parse the answer, etc..
> I think this is way too complicated for such a very basic need. So a 
> table_file_format (or partition_file_format, I don't know) in the 
> information_schema would be a very precious help for monitoring.
> Best regards,
> Simon



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25659) Divide IN/(NOT IN) queries based on number of max parameters SQL engine can support

2021-10-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25659?focusedWorklogId=671480=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-671480
 ]

ASF GitHub Bot logged work on HIVE-25659:
-

Author: ASF GitHub Bot
Created on: 28/Oct/21 14:50
Start Date: 28/Oct/21 14:50
Worklog Time Spent: 10m 
  Work Description: guptanikhil007 opened a new pull request #2758:
URL: https://github.com/apache/hive/pull/2758


   …
   
   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 671480)
Remaining Estimate: 0h
Time Spent: 10m

> Divide IN/(NOT IN) queries based on number of max parameters SQL engine can 
> support
> ---
>
> Key: HIVE-25659
> URL: https://issues.apache.org/jira/browse/HIVE-25659
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore
>Affects Versions: 3.1.0, 4.0.0
>Reporter: Nikhil Gupta
>Assignee: Nikhil Gupta
>Priority: Minor
> Fix For: 4.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Function 
> org.apache.hadoop.hive.metastore.txn.TxnUtils#buildQueryWithINClauseStrings 
> can generate queries with huge number of parameters with very small value of 
> DIRECT_SQL_MAX_ELEMENTS_IN_CLAUSE and DIRECT_SQL_MAX_QUERY_LENGTH while 
> generating delete query for completed_compactions table
> Example:
> {code:java}
> DIRECT_SQL_MAX_ELEMENTS_IN_CLAUSE = 100
> DIRECT_SQL_MAX_QUERY_LENGTH = 10 (10 KB)
> Number of parameters in a single query = 4759
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25659) Divide IN/(NOT IN) queries based on number of max parameters SQL engine can support

2021-10-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25659:
--
Labels: pull-request-available  (was: )

> Divide IN/(NOT IN) queries based on number of max parameters SQL engine can 
> support
> ---
>
> Key: HIVE-25659
> URL: https://issues.apache.org/jira/browse/HIVE-25659
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore
>Affects Versions: 3.1.0, 4.0.0
>Reporter: Nikhil Gupta
>Assignee: Nikhil Gupta
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Function 
> org.apache.hadoop.hive.metastore.txn.TxnUtils#buildQueryWithINClauseStrings 
> can generate queries with huge number of parameters with very small value of 
> DIRECT_SQL_MAX_ELEMENTS_IN_CLAUSE and DIRECT_SQL_MAX_QUERY_LENGTH while 
> generating delete query for completed_compactions table
> Example:
> {code:java}
> DIRECT_SQL_MAX_ELEMENTS_IN_CLAUSE = 100
> DIRECT_SQL_MAX_QUERY_LENGTH = 10 (10 KB)
> Number of parameters in a single query = 4759
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-25659) Divide IN/(NOT IN) queries based on number of max parameters SQL engine can support

2021-10-28 Thread Nikhil Gupta (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17435458#comment-17435458
 ] 

Nikhil Gupta commented on HIVE-25659:
-

Code to find number of parameters
{noformat}
public static void main(String[] args) {
  Configuration conf = MetastoreConf.newMetastoreConf();
  conf.set(ConfVars.DIRECT_SQL_MAX_ELEMENTS_IN_CLAUSE.getVarname(), "100");
  conf.set(ConfVars.DIRECT_SQL_MAX_QUERY_LENGTH.getVarname(), "10");
  List queries = new ArrayList<>();
  List deleteSet = new ArrayList<>();
  for (long i=0; i < 1; i++) {
deleteSet.add(i+1);
  }
  StringBuilder prefix = new StringBuilder();
  StringBuilder suffix = new StringBuilder();

  prefix.append("delete from COMPLETED_COMPACTIONS where ");
  suffix.append("");

  List questions = new ArrayList<>(deleteSet.size());
  for (int  i = 0; i < deleteSet.size(); i++) {
questions.add("?");
  }
  List counts = TxnUtils.buildQueryWithINClauseStrings(conf, queries, 
prefix, suffix, questions, "cc_id", false, false);
  System.out.println(queries.get(0).chars().filter(ch -> ch == '?').count());
}{noformat}

> Divide IN/(NOT IN) queries based on number of max parameters SQL engine can 
> support
> ---
>
> Key: HIVE-25659
> URL: https://issues.apache.org/jira/browse/HIVE-25659
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore
>Affects Versions: 3.1.0, 4.0.0
>Reporter: Nikhil Gupta
>Assignee: Nikhil Gupta
>Priority: Minor
> Fix For: 4.0.0
>
>
> Function 
> org.apache.hadoop.hive.metastore.txn.TxnUtils#buildQueryWithINClauseStrings 
> can generate queries with huge number of parameters with very small value of 
> DIRECT_SQL_MAX_ELEMENTS_IN_CLAUSE and DIRECT_SQL_MAX_QUERY_LENGTH while 
> generating delete query for completed_compactions table
> Example:
> {code:java}
> DIRECT_SQL_MAX_ELEMENTS_IN_CLAUSE = 100
> DIRECT_SQL_MAX_QUERY_LENGTH = 10 (10 KB)
> Number of parameters in a single query = 4759
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25659) Divide IN/(NOT IN) queries based on number of max parameters SQL engine can support

2021-10-28 Thread Nikhil Gupta (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nikhil Gupta updated HIVE-25659:

Description: 
Function 
org.apache.hadoop.hive.metastore.txn.TxnUtils#buildQueryWithINClauseStrings can 
generate queries with huge number of parameters with very small value of 
DIRECT_SQL_MAX_ELEMENTS_IN_CLAUSE and DIRECT_SQL_MAX_QUERY_LENGTH while 
generating delete query for completed_compactions table

Example:
{code:java}
DIRECT_SQL_MAX_ELEMENTS_IN_CLAUSE = 100
DIRECT_SQL_MAX_QUERY_LENGTH = 10 (10 KB)
Number of parameters in a single query = 4759
{code}

  was:
Function 
org.apache.hadoop.hive.metastore.txn.TxnUtils#buildQueryWithINClauseStrings can 
generate queries with huge number of parameters with very small value of 
DIRECT_SQL_MAX_ELEMENTS_IN_CLAUSE and DIRECT_SQL_MAX_QUERY_LENGTH while 
generating delete query for completed_compactions table


Example:
 DIRECT_SQL_MAX_ELEMENTS_IN_CLAUSE = 100

DIRECT_SQL_MAX_QUERY_LENGTH = 10 (10 KB)

Number of parameters in a single query = 4759


> Divide IN/(NOT IN) queries based on number of max parameters SQL engine can 
> support
> ---
>
> Key: HIVE-25659
> URL: https://issues.apache.org/jira/browse/HIVE-25659
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore
>Affects Versions: 3.1.0, 4.0.0
>Reporter: Nikhil Gupta
>Assignee: Nikhil Gupta
>Priority: Minor
> Fix For: 4.0.0
>
>
> Function 
> org.apache.hadoop.hive.metastore.txn.TxnUtils#buildQueryWithINClauseStrings 
> can generate queries with huge number of parameters with very small value of 
> DIRECT_SQL_MAX_ELEMENTS_IN_CLAUSE and DIRECT_SQL_MAX_QUERY_LENGTH while 
> generating delete query for completed_compactions table
> Example:
> {code:java}
> DIRECT_SQL_MAX_ELEMENTS_IN_CLAUSE = 100
> DIRECT_SQL_MAX_QUERY_LENGTH = 10 (10 KB)
> Number of parameters in a single query = 4759
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25659) Divide IN/(NOT IN) queries based on number of max parameters SQL engine can support

2021-10-28 Thread Nikhil Gupta (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nikhil Gupta updated HIVE-25659:

Description: 
Function 
org.apache.hadoop.hive.metastore.txn.TxnUtils#buildQueryWithINClauseStrings can 
generate queries with huge number of parameters with very small value of 
DIRECT_SQL_MAX_ELEMENTS_IN_CLAUSE and DIRECT_SQL_MAX_QUERY_LENGTH while 
generating delete query for completed_compactions table


Example:
 DIRECT_SQL_MAX_ELEMENTS_IN_CLAUSE = 100

DIRECT_SQL_MAX_QUERY_LENGTH = 10 (10 KB)

Number of parameters in a single query = 4759

  was:
 

 


> Divide IN/(NOT IN) queries based on number of max parameters SQL engine can 
> support
> ---
>
> Key: HIVE-25659
> URL: https://issues.apache.org/jira/browse/HIVE-25659
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore
>Affects Versions: 3.1.0, 4.0.0
>Reporter: Nikhil Gupta
>Assignee: Nikhil Gupta
>Priority: Minor
> Fix For: 4.0.0
>
>
> Function 
> org.apache.hadoop.hive.metastore.txn.TxnUtils#buildQueryWithINClauseStrings 
> can generate queries with huge number of parameters with very small value of 
> DIRECT_SQL_MAX_ELEMENTS_IN_CLAUSE and DIRECT_SQL_MAX_QUERY_LENGTH while 
> generating delete query for completed_compactions table
> Example:
>  DIRECT_SQL_MAX_ELEMENTS_IN_CLAUSE = 100
> DIRECT_SQL_MAX_QUERY_LENGTH = 10 (10 KB)
> Number of parameters in a single query = 4759



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25659) Divide IN/(NOT IN) queries based on number of max parameters SQL engine can support

2021-10-28 Thread Nikhil Gupta (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nikhil Gupta reassigned HIVE-25659:
---

Assignee: Nikhil Gupta

> Divide IN/(NOT IN) queries based on number of max parameters SQL engine can 
> support
> ---
>
> Key: HIVE-25659
> URL: https://issues.apache.org/jira/browse/HIVE-25659
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore
>Affects Versions: 3.1.0, 4.0.0
>Reporter: Nikhil Gupta
>Assignee: Nikhil Gupta
>Priority: Minor
> Fix For: 4.0.0
>
>
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25596) Compress Hive Replication Metrics while storing

2021-10-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25596?focusedWorklogId=671426=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-671426
 ]

ASF GitHub Bot logged work on HIVE-25596:
-

Author: ASF GitHub Bot
Created on: 28/Oct/21 13:15
Start Date: 28/Oct/21 13:15
Worklog Time Spent: 10m 
  Work Description: hmangla98 commented on a change in pull request #2724:
URL: https://github.com/apache/hive/pull/2724#discussion_r73835



##
File path: metastore/scripts/upgrade/hive/hive-schema-4.0.0.hive.sql
##
@@ -1466,7 +1466,8 @@ CREATE EXTERNAL TABLE IF NOT EXISTS `REPLICATION_METRICS` 
(
 `POLICY_NAME` string,
 `DUMP_EXECUTION_ID` bigint,
 `METADATA` string,
-`PROGRESS` string
+`PROGRESS` string,
+`MESSAGE_FORMAT` varchar(16)

Review comment:
   Yes, varchar is supported in hive-schema files. Actually, we're using 
same syntax in case of Notification_log table. So, i kept it same for both the 
tables.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 671426)
Time Spent: 1h 50m  (was: 1h 40m)

> Compress Hive Replication Metrics while storing
> ---
>
> Key: HIVE-25596
> URL: https://issues.apache.org/jira/browse/HIVE-25596
> Project: Hive
>  Issue Type: Improvement
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Compress the json fields of sys.replication_metrics table to optimise RDBMS 
> space usage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25628) Avoid unnecessary file ops if Iceberg table is LLAP cached

2021-10-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25628?focusedWorklogId=671424=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-671424
 ]

ASF GitHub Bot logged work on HIVE-25628:
-

Author: ASF GitHub Bot
Created on: 28/Oct/21 13:12
Start Date: 28/Oct/21 13:12
Worklog Time Spent: 10m 
  Work Description: szlta merged pull request #2748:
URL: https://github.com/apache/hive/pull/2748


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 671424)
Time Spent: 0.5h  (was: 20m)

> Avoid unnecessary file ops if Iceberg table is LLAP cached
> --
>
> Key: HIVE-25628
> URL: https://issues.apache.org/jira/browse/HIVE-25628
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> In case the query execution is vectorized for an Iceberg table, we need to 
> make an extra file open operation on the ORC file to learn what the file 
> schema is (to be matched later with the logical schema).
> In LLAP configuration the file schema could be retrieved through LLAP cache 
> as ORC metadata is cached, so we should avoid the file operation when 
> possible.
> Also: LLAP relies on cache keys that are usually triplets of file information 
> and is constructed by an FS.listStatus call. For iceberg tables we should 
> rely on such file information provided by Iceberg's metadata to spare this 
> call too.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-25628) Avoid unnecessary file ops if Iceberg table is LLAP cached

2021-10-28 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-25628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ádám Szita resolved HIVE-25628.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Committed to master. Thanks for the review [~mbod]!

> Avoid unnecessary file ops if Iceberg table is LLAP cached
> --
>
> Key: HIVE-25628
> URL: https://issues.apache.org/jira/browse/HIVE-25628
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> In case the query execution is vectorized for an Iceberg table, we need to 
> make an extra file open operation on the ORC file to learn what the file 
> schema is (to be matched later with the logical schema).
> In LLAP configuration the file schema could be retrieved through LLAP cache 
> as ORC metadata is cached, so we should avoid the file operation when 
> possible.
> Also: LLAP relies on cache keys that are usually triplets of file information 
> and is constructed by an FS.listStatus call. For iceberg tables we should 
> rely on such file information provided by Iceberg's metadata to spare this 
> call too.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25658) Fix regex for masking totalSize table properties in Iceberg q-tests

2021-10-28 Thread Marton Bod (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod reassigned HIVE-25658:
-


> Fix regex for masking totalSize table properties in Iceberg q-tests
> ---
>
> Key: HIVE-25658
> URL: https://issues.apache.org/jira/browse/HIVE-25658
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>
> HIVE-25607 introduced a text replace regex for masking out the totalSize 
> table property values in Iceberg q.out files. The regex however did not cover 
> all of the props in the q.out files, so here is the fix for the regex.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25596) Compress Hive Replication Metrics while storing

2021-10-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25596?focusedWorklogId=671407=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-671407
 ]

ASF GitHub Bot logged work on HIVE-25596:
-

Author: ASF GitHub Bot
Created on: 28/Oct/21 12:26
Start Date: 28/Oct/21 12:26
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #2724:
URL: https://github.com/apache/hive/pull/2724#discussion_r738331249



##
File path: metastore/scripts/upgrade/hive/upgrade-3.1.0-to-4.0.0.hive.sql
##
@@ -527,7 +527,8 @@ CREATE EXTERNAL TABLE IF NOT EXISTS `REPLICATION_METRICS` (
 `POLICY_NAME` string,
 `DUMP_EXECUTION_ID` bigint,
 `METADATA` string,
-`PROGRESS` string
+`PROGRESS` string,
+`MESSAGE_FORMAT` varchar(16)

Review comment:
   varchar or string?

##
File path: 
standalone-metastore/metastore-server/src/main/sql/mssql/hive-schema-4.0.0.mssql.sql
##
@@ -1367,7 +1367,8 @@ CREATE TABLE "REPLICATION_METRICS" (
   "RM_DUMP_EXECUTION_ID" bigint NOT NULL,
   "RM_METADATA" varchar(max),
   "RM_PROGRESS" varchar(max),
-  "RM_START_TIME" integer NOT NULL
+  "RM_START_TIME" integer NOT NULL,
+  MESSAGE_FORMAT nvarchar(16),

Review comment:
   typo nvarchar

##
File path: metastore/scripts/upgrade/hive/hive-schema-4.0.0.hive.sql
##
@@ -1466,7 +1466,8 @@ CREATE EXTERNAL TABLE IF NOT EXISTS `REPLICATION_METRICS` 
(
 `POLICY_NAME` string,
 `DUMP_EXECUTION_ID` bigint,
 `METADATA` string,
-`PROGRESS` string
+`PROGRESS` string,
+`MESSAGE_FORMAT` varchar(16)

Review comment:
   is varchar supported? can this be string?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 671407)
Time Spent: 1h 40m  (was: 1.5h)

> Compress Hive Replication Metrics while storing
> ---
>
> Key: HIVE-25596
> URL: https://issues.apache.org/jira/browse/HIVE-25596
> Project: Hive
>  Issue Type: Improvement
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Compress the json fields of sys.replication_metrics table to optimise RDBMS 
> space usage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-25650) Make workerId and workerVersionId optional in the FindNextCompactRequest

2021-10-28 Thread Viktor Csomor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viktor Csomor resolved HIVE-25650.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

> Make workerId and workerVersionId optional in the FindNextCompactRequest
> 
>
> Key: HIVE-25650
> URL: https://issues.apache.org/jira/browse/HIVE-25650
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Viktor Csomor
>Assignee: Viktor Csomor
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> In hive_metastore.thrift the FindNextCompactRequest struct's fields are 
> required:
> {code}
> struct FindNextCompactRequest {
> 1: required string workerId,
> 2: required string workerVersion
> }{code}
> these should probably be made optional, to avoid breaking compaction if 
> they're not available.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25650) Make workerId and workerVersionId optional in the FindNextCompactRequest

2021-10-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25650?focusedWorklogId=671302=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-671302
 ]

ASF GitHub Bot logged work on HIVE-25650:
-

Author: ASF GitHub Bot
Created on: 28/Oct/21 08:58
Start Date: 28/Oct/21 08:58
Worklog Time Spent: 10m 
  Work Description: lcspinter merged pull request #2749:
URL: https://github.com/apache/hive/pull/2749


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 671302)
Time Spent: 0.5h  (was: 20m)

> Make workerId and workerVersionId optional in the FindNextCompactRequest
> 
>
> Key: HIVE-25650
> URL: https://issues.apache.org/jira/browse/HIVE-25650
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Viktor Csomor
>Assignee: Viktor Csomor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> In hive_metastore.thrift the FindNextCompactRequest struct's fields are 
> required:
> {code}
> struct FindNextCompactRequest {
> 1: required string workerId,
> 2: required string workerVersion
> }{code}
> these should probably be made optional, to avoid breaking compaction if 
> they're not available.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25656) Get materialized view state based on number of affected rows by transactions

2021-10-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25656?focusedWorklogId=671293=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-671293
 ]

ASF GitHub Bot logged work on HIVE-25656:
-

Author: ASF GitHub Bot
Created on: 28/Oct/21 08:38
Start Date: 28/Oct/21 08:38
Worklog Time Spent: 10m 
  Work Description: kasakrisz opened a new pull request #2756:
URL: https://github.com/apache/hive/pull/2756


   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 671293)
Remaining Estimate: 0h
Time Spent: 10m

> Get materialized view state based on number of affected rows by transactions
> 
>
> Key: HIVE-25656
> URL: https://issues.apache.org/jira/browse/HIVE-25656
> Project: Hive
>  Issue Type: Improvement
>  Components: Materialized views, Transactions
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
> Fix For: 4.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> To enable the faster incremental rebuild of materialized views presence of 
> update/delete operations on the source tables of the view since the last 
> rebuild must be checked. Based on the outcome different plan is generated for 
> scenarios in presence of update/delete and insert only operations.
> Currently this is done by querying the COMPLETED_TXN_COMPONENTS table however 
> the records from this table is cleaned when MV source tables are compacted. 
> This reduces the chances of incremental MV rebuild.
> The goal of this patch is to find an alternative way to store and retrieve 
> this information.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25656) Get materialized view state based on number of affected rows by transactions

2021-10-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25656:
--
Labels: pull-request-available  (was: )

> Get materialized view state based on number of affected rows by transactions
> 
>
> Key: HIVE-25656
> URL: https://issues.apache.org/jira/browse/HIVE-25656
> Project: Hive
>  Issue Type: Improvement
>  Components: Materialized views, Transactions
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> To enable the faster incremental rebuild of materialized views presence of 
> update/delete operations on the source tables of the view since the last 
> rebuild must be checked. Based on the outcome different plan is generated for 
> scenarios in presence of update/delete and insert only operations.
> Currently this is done by querying the COMPLETED_TXN_COMPONENTS table however 
> the records from this table is cleaned when MV source tables are compacted. 
> This reduces the chances of incremental MV rebuild.
> The goal of this patch is to find an alternative way to store and retrieve 
> this information.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25656) Get materialized view state based on number of affected rows by transactions

2021-10-28 Thread Krisztian Kasa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa reassigned HIVE-25656:
-


> Get materialized view state based on number of affected rows by transactions
> 
>
> Key: HIVE-25656
> URL: https://issues.apache.org/jira/browse/HIVE-25656
> Project: Hive
>  Issue Type: Improvement
>  Components: Materialized views, Transactions
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
> Fix For: 4.0.0
>
>
> To enable the faster incremental rebuild of materialized views presence of 
> update/delete operations on the source tables of the view since the last 
> rebuild must be checked. Based on the outcome different plan is generated for 
> scenarios in presence of update/delete and insert only operations.
> Currently this is done by querying the COMPLETED_TXN_COMPONENTS table however 
> the records from this table is cleaned when MV source tables are compacted. 
> This reduces the chances of incremental MV rebuild.
> The goal of this patch is to find an alternative way to store and retrieve 
> this information.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25647) hadoop memo

2021-10-28 Thread St Li (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

St Li updated HIVE-25647:
-
Description: 
master,slave1,slave2master,slave1,slave2//opt represent wechat hadoop bigdata 
dev//2019 :bigdata competitionhadoop 50070hbase 16010storm 8080
#hostnamehostnamectl set-hostname master && bash hostname  master && bash 
hostname  slave1/slave2 && bash vim /etc/hostname   master/slave1/slave2vim 
/etc/hosts  ip master   ip slave1  ipslave2
#yumcd /etc/yum.repos.d && rm -rf *wget 
http://172.16.47.240/bigdata/repofile/bigdata.repoyum clean all
#firewallsystemctl stop firewalldsystemctl status firewalld
#timezonetzselect  5-9-1-1echo "TZ='Asia/Shanghai'; export TZ" >> /etc/profile 
&& source /etc/profile
#ntpyum install -y ntpvim /etc/ntp.conf//#server 0~3.centos.pool.ntp.org 
iburstserver 127.127.1.0fudge 127.127.1.0 stratum 10/bin/systemctl restart 
ntpd.servicentpdate master (slave1,slave2)
#crontabservice crond status/sbin/service crond startcrontab -e*/30 8-17 * * * 
/usr/sbin/ntpdate mastercrontab –l
#ssh passwordssh-keygen -t dsa -P '' -f ~/.ssh/id_dsacat /root/.ssh/id_dsa.pub 
>> /root/.ssh/authorized_keysscp ~/.ssh/authorized_keys root@slave1:~/.ssh/scp 
~/.ssh/authorized_keys root@slave2:~/.ssh/
ssh-copy-id masterssh-copy-id slave1ssh-copy-id slave2
#install jdkmkdir -p /usr/javatar -zxvf jdk-8u171-linux-x64.tar.gz -C /usr/java/
vim /etc/profileexport JAVA_HOME=/usr/java/jdk1.8.0_171export 
CLASSPATH=$JAVA_HOME/lib/export PATH=$PATH:$JAVA_HOME/bin
source /etc/profile && java -version
scp -r /usr/java root@slave1:/usr/scp -r /usr/java root@slave2:/usr/
#install hadoopmkdir -p /usr/hadoop && cd /usr/hadooptar -zxvf 
/usr/hadoop/hadoop-2.7.3.tar.gz -C /usr/hadoop/rm -rf 
/usr/hadoop/hadoop-2.7.3.tar.gzvim /etc/profileexport 
HADOOP_HOME=/usr/hadoop/hadoop-2.7.3export 
PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbinhadoop  //test
hadoop-env.sh/mapred-env.sh/yarn-en.shexport JAVA_HOME=/usr/java/jdk1.8.0_171
##vim core-site.xmlfs.default.name \{hdfs://master:9000}hadoop.tmp.dir 
\{/usr/hadoop/hadoop-2.7.3/hdfs/tmp}io.file.buffer.size 
\{131072}fs.checkpoint.period \{60}fs.checkpoint.size \{67108864}
##hdfs-site.xmldfs.replication \{2}dfs.namenode.name.dir 
\{file:/usr/hadoop/hadoop-2.7.3/hdfs/name}dfs.datanode.data.dir 
\{file:/usr/hadoop/hadoop-2.7.3/hdfs/data}
##vim yarn-env.shyarn.resourcemanager.address 
\{master:18040}yarn.resourcemanager.scheduler.address 
\{master:18030}yarn.resourcemanager.webapp.address 
\{master:18088}yarn.resourcemanager.resource-tracker.address 
\{18025}yarn.resourcemanager.admin.address 
\{master:18141}yarn.nodemanager.aux-services 
\{mapreduce_shuffle}yarn.nodemanager.auxservices.mapreduce.shuffle.class 
\{org.apache.hadoop.mapred.ShuffleHandler}
#vim mapred-site.xmlmapreduce.framework.name \{yarn}
#slaves fileecho master > master && echo slave1 > slaves && echo slave2 >> 
slaves
#hadoop formathadoop namenode -format (master)   //has been successfully#start 
hadoopstart-all.shmaster 
:NameNode,SecondaryNameNode,ResourceManagerslave1~2:DataNode,NodeManager
start-dfs.shstart-yarn.shhadoop-daemon.sh start namenodehadoop-daemon.sh start 
datanodehadoop-daemon.sh start secondarynamenodehadoop-daemon.sh start 
resourcemanagerhadoop-daemon.sh start nodemanager
test hdfs& mapreducehadoop fs -mkdir /inputhadoop fs -put 
$HADOOP_HOME/README.txt /input
http://master:50070hadoop jar 
$HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.4.jar
#install hiveyum -y install mysql-community-server slave2:mysqlserver 
slave1:hiveserver master:hiveclientsystemctl daemon-reloadsystemctl start 
mysqldcat /var/log/mysqld.log grep "temporary password"mysql -uroot -pset 
global validate_password_policy=0;set global validate_password_length=4;alter 
user 'root'@'localhost' identified by '123456';mysql -uroot -p123456create user 
'root'@'%' identified by '123456';grant all privileges on *.* to 'root'@'%' 
with grant option;flush privileges;
mkdir -p /usr/hive tar -zxvf /usr/hive/apache-hive-2.1.1-bin.tar.gz -C 
/usr/hive/
vim /etc/profile   //for hiveexport 
HIVE_HOME=/usr/hive/apache-hive-2.1.1-binexport PATH=$PATH:$HIVE_HOME/binsource 
/etc/profile
vim hive-env.shcd $HIVE_HOME/conf && vim hive-env.shexport 
HADOOP_HOME=/usr/hadoop/hadoop-2.7.3export 
HIVE_CONF_DIR=/usr/hive/apache-hive-2.1.1-bin/confexport 
HIVE_AUX_JARS_PATH=/usr/hive/apache-hive-2.1.1-bin/lib
cp $HIVE_HOME/lib/jline-2.12.jar $HADOOP_HOME/share/hadoop/yarn/lib/
##slave1 hive-servercd $HIVE_HOME/lib && wget or cp 
mysql-connector-java-5.1.47-bin.jar
hive-site.xml (hive-server)hive.metastore.warehouse.dir 
\{/user/hive_remote/warehouse}javax.jdo.option.ConnectionDriverName 
\{com.mysql.jdbc.Driver}javax.jdo.option.ConnectionURL 
\{jdbc:mysql://slave2:3306/hive?createDatabaseIfNotExist=trueuseSSL=false}javax.jdo.option.ConnectionUserName
 \{root}javax.jdo.option.ConnectionPassword \{123456}
hive-site.xml (hive