[jira] [Updated] (HIVE-24064) Disable Materialized View Replication
[ https://issues.apache.org/jira/browse/HIVE-24064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arko Sharma updated HIVE-24064: --- Attachment: HIVE-24064.02.patch > Disable Materialized View Replication > - > > Key: HIVE-24064 > URL: https://issues.apache.org/jira/browse/HIVE-24064 > Project: Hive > Issue Type: Bug >Reporter: Arko Sharma >Assignee: Arko Sharma >Priority: Major > Labels: pull-request-available > Attachments: HIVE-24064.01.patch, HIVE-24064.02.patch > > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-20817) Reading Timestamp datatype via HiveServer2 gives errors
[ https://issues.apache.org/jira/browse/HIVE-20817?focusedWorklogId=474106=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-474106 ] ASF GitHub Bot logged work on HIVE-20817: - Author: ASF GitHub Bot Created on: 25/Aug/20 00:40 Start Date: 25/Aug/20 00:40 Worklog Time Spent: 10m Work Description: github-actions[bot] commented on pull request #1179: URL: https://github.com/apache/hive/pull/1179#issuecomment-679436133 This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Feel free to reach out on the d...@hive.apache.org list if the patch is in need of reviews. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 474106) Time Spent: 40m (was: 0.5h) > Reading Timestamp datatype via HiveServer2 gives errors > --- > > Key: HIVE-20817 > URL: https://issues.apache.org/jira/browse/HIVE-20817 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 4.0.0 >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-20817.01.patch, HIVE-20817.02.patch > > Time Spent: 40m > Remaining Estimate: 0h > > CREATE TABLE JdbcBasicRead ( empno int, desg string,empname string,doj > timestamp,Salary float,mgrid smallint, deptno tinyint ) ROW FORMAT DELIMITED > FIELDS TERMINATED BY ','; > LOAD DATA LOCAL INPATH '/tmp/art_jdbc/hive/input/input_7columns.txt' > OVERWRITE INTO TABLE JdbcBasicRead; > Sample Data. > — > 7369,M,SMITH,1980-12-17 17:07:29.234234,5000.00,7902,20 > 7499,X,ALLEN,1981-02-20 17:07:29.234234,1250.00,7698,30 > 7521,X,WARD,1981-02-22 17:07:29.234234,01600.57,7698,40 > 7566,M,JONES,1981-04-02 17:07:29.234234,02975.65,7839,10 > 7654,X,MARTIN,1981-09-28 17:07:29.234234,01250.00,7698,20 > 7698,M,BLAKE,1981-05-01 17:07:29.234234,2850.98,7839,30 > 7782,M,CLARK,1981-06-09 17:07:29.234234,02450.00,7839,20 > — > Select statement: SELECT empno, desg, empname, doj, salary, mgrid, deptno > FROM JdbcBasicWrite > {code} > 2018-09-25T07:11:03,222 WARN [HiveServer2-Handler-Pool: Thread-83]: > thrift.ThriftCLIService (:()) - Error fetching results: > org.apache.hive.service.cli.HiveSQLException: java.lang.ClassCastException: > org.apache.hadoop.hive.common.type.Timestamp cannot be cast to > java.sql.Timestamp > at > org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:469) > ~[hive-service-3.1.0.3.0.1.0-187.jar:3.1.0.3.0.1.0-187] > at > org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:328) > ~[hive-service-3.1.0.3.0.1.0-187.jar:3.1.0.3.0.1.0-187] > at > org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:910) > ~[hive-service-3.1.0.3.0.1.0-187.jar:3.1.0.3.0.1.0-187] > at sun.reflect.GeneratedMethodAccessor50.invoke(Unknown Source) ~[?:?] > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > ~[?:1.8.0_112] > at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_112] > at > org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78) > ~[hive-service-3.1.0.3.0.1.0-187.jar:3.1.0.3.0.1.0-187] > at > org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36) > ~[hive-service-3.1.0.3.0.1.0-187.jar:3.1.0.3.0.1.0-187] > at > org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63) > ~[hive-service-3.1.0.3.0.1.0-187.jar:3.1.0.3.0.1.0-187] > at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_112] > at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_112] > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > ~[hadoop-common-3.1.1.3.0.1.0-187.jar:?] > at > org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59) > ~[hive-service-3.1.0.3.0.1.0-187.jar:3.1.0.3.0.1.0-187] > at com.sun.proxy.$Proxy46.fetchResults(Unknown Source) ~[?:?] > at org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:564) > ~[hive-service-3.1.0.3.0.1.0-187.jar:3.1.0.3.0.1.0-187] > at > org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:786) >
[jira] [Work logged] (HIVE-23546) Skip authorization when user is a superuser
[ https://issues.apache.org/jira/browse/HIVE-23546?focusedWorklogId=474096=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-474096 ] ASF GitHub Bot logged work on HIVE-23546: - Author: ASF GitHub Bot Created on: 24/Aug/20 23:57 Start Date: 24/Aug/20 23:57 Worklog Time Spent: 10m Work Description: dengzhhu653 opened a new pull request #1033: URL: https://github.com/apache/hive/pull/1033 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 474096) Time Spent: 1h 10m (was: 1h) > Skip authorization when user is a superuser > --- > > Key: HIVE-23546 > URL: https://issues.apache.org/jira/browse/HIVE-23546 > Project: Hive > Issue Type: Improvement >Reporter: Zhihua Deng >Assignee: Zhihua Deng >Priority: Minor > Labels: pull-request-available > Attachments: HIVE-23546.patch > > Time Spent: 1h 10m > Remaining Estimate: 0h > > If the current user is a superuser, there is no need to do authorization. > This can speed up queries, especially for those ddl queries. For example, the > superuser add partitions when the external data is ready, or show partitions > to check whether it OK to take the work flow one step further in a busy hive > cluster. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23546) Skip authorization when user is a superuser
[ https://issues.apache.org/jira/browse/HIVE-23546?focusedWorklogId=474093=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-474093 ] ASF GitHub Bot logged work on HIVE-23546: - Author: ASF GitHub Bot Created on: 24/Aug/20 23:51 Start Date: 24/Aug/20 23:51 Worklog Time Spent: 10m Work Description: dengzhhu653 closed pull request #1033: URL: https://github.com/apache/hive/pull/1033 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 474093) Time Spent: 1h (was: 50m) > Skip authorization when user is a superuser > --- > > Key: HIVE-23546 > URL: https://issues.apache.org/jira/browse/HIVE-23546 > Project: Hive > Issue Type: Improvement >Reporter: Zhihua Deng >Assignee: Zhihua Deng >Priority: Minor > Labels: pull-request-available > Attachments: HIVE-23546.patch > > Time Spent: 1h > Remaining Estimate: 0h > > If the current user is a superuser, there is no need to do authorization. > This can speed up queries, especially for those ddl queries. For example, the > superuser add partitions when the external data is ready, or show partitions > to check whether it OK to take the work flow one step further in a busy hive > cluster. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24069) HiveHistory should log the task that ends abnormally
[ https://issues.apache.org/jira/browse/HIVE-24069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-24069: -- Labels: pull-request-available (was: ) > HiveHistory should log the task that ends abnormally > > > Key: HIVE-24069 > URL: https://issues.apache.org/jira/browse/HIVE-24069 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Reporter: Zhihua Deng >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > When the task returns with the exitVal not equal to 0, The Executor would > skip marking the task return code and calling endTask. This may make the > history log incomplete for such tasks. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24069) HiveHistory should log the task that ends abnormally
[ https://issues.apache.org/jira/browse/HIVE-24069?focusedWorklogId=474092=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-474092 ] ASF GitHub Bot logged work on HIVE-24069: - Author: ASF GitHub Bot Created on: 24/Aug/20 23:49 Start Date: 24/Aug/20 23:49 Worklog Time Spent: 10m Work Description: dengzhhu653 opened a new pull request #1429: URL: https://github.com/apache/hive/pull/1429 ### What changes were proposed in this pull request? HiveHistory logs the task that ends abnormally. ### Why are the changes needed? When the task returns with the exitVal not equal to 0, The Executor would skip marking the task return code and calling endTask. This may make the history log incomplete for such tasks. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 474092) Remaining Estimate: 0h Time Spent: 10m > HiveHistory should log the task that ends abnormally > > > Key: HIVE-24069 > URL: https://issues.apache.org/jira/browse/HIVE-24069 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Reporter: Zhihua Deng >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > When the task returns with the exitVal not equal to 0, The Executor would > skip marking the task return code and calling endTask. This may make the > history log incomplete for such tasks. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24068) Add re-execution plugin for handling DAG submission failures
[ https://issues.apache.org/jira/browse/HIVE-24068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-24068: -- Labels: pull-request-available (was: ) > Add re-execution plugin for handling DAG submission failures > > > Key: HIVE-24068 > URL: https://issues.apache.org/jira/browse/HIVE-24068 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > DAG submission failure can also happen in environments where AM container > died causing DNS issues. DAG submissions are safe to retry as the DAG hasn't > started execution yet. There are retries at getSession and submitDAG level > individually but some submitDAG failure has to retry getSession as well as AM > could be unreachable, this can be handled in re-execution plugin. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24068) Add re-execution plugin for handling DAG submission failures
[ https://issues.apache.org/jira/browse/HIVE-24068?focusedWorklogId=474081=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-474081 ] ASF GitHub Bot logged work on HIVE-24068: - Author: ASF GitHub Bot Created on: 24/Aug/20 23:23 Start Date: 24/Aug/20 23:23 Worklog Time Spent: 10m Work Description: prasanthj opened a new pull request #1428: URL: https://github.com/apache/hive/pull/1428 ### What changes were proposed in this pull request? DAG submission failure can also happen in environments where AM container died causing DNS issues. DAG submissions are safe to retry as the DAG hasn't started execution yet. There are retries at getSession and submitDAG level individually but some submitDAG failure has to retry getSession as well as AM could be unreachable, this can be handled in re-execution plugin. This PR adds a new re-execution plugin for intermittent DAG submission failures. ### Why are the changes needed? To make hive resilient to environments with network/DNS issues. ### Does this PR introduce _any_ user-facing change? Yes. Adds the re-exec plugin as default option. ### How was this patch tested? Manually. Tez code was changed to explicitly throw UnknownHostException to simulate DNS/network issue and tested to make sure retry happens. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 474081) Remaining Estimate: 0h Time Spent: 10m > Add re-execution plugin for handling DAG submission failures > > > Key: HIVE-24068 > URL: https://issues.apache.org/jira/browse/HIVE-24068 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > DAG submission failure can also happen in environments where AM container > died causing DNS issues. DAG submissions are safe to retry as the DAG hasn't > started execution yet. There are retries at getSession and submitDAG level > individually but some submitDAG failure has to retry getSession as well as AM > could be unreachable, this can be handled in re-execution plugin. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24068) Add re-execution plugin for handling DAG submission failures
[ https://issues.apache.org/jira/browse/HIVE-24068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-24068: - Description: DAG submission failure can also happen in environments where AM container died causing DNS issues. DAG submissions are safe to retry as the DAG hasn't started execution yet. There are retries at getSession and submitDAG level individually but some submitDAG failure has to retry getSession as well as AM could be unreachable, this can be handled in re-execution plugin. (was: ReExecutionOverlayPlugin handles cases where there is a vertex failure. DAG submission failure can also happen in environments where AM container died causing DNS issues. DAG submissions are safe to retry as the DAG hasn't started execution yet.) > Add re-execution plugin for handling DAG submission failures > > > Key: HIVE-24068 > URL: https://issues.apache.org/jira/browse/HIVE-24068 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Major > > DAG submission failure can also happen in environments where AM container > died causing DNS issues. DAG submissions are safe to retry as the DAG hasn't > started execution yet. There are retries at getSession and submitDAG level > individually but some submitDAG failure has to retry getSession as well as AM > could be unreachable, this can be handled in re-execution plugin. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24068) Add re-execution plugin for handling DAG submission failures
[ https://issues.apache.org/jira/browse/HIVE-24068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-24068: - Summary: Add re-execution plugin for handling DAG submission failures (was: ReExecutionOverlayPlugin can handle DAG submission failures as well) > Add re-execution plugin for handling DAG submission failures > > > Key: HIVE-24068 > URL: https://issues.apache.org/jira/browse/HIVE-24068 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Major > > ReExecutionOverlayPlugin handles cases where there is a vertex failure. DAG > submission failure can also happen in environments where AM container died > causing DNS issues. DAG submissions are safe to retry as the DAG hasn't > started execution yet. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (HIVE-23649) Fix FindBug issues in hive-service-rpc
[ https://issues.apache.org/jira/browse/HIVE-23649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-23649 started by Mustafa Iman. --- > Fix FindBug issues in hive-service-rpc > -- > > Key: HIVE-23649 > URL: https://issues.apache.org/jira/browse/HIVE-23649 > Project: Hive > Issue Type: Sub-task >Reporter: Panagiotis Garefalakis >Assignee: Mustafa Iman >Priority: Major > Labels: pull-request-available > Attachments: spotbugsXml.xml > > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23649) Fix FindBug issues in hive-service-rpc
[ https://issues.apache.org/jira/browse/HIVE-23649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mustafa Iman updated HIVE-23649: Status: Patch Available (was: In Progress) > Fix FindBug issues in hive-service-rpc > -- > > Key: HIVE-23649 > URL: https://issues.apache.org/jira/browse/HIVE-23649 > Project: Hive > Issue Type: Sub-task >Reporter: Panagiotis Garefalakis >Assignee: Mustafa Iman >Priority: Major > Labels: pull-request-available > Attachments: spotbugsXml.xml > > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23302) Create HiveJdbcDatabaseAccessor for JDBC storage handler
[ https://issues.apache.org/jira/browse/HIVE-23302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-23302: -- Labels: pull-request-available (was: ) > Create HiveJdbcDatabaseAccessor for JDBC storage handler > > > Key: HIVE-23302 > URL: https://issues.apache.org/jira/browse/HIVE-23302 > Project: Hive > Issue Type: Bug > Components: StorageHandler >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > The {{JdbcDatabaseAccessor}} associated with the storage handler makes some > SQL calls to the RDBMS through the JDBC connection. There is a > {{GenericJdbcDatabaseAccessor}} with a generic implementation that the > storage handler uses if there is no specific implementation for a certain > RDBMS. > Currently, Hive uses the {{GenericJdbcDatabaseAccessor}}. Afaik the only > generic query that will not work is splitting the query based on offset and > limit, since the syntax for that query is different than the one accepted by > Hive. We should create a {{HiveJdbcDatabaseAccessor}} to override that query > and possibly fix any other existing incompatibilities. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23302) Create HiveJdbcDatabaseAccessor for JDBC storage handler
[ https://issues.apache.org/jira/browse/HIVE-23302?focusedWorklogId=474061=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-474061 ] ASF GitHub Bot logged work on HIVE-23302: - Author: ASF GitHub Bot Created on: 24/Aug/20 22:04 Start Date: 24/Aug/20 22:04 Worklog Time Spent: 10m Work Description: jcamachor opened a new pull request #1427: URL: https://github.com/apache/hive/pull/1427 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 474061) Remaining Estimate: 0h Time Spent: 10m > Create HiveJdbcDatabaseAccessor for JDBC storage handler > > > Key: HIVE-23302 > URL: https://issues.apache.org/jira/browse/HIVE-23302 > Project: Hive > Issue Type: Bug > Components: StorageHandler >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > The {{JdbcDatabaseAccessor}} associated with the storage handler makes some > SQL calls to the RDBMS through the JDBC connection. There is a > {{GenericJdbcDatabaseAccessor}} with a generic implementation that the > storage handler uses if there is no specific implementation for a certain > RDBMS. > Currently, Hive uses the {{GenericJdbcDatabaseAccessor}}. Afaik the only > generic query that will not work is splitting the query based on offset and > limit, since the syntax for that query is different than the one accepted by > Hive. We should create a {{HiveJdbcDatabaseAccessor}} to override that query > and possibly fix any other existing incompatibilities. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23302) Create HiveJdbcDatabaseAccessor for JDBC storage handler
[ https://issues.apache.org/jira/browse/HIVE-23302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-23302: --- Status: Patch Available (was: In Progress) > Create HiveJdbcDatabaseAccessor for JDBC storage handler > > > Key: HIVE-23302 > URL: https://issues.apache.org/jira/browse/HIVE-23302 > Project: Hive > Issue Type: Bug > Components: StorageHandler >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez >Priority: Major > > The {{JdbcDatabaseAccessor}} associated with the storage handler makes some > SQL calls to the RDBMS through the JDBC connection. There is a > {{GenericJdbcDatabaseAccessor}} with a generic implementation that the > storage handler uses if there is no specific implementation for a certain > RDBMS. > Currently, Hive uses the {{GenericJdbcDatabaseAccessor}}. Afaik the only > generic query that will not work is splitting the query based on offset and > limit, since the syntax for that query is different than the one accepted by > Hive. We should create a {{HiveJdbcDatabaseAccessor}} to override that query > and possibly fix any other existing incompatibilities. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (HIVE-23302) Create HiveJdbcDatabaseAccessor for JDBC storage handler
[ https://issues.apache.org/jira/browse/HIVE-23302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-23302 started by Jesus Camacho Rodriguez. -- > Create HiveJdbcDatabaseAccessor for JDBC storage handler > > > Key: HIVE-23302 > URL: https://issues.apache.org/jira/browse/HIVE-23302 > Project: Hive > Issue Type: Bug > Components: StorageHandler >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez >Priority: Major > > The {{JdbcDatabaseAccessor}} associated with the storage handler makes some > SQL calls to the RDBMS through the JDBC connection. There is a > {{GenericJdbcDatabaseAccessor}} with a generic implementation that the > storage handler uses if there is no specific implementation for a certain > RDBMS. > Currently, Hive uses the {{GenericJdbcDatabaseAccessor}}. Afaik the only > generic query that will not work is splitting the query based on offset and > limit, since the syntax for that query is different than the one accepted by > Hive. We should create a {{HiveJdbcDatabaseAccessor}} to override that query > and possibly fix any other existing incompatibilities. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-23302) Create HiveJdbcDatabaseAccessor for JDBC storage handler
[ https://issues.apache.org/jira/browse/HIVE-23302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez reassigned HIVE-23302: -- Assignee: Jesus Camacho Rodriguez > Create HiveJdbcDatabaseAccessor for JDBC storage handler > > > Key: HIVE-23302 > URL: https://issues.apache.org/jira/browse/HIVE-23302 > Project: Hive > Issue Type: Bug > Components: StorageHandler >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez >Priority: Major > > The {{JdbcDatabaseAccessor}} associated with the storage handler makes some > SQL calls to the RDBMS through the JDBC connection. There is a > {{GenericJdbcDatabaseAccessor}} with a generic implementation that the > storage handler uses if there is no specific implementation for a certain > RDBMS. > Currently, Hive uses the {{GenericJdbcDatabaseAccessor}}. Afaik the only > generic query that will not work is splitting the query based on offset and > limit, since the syntax for that query is different than the one accepted by > Hive. We should create a {{HiveJdbcDatabaseAccessor}} to override that query > and possibly fix any other existing incompatibilities. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23649) Fix FindBug issues in hive-service-rpc
[ https://issues.apache.org/jira/browse/HIVE-23649?focusedWorklogId=474059=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-474059 ] ASF GitHub Bot logged work on HIVE-23649: - Author: ASF GitHub Bot Created on: 24/Aug/20 21:56 Start Date: 24/Aug/20 21:56 Worklog Time Spent: 10m Work Description: mustafaiman opened a new pull request #1426: URL: https://github.com/apache/hive/pull/1426 Entire org.apache.hive.service.rpc.thrift package is generated files. We should ignore these when running spotbugs. Change-Id: I0ec78853b50e3720976daf52a2efbc200047b281 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 474059) Remaining Estimate: 0h Time Spent: 10m > Fix FindBug issues in hive-service-rpc > -- > > Key: HIVE-23649 > URL: https://issues.apache.org/jira/browse/HIVE-23649 > Project: Hive > Issue Type: Sub-task >Reporter: Panagiotis Garefalakis >Assignee: Mustafa Iman >Priority: Major > Attachments: spotbugsXml.xml > > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23649) Fix FindBug issues in hive-service-rpc
[ https://issues.apache.org/jira/browse/HIVE-23649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-23649: -- Labels: pull-request-available (was: ) > Fix FindBug issues in hive-service-rpc > -- > > Key: HIVE-23649 > URL: https://issues.apache.org/jira/browse/HIVE-23649 > Project: Hive > Issue Type: Sub-task >Reporter: Panagiotis Garefalakis >Assignee: Mustafa Iman >Priority: Major > Labels: pull-request-available > Attachments: spotbugsXml.xml > > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-23649) Fix FindBug issues in hive-service-rpc
[ https://issues.apache.org/jira/browse/HIVE-23649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mustafa Iman reassigned HIVE-23649: --- Assignee: Mustafa Iman > Fix FindBug issues in hive-service-rpc > -- > > Key: HIVE-23649 > URL: https://issues.apache.org/jira/browse/HIVE-23649 > Project: Hive > Issue Type: Sub-task >Reporter: Panagiotis Garefalakis >Assignee: Mustafa Iman >Priority: Major > Attachments: spotbugsXml.xml > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24068) ReExecutionOverlayPlugin can handle DAG submission failures as well
[ https://issues.apache.org/jira/browse/HIVE-24068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran reassigned HIVE-24068: > ReExecutionOverlayPlugin can handle DAG submission failures as well > --- > > Key: HIVE-24068 > URL: https://issues.apache.org/jira/browse/HIVE-24068 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Major > > ReExecutionOverlayPlugin handles cases where there is a vertex failure. DAG > submission failure can also happen in environments where AM container died > causing DNS issues. DAG submissions are safe to retry as the DAG hasn't > started execution yet. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-21025) LLAP IO fails on read if partition column is included in the table and the query has a predicate on the partition column
[ https://issues.apache.org/jira/browse/HIVE-21025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mustafa Iman reassigned HIVE-21025: --- Assignee: (was: Mustafa Iman) > LLAP IO fails on read if partition column is included in the table and the > query has a predicate on the partition column > > > Key: HIVE-21025 > URL: https://issues.apache.org/jira/browse/HIVE-21025 > Project: Hive > Issue Type: Bug > Components: llap >Affects Versions: 2.3.4 >Reporter: Eugene Koifman >Priority: Major > > Hive doesn't officially support the case when a partitioning column is also > included in the data itself, though it works in some cases. Hive would never > write a data file with partition column in it but this can happen for > external tables where data is added by the end user. > Consider improving validation (at least for schema-aware files) on read to > produce a better error than {{ArrayIndexOutOfBoundsException}} > {code:java} > Caused by: java.lang.ArrayIndexOutOfBoundsException > ], TaskAttempt 3 failed, info=[Error: Error while running task ( failure ) : > attempt_1539023000868_24675_3_01_07_3:java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: > java.io.IOException: java.lang.ArrayIndexOutOfBoundsException > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:218) > > at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:172) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370) > > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) > > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) > > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869) > > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) > > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) > > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:110) > > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > java.io.IOException: java.io.IOException: > java.lang.ArrayIndexOutOfBoundsException > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:80) > > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:419) > > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:189) > > ... 15 more > Caused by: java.io.IOException: java.io.IOException: > java.lang.ArrayIndexOutOfBoundsException > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) > > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) > > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:355) > > at > org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79) > > at > org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33) > > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116) > > at > org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:151) > > at org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:116) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68) > > ... 17 more > Caused by: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException > at > org.apache.hadoop.hive.llap.io.api.impl.LlapRecordReader.rethrowErrorIfAny(LlapRecordReader.java:355) > > at > org.apache.hadoop.hive.llap.io.api.impl.LlapRecordReader.nextCvb(LlapRecordReader.java:310) > > at > org.apache.hadoop.hive.llap.io.api.impl.LlapRecordReader.next(LlapRecordReader.java:250) > > at >
[jira] [Updated] (HIVE-24067) TestReplicationScenariosExclusiveReplica - Wrong FS error during DB drop
[ https://issues.apache.org/jira/browse/HIVE-24067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pravin Sinha updated HIVE-24067: Status: Patch Available (was: Open) > TestReplicationScenariosExclusiveReplica - Wrong FS error during DB drop > > > Key: HIVE-24067 > URL: https://issues.apache.org/jira/browse/HIVE-24067 > Project: Hive > Issue Type: Task >Reporter: Pravin Sinha >Assignee: Pravin Sinha >Priority: Major > Labels: pull-request-available > Attachments: HIVE-24067.01.patch > > Time Spent: 10m > Remaining Estimate: 0h > > In TestReplicationScenariosExclusiveReplica during drop database operation > for primary db, it leads to wrong FS error as the ReplChangeManager is > associated with replica FS. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24067) TestReplicationScenariosExclusiveReplica - Wrong FS error during DB drop
[ https://issues.apache.org/jira/browse/HIVE-24067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pravin Sinha updated HIVE-24067: Attachment: HIVE-24067.01.patch > TestReplicationScenariosExclusiveReplica - Wrong FS error during DB drop > > > Key: HIVE-24067 > URL: https://issues.apache.org/jira/browse/HIVE-24067 > Project: Hive > Issue Type: Task >Reporter: Pravin Sinha >Assignee: Pravin Sinha >Priority: Major > Labels: pull-request-available > Attachments: HIVE-24067.01.patch > > Time Spent: 10m > Remaining Estimate: 0h > > In TestReplicationScenariosExclusiveReplica during drop database operation > for primary db, it leads to wrong FS error as the ReplChangeManager is > associated with replica FS. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24067) TestReplicationScenariosExclusiveReplica - Wrong FS error during DB drop
[ https://issues.apache.org/jira/browse/HIVE-24067?focusedWorklogId=474043=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-474043 ] ASF GitHub Bot logged work on HIVE-24067: - Author: ASF GitHub Bot Created on: 24/Aug/20 20:41 Start Date: 24/Aug/20 20:41 Worklog Time Spent: 10m Work Description: pkumarsinha opened a new pull request #1425: URL: https://github.com/apache/hive/pull/1425 … during DB drop ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 474043) Remaining Estimate: 0h Time Spent: 10m > TestReplicationScenariosExclusiveReplica - Wrong FS error during DB drop > > > Key: HIVE-24067 > URL: https://issues.apache.org/jira/browse/HIVE-24067 > Project: Hive > Issue Type: Task >Reporter: Pravin Sinha >Assignee: Pravin Sinha >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > In TestReplicationScenariosExclusiveReplica during drop database operation > for primary db, it leads to wrong FS error as the ReplChangeManager is > associated with replica FS. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24067) TestReplicationScenariosExclusiveReplica - Wrong FS error during DB drop
[ https://issues.apache.org/jira/browse/HIVE-24067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-24067: -- Labels: pull-request-available (was: ) > TestReplicationScenariosExclusiveReplica - Wrong FS error during DB drop > > > Key: HIVE-24067 > URL: https://issues.apache.org/jira/browse/HIVE-24067 > Project: Hive > Issue Type: Task >Reporter: Pravin Sinha >Assignee: Pravin Sinha >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > In TestReplicationScenariosExclusiveReplica during drop database operation > for primary db, it leads to wrong FS error as the ReplChangeManager is > associated with replica FS. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24067) TestReplicationScenariosExclusiveReplica - Wrong FS error during DB drop
[ https://issues.apache.org/jira/browse/HIVE-24067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pravin Sinha reassigned HIVE-24067: --- > TestReplicationScenariosExclusiveReplica - Wrong FS error during DB drop > > > Key: HIVE-24067 > URL: https://issues.apache.org/jira/browse/HIVE-24067 > Project: Hive > Issue Type: Task >Reporter: Pravin Sinha >Assignee: Pravin Sinha >Priority: Major > > In TestReplicationScenariosExclusiveReplica during drop database operation > for primary db, it leads to wrong FS error as the ReplChangeManager is > associated with replica FS. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24066) Hive query on parquet data should identify if column is not present in file schema and show NULL value instead of Exception
[ https://issues.apache.org/jira/browse/HIVE-24066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jainik Vora updated HIVE-24066: --- Priority: Minor (was: Trivial) > Hive query on parquet data should identify if column is not present in file > schema and show NULL value instead of Exception > --- > > Key: HIVE-24066 > URL: https://issues.apache.org/jira/browse/HIVE-24066 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 2.3.5 >Reporter: Jainik Vora >Priority: Minor > > I created a hive table containing columns with struct data type > > {code:java} > CREATE EXTERNAL TABLE abc_dwh.table_on_parquet ( > `context` struct<`app`:struct<`build`:string, `name`:string, > `namespace`:string, `version`:string>, `screen`:struct<`height`:bigint, > `width`:bigint>, `timezone`:string>, > `messageid` string, > `timestamp` string, > `userid` string) > PARTITIONED BY (year string, month string, day string, hour string) > STORED as PARQUET > LOCATION 's3://abc/xyz'; > {code} > > All columns are nullable hence the parquet files read by the table don't > always contain all columns. If any file in a partition doesn't have > "context.app" struct and if "context.app.version" is queried, Hive throws an > exception as below. Same for "context.screen" as well. > > {code:java} > Caused by: java.io.IOException: java.lang.RuntimeException: Primitive type > appshould not doesn't match typeapp[version] > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97) > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:379) > at > org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:203) > ... 25 more > Caused by: java.lang.RuntimeException: Primitive type appshould not doesn't > match typeapp[version] > at > org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.projectLeafTypes(DataWritableReadSupport.java:330) > at > org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.projectLeafTypes(DataWritableReadSupport.java:322) > at > org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.getProjectedSchema(DataWritableReadSupport.java:249) > at > org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.init(DataWritableReadSupport.java:379) > at > org.apache.hadoop.hive.ql.io.parquet.ParquetRecordReaderBase.getSplit(ParquetRecordReaderBase.java:84) > at > org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:75) > at > org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:60) > at > org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:75) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:376) > ... 26 more > {code} > > Querying context.app shows as null > {code:java} > hive> select context.app from abc_dwh.table_on_parquet where year=2020 and > month='07' and day=26 and hour='03' limit 5; > OK > NULL > NULL > NULL > NULL > NULL > {code} > > As a workaround, I tried querying "context.app.version" only if "context.app" > is not null but that also gave the same error. *To verify the case statement > for null check, I ran below query which should produce "0" in result for all > columns produced "1".* Distinct value of context.app for the partition is > NULL so ruled out differences in select with limit. Running the same query in > SparkSQL provides the correct result. > {code:java} > hive> select case when context.app is null then 0 else 1 end status from > abc_dwh.table_on_parquet where year=2020 and month='07' and day=26 and > hour='03' limit 5; > OK > 1 > 1 > 1 > 1 > 1 {code} > Hive Version used: 2.3.5-amzn-0 (on AWS EMR){color:#88} > {color} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-3619) Hive JDBC driver should return a proper update-count of rows affected by query
[ https://issues.apache.org/jira/browse/HIVE-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17183511#comment-17183511 ] Miklos Szurap commented on HIVE-3619: - Linked HIVE-20218 - seems that fixed this jira. > Hive JDBC driver should return a proper update-count of rows affected by query > -- > > Key: HIVE-3619 > URL: https://issues.apache.org/jira/browse/HIVE-3619 > Project: Hive > Issue Type: Bug > Components: JDBC >Affects Versions: 0.9.0 >Reporter: Harsh J >Priority: Minor > Attachments: HIVE-3619.patch > > > HiveStatement.java currently has an explicit 0 return: > public int getUpdateCount() throws SQLException { return 0; } > Ideally we ought to emit the exact number of rows affected by the query > statement itself. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24031) Infinite planning time on syntactically big queries
[ https://issues.apache.org/jira/browse/HIVE-24031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-24031: -- Labels: pull-request-available (was: ) > Infinite planning time on syntactically big queries > --- > > Key: HIVE-24031 > URL: https://issues.apache.org/jira/browse/HIVE-24031 > Project: Hive > Issue Type: Bug > Components: Query Planning >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: ASTNode_getChildren_cost.png, > query_big_array_constructor.nps > > Time Spent: 10m > Remaining Estimate: 0h > > Syntactically big queries (~1 million tokens), such as the query shown below, > lead to very big (seemingly infinite) planning times. > {code:sql} > select posexplode(array('item1', 'item2', ..., 'item1M')); > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24031) Infinite planning time on syntactically big queries
[ https://issues.apache.org/jira/browse/HIVE-24031?focusedWorklogId=473922=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-473922 ] ASF GitHub Bot logged work on HIVE-24031: - Author: ASF GitHub Bot Created on: 24/Aug/20 15:05 Start Date: 24/Aug/20 15:05 Worklog Time Spent: 10m Work Description: zabetak opened a new pull request #1424: URL: https://github.com/apache/hive/pull/1424 ### What changes were proposed in this pull request? 1. Drop the defensive copy of children inside ASTNode#getChildren. 2. Protect clients by accidentally modifying the list via an unmodifiable collection. ### Why are the changes needed? Profiling shows the vast majority of time spend on creating defensive copies of the node expression list inside ASTNode#getChildren. The method is called extensively from various places in the code especially those walking over the expression tree so it needs to be efficient. Most of the time creating defensive copies is not necessary. For those cases (if any) that the list needs to be modified clients should perform a copy themselves. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? The test was added in a separate branch since it is not meant to be committed upstream for the following reasons: - the query for reproducing the problem takes up a few MBs - requires some changes in the default configurations. If you want to run the test run the following commands: ``` git checkout -b HIVE-24031-TEST master git pull g...@github.com:zabetak/hive.git HIVE-24031-PLUS-TEST mvn clean install -DskipTests cd itests mvn clean install -DskipTests cd qtest mvn test -Dtest=TestMiniLlapLocalCliDriver -Dqfile=big_query_with_array_constructor.q -Dtest.output.overwrite ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 473922) Remaining Estimate: 0h Time Spent: 10m > Infinite planning time on syntactically big queries > --- > > Key: HIVE-24031 > URL: https://issues.apache.org/jira/browse/HIVE-24031 > Project: Hive > Issue Type: Bug > Components: Query Planning >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Fix For: 4.0.0 > > Attachments: ASTNode_getChildren_cost.png, > query_big_array_constructor.nps > > Time Spent: 10m > Remaining Estimate: 0h > > Syntactically big queries (~1 million tokens), such as the query shown below, > lead to very big (seemingly infinite) planning times. > {code:sql} > select posexplode(array('item1', 'item2', ..., 'item1M')); > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24065) Bloom filters can be cached after deserialization in VectorInBloomFilterColDynamicValue
[ https://issues.apache.org/jira/browse/HIVE-24065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-24065: -- Labels: pull-request-available (was: ) > Bloom filters can be cached after deserialization in > VectorInBloomFilterColDynamicValue > --- > > Key: HIVE-24065 > URL: https://issues.apache.org/jira/browse/HIVE-24065 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > Attachments: image-2020-08-05-10-05-25-080.png > > Time Spent: 10m > Remaining Estimate: 0h > > Same bloom filter is loaded multiple times across tasks. It would be good to > check if we can optimise this, to avoid deserializing. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24065) Bloom filters can be cached after deserialization in VectorInBloomFilterColDynamicValue
[ https://issues.apache.org/jira/browse/HIVE-24065?focusedWorklogId=473920=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-473920 ] ASF GitHub Bot logged work on HIVE-24065: - Author: ASF GitHub Bot Created on: 24/Aug/20 14:49 Start Date: 24/Aug/20 14:49 Worklog Time Spent: 10m Work Description: abstractdog opened a new pull request #1423: URL: https://github.com/apache/hive/pull/1423 Change-Id: I311f131c03392618cc2dac186e7e53a48ede1eb4 ### What changes were proposed in this pull request? As the title suggests, expensive bloom filter deserialization can be eliminated by caching the bloom filters. This way, only 1 filter instance per daemon (or container in container mode) will be present. ### Why are the changes needed? Performance improvement. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Tested on cluster. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 473920) Remaining Estimate: 0h Time Spent: 10m > Bloom filters can be cached after deserialization in > VectorInBloomFilterColDynamicValue > --- > > Key: HIVE-24065 > URL: https://issues.apache.org/jira/browse/HIVE-24065 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Attachments: image-2020-08-05-10-05-25-080.png > > Time Spent: 10m > Remaining Estimate: 0h > > Same bloom filter is loaded multiple times across tasks. It would be good to > check if we can optimise this, to avoid deserializing. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24065) Bloom filters can be cached after deserialization in VectorInBloomFilterColDynamicValue
[ https://issues.apache.org/jira/browse/HIVE-24065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated HIVE-24065: Attachment: image-2020-08-05-10-05-25-080.png > Bloom filters can be cached after deserialization in > VectorInBloomFilterColDynamicValue > --- > > Key: HIVE-24065 > URL: https://issues.apache.org/jira/browse/HIVE-24065 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Attachments: image-2020-08-05-10-05-25-080.png > > > Same bloom filter is loaded multiple times across tasks. It would be good to > check if we can optimise this, to avoid deserializing. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-24065) Bloom filters can be cached after deserialization in VectorInBloomFilterColDynamicValue
[ https://issues.apache.org/jira/browse/HIVE-24065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17183353#comment-17183353 ] László Bodor commented on HIVE-24065: - the idea is to cache the bloom filters in order to eliminate deserialization > Bloom filters can be cached after deserialization in > VectorInBloomFilterColDynamicValue > --- > > Key: HIVE-24065 > URL: https://issues.apache.org/jira/browse/HIVE-24065 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Attachments: image-2020-08-05-10-05-25-080.png > > > Same bloom filter is loaded multiple times across tasks. It would be good to > check if we can optimise this, to avoid deserializing. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24065) Bloom filters can be cached after deserialization in VectorInBloomFilterColDynamicValue
[ https://issues.apache.org/jira/browse/HIVE-24065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated HIVE-24065: Description: Same bloom filter is loaded multiple times across tasks. It would be good to check if we can optimise this, to avoid deserializing. > Bloom filters can be cached after deserialization in > VectorInBloomFilterColDynamicValue > --- > > Key: HIVE-24065 > URL: https://issues.apache.org/jira/browse/HIVE-24065 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > > Same bloom filter is loaded multiple times across tasks. It would be good to > check if we can optimise this, to avoid deserializing. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24065) Bloom filters can be cached after deserialization in VectorInBloomFilterColDynamicValue
[ https://issues.apache.org/jira/browse/HIVE-24065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor reassigned HIVE-24065: --- Assignee: László Bodor > Bloom filters can be cached after deserialization in > VectorInBloomFilterColDynamicValue > --- > > Key: HIVE-24065 > URL: https://issues.apache.org/jira/browse/HIVE-24065 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24064) Disable Materialized View Replication
[ https://issues.apache.org/jira/browse/HIVE-24064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arko Sharma updated HIVE-24064: --- Attachment: HIVE-24064.01.patch Status: Patch Available (was: Open) > Disable Materialized View Replication > - > > Key: HIVE-24064 > URL: https://issues.apache.org/jira/browse/HIVE-24064 > Project: Hive > Issue Type: Bug >Reporter: Arko Sharma >Assignee: Arko Sharma >Priority: Major > Labels: pull-request-available > Attachments: HIVE-24064.01.patch > > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24064) Disable Materialized View Replication
[ https://issues.apache.org/jira/browse/HIVE-24064?focusedWorklogId=473902=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-473902 ] ASF GitHub Bot logged work on HIVE-24064: - Author: ASF GitHub Bot Created on: 24/Aug/20 13:47 Start Date: 24/Aug/20 13:47 Worklog Time Spent: 10m Work Description: ArkoSharma opened a new pull request #1422: URL: https://github.com/apache/hive/pull/1422 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 473902) Remaining Estimate: 0h Time Spent: 10m > Disable Materialized View Replication > - > > Key: HIVE-24064 > URL: https://issues.apache.org/jira/browse/HIVE-24064 > Project: Hive > Issue Type: Bug >Reporter: Arko Sharma >Assignee: Arko Sharma >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24064) Disable Materialized View Replication
[ https://issues.apache.org/jira/browse/HIVE-24064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-24064: -- Labels: pull-request-available (was: ) > Disable Materialized View Replication > - > > Key: HIVE-24064 > URL: https://issues.apache.org/jira/browse/HIVE-24064 > Project: Hive > Issue Type: Bug >Reporter: Arko Sharma >Assignee: Arko Sharma >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24064) Disable Materialized View Replication
[ https://issues.apache.org/jira/browse/HIVE-24064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arko Sharma reassigned HIVE-24064: -- Assignee: Arko Sharma > Disable Materialized View Replication > - > > Key: HIVE-24064 > URL: https://issues.apache.org/jira/browse/HIVE-24064 > Project: Hive > Issue Type: Bug >Reporter: Arko Sharma >Assignee: Arko Sharma >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23880) Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge
[ https://issues.apache.org/jira/browse/HIVE-23880?focusedWorklogId=473884=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-473884 ] ASF GitHub Bot logged work on HIVE-23880: - Author: ASF GitHub Bot Created on: 24/Aug/20 12:49 Start Date: 24/Aug/20 12:49 Worklog Time Spent: 10m Work Description: abstractdog opened a new pull request #1280: URL: https://github.com/apache/hive/pull/1280 ## NOTICE Please create an issue in ASF JIRA before opening a pull request, and you need to set the title of the pull request which starts with the corresponding JIRA issue number. (e.g. HIVE-X: Fix a typo in YYY) For more details, please see https://cwiki.apache.org/confluence/display/Hive/HowToContribute This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 473884) Time Spent: 8h 20m (was: 8h 10m) > Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge > --- > > Key: HIVE-23880 > URL: https://issues.apache.org/jira/browse/HIVE-23880 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > Attachments: lipwig-output3605036885489193068.svg > > Time Spent: 8h 20m > Remaining Estimate: 0h > > Merging bloom filters in semijoin reduction can become the main bottleneck in > case of large number of source mapper tasks (~1000, Map 1 in below example) > and a large amount of expected entries (50M) in bloom filters. > For example in TPCDS Q93: > {code} > select /*+ semi(store_returns, sr_item_sk, store_sales, 7000)*/ > ss_customer_sk > ,sum(act_sales) sumsales > from (select ss_item_sk > ,ss_ticket_number > ,ss_customer_sk > ,case when sr_return_quantity is not null then > (ss_quantity-sr_return_quantity)*ss_sales_price > else > (ss_quantity*ss_sales_price) end act_sales > from store_sales left outer join store_returns on (sr_item_sk = > ss_item_sk >and > sr_ticket_number = ss_ticket_number) > ,reason > where sr_reason_sk = r_reason_sk > and r_reason_desc = 'reason 66') t > group by ss_customer_sk > order by sumsales, ss_customer_sk > limit 100; > {code} > On 10TB-30TB scale there is a chance that from 3-4 mins of query runtime 1-2 > mins are spent with merging bloom filters (Reducer 2), as in: > [^lipwig-output3605036885489193068.svg] > {code} > -- > VERTICES MODESTATUS TOTAL COMPLETED RUNNING PENDING > FAILED KILLED > -- > Map 3 .. llap SUCCEEDED 1 100 > 0 0 > Map 1 .. llap SUCCEEDED 1263 126300 > 0 0 > Reducer 2 llap RUNNING 1 010 > 0 0 > Map 4 llap RUNNING 6154 0 207 5947 > 0 0 > Reducer 5 llapINITED 43 00 43 > 0 0 > Reducer 6 llapINITED 1 001 > 0 0 > -- > VERTICES: 02/06 [>>--] 16% ELAPSED TIME: 149.98 s > -- > {code} > For example, 70M entries in bloom filter leads to a 436 465 696 bits, so > merging 1263 bloom filters means running ~ 1263 * 436 465 696 bitwise OR > operation, which is very hot codepath, but can be parallelized. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23880) Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge
[ https://issues.apache.org/jira/browse/HIVE-23880?focusedWorklogId=473883=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-473883 ] ASF GitHub Bot logged work on HIVE-23880: - Author: ASF GitHub Bot Created on: 24/Aug/20 12:49 Start Date: 24/Aug/20 12:49 Worklog Time Spent: 10m Work Description: abstractdog closed pull request #1280: URL: https://github.com/apache/hive/pull/1280 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 473883) Time Spent: 8h 10m (was: 8h) > Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge > --- > > Key: HIVE-23880 > URL: https://issues.apache.org/jira/browse/HIVE-23880 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > Attachments: lipwig-output3605036885489193068.svg > > Time Spent: 8h 10m > Remaining Estimate: 0h > > Merging bloom filters in semijoin reduction can become the main bottleneck in > case of large number of source mapper tasks (~1000, Map 1 in below example) > and a large amount of expected entries (50M) in bloom filters. > For example in TPCDS Q93: > {code} > select /*+ semi(store_returns, sr_item_sk, store_sales, 7000)*/ > ss_customer_sk > ,sum(act_sales) sumsales > from (select ss_item_sk > ,ss_ticket_number > ,ss_customer_sk > ,case when sr_return_quantity is not null then > (ss_quantity-sr_return_quantity)*ss_sales_price > else > (ss_quantity*ss_sales_price) end act_sales > from store_sales left outer join store_returns on (sr_item_sk = > ss_item_sk >and > sr_ticket_number = ss_ticket_number) > ,reason > where sr_reason_sk = r_reason_sk > and r_reason_desc = 'reason 66') t > group by ss_customer_sk > order by sumsales, ss_customer_sk > limit 100; > {code} > On 10TB-30TB scale there is a chance that from 3-4 mins of query runtime 1-2 > mins are spent with merging bloom filters (Reducer 2), as in: > [^lipwig-output3605036885489193068.svg] > {code} > -- > VERTICES MODESTATUS TOTAL COMPLETED RUNNING PENDING > FAILED KILLED > -- > Map 3 .. llap SUCCEEDED 1 100 > 0 0 > Map 1 .. llap SUCCEEDED 1263 126300 > 0 0 > Reducer 2 llap RUNNING 1 010 > 0 0 > Map 4 llap RUNNING 6154 0 207 5947 > 0 0 > Reducer 5 llapINITED 43 00 43 > 0 0 > Reducer 6 llapINITED 1 001 > 0 0 > -- > VERTICES: 02/06 [>>--] 16% ELAPSED TIME: 149.98 s > -- > {code} > For example, 70M entries in bloom filter leads to a 436 465 696 bits, so > merging 1263 bloom filters means running ~ 1263 * 436 465 696 bitwise OR > operation, which is very hot codepath, but can be parallelized. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-18284) NPE when inserting data with 'distribute by' clause with dynpart sort optimization
[ https://issues.apache.org/jira/browse/HIVE-18284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17183160#comment-17183160 ] Syed Shameerur Rahman commented on HIVE-18284: -- [~jcamachorodriguez] Could you please review the PR? > NPE when inserting data with 'distribute by' clause with dynpart sort > optimization > -- > > Key: HIVE-18284 > URL: https://issues.apache.org/jira/browse/HIVE-18284 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 2.3.1, 2.3.2 >Reporter: Aki Tanaka >Assignee: Syed Shameerur Rahman >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > A Null Pointer Exception occurs when inserting data with 'distribute by' > clause. The following snippet query reproduces this issue: > *(non-vectorized , non-llap mode)* > {code:java} > create table table1 (col1 string, datekey int); > insert into table1 values ('ROW1', 1), ('ROW2', 2), ('ROW3', 1); > create table table2 (col1 string) partitioned by (datekey int); > set hive.vectorized.execution.enabled=false; > set hive.optimize.sort.dynamic.partition=true; > set hive.exec.dynamic.partition.mode=nonstrict; > insert into table table2 > PARTITION(datekey) > select col1, > datekey > from table1 > distribute by datekey ; > {code} > I could run the insert query without the error if I remove Distribute By or > use Cluster By clause. > It seems that the issue happens because Distribute By does not guarantee > clustering or sorting properties on the distributed keys. > FileSinkOperator removes the previous fsp. FileSinkOperator will remove the > previous fsp which might be re-used when we use Distribute By. > https://github.com/apache/hive/blob/branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java#L972 > The following stack trace is logged. > {code:java} > Vertex failed, vertexName=Reducer 2, vertexId=vertex_1513111717879_0056_1_01, > diagnostics=[Task failed, taskId=task_1513111717879_0056_1_01_00, > diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( > failure ) : > attempt_1513111717879_0056_1_01_00_0:java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing row (tag=0) {"key":{},"value":{"_col0":"ROW3","_col1":1}} > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime > Error while processing row (tag=0) > {"key":{},"value":{"_col0":"ROW3","_col1":1}} > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:365) > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:250) > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:317) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:185) > ... 14 more > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:762) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:897) > at > org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95) > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:356) > ... 17 more >
[jira] [Work logged] (HIVE-23851) MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions
[ https://issues.apache.org/jira/browse/HIVE-23851?focusedWorklogId=473832=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-473832 ] ASF GitHub Bot logged work on HIVE-23851: - Author: ASF GitHub Bot Created on: 24/Aug/20 11:01 Start Date: 24/Aug/20 11:01 Worklog Time Spent: 10m Work Description: shameersss1 commented on pull request #1271: URL: https://github.com/apache/hive/pull/1271#issuecomment-679059909 @kgyrtkirk Could you please review the PR? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 473832) Time Spent: 2h 20m (was: 2h 10m) > MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions > > > Key: HIVE-23851 > URL: https://issues.apache.org/jira/browse/HIVE-23851 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Syed Shameerur Rahman >Assignee: Syed Shameerur Rahman >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 2h 20m > Remaining Estimate: 0h > > *Steps to reproduce:* > # Create external table > # Run msck command to sync all the partitions with metastore > # Remove one of the partition path > # Run msck repair with partition filtering > *Stack Trace:* > {code:java} > 2020-07-15T02:10:29,045 ERROR [4dad298b-28b1-4e6b-94b6-aa785b60c576 main] > ppr.PartitionExpressionForMetastore: Failed to deserialize the expression > java.lang.IndexOutOfBoundsException: Index: 110, Size: 0 > at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_192] > at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_192] > at > org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:857) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:707) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:806) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:96) > [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.convertExprToFilter(PartitionExpressionForMetastore.java:52) > [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.PartFilterExprUtil.makeExpressionTree(PartFilterExprUtil.java:48) > [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExprInternal(ObjectStore.java:3593) > [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.VerifyingObjectStore.getPartitionsByExpr(VerifyingObjectStore.java:80) > [hive-standalone-metastore-server-4.0.0-SNAPSHOT-tests.jar:4.0.0-SNAPSHOT] > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_192] > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > ~[?:1.8.0_192] > {code} > *Cause:* > In case of msck repair with partition filtering we expect expression proxy > class to be set as PartitionExpressionForMetastore ( > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/ddl/misc/msck/MsckAnalyzer.java#L78 > ), While dropping partition we serialize the drop partition filter > expression as ( > https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/Msck.java#L589 > ) which is incompatible during deserializtion happening in > PartitionExpressionForMetastore ( >
[jira] [Commented] (HIVE-23851) MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions
[ https://issues.apache.org/jira/browse/HIVE-23851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17183159#comment-17183159 ] Syed Shameerur Rahman commented on HIVE-23851: -- [~kgyrtkirk] Could you please review the PR? > MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions > > > Key: HIVE-23851 > URL: https://issues.apache.org/jira/browse/HIVE-23851 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Syed Shameerur Rahman >Assignee: Syed Shameerur Rahman >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 2h 20m > Remaining Estimate: 0h > > *Steps to reproduce:* > # Create external table > # Run msck command to sync all the partitions with metastore > # Remove one of the partition path > # Run msck repair with partition filtering > *Stack Trace:* > {code:java} > 2020-07-15T02:10:29,045 ERROR [4dad298b-28b1-4e6b-94b6-aa785b60c576 main] > ppr.PartitionExpressionForMetastore: Failed to deserialize the expression > java.lang.IndexOutOfBoundsException: Index: 110, Size: 0 > at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_192] > at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_192] > at > org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:857) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:707) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:806) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:96) > [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.convertExprToFilter(PartitionExpressionForMetastore.java:52) > [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.PartFilterExprUtil.makeExpressionTree(PartFilterExprUtil.java:48) > [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExprInternal(ObjectStore.java:3593) > [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.VerifyingObjectStore.getPartitionsByExpr(VerifyingObjectStore.java:80) > [hive-standalone-metastore-server-4.0.0-SNAPSHOT-tests.jar:4.0.0-SNAPSHOT] > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_192] > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > ~[?:1.8.0_192] > {code} > *Cause:* > In case of msck repair with partition filtering we expect expression proxy > class to be set as PartitionExpressionForMetastore ( > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/ddl/misc/msck/MsckAnalyzer.java#L78 > ), While dropping partition we serialize the drop partition filter > expression as ( > https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/Msck.java#L589 > ) which is incompatible during deserializtion happening in > PartitionExpressionForMetastore ( > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionExpressionForMetastore.java#L52 > ) hence the query fails with Failed to deserialize the expression. > *Solutions*: > I could think of two approaches to this problem > # Since PartitionExpressionForMetastore is required only during parition > pruning step, We can switch back the expression proxy class to > MsckPartitionExpressionProxy once the partition pruning step is done. > # The other solution is to make serialization process in msck drop partition > filter expression compatible with the one with > PartitionExpressionForMetastore, We can do this via Reflection since the drop > partition serialization happens in Msck class (standadlone-metatsore) by this > way we can completely remove the need for class
[jira] [Updated] (HIVE-23926) Flaky test TestTableLevelReplicationScenarios.testRenameTableScenariosWithReplacePolicyDMLOperattion
[ https://issues.apache.org/jira/browse/HIVE-23926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anishek Agarwal updated HIVE-23926: --- Resolution: Fixed Status: Resolved (was: Patch Available) +1. Merged to master. Thanks for the patch [~^sharma] > Flaky test > TestTableLevelReplicationScenarios.testRenameTableScenariosWithReplacePolicyDMLOperattion > > > Key: HIVE-23926 > URL: https://issues.apache.org/jira/browse/HIVE-23926 > Project: Hive > Issue Type: Bug >Reporter: Zoltan Haindrich >Assignee: Arko Sharma >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23926.01.patch > > Time Spent: 10m > Remaining Estimate: 0h > > http://ci.hive.apache.org/job/hive-precommit/job/master/123/testReport/org.apache.hadoop.hive.ql.parse/TestTableLevelReplicationScenarios/Testing___split_18___Archive___testRenameTableScenariosWithReplacePolicyDMLOperattion/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24032) Remove hadoop shims dependency and use FileSystem Api directly from standalone metastore
[ https://issues.apache.org/jira/browse/HIVE-24032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anishek Agarwal updated HIVE-24032: --- Resolution: Fixed Status: Resolved (was: Patch Available) Merged to master. Thanks for the patch [~aasha] an review [~pkumarsinha] ! > Remove hadoop shims dependency and use FileSystem Api directly from > standalone metastore > > > Key: HIVE-24032 > URL: https://issues.apache.org/jira/browse/HIVE-24032 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-24032.01.patch, HIVE-24032.02.patch, > HIVE-24032.03.patch > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Remove hadoop shims dependency from standalone metastore. > Rename hive.repl.data.copy.lazy hive conf to > hive.repl.run.data.copy.tasks.on.target -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23723) Limit operator pushdown through LOJ
[ https://issues.apache.org/jira/browse/HIVE-23723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Kasa updated HIVE-23723: -- Resolution: Fixed Status: Resolved (was: Patch Available) > Limit operator pushdown through LOJ > --- > > Key: HIVE-23723 > URL: https://issues.apache.org/jira/browse/HIVE-23723 > Project: Hive > Issue Type: Improvement > Components: Hive >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Limit operator (without an order by) can be pushed through SELECTS and LEFT > OUTER JOINs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23723) Limit operator pushdown through LOJ
[ https://issues.apache.org/jira/browse/HIVE-23723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17183131#comment-17183131 ] Krisztian Kasa commented on HIVE-23723: --- Pushed to master, thanks [~amagyar]! > Limit operator pushdown through LOJ > --- > > Key: HIVE-23723 > URL: https://issues.apache.org/jira/browse/HIVE-23723 > Project: Hive > Issue Type: Improvement > Components: Hive >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Limit operator (without an order by) can be pushed through SELECTS and LEFT > OUTER JOINs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24032) Remove hadoop shims dependency and use FileSystem Api directly from standalone metastore
[ https://issues.apache.org/jira/browse/HIVE-24032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aasha Medhi updated HIVE-24032: --- Description: Remove hadoop shims dependency from standalone metastore. > Remove hadoop shims dependency and use FileSystem Api directly from > standalone metastore > > > Key: HIVE-24032 > URL: https://issues.apache.org/jira/browse/HIVE-24032 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-24032.01.patch, HIVE-24032.02.patch, > HIVE-24032.03.patch > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Remove hadoop shims dependency from standalone metastore. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23723) Limit operator pushdown through LOJ
[ https://issues.apache.org/jira/browse/HIVE-23723?focusedWorklogId=473820=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-473820 ] ASF GitHub Bot logged work on HIVE-23723: - Author: ASF GitHub Bot Created on: 24/Aug/20 10:12 Start Date: 24/Aug/20 10:12 Worklog Time Spent: 10m Work Description: kasakrisz merged pull request #1323: URL: https://github.com/apache/hive/pull/1323 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 473820) Time Spent: 1h 20m (was: 1h 10m) > Limit operator pushdown through LOJ > --- > > Key: HIVE-23723 > URL: https://issues.apache.org/jira/browse/HIVE-23723 > Project: Hive > Issue Type: Improvement > Components: Hive >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Limit operator (without an order by) can be pushed through SELECTS and LEFT > OUTER JOINs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24032) Remove hadoop shims dependency and use FileSystem Api directly from standalone metastore
[ https://issues.apache.org/jira/browse/HIVE-24032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aasha Medhi updated HIVE-24032: --- Description: Remove hadoop shims dependency from standalone metastore. Rename hive.repl.data.copy.lazy hive conf to hive.repl.run.data.copy.tasks.on.target was:Remove hadoop shims dependency from standalone metastore. > Remove hadoop shims dependency and use FileSystem Api directly from > standalone metastore > > > Key: HIVE-24032 > URL: https://issues.apache.org/jira/browse/HIVE-24032 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-24032.01.patch, HIVE-24032.02.patch, > HIVE-24032.03.patch > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Remove hadoop shims dependency from standalone metastore. > Rename hive.repl.data.copy.lazy hive conf to > hive.repl.run.data.copy.tasks.on.target -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24063) SqlFunctionConverter#getHiveUDF handles cast before geting FunctionInfo
[ https://issues.apache.org/jira/browse/HIVE-24063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-24063: -- Labels: pull-request-available (was: ) > SqlFunctionConverter#getHiveUDF handles cast before geting FunctionInfo > --- > > Key: HIVE-24063 > URL: https://issues.apache.org/jira/browse/HIVE-24063 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Reporter: Zhihua Deng >Priority: Trivial > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > When the current SqlOperator is SqlCastFunction, > FunctionRegistry.getFunctionInfo would return null, > but when hive.allow.udf.load.on.demand is enabled, HiveServer2 will refer to > metastore for the function definition, an exception stack trace can be seen > here in HiveServer2 log: > INFO exec.FunctionRegistry: Unable to look up default.cast in metastore > org.apache.hadoop.hive.ql.metadata.HiveException: > NoSuchObjectException(message:Function @hive#default.cast does not exist) > at org.apache.hadoop.hive.ql.metadata.Hive.getFunction(Hive.java:5495) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.Registry.getFunctionInfoFromMetastoreNoLock(Registry.java:788) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.Registry.getQualifiedFunctionInfo(Registry.java:657) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.Registry.getFunctionInfo(Registry.java:351) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.FunctionRegistry.getFunctionInfo(FunctionRegistry.java:597) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.calcite.translator.SqlFunctionConverter.getHiveUDF(SqlFunctionConverter.java:158) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.calcite.rules.PartitionPrune$ExtractPartPruningPredicate.visitCall(PartitionPrune.java:112) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.calcite.rules.PartitionPrune$ExtractPartPruningPredicate.visitCall(PartitionPrune.java:68) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at org.apache.calcite.rex.RexCall.accept(RexCall.java:191) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.calcite.rules.PartitionPrune$ExtractPartPruningPredicate.visitCall(PartitionPrune.java:134) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.calcite.rules.PartitionPrune$ExtractPartPruningPredicate.visitCall(PartitionPrune.java:68) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at org.apache.calcite.rex.RexCall.accept(RexCall.java:191) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.calcite.rules.PartitionPrune$ExtractPartPruningPredicate.visitCall(PartitionPrune.java:134) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.calcite.rules.PartitionPrune$ExtractPartPruningPredicate.visitCall(PartitionPrune.java:68) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at org.apache.calcite.rex.RexCall.accept(RexCall.java:191) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.calcite.rules.PartitionPrune$ExtractPartPruningPredicate.visitCall(PartitionPrune.java:134) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > > So it's may be better to handle explicit cast before geting the FunctionInfo > from Registry. Even if there is no cast in the query, the method > handleExplicitCast returns null quickly when op.kind is not a SqlKind.CAST. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24063) SqlFunctionConverter#getHiveUDF handles cast before geting FunctionInfo
[ https://issues.apache.org/jira/browse/HIVE-24063?focusedWorklogId=473816=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-473816 ] ASF GitHub Bot logged work on HIVE-24063: - Author: ASF GitHub Bot Created on: 24/Aug/20 10:06 Start Date: 24/Aug/20 10:06 Worklog Time Spent: 10m Work Description: dengzhhu653 opened a new pull request #1421: URL: https://github.com/apache/hive/pull/1421 …g FunctionInfo ### What changes were proposed in this pull request? SqlFunctionConverter#getHiveUDF handles cast before geting FunctionInfo ### Why are the changes needed? With hive.allow.udf.load.on.demand is enabled, another rpc call will be make to metastore for cast definition when getting FunctionInfo, but there is no need to do this. ### Does this PR introduce _any_ user-facing change No ### How was this patch tested? Included tests This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 473816) Remaining Estimate: 0h Time Spent: 10m > SqlFunctionConverter#getHiveUDF handles cast before geting FunctionInfo > --- > > Key: HIVE-24063 > URL: https://issues.apache.org/jira/browse/HIVE-24063 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Reporter: Zhihua Deng >Priority: Trivial > Time Spent: 10m > Remaining Estimate: 0h > > When the current SqlOperator is SqlCastFunction, > FunctionRegistry.getFunctionInfo would return null, > but when hive.allow.udf.load.on.demand is enabled, HiveServer2 will refer to > metastore for the function definition, an exception stack trace can be seen > here in HiveServer2 log: > INFO exec.FunctionRegistry: Unable to look up default.cast in metastore > org.apache.hadoop.hive.ql.metadata.HiveException: > NoSuchObjectException(message:Function @hive#default.cast does not exist) > at org.apache.hadoop.hive.ql.metadata.Hive.getFunction(Hive.java:5495) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.Registry.getFunctionInfoFromMetastoreNoLock(Registry.java:788) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.Registry.getQualifiedFunctionInfo(Registry.java:657) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.Registry.getFunctionInfo(Registry.java:351) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.FunctionRegistry.getFunctionInfo(FunctionRegistry.java:597) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.calcite.translator.SqlFunctionConverter.getHiveUDF(SqlFunctionConverter.java:158) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.calcite.rules.PartitionPrune$ExtractPartPruningPredicate.visitCall(PartitionPrune.java:112) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.calcite.rules.PartitionPrune$ExtractPartPruningPredicate.visitCall(PartitionPrune.java:68) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at org.apache.calcite.rex.RexCall.accept(RexCall.java:191) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.calcite.rules.PartitionPrune$ExtractPartPruningPredicate.visitCall(PartitionPrune.java:134) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.calcite.rules.PartitionPrune$ExtractPartPruningPredicate.visitCall(PartitionPrune.java:68) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at org.apache.calcite.rex.RexCall.accept(RexCall.java:191) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.calcite.rules.PartitionPrune$ExtractPartPruningPredicate.visitCall(PartitionPrune.java:134) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.calcite.rules.PartitionPrune$ExtractPartPruningPredicate.visitCall(PartitionPrune.java:68) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at org.apache.calcite.rex.RexCall.accept(RexCall.java:191) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.calcite.rules.PartitionPrune$ExtractPartPruningPredicate.visitCall(PartitionPrune.java:134) >
[jira] [Assigned] (HIVE-24062) Combine all table constrains RDBMS calls in one SQL call
[ https://issues.apache.org/jira/browse/HIVE-24062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashish Sharma reassigned HIVE-24062: > Combine all table constrains RDBMS calls in one SQL call > > > Key: HIVE-24062 > URL: https://issues.apache.org/jira/browse/HIVE-24062 > Project: Hive > Issue Type: Improvement >Reporter: Ashish Sharma >Assignee: Ashish Sharma >Priority: Major > > Table consist of 6 different type of constrains namely > PrimaryKey,ForeignKey,UniqueConstraint,NotNullConstraint,DefaultConstraint,CheckConstraint. > All constrains has different SQL query to fetch the infromation from RDBMS. > Which lead to 6 different RDBS call. > Idea here is to have one complex query which fetch all the constrains > information at once then filter the result set on the basis of constrains > type. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22352) Hive JDBC Storage Handler, simple select query failed with NPE if executed using Fetch Task
[ https://issues.apache.org/jira/browse/HIVE-22352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17183003#comment-17183003 ] chenruotao commented on HIVE-22352: --- maybe you should try to set hive.sql.query.fieldNames=id,date and set hive.sql.query.fieldTypes=int,timestamp or update the ext table column name col1 to id and col2 to date > Hive JDBC Storage Handler, simple select query failed with NPE if executed > using Fetch Task > --- > > Key: HIVE-22352 > URL: https://issues.apache.org/jira/browse/HIVE-22352 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 3.1.1 > Environment: Hive-3.1 >Reporter: Rajkumar Singh >Priority: Blocker > > Steps To Repro: > > {code:java} > // MySQL Table > CREATE TABLE `visitors` ( `id` bigint(20) unsigned NOT NULL, `date` timestamp > NOT NULL DEFAULT CURRENT_TIMESTAMP ) > // hive table > CREATE EXTERNAL TABLE `hive_visitors`( `col1` bigint COMMENT 'from > deserializer', `col2` timestamp COMMENT 'from deserializer') ROW FORMAT SERDE > 'org.apache.hive.storage.jdbc.JdbcSerDe' STORED BY > 'org.apache.hive.storage.jdbc.JdbcStorageHandler' WITH SERDEPROPERTIES ( > 'serialization.format'='1') TBLPROPERTIES ( 'bucketing_version'='2', > 'hive.sql.database.type'='MYSQL', 'hive.sql.dbcp.maxActive'='1', > 'hive.sql.dbcp.password'='hive', 'hive.sql.dbcp.username'='hive', > 'hive.sql.jdbc.driver'='com.mysql.jdbc.Driver', > 'hive.sql.jdbc.url'='jdbc:mysql://hostname/test', > 'hive.sql.table'='visitors', 'transient_lastDdlTime'='1554910389') > Query: > select * from hive_visitors ; > Exception: > 2019-10-16T04:04:39,483 WARN [HiveServer2-Handler-Pool: Thread-71]: > thrift.ThriftCLIService (:()) - Error fetching results: > org.apache.hive.service.cli.HiveSQLException: java.io.IOException: > java.lang.NullPointerException at > org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:478) > ~[hive-service-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315] at > org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:328) > ~[hive-service-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315] at > org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:952) > ~[hive-service-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315] at > sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source) ~[?:?] at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > ~[?:1.8.0_112] at java.lang.reflect.Method.invoke(Method.java:498) > ~[?:1.8.0_112] at > org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78) > ~[hive-service-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315] at > org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36) > ~[hive-service-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315] at > org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63) > ~[hive-service-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315] at > java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_112] at > javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_112] at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > ~[hadoop-common-3.1.1.3.1.4.0-315.jar:?] at > org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59) > ~[hive-service-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315] at > com.sun.proxy.$Proxy42.fetchResults(Unknown Source) ~[?:?] at > org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:565) > ~[hive-service-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315] at > org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:792) > ~[hive-service-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315] at > org.apache.hive.service.rpc.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1837) > ~[hive-exec-3.1.0.3.1.4.0-315.jar:3.1.1000-SNAPSHOT] at > org.apache.hive.service.rpc.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1822) > ~[hive-exec-3.1.0.3.1.4.0-315.jar:3.1.1000-SNAPSHOT] at > org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) > ~[hive-exec-3.1.0.3.1.4.0-315.jar:3.1.1000-SNAPSHOT] at > org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) > ~[hive-exec-3.1.0.3.1.4.0-315.jar:3.1.1000-SNAPSHOT] at > org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56) > ~[hive-service-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315] at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) > ~[hive-exec-3.1.0.3.1.4.0-315.jar:3.1.1000-SNAPSHOT] at >