[jira] [Commented] (HIVE-16352) Ability to skip or repair out of sync blocks with HIVE at runtime
[ https://issues.apache.org/jira/browse/HIVE-16352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264647#comment-17264647 ] gabrywu commented on HIVE-16352: [~kgyrtkirk] Yes, it's small and very useful. > Ability to skip or repair out of sync blocks with HIVE at runtime > - > > Key: HIVE-16352 > URL: https://issues.apache.org/jira/browse/HIVE-16352 > Project: Hive > Issue Type: New Feature > Components: Avro, File Formats, Reader >Affects Versions: 3.1.2 >Reporter: Navdeep Poonia >Assignee: gabrywu >Priority: Major > Labels: pull-request-available > Time Spent: 1h 50m > Remaining Estimate: 0h > > When a file is corrupted it raises the error java.io.IOException: Invalid > sync! with hive. > Can we have some functionality to skip or repair such blocks at runtime to > make avro more error resilient in case of data corruption. > Error: java.io.IOException: java.io.IOException: java.io.IOException: While > processing file > s3n:///navdeepp/warehouse/avro_test/354dc34474404f4bbc0d8013fc8e6e4b_42. > java.io.IOException: Invalid sync! > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) > at > org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:334) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24606) Multi-stage materialized CTEs can lose intermediate data
[ https://issues.apache.org/jira/browse/HIVE-24606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] okumin updated HIVE-24606: -- Summary: Multi-stage materialized CTEs can lose intermediate data (was: Multi-stage materialized CTEs can lost intermediate data) > Multi-stage materialized CTEs can lose intermediate data > > > Key: HIVE-24606 > URL: https://issues.apache.org/jira/browse/HIVE-24606 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 2.3.7, 3.1.2, 4.0.0 >Reporter: okumin >Assignee: okumin >Priority: Major > > With complex multi-stage CTEs, Hive can start a latter stage before its > previous stage finishes. > That's because `SemanticAnalyzer#toRealRootTasks` can fail to resolve > dependency between multistage materialized CTEs when a non-materialized CTE > cuts in. > > [https://github.com/apache/hive/blob/425e1ff7c054f87c4db87e77d004282d529599ae/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L1414] > > For example, when submitting this query, > {code:sql} > SET hive.optimize.cte.materialize.threshold=2; > SET hive.optimize.cte.materialize.full.aggregate.only=false; > WITH x AS ( SELECT 'x' AS id ), -- not materialized > a1 AS ( SELECT 'a1' AS id ), -- materialized by a2 and the root > a2 AS ( SELECT 'a2 <- ' || id AS id FROM a1) -- materialized by the root > SELECT * FROM a1 > UNION ALL > SELECT * FROM x > UNION ALL > SELECT * FROM a2 > UNION ALL > SELECT * FROM a2; > {code} > `toRealRootTask` will traverse the CTEs in order of `a1`, `x`, and `a2`. It > means the dependency between `a1` and `a2` will be ignored and `a2` can start > without waiting for `a1`. As a result, the above query returns the following > result. > {code:java} > +-+ > | id | > +-+ > | a1 | > | x | > +-+ > {code} > For your information, I ran this test with revision = > 425e1ff7c054f87c4db87e77d004282d529599ae. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24589) Drop catalog failing with deadlock error for Oracle backend dbms.
[ https://issues.apache.org/jira/browse/HIVE-24589?focusedWorklogId=535878=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-535878 ] ASF GitHub Bot logged work on HIVE-24589: - Author: ASF GitHub Bot Created on: 14/Jan/21 05:03 Start Date: 14/Jan/21 05:03 Worklog Time Spent: 10m Work Description: maheshk114 merged pull request #1850: URL: https://github.com/apache/hive/pull/1850 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 535878) Time Spent: 20m (was: 10m) > Drop catalog failing with deadlock error for Oracle backend dbms. > - > > Key: HIVE-24589 > URL: https://issues.apache.org/jira/browse/HIVE-24589 > Project: Hive > Issue Type: Bug >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > When we do a drop catalog we drop the catalog from the CTLGS table. The DBS > table has a foreign key reference on CTLGS for CTLG_NAME. This is causing the > DBS table to be locked exclusively and causing deadlocks. This can be avoided > by creating an index in the DBS table on CTLG_NAME. > {code:java} > CREATE INDEX CTLG_NAME_DBS ON DBS(CTLG_NAME); {code} > {code:java} > Oracle Database maximizes the concurrency control of parent keys in relation > to dependent foreign keys.Locking behaviour depends on whether foreign key > columns are indexed. If foreign keys are not indexed, then the child table > will probably be locked more frequently, deadlocks will occur, and > concurrency will be decreased. For this reason foreign keys should almost > always be indexed. The only exception is when the matching unique or primary > key is never updated or deleted.{code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24386) Add builder methods for GetTablesRequest and GetPartitionsRequest to HiveMetaStoreClient
[ https://issues.apache.org/jira/browse/HIVE-24386?focusedWorklogId=535861=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-535861 ] ASF GitHub Bot logged work on HIVE-24386: - Author: ASF GitHub Bot Created on: 14/Jan/21 03:58 Start Date: 14/Jan/21 03:58 Worklog Time Spent: 10m Work Description: vnhive opened a new pull request #1694: URL: https://github.com/apache/hive/pull/1694 HIVE-24386 : Add builder methods for GetTablesRequest and GetPartitionsRequest to HiveMetaStoreClient This patch builds over the patch for HIVE-24397 and adds builder methods for the request and the projection specification classes of Tables and Partitions. The relevant unit tests have also been updated. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 535861) Time Spent: 1h 40m (was: 1.5h) > Add builder methods for GetTablesRequest and GetPartitionsRequest to > HiveMetaStoreClient > > > Key: HIVE-24386 > URL: https://issues.apache.org/jira/browse/HIVE-24386 > Project: Hive > Issue Type: Sub-task > Components: Hive >Reporter: Narayanan Venkateswaran >Assignee: Narayanan Venkateswaran >Priority: Minor > Labels: pull-request-available > Time Spent: 1h 40m > Remaining Estimate: 0h > > Builder methods for GetTablesRequest and GetPartitionsRequest should be added > to the HiveMetaStoreClient class. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24386) Add builder methods for GetTablesRequest and GetPartitionsRequest to HiveMetaStoreClient
[ https://issues.apache.org/jira/browse/HIVE-24386?focusedWorklogId=535860=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-535860 ] ASF GitHub Bot logged work on HIVE-24386: - Author: ASF GitHub Bot Created on: 14/Jan/21 03:58 Start Date: 14/Jan/21 03:58 Worklog Time Spent: 10m Work Description: vnhive commented on pull request #1694: URL: https://github.com/apache/hive/pull/1694#issuecomment-759909797 > Added some comments. Requesting changes. I have addressed all your requests. Can you please check if you are happy or you want me to change anything more ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 535860) Time Spent: 1.5h (was: 1h 20m) > Add builder methods for GetTablesRequest and GetPartitionsRequest to > HiveMetaStoreClient > > > Key: HIVE-24386 > URL: https://issues.apache.org/jira/browse/HIVE-24386 > Project: Hive > Issue Type: Sub-task > Components: Hive >Reporter: Narayanan Venkateswaran >Assignee: Narayanan Venkateswaran >Priority: Minor > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > Builder methods for GetTablesRequest and GetPartitionsRequest should be added > to the HiveMetaStoreClient class. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24386) Add builder methods for GetTablesRequest and GetPartitionsRequest to HiveMetaStoreClient
[ https://issues.apache.org/jira/browse/HIVE-24386?focusedWorklogId=535859=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-535859 ] ASF GitHub Bot logged work on HIVE-24386: - Author: ASF GitHub Bot Created on: 14/Jan/21 03:58 Start Date: 14/Jan/21 03:58 Worklog Time Spent: 10m Work Description: vnhive closed pull request #1694: URL: https://github.com/apache/hive/pull/1694 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 535859) Time Spent: 1h 20m (was: 1h 10m) > Add builder methods for GetTablesRequest and GetPartitionsRequest to > HiveMetaStoreClient > > > Key: HIVE-24386 > URL: https://issues.apache.org/jira/browse/HIVE-24386 > Project: Hive > Issue Type: Sub-task > Components: Hive >Reporter: Narayanan Venkateswaran >Assignee: Narayanan Venkateswaran >Priority: Minor > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > Builder methods for GetTablesRequest and GetPartitionsRequest should be added > to the HiveMetaStoreClient class. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-24627) Add Debug Logging to Hive JDBC Connection
[ https://issues.apache.org/jira/browse/HIVE-24627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor resolved HIVE-24627. --- Fix Version/s: 4.0.0 Resolution: Fixed Pushed to master. Thanks [~mgergely] for the review! > Add Debug Logging to Hive JDBC Connection > - > > Key: HIVE-24627 > URL: https://issues.apache.org/jira/browse/HIVE-24627 > Project: Hive > Issue Type: Improvement > Components: JDBC >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1h > Remaining Estimate: 0h > > Log the following: > # Session handle > # Version Number > # Any configurations/variables set by the user at the client-side > # Dump the Hive configurations at session-start -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24627) Add Debug Logging to Hive JDBC Connection
[ https://issues.apache.org/jira/browse/HIVE-24627?focusedWorklogId=535847=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-535847 ] ASF GitHub Bot logged work on HIVE-24627: - Author: ASF GitHub Bot Created on: 14/Jan/21 02:53 Start Date: 14/Jan/21 02:53 Worklog Time Spent: 10m Work Description: belugabehr merged pull request #1859: URL: https://github.com/apache/hive/pull/1859 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 535847) Time Spent: 1h (was: 50m) > Add Debug Logging to Hive JDBC Connection > - > > Key: HIVE-24627 > URL: https://issues.apache.org/jira/browse/HIVE-24627 > Project: Hive > Issue Type: Improvement > Components: JDBC >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > Log the following: > # Session handle > # Version Number > # Any configurations/variables set by the user at the client-side > # Dump the Hive configurations at session-start -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24075) Optimise KeyValuesInputMerger
[ https://issues.apache.org/jira/browse/HIVE-24075?focusedWorklogId=535833=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-535833 ] ASF GitHub Bot logged work on HIVE-24075: - Author: ASF GitHub Bot Created on: 14/Jan/21 01:34 Start Date: 14/Jan/21 01:34 Worklog Time Spent: 10m Work Description: github-actions[bot] closed pull request #1463: URL: https://github.com/apache/hive/pull/1463 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 535833) Time Spent: 50m (was: 40m) > Optimise KeyValuesInputMerger > - > > Key: HIVE-24075 > URL: https://issues.apache.org/jira/browse/HIVE-24075 > Project: Hive > Issue Type: Improvement >Reporter: Rajesh Balamohan >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > Comparisons in KeyValueInputMerger can be reduced. > [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/tools/KeyValuesInputMerger.java#L165|https://github.infra.cloudera.com/CDH/hive/blob/cdpd-master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/tools/KeyValuesInputMerger.java#L165] > [https://github.infra.cloudera.com/CDH/hive/blob/cdpd-master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/tools/KeyValuesInputMerger.java#L150|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/tools/KeyValuesInputMerger.java#L150] > If the reader comparisons in the queue are same, we could reuse > "{{nextKVReaders}}" in next subsequent iteration instead of doing the > comparison all over again. > [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/tools/KeyValuesInputMerger.java#L178] > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24595) Vectorization causing incorrect results for scalar subquery
[ https://issues.apache.org/jira/browse/HIVE-24595?focusedWorklogId=535805=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-535805 ] ASF GitHub Bot logged work on HIVE-24595: - Author: ASF GitHub Bot Created on: 13/Jan/21 23:41 Start Date: 13/Jan/21 23:41 Worklog Time Spent: 10m Work Description: mustafaiman opened a new pull request #1867: URL: https://github.com/apache/hive/pull/1867 Change-Id: Ia901a4b1ee6a4f34fdf13f02fcd9eaaf615cca58 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 535805) Remaining Estimate: 0h Time Spent: 10m > Vectorization causing incorrect results for scalar subquery > --- > > Key: HIVE-24595 > URL: https://issues.apache.org/jira/browse/HIVE-24595 > Project: Hive > Issue Type: Bug > Components: Vectorization >Affects Versions: 3.0.0 >Reporter: Vineet Garg >Assignee: Mustafa İman >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > *Repro* > {code:sql} > CREATE EXTERNAL TABLE `alltypessmall`( >`id` int, >`bool_col` boolean, >`tinyint_col` tinyint, >`smallint_col` smallint, >`int_col` int, >`bigint_col` bigint, >`float_col` float, >`double_col` double, >`date_string_col` string, >`string_col` string, >`timestamp_col` timestamp) > PARTITIONED BY ( >`year` int, >`month` int) > ROW FORMAT SERDE >'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' > WITH SERDEPROPERTIES ( >'escape.delim'='\\', >'field.delim'=',', >'serialization.format'=',') > STORED AS INPUTFORMAT >'org.apache.hadoop.mapred.TextInputFormat' > OUTPUTFORMAT >'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' > TBLPROPERTIES ( >'DO_NOT_UPDATE_STATS'='true', >'OBJCAPABILITIES'='EXTREAD,EXTWRITE', >'STATS_GENERATED'='TASK', >'impala.lastComputeStatsTime'='1608312793', >'transient_lastDdlTime'='1608310442'); > insert into alltypessmall partition(year=2002,month=1) values(1, true, > 3,3,4,3434,5.4,44.3,'str1','str2', '01-01-2001'); > insert into alltypessmall partition(year=2002,month=1) values(1, true, > 3,3,4,3434,5.4,44.3,'str1','str2', '01-01-2001'); > insert into alltypessmall partition(year=2002,month=1) values(1, true, > 3,3,40,3434,5.4,44.3,'str1','str2', '01-01-2001'); > {code} > Following query should fail but it succeeds > {code:sql} > SELECT id FROM alltypessmall > WHERE int_col = > (SELECT int_col >FROM alltypessmall) > ORDER BY id; > {code} > *Explain plan* > {code:java} > STAGE DEPENDENCIES: > Stage-1 is a root stage > Stage-0 depends on stages: Stage-1 > STAGE PLANS: > Stage: Stage-1 > Tez > DagId: vgarg_20210106115838_3fe73bf6-66c2-4281-92e8-fd75fd8ad400:17 > Edges: > Map 1 <- Map 3 (BROADCAST_EDGE), Reducer 4 (BROADCAST_EDGE) > Reducer 2 <- Map 1 (SIMPLE_EDGE) > Reducer 4 <- Map 3 (CUSTOM_SIMPLE_EDGE) > DagName: vgarg_20210106115838_3fe73bf6-66c2-4281-92e8-fd75fd8ad400:17 > Vertices: > Map 1 > Map Operator Tree: > TableScan > alias: alltypessmall > filterExpr: int_col is not null (type: boolean) > Statistics: Num rows: 3 Data size: 24 Basic stats: COMPLETE > Column stats: COMPLETE > Filter Operator > predicate: int_col is not null (type:
[jira] [Updated] (HIVE-24595) Vectorization causing incorrect results for scalar subquery
[ https://issues.apache.org/jira/browse/HIVE-24595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-24595: -- Labels: pull-request-available (was: ) > Vectorization causing incorrect results for scalar subquery > --- > > Key: HIVE-24595 > URL: https://issues.apache.org/jira/browse/HIVE-24595 > Project: Hive > Issue Type: Bug > Components: Vectorization >Affects Versions: 3.0.0 >Reporter: Vineet Garg >Assignee: Mustafa İman >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > *Repro* > {code:sql} > CREATE EXTERNAL TABLE `alltypessmall`( >`id` int, >`bool_col` boolean, >`tinyint_col` tinyint, >`smallint_col` smallint, >`int_col` int, >`bigint_col` bigint, >`float_col` float, >`double_col` double, >`date_string_col` string, >`string_col` string, >`timestamp_col` timestamp) > PARTITIONED BY ( >`year` int, >`month` int) > ROW FORMAT SERDE >'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' > WITH SERDEPROPERTIES ( >'escape.delim'='\\', >'field.delim'=',', >'serialization.format'=',') > STORED AS INPUTFORMAT >'org.apache.hadoop.mapred.TextInputFormat' > OUTPUTFORMAT >'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' > TBLPROPERTIES ( >'DO_NOT_UPDATE_STATS'='true', >'OBJCAPABILITIES'='EXTREAD,EXTWRITE', >'STATS_GENERATED'='TASK', >'impala.lastComputeStatsTime'='1608312793', >'transient_lastDdlTime'='1608310442'); > insert into alltypessmall partition(year=2002,month=1) values(1, true, > 3,3,4,3434,5.4,44.3,'str1','str2', '01-01-2001'); > insert into alltypessmall partition(year=2002,month=1) values(1, true, > 3,3,4,3434,5.4,44.3,'str1','str2', '01-01-2001'); > insert into alltypessmall partition(year=2002,month=1) values(1, true, > 3,3,40,3434,5.4,44.3,'str1','str2', '01-01-2001'); > {code} > Following query should fail but it succeeds > {code:sql} > SELECT id FROM alltypessmall > WHERE int_col = > (SELECT int_col >FROM alltypessmall) > ORDER BY id; > {code} > *Explain plan* > {code:java} > STAGE DEPENDENCIES: > Stage-1 is a root stage > Stage-0 depends on stages: Stage-1 > STAGE PLANS: > Stage: Stage-1 > Tez > DagId: vgarg_20210106115838_3fe73bf6-66c2-4281-92e8-fd75fd8ad400:17 > Edges: > Map 1 <- Map 3 (BROADCAST_EDGE), Reducer 4 (BROADCAST_EDGE) > Reducer 2 <- Map 1 (SIMPLE_EDGE) > Reducer 4 <- Map 3 (CUSTOM_SIMPLE_EDGE) > DagName: vgarg_20210106115838_3fe73bf6-66c2-4281-92e8-fd75fd8ad400:17 > Vertices: > Map 1 > Map Operator Tree: > TableScan > alias: alltypessmall > filterExpr: int_col is not null (type: boolean) > Statistics: Num rows: 3 Data size: 24 Basic stats: COMPLETE > Column stats: COMPLETE > Filter Operator > predicate: int_col is not null (type: boolean) > Statistics: Num rows: 3 Data size: 24 Basic stats: > COMPLETE Column stats: COMPLETE > Select Operator > expressions: id (type: int), int_col (type: int) > outputColumnNames: _col0, _col1 > Statistics: Num rows: 3 Data size: 24 Basic stats: > COMPLETE Column stats: COMPLETE > Map Join Operator > condition map: > Inner Join 0 to 1 > keys: > 0 > 1 > outputColumnNames: _col0, _col1 > input vertices: > 1 Reducer 4 > Statistics: Num rows: 3 Data size: 24 Basic stats: > COMPLETE Column stats: COMPLETE > Map Join Operator > condition map: >
[jira] [Updated] (HIVE-24634) Create table if not exists should validate whether table exists before doAuth()
[ https://issues.apache.org/jira/browse/HIVE-24634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naresh P R updated HIVE-24634: -- Description: In Hive + Ranger cluster, Create table if not exist hive-ranger would validate privileges over complete files in table location even thought table already exist. Table exist check should be validated before doAuthorization in compile. {code:java} at org.apache.hadoop.hive.common.FileUtils.isActionPermittedForFileHierarchy(FileUtils.java:452) at org.apache.ranger.authorization.hive.authorizer.RangerHiveAuthorizer.isURIAccessAllowed(RangerHiveAuthorizer.java:1428) at org.apache.ranger.authorization.hive.authorizer.RangerHiveAuthorizer.checkPrivileges(RangerHiveAuthorizer.java:291) at org.apache.hadoop.hive.ql.Driver.doAuthorizationV2(Driver.java:1337) at org.apache.hadoop.hive.ql.Driver.doAuthorization(Driver.java:1101) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:710){code} was: In Hive + Ranger cluster, Create table if not exist hive-ranger would validate privileges over complete files in table location even thought table already exist. Table exist check should be validated before doAuthorization in compile. {code:java} at org.apache.ranger.authorization.hive.authorizer.RangerHiveAuthorizer.checkPrivileges(RangerHiveAuthorizer.java:291) at org.apache.hadoop.hive.ql.Driver.doAuthorizationV2(Driver.java:1337) at org.apache.hadoop.hive.ql.Driver.doAuthorization(Driver.java:1101) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:710){code} > Create table if not exists should validate whether table exists before > doAuth() > --- > > Key: HIVE-24634 > URL: https://issues.apache.org/jira/browse/HIVE-24634 > Project: Hive > Issue Type: Bug >Reporter: Naresh P R >Priority: Major > > In Hive + Ranger cluster, Create table if not exist hive-ranger would > validate privileges over complete files in table location even thought table > already exist. > Table exist check should be validated before doAuthorization in compile. > {code:java} > at > org.apache.hadoop.hive.common.FileUtils.isActionPermittedForFileHierarchy(FileUtils.java:452) > > at > org.apache.ranger.authorization.hive.authorizer.RangerHiveAuthorizer.isURIAccessAllowed(RangerHiveAuthorizer.java:1428) > at > org.apache.ranger.authorization.hive.authorizer.RangerHiveAuthorizer.checkPrivileges(RangerHiveAuthorizer.java:291) > at org.apache.hadoop.hive.ql.Driver.doAuthorizationV2(Driver.java:1337) > at org.apache.hadoop.hive.ql.Driver.doAuthorization(Driver.java:1101) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:710){code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24634) Create table if not exists should validate whether table exists before doAuth()
[ https://issues.apache.org/jira/browse/HIVE-24634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naresh P R updated HIVE-24634: -- Description: In Hive + Ranger cluster, Create table if not exist hive-ranger would validate privileges over complete files in table location even thought table already exist. Table exist check should be validated before doAuthorization in compile. {code:java} at org.apache.ranger.authorization.hive.authorizer.RangerHiveAuthorizer.checkPrivileges(RangerHiveAuthorizer.java:291) at org.apache.hadoop.hive.ql.Driver.doAuthorizationV2(Driver.java:1337) at org.apache.hadoop.hive.ql.Driver.doAuthorization(Driver.java:1101) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:710){code} was: In Hive + Ranger cluster, Create table if not exist hive-ranger would validate privileges over complete files in table location even thought table already exist. Table exist check should be validated before doAuthorization in compile. at org.apache.ranger.authorization.hive.authorizer.RangerHiveAuthorizer.checkPrivileges(RangerHiveAuthorizer.java:291) at org.apache.hadoop.hive.ql.Driver.doAuthorizationV2(Driver.java:1337) at org.apache.hadoop.hive.ql.Driver.doAuthorization(Driver.java:1101) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:710) > Create table if not exists should validate whether table exists before > doAuth() > --- > > Key: HIVE-24634 > URL: https://issues.apache.org/jira/browse/HIVE-24634 > Project: Hive > Issue Type: Bug >Reporter: Naresh P R >Priority: Major > > In Hive + Ranger cluster, Create table if not exist hive-ranger would > validate privileges over complete files in table location even thought table > already exist. > Table exist check should be validated before doAuthorization in compile. > {code:java} > at > org.apache.ranger.authorization.hive.authorizer.RangerHiveAuthorizer.checkPrivileges(RangerHiveAuthorizer.java:291) > at org.apache.hadoop.hive.ql.Driver.doAuthorizationV2(Driver.java:1337) > at org.apache.hadoop.hive.ql.Driver.doAuthorization(Driver.java:1101) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:710){code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-24394) Enable printing explain to console at query start
[ https://issues.apache.org/jira/browse/HIVE-24394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264330#comment-17264330 ] Johan Gustavsson commented on HIVE-24394: - Thank you for reviewing and merging this [~kgyrtkirk] > Enable printing explain to console at query start > - > > Key: HIVE-24394 > URL: https://issues.apache.org/jira/browse/HIVE-24394 > Project: Hive > Issue Type: Improvement > Components: Hive, Query Processor >Affects Versions: 2.3.7, 3.1.2 >Reporter: Johan Gustavsson >Assignee: Johan Gustavsson >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Currently there is a hive.log.explain.output option that prints extended > explain to log. While this is helpful for internal investigations, it limits > the information that is available to users. So we should add options to make > this print non-extended explain to console,. for general user consumption, to > make it easier for users to debug queries and workflows without having to > resubmit queries with explain. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24523) Vectorized read path for LazySimpleSerde does not honor the SERDEPROPERTIES for timestamp
[ https://issues.apache.org/jira/browse/HIVE-24523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naresh P R updated HIVE-24523: -- Fix Version/s: 4.0.0 > Vectorized read path for LazySimpleSerde does not honor the SERDEPROPERTIES > for timestamp > - > > Key: HIVE-24523 > URL: https://issues.apache.org/jira/browse/HIVE-24523 > Project: Hive > Issue Type: Bug > Components: Vectorization >Affects Versions: 3.2.0, 4.0.0 >Reporter: Rajkumar Singh >Assignee: Naresh P R >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1h > Remaining Estimate: 0h > > Steps to repro: > {code:java} > create external table tstable(date_created timestamp) ROW FORMAT SERDE > 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ( > 'timestamp.formats'='MMddHHmmss') stored as textfile; > cat sampledata > 2020120517 > hdfs dfs -put sampledata /warehouse/tablespace/external/hive/tstable > {code} > disable fetch task conversion and run select * from tstable which produce no > results, disabling the set > hive.vectorized.use.vector.serde.deserialize=false; return the expected > output. > while parsing the string to timestamp > https://github.com/apache/hive/blob/master/serde/src/java/org/apache/hadoop/hive/serde2/lazy/fast/LazySimpleDeserializeRead.java#L812 > does not set the DateTimeFormatter which results IllegalArgumentException > while parsing the timestamp through TimestampUtils.stringToTimestamp(strValue) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24628) Decimal values are displayed as scientific notation in beeline
[ https://issues.apache.org/jira/browse/HIVE-24628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naresh P R updated HIVE-24628: -- Component/s: Beeline > Decimal values are displayed as scientific notation in beeline > -- > > Key: HIVE-24628 > URL: https://issues.apache.org/jira/browse/HIVE-24628 > Project: Hive > Issue Type: Bug > Components: Beeline >Reporter: Naresh P R >Assignee: Naresh P R >Priority: Major > Labels: pull-request-available > > As we are using BigDecimal.toString() returns scientific notation instead of > original text, which confuse customer. It should be changed to > toPlainString() at here > [https://github.com/apache/hive/blob/master/beeline/src/java/org/apache/hive/beeline/Rows.java#L165] > Repro steps: > > {code:java} > beeline> select cast(0 as decimal(20,10)); > //output > 0E-10 > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23684) Large underestimation in NDV stats when input and join cardinality ratio is big
[ https://issues.apache.org/jira/browse/HIVE-23684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264247#comment-17264247 ] Vineet Garg commented on HIVE-23684: Merged the pull request into master. > Large underestimation in NDV stats when input and join cardinality ratio is > big > --- > > Key: HIVE-23684 > URL: https://issues.apache.org/jira/browse/HIVE-23684 > Project: Hive > Issue Type: Bug >Reporter: Stamatis Zampetakis >Assignee: Vineet Garg >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > Large underestimations of NDV values may occur after a join operation since > the current logic will decrease the original NDV values proportionally. > The > [code|https://github.com/apache/hive/blob/1271d08a3c51c021fa710449f8748b8cdb12b70f/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java#L2558] > compares the number of rows of each relation before the join with the number > of rows after the join and extracts a ratio for each side. Based on this > ratio it adapts (reduces) the NDV accordingly. > Consider for instance the following query: > {code:sql} > select inv_warehouse_sk > , inv_item_sk > , stddev_samp(inv_quantity_on_hand) stdev > , avg(inv_quantity_on_hand) mean > from inventory >, date_dim > where inv_date_sk = d_date_sk > and d_year = 1999 > and d_moy = 2 > group by inv_warehouse_sk, inv_item_sk; > {code} > For the sake of the discussion, I outline below some relevant stats (from > TPCDS30tb): > T(inventory) = 1627857000 > T(date_dim) = 73049 > T(inventory JOIN date_dim[d_year=1999 AND d_moy=2]) = 24948000 > V(inventory, inv_date_sk) = 261 > V(inventory, inv_item_sk) = 42 > V(inventory, inv_warehouse_sk) = 27 > V(date_dim, inv, d_date_sk) = 73049 > For instance, in this query the join between inventory and date_dim has ~24M > rows while inventory has ~1.5B so the NDV of the columns coming from > inventory are reduced by a factor of ~100 so we end up with V(JOIN, > inv_item_sk) = ~6K while the real one is 231000. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-23684) Large underestimation in NDV stats when input and join cardinality ratio is big
[ https://issues.apache.org/jira/browse/HIVE-23684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vineet Garg resolved HIVE-23684. Fix Version/s: 4.0.0 Resolution: Fixed > Large underestimation in NDV stats when input and join cardinality ratio is > big > --- > > Key: HIVE-23684 > URL: https://issues.apache.org/jira/browse/HIVE-23684 > Project: Hive > Issue Type: Bug >Reporter: Stamatis Zampetakis >Assignee: Vineet Garg >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 50m > Remaining Estimate: 0h > > Large underestimations of NDV values may occur after a join operation since > the current logic will decrease the original NDV values proportionally. > The > [code|https://github.com/apache/hive/blob/1271d08a3c51c021fa710449f8748b8cdb12b70f/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java#L2558] > compares the number of rows of each relation before the join with the number > of rows after the join and extracts a ratio for each side. Based on this > ratio it adapts (reduces) the NDV accordingly. > Consider for instance the following query: > {code:sql} > select inv_warehouse_sk > , inv_item_sk > , stddev_samp(inv_quantity_on_hand) stdev > , avg(inv_quantity_on_hand) mean > from inventory >, date_dim > where inv_date_sk = d_date_sk > and d_year = 1999 > and d_moy = 2 > group by inv_warehouse_sk, inv_item_sk; > {code} > For the sake of the discussion, I outline below some relevant stats (from > TPCDS30tb): > T(inventory) = 1627857000 > T(date_dim) = 73049 > T(inventory JOIN date_dim[d_year=1999 AND d_moy=2]) = 24948000 > V(inventory, inv_date_sk) = 261 > V(inventory, inv_item_sk) = 42 > V(inventory, inv_warehouse_sk) = 27 > V(date_dim, inv, d_date_sk) = 73049 > For instance, in this query the join between inventory and date_dim has ~24M > rows while inventory has ~1.5B so the NDV of the columns coming from > inventory are reduced by a factor of ~100 so we end up with V(JOIN, > inv_item_sk) = ~6K while the real one is 231000. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-23684) Large underestimation in NDV stats when input and join cardinality ratio is big
[ https://issues.apache.org/jira/browse/HIVE-23684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vineet Garg reassigned HIVE-23684: -- Assignee: Vineet Garg (was: Stamatis Zampetakis) > Large underestimation in NDV stats when input and join cardinality ratio is > big > --- > > Key: HIVE-23684 > URL: https://issues.apache.org/jira/browse/HIVE-23684 > Project: Hive > Issue Type: Bug >Reporter: Stamatis Zampetakis >Assignee: Vineet Garg >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > Large underestimations of NDV values may occur after a join operation since > the current logic will decrease the original NDV values proportionally. > The > [code|https://github.com/apache/hive/blob/1271d08a3c51c021fa710449f8748b8cdb12b70f/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java#L2558] > compares the number of rows of each relation before the join with the number > of rows after the join and extracts a ratio for each side. Based on this > ratio it adapts (reduces) the NDV accordingly. > Consider for instance the following query: > {code:sql} > select inv_warehouse_sk > , inv_item_sk > , stddev_samp(inv_quantity_on_hand) stdev > , avg(inv_quantity_on_hand) mean > from inventory >, date_dim > where inv_date_sk = d_date_sk > and d_year = 1999 > and d_moy = 2 > group by inv_warehouse_sk, inv_item_sk; > {code} > For the sake of the discussion, I outline below some relevant stats (from > TPCDS30tb): > T(inventory) = 1627857000 > T(date_dim) = 73049 > T(inventory JOIN date_dim[d_year=1999 AND d_moy=2]) = 24948000 > V(inventory, inv_date_sk) = 261 > V(inventory, inv_item_sk) = 42 > V(inventory, inv_warehouse_sk) = 27 > V(date_dim, inv, d_date_sk) = 73049 > For instance, in this query the join between inventory and date_dim has ~24M > rows while inventory has ~1.5B so the NDV of the columns coming from > inventory are reduced by a factor of ~100 so we end up with V(JOIN, > inv_item_sk) = ~6K while the real one is 231000. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23684) Large underestimation in NDV stats when input and join cardinality ratio is big
[ https://issues.apache.org/jira/browse/HIVE-23684?focusedWorklogId=535539=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-535539 ] ASF GitHub Bot logged work on HIVE-23684: - Author: ASF GitHub Bot Created on: 13/Jan/21 16:46 Start Date: 13/Jan/21 16:46 Worklog Time Spent: 10m Work Description: vineetgarg02 merged pull request #1786: URL: https://github.com/apache/hive/pull/1786 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 535539) Time Spent: 50m (was: 40m) > Large underestimation in NDV stats when input and join cardinality ratio is > big > --- > > Key: HIVE-23684 > URL: https://issues.apache.org/jira/browse/HIVE-23684 > Project: Hive > Issue Type: Bug >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > Large underestimations of NDV values may occur after a join operation since > the current logic will decrease the original NDV values proportionally. > The > [code|https://github.com/apache/hive/blob/1271d08a3c51c021fa710449f8748b8cdb12b70f/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java#L2558] > compares the number of rows of each relation before the join with the number > of rows after the join and extracts a ratio for each side. Based on this > ratio it adapts (reduces) the NDV accordingly. > Consider for instance the following query: > {code:sql} > select inv_warehouse_sk > , inv_item_sk > , stddev_samp(inv_quantity_on_hand) stdev > , avg(inv_quantity_on_hand) mean > from inventory >, date_dim > where inv_date_sk = d_date_sk > and d_year = 1999 > and d_moy = 2 > group by inv_warehouse_sk, inv_item_sk; > {code} > For the sake of the discussion, I outline below some relevant stats (from > TPCDS30tb): > T(inventory) = 1627857000 > T(date_dim) = 73049 > T(inventory JOIN date_dim[d_year=1999 AND d_moy=2]) = 24948000 > V(inventory, inv_date_sk) = 261 > V(inventory, inv_item_sk) = 42 > V(inventory, inv_warehouse_sk) = 27 > V(date_dim, inv, d_date_sk) = 73049 > For instance, in this query the join between inventory and date_dim has ~24M > rows while inventory has ~1.5B so the NDV of the columns coming from > inventory are reduced by a factor of ~100 so we end up with V(JOIN, > inv_item_sk) = ~6K while the real one is 231000. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-14165) Remove Hive file listing during split computation
[ https://issues.apache.org/jira/browse/HIVE-14165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-14165: -- Labels: pull-request-available (was: ) > Remove Hive file listing during split computation > - > > Key: HIVE-14165 > URL: https://issues.apache.org/jira/browse/HIVE-14165 > Project: Hive > Issue Type: Sub-task >Affects Versions: 2.1.0 >Reporter: Abdullah Yousufi >Assignee: Peter Varga >Priority: Major > Labels: pull-request-available > Attachments: HIVE-14165.02.patch, HIVE-14165.03.patch, > HIVE-14165.04.patch, HIVE-14165.05.patch, HIVE-14165.06.patch, > HIVE-14165.07.patch, HIVE-14165.patch > > Time Spent: 10m > Remaining Estimate: 0h > > The Hive side listing in FetchOperator.java is unnecessary, since Hadoop's > FileInputFormat.java will list the files during split computation anyway to > determine their size. One way to remove this is to catch the > InvalidInputFormat exception thrown by FileInputFormat#getSplits() on the > Hive side instead of doing the file listing beforehand. > For S3 select queries on partitioned tables, this results in a 2x speedup. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-14165) Remove Hive file listing during split computation
[ https://issues.apache.org/jira/browse/HIVE-14165?focusedWorklogId=535524=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-535524 ] ASF GitHub Bot logged work on HIVE-14165: - Author: ASF GitHub Bot Created on: 13/Jan/21 16:31 Start Date: 13/Jan/21 16:31 Worklog Time Spent: 10m Work Description: pvargacl opened a new pull request #1866: URL: https://github.com/apache/hive/pull/1866 ### What changes were proposed in this pull request? Remove unnecessary file listing from Fetchoperator, rather handle FileNotFoundException, to make it more performant on s3. Rebased the original patch from Sahil Takiar. ### Why are the changes needed? Performance improvement ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Current unit tests. Manually test: deleted some directories during execution to cause FileNotFoundEx. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 535524) Remaining Estimate: 0h Time Spent: 10m > Remove Hive file listing during split computation > - > > Key: HIVE-14165 > URL: https://issues.apache.org/jira/browse/HIVE-14165 > Project: Hive > Issue Type: Sub-task >Affects Versions: 2.1.0 >Reporter: Abdullah Yousufi >Assignee: Peter Varga >Priority: Major > Attachments: HIVE-14165.02.patch, HIVE-14165.03.patch, > HIVE-14165.04.patch, HIVE-14165.05.patch, HIVE-14165.06.patch, > HIVE-14165.07.patch, HIVE-14165.patch > > Time Spent: 10m > Remaining Estimate: 0h > > The Hive side listing in FetchOperator.java is unnecessary, since Hadoop's > FileInputFormat.java will list the files during split computation anyway to > determine their size. One way to remove this is to catch the > InvalidInputFormat exception thrown by FileInputFormat#getSplits() on the > Hive side instead of doing the file listing beforehand. > For S3 select queries on partitioned tables, this results in a 2x speedup. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24633) Support CTE with column labels
[ https://issues.apache.org/jira/browse/HIVE-24633?focusedWorklogId=535468=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-535468 ] ASF GitHub Bot logged work on HIVE-24633: - Author: ASF GitHub Bot Created on: 13/Jan/21 15:23 Start Date: 13/Jan/21 15:23 Worklog Time Spent: 10m Work Description: kasakrisz opened a new pull request #1865: URL: https://github.com/apache/hive/pull/1865 ### What changes were proposed in this pull request? 1. Improve the parser to accept CTE clause with `with column list` specified: ``` WITH cte(a, b) AS ... ``` 2. When transforming subquery AST tree to Calcite RelNode tree a new RowResolver is created for the subquery's top node to point its alias. Extend this logic with assign the `with column list` elements to each entry if explicitly specified in the `WITH` clause. ### Why are the changes needed? SQL standard enables this feature. ### Does this PR introduce _any_ user-facing change? Yes. When users specify `with column list` the list elements must be used to reference expressions in the CTE' select clause from the main query. ### How was this patch tested? ``` mvn test -DskipSparkTests -Dtest=TestMiniLlapLocalCliDriver -Dqfile=cte_8.q -pl itests/qtest -Pitests mvn test -DskipSparkTests -Dtest=TestMiniLlapLocalCliDriver -Dqfile=cte_mat_1.q -pl itests/qtest -Pitests ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 535468) Remaining Estimate: 0h Time Spent: 10m > Support CTE with column labels > -- > > Key: HIVE-24633 > URL: https://issues.apache.org/jira/browse/HIVE-24633 > Project: Hive > Issue Type: Improvement > Components: Parser >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > {code} > with cte1(a, b) as (select int_col x, bigint_col y from t1) > select a, b from cte1{code} > {code} > a b > 1 2 > 3 4 > {code} > {code} > ::= > [ ] > [ ] [ ] [ ] > ::= > WITH [ RECURSIVE ] > ::= >[ { }... ] > ::= >[] > AS [ ] > ::= > > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24633) Support CTE with column labels
[ https://issues.apache.org/jira/browse/HIVE-24633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-24633: -- Labels: pull-request-available (was: ) > Support CTE with column labels > -- > > Key: HIVE-24633 > URL: https://issues.apache.org/jira/browse/HIVE-24633 > Project: Hive > Issue Type: Improvement > Components: Parser >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > {code} > with cte1(a, b) as (select int_col x, bigint_col y from t1) > select a, b from cte1{code} > {code} > a b > 1 2 > 3 4 > {code} > {code} > ::= > [ ] > [ ] [ ] [ ] > ::= > WITH [ RECURSIVE ] > ::= >[ { }... ] > ::= >[] > AS [ ] > ::= > > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24633) Support CTE with column labels
[ https://issues.apache.org/jira/browse/HIVE-24633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Kasa updated HIVE-24633: -- Description: {code} with cte1(a, b) as (select int_col x, bigint_col y from t1) select a, b from cte1{code} {code} a b 1 2 3 4 {code} {code} ::= [ ] [ ] [ ] [ ] ::= WITH [ RECURSIVE ] ::= [ { }... ] ::= [] AS [ ] ::= {code} was: {code} with cte1(a, b) as (select int_col x, bigint_col y from t1) select a, b from cte1{code} {code} a b 1 2 3 4 {code} > Support CTE with column labels > -- > > Key: HIVE-24633 > URL: https://issues.apache.org/jira/browse/HIVE-24633 > Project: Hive > Issue Type: Improvement > Components: Parser >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > > {code} > with cte1(a, b) as (select int_col x, bigint_col y from t1) > select a, b from cte1{code} > {code} > a b > 1 2 > 3 4 > {code} > {code} > ::= > [ ] > [ ] [ ] [ ] > ::= > WITH [ RECURSIVE ] > ::= >[ { }... ] > ::= >[] > AS [ ] > ::= > > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24394) Enable printing explain to console at query start
[ https://issues.apache.org/jira/browse/HIVE-24394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich updated HIVE-24394: Fix Version/s: 4.0.0 Assignee: Johan Gustavsson (was: Zoltan Haindrich) Resolution: Fixed Status: Resolved (was: Patch Available) merged into master. Thank you [~johang] for the patch! > Enable printing explain to console at query start > - > > Key: HIVE-24394 > URL: https://issues.apache.org/jira/browse/HIVE-24394 > Project: Hive > Issue Type: Improvement > Components: Hive, Query Processor >Affects Versions: 2.3.7, 3.1.2 >Reporter: Johan Gustavsson >Assignee: Johan Gustavsson >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Currently there is a hive.log.explain.output option that prints extended > explain to log. While this is helpful for internal investigations, it limits > the information that is available to users. So we should add options to make > this print non-extended explain to console,. for general user consumption, to > make it easier for users to debug queries and workflows without having to > resubmit queries with explain. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24394) Enable printing explain to console at query start
[ https://issues.apache.org/jira/browse/HIVE-24394?focusedWorklogId=535447=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-535447 ] ASF GitHub Bot logged work on HIVE-24394: - Author: ASF GitHub Bot Created on: 13/Jan/21 14:57 Start Date: 13/Jan/21 14:57 Worklog Time Spent: 10m Work Description: kgyrtkirk merged pull request #1679: URL: https://github.com/apache/hive/pull/1679 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 535447) Time Spent: 20m (was: 10m) > Enable printing explain to console at query start > - > > Key: HIVE-24394 > URL: https://issues.apache.org/jira/browse/HIVE-24394 > Project: Hive > Issue Type: Improvement > Components: Hive, Query Processor >Affects Versions: 2.3.7, 3.1.2 >Reporter: Johan Gustavsson >Assignee: Zoltan Haindrich >Priority: Minor > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Currently there is a hive.log.explain.output option that prints extended > explain to log. While this is helpful for internal investigations, it limits > the information that is available to users. So we should add options to make > this print non-extended explain to console,. for general user consumption, to > make it easier for users to debug queries and workflows without having to > resubmit queries with explain. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24627) Add Debug Logging to Hive JDBC Connection
[ https://issues.apache.org/jira/browse/HIVE-24627?focusedWorklogId=535441=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-535441 ] ASF GitHub Bot logged work on HIVE-24627: - Author: ASF GitHub Bot Created on: 13/Jan/21 14:50 Start Date: 13/Jan/21 14:50 Worklog Time Spent: 10m Work Description: belugabehr opened a new pull request #1859: URL: https://github.com/apache/hive/pull/1859 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 535441) Time Spent: 50m (was: 40m) > Add Debug Logging to Hive JDBC Connection > - > > Key: HIVE-24627 > URL: https://issues.apache.org/jira/browse/HIVE-24627 > Project: Hive > Issue Type: Improvement > Components: JDBC >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > Log the following: > # Session handle > # Version Number > # Any configurations/variables set by the user at the client-side > # Dump the Hive configurations at session-start -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24339) REPL LOAD command ignores config properties set by WITH clause
[ https://issues.apache.org/jira/browse/HIVE-24339?focusedWorklogId=535439=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-535439 ] ASF GitHub Bot logged work on HIVE-24339: - Author: ASF GitHub Bot Created on: 13/Jan/21 14:49 Start Date: 13/Jan/21 14:49 Worklog Time Spent: 10m Work Description: ayushtkn commented on pull request #1864: URL: https://github.com/apache/hive/pull/1864#issuecomment-759497451 cc. @kgyrtkirk @abstractdog This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 535439) Time Spent: 20m (was: 10m) > REPL LOAD command ignores config properties set by WITH clause > -- > > Key: HIVE-24339 > URL: https://issues.apache.org/jira/browse/HIVE-24339 > Project: Hive > Issue Type: Bug >Reporter: László Bodor >Assignee: Ayush Saxena >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > By debug messages we confirmed that REPL LOAD command ignored some config > properties when they were provided in WITH clause, e.g.: > {code} > REPL LOAD bdpp01pub FROM > 'hdfs://prdpdp01//apps/hive/repl/8237c7bd-ba26-4425-8659-3a0d32ab312c' WITH > ('mapreduce.job.queuename'='default','hive.exec.parallel'='true','hive.exec.parallel.thread.number'='128', > ... > {code} > We found that it was working on 16 threads, ignoring > 'hive.exec.parallel.thread.number'='128'. Setting this property on session > level worked. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24627) Add Debug Logging to Hive JDBC Connection
[ https://issues.apache.org/jira/browse/HIVE-24627?focusedWorklogId=535429=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-535429 ] ASF GitHub Bot logged work on HIVE-24627: - Author: ASF GitHub Bot Created on: 13/Jan/21 14:46 Start Date: 13/Jan/21 14:46 Worklog Time Spent: 10m Work Description: belugabehr closed pull request #1859: URL: https://github.com/apache/hive/pull/1859 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 535429) Time Spent: 40m (was: 0.5h) > Add Debug Logging to Hive JDBC Connection > - > > Key: HIVE-24627 > URL: https://issues.apache.org/jira/browse/HIVE-24627 > Project: Hive > Issue Type: Improvement > Components: JDBC >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > Log the following: > # Session handle > # Version Number > # Any configurations/variables set by the user at the client-side > # Dump the Hive configurations at session-start -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24613) Support Values clause without Insert
[ https://issues.apache.org/jira/browse/HIVE-24613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Kasa updated HIVE-24613: -- Component/s: Parser > Support Values clause without Insert > > > Key: HIVE-24613 > URL: https://issues.apache.org/jira/browse/HIVE-24613 > Project: Hive > Issue Type: Improvement > Components: Parser >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > Standalone: > {code} > VALUES(1,2,3),(4,5,6); > {code} > {code} > 1 2 3 > 4 5 6 > {code} > In subquery: > {code} > SELECT * FROM (VALUES(1,2,3),(4,5,6)) as FOO; > {code} > {code} > 1 2 3 > 4 5 6 > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24633) Support CTE with column labels
[ https://issues.apache.org/jira/browse/HIVE-24633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Kasa reassigned HIVE-24633: - > Support CTE with column labels > -- > > Key: HIVE-24633 > URL: https://issues.apache.org/jira/browse/HIVE-24633 > Project: Hive > Issue Type: Improvement > Components: Parser >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > > {code} > with cte1(a, b) as (select int_col x, bigint_col y from t1) > select a, b from cte1{code} > {code} > a b > 1 2 > 3 4 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-24613) Support Values clause without Insert
[ https://issues.apache.org/jira/browse/HIVE-24613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Kasa resolved HIVE-24613. --- Resolution: Fixed Pushed to master. Thanks [~jcamachorodriguez], [~kgyrtkirk] for review. > Support Values clause without Insert > > > Key: HIVE-24613 > URL: https://issues.apache.org/jira/browse/HIVE-24613 > Project: Hive > Issue Type: Improvement >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > Standalone: > {code} > VALUES(1,2,3),(4,5,6); > {code} > {code} > 1 2 3 > 4 5 6 > {code} > In subquery: > {code} > SELECT * FROM (VALUES(1,2,3),(4,5,6)) as FOO; > {code} > {code} > 1 2 3 > 4 5 6 > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24613) Support Values clause without Insert
[ https://issues.apache.org/jira/browse/HIVE-24613?focusedWorklogId=535416=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-535416 ] ASF GitHub Bot logged work on HIVE-24613: - Author: ASF GitHub Bot Created on: 13/Jan/21 14:10 Start Date: 13/Jan/21 14:10 Worklog Time Spent: 10m Work Description: kasakrisz merged pull request #1847: URL: https://github.com/apache/hive/pull/1847 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 535416) Time Spent: 50m (was: 40m) > Support Values clause without Insert > > > Key: HIVE-24613 > URL: https://issues.apache.org/jira/browse/HIVE-24613 > Project: Hive > Issue Type: Improvement >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > Standalone: > {code} > VALUES(1,2,3),(4,5,6); > {code} > {code} > 1 2 3 > 4 5 6 > {code} > In subquery: > {code} > SELECT * FROM (VALUES(1,2,3),(4,5,6)) as FOO; > {code} > {code} > 1 2 3 > 4 5 6 > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-23459) Reduce number of listPath calls in AcidUtils::getAcidState
[ https://issues.apache.org/jira/browse/HIVE-23459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Varga resolved HIVE-23459. Resolution: Duplicate > Reduce number of listPath calls in AcidUtils::getAcidState > -- > > Key: HIVE-23459 > URL: https://issues.apache.org/jira/browse/HIVE-23459 > Project: Hive > Issue Type: Improvement >Reporter: Rajesh Balamohan >Assignee: Peter Varga >Priority: Minor > Attachments: image-2020-05-13-13-57-27-270.png > > > There are atleast 3 places where listPaths is invoked for FS (highlighted in > the follow profile). > !image-2020-05-13-13-57-27-270.png|width=869,height=626! > > Dir caching works mainly for BI strategy and when there are no-delta files. > It would be good to consider reducing number of NN calls to reduce getSplits > time. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24339) REPL LOAD command ignores config properties set by WITH clause
[ https://issues.apache.org/jira/browse/HIVE-24339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-24339: -- Labels: pull-request-available (was: ) > REPL LOAD command ignores config properties set by WITH clause > -- > > Key: HIVE-24339 > URL: https://issues.apache.org/jira/browse/HIVE-24339 > Project: Hive > Issue Type: Bug >Reporter: László Bodor >Assignee: Ayush Saxena >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > By debug messages we confirmed that REPL LOAD command ignored some config > properties when they were provided in WITH clause, e.g.: > {code} > REPL LOAD bdpp01pub FROM > 'hdfs://prdpdp01//apps/hive/repl/8237c7bd-ba26-4425-8659-3a0d32ab312c' WITH > ('mapreduce.job.queuename'='default','hive.exec.parallel'='true','hive.exec.parallel.thread.number'='128', > ... > {code} > We found that it was working on 16 threads, ignoring > 'hive.exec.parallel.thread.number'='128'. Setting this property on session > level worked. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24339) REPL LOAD command ignores config properties set by WITH clause
[ https://issues.apache.org/jira/browse/HIVE-24339?focusedWorklogId=535353=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-535353 ] ASF GitHub Bot logged work on HIVE-24339: - Author: ASF GitHub Bot Created on: 13/Jan/21 12:31 Start Date: 13/Jan/21 12:31 Worklog Time Spent: 10m Work Description: ayushtkn opened a new pull request #1864: URL: https://github.com/apache/hive/pull/1864 ### What changes were proposed in this pull request? Take numThreads form root task if explicitly specified ### Why are the changes needed? For repl load/dump to specify numThreads as part of With clause ### How was this patch tested? Added ut This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 535353) Remaining Estimate: 0h Time Spent: 10m > REPL LOAD command ignores config properties set by WITH clause > -- > > Key: HIVE-24339 > URL: https://issues.apache.org/jira/browse/HIVE-24339 > Project: Hive > Issue Type: Bug >Reporter: László Bodor >Assignee: Ayush Saxena >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > By debug messages we confirmed that REPL LOAD command ignored some config > properties when they were provided in WITH clause, e.g.: > {code} > REPL LOAD bdpp01pub FROM > 'hdfs://prdpdp01//apps/hive/repl/8237c7bd-ba26-4425-8659-3a0d32ab312c' WITH > ('mapreduce.job.queuename'='default','hive.exec.parallel'='true','hive.exec.parallel.thread.number'='128', > ... > {code} > We found that it was working on 16 threads, ignoring > 'hive.exec.parallel.thread.number'='128'. Setting this property on session > level worked. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24632) Replace with null when GenericUDFBaseCompare has a non-interpretable val
[ https://issues.apache.org/jira/browse/HIVE-24632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-24632: -- Labels: pull-request-available (was: ) > Replace with null when GenericUDFBaseCompare has a non-interpretable val > > > Key: HIVE-24632 > URL: https://issues.apache.org/jira/browse/HIVE-24632 > Project: Hive > Issue Type: Improvement > Components: Parser >Affects Versions: 4.0.0 >Reporter: Zhihua Deng >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > The query > {code:java} > create table ccn_table(key int, value string); > set hive.cbo.enable=false; > select * from ccn_table where key > '123a' ; > {code} > will scan all records(partitions) compared to older version, as the plan > tells: > {noformat} > STAGE PLANS: > Stage: Stage-0 >Fetch Operator > limit: -1 > Processor Tree: >TableScan > alias: ccn_table > filterExpr: (key > '123a') (type: boolean) > Statistics: Num rows: 2 Data size: 180 Basic stats: COMPLETE Column > stats: COMPLETE > GatherStats: false > Filter Operator >isSamplingPred: false >predicate: (key > '123a') (type: boolean) >Statistics: Num rows: 1 Data size: 90 Basic stats: COMPLETE Column > stats: COMPLETE >Select Operator > expressions: key (type: int), value (type: string) > outputColumnNames: _col0, _col1 > Statistics: Num rows: 1 Data size: 90 Basic stats: COMPLETE > Column stats: COMPLETE > ListSink{noformat} > When the TypeCheckProcFactory#getXpathOrFuncExprNodeDesc validates the expr: > +key > '123a',+ the operator(>) is not an equal operator(=), so the factory > returns +key > '123a'+ as it is. However all the subclass of > GenericUDFBaseCompare(except GenericUDFOPEqualNS and GenericUDFOPNotEqualNS) > would return null if either side of the function children is null, so it's > safe to return constant null when processing the expr +`key > '123a'`+. This > will benifit some queries when the cbo is disabled. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24632) Replace with null when GenericUDFBaseCompare has a non-interpretable val
[ https://issues.apache.org/jira/browse/HIVE-24632?focusedWorklogId=535340=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-535340 ] ASF GitHub Bot logged work on HIVE-24632: - Author: ASF GitHub Bot Created on: 13/Jan/21 12:19 Start Date: 13/Jan/21 12:19 Worklog Time Spent: 10m Work Description: dengzhhu653 opened a new pull request #1863: URL: https://github.com/apache/hive/pull/1863 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? Added tests This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 535340) Remaining Estimate: 0h Time Spent: 10m > Replace with null when GenericUDFBaseCompare has a non-interpretable val > > > Key: HIVE-24632 > URL: https://issues.apache.org/jira/browse/HIVE-24632 > Project: Hive > Issue Type: Improvement > Components: Parser >Affects Versions: 4.0.0 >Reporter: Zhihua Deng >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > The query > {code:java} > create table ccn_table(key int, value string); > set hive.cbo.enable=false; > select * from ccn_table where key > '123a' ; > {code} > will scan all records(partitions) compared to older version, as the plan > tells: > {noformat} > STAGE PLANS: > Stage: Stage-0 >Fetch Operator > limit: -1 > Processor Tree: >TableScan > alias: ccn_table > filterExpr: (key > '123a') (type: boolean) > Statistics: Num rows: 2 Data size: 180 Basic stats: COMPLETE Column > stats: COMPLETE > GatherStats: false > Filter Operator >isSamplingPred: false >predicate: (key > '123a') (type: boolean) >Statistics: Num rows: 1 Data size: 90 Basic stats: COMPLETE Column > stats: COMPLETE >Select Operator > expressions: key (type: int), value (type: string) > outputColumnNames: _col0, _col1 > Statistics: Num rows: 1 Data size: 90 Basic stats: COMPLETE > Column stats: COMPLETE > ListSink{noformat} > When the TypeCheckProcFactory#getXpathOrFuncExprNodeDesc validates the expr: > +key > '123a',+ the operator(>) is not an equal operator(=), so the factory > returns +key > '123a'+ as it is. However all the subclass of > GenericUDFBaseCompare(except GenericUDFOPEqualNS and GenericUDFOPNotEqualNS) > would return null if either side of the function children is null, so it's > safe to return constant null when processing the expr +`key > '123a'`+. This > will benifit some queries when the cbo is disabled. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24611) Remove unnecessary parameter from AbstractAlterTableOperation
[ https://issues.apache.org/jira/browse/HIVE-24611?focusedWorklogId=535334=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-535334 ] ASF GitHub Bot logged work on HIVE-24611: - Author: ASF GitHub Bot Created on: 13/Jan/21 12:15 Start Date: 13/Jan/21 12:15 Worklog Time Spent: 10m Work Description: miklosgergely merged pull request #1846: URL: https://github.com/apache/hive/pull/1846 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 535334) Time Spent: 20m (was: 10m) > Remove unnecessary parameter from AbstractAlterTableOperation > - > > Key: HIVE-24611 > URL: https://issues.apache.org/jira/browse/HIVE-24611 > Project: Hive > Issue Type: Sub-task >Reporter: Miklos Gergely >Assignee: Miklos Gergely >Priority: Minor > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-24611) Remove unnecessary parameter from AbstractAlterTableOperation
[ https://issues.apache.org/jira/browse/HIVE-24611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Miklos Gergely resolved HIVE-24611. --- Resolution: Fixed Merged to master, thank you [~kkasa] > Remove unnecessary parameter from AbstractAlterTableOperation > - > > Key: HIVE-24611 > URL: https://issues.apache.org/jira/browse/HIVE-24611 > Project: Hive > Issue Type: Sub-task >Reporter: Miklos Gergely >Assignee: Miklos Gergely >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24632) Replace with null when GenericUDFBaseCompare has a non-interpretable val
[ https://issues.apache.org/jira/browse/HIVE-24632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihua Deng updated HIVE-24632: --- Description: The query {code:java} create table ccn_table(key int, value string); set hive.cbo.enable=false; select * from ccn_table where key > '123a' ; {code} will scan all records(partitions) compared to older version, as the plan tells: {noformat} STAGE PLANS: Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: TableScan alias: ccn_table filterExpr: (key > '123a') (type: boolean) Statistics: Num rows: 2 Data size: 180 Basic stats: COMPLETE Column stats: COMPLETE GatherStats: false Filter Operator isSamplingPred: false predicate: (key > '123a') (type: boolean) Statistics: Num rows: 1 Data size: 90 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: key (type: int), value (type: string) outputColumnNames: _col0, _col1 Statistics: Num rows: 1 Data size: 90 Basic stats: COMPLETE Column stats: COMPLETE ListSink{noformat} When the TypeCheckProcFactory#getXpathOrFuncExprNodeDesc validates the expr: +key > '123a',+ the operator(>) is not an equal operator(=), so the factory returns +key > '123a'+ as it is. However all the subclass of GenericUDFBaseCompare(except GenericUDFOPEqualNS and GenericUDFOPNotEqualNS) would return null if either side of the function children is null, so it's safe to return constant null when processing the expr +`key > '123a'`+. This will benifit some queries when the cbo is disabled. was: The query {code:java} create table ccn_table(key int, value string); set hive.cbo.enable=false; select * from ccn_table where key > '123a' ; {code} will scan all records(partitions) compared to older version, as the plan tells: {noformat} STAGE PLANS: Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: TableScan alias: ccn_table filterExpr: (key > '123a') (type: boolean) Statistics: Num rows: 2 Data size: 180 Basic stats: COMPLETE Column stats: COMPLETE GatherStats: false Filter Operator isSamplingPred: false predicate: (key > '123a') (type: boolean) Statistics: Num rows: 1 Data size: 90 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: key (type: int), value (type: string) outputColumnNames: _col0, _col1 Statistics: Num rows: 1 Data size: 90 Basic stats: COMPLETE Column stats: COMPLETE ListSink{noformat} When the TypeCheckProcFactory#getXpathOrFuncExprNodeDesc validates the expr: +key > '123a',+ the operator(>) is not an equal operator(=), so the factory returns +key > '123a'+ as it is. However all the subclass of GenericUDFBaseCompare(except GenericUDFOPEqualNS and GenericUDFOPNotEqualNS) would return null if either side of the function children is null, so it's safe to return constant null when processing the expr +`key > '123a'`+. This will benifit some queries when the cbo is disabled. > Replace with null when GenericUDFBaseCompare has a non-interpretable val > > > Key: HIVE-24632 > URL: https://issues.apache.org/jira/browse/HIVE-24632 > Project: Hive > Issue Type: Improvement > Components: Parser >Affects Versions: 4.0.0 >Reporter: Zhihua Deng >Priority: Major > > The query > {code:java} > create table ccn_table(key int, value string); > set hive.cbo.enable=false; > select * from ccn_table where key > '123a' ; > {code} > will scan all records(partitions) compared to older version, as the plan > tells: > {noformat} > STAGE PLANS: > Stage: Stage-0 >Fetch Operator > limit: -1 > Processor Tree: >TableScan > alias: ccn_table > filterExpr: (key > '123a') (type: boolean) > Statistics: Num rows: 2 Data size: 180 Basic stats: COMPLETE Column > stats: COMPLETE > GatherStats: false > Filter Operator >isSamplingPred: false >predicate: (key > '123a') (type: boolean) >Statistics: Num rows: 1 Data size: 90 Basic stats: COMPLETE Column > stats: COMPLETE >Select Operator > expressions: key (type: int), value (type: string) > outputColumnNames: _col0, _col1 > Statistics: Num rows: 1 Data size: 90 Basic stats: COMPLETE > Column stats: COMPLETE > ListSink{noformat} > When the TypeCheckProcFactory#getXpathOrFuncExprNodeDesc validates the expr: > +key > '123a',+ the operator(>) is not an equal operator(=), so the factory > returns +key > '123a'+
[jira] [Commented] (HIVE-24590) Operation Logging still leaks the log4j Appenders
[ https://issues.apache.org/jira/browse/HIVE-24590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264100#comment-17264100 ] Eugene Chung commented on HIVE-24590: - [~zabetak] Okay. Let me try. > Operation Logging still leaks the log4j Appenders > - > > Key: HIVE-24590 > URL: https://issues.apache.org/jira/browse/HIVE-24590 > Project: Hive > Issue Type: Bug > Components: Logging >Reporter: Eugene Chung >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Attachments: Screen Shot 2021-01-06 at 18.42.05.png, Screen Shot > 2021-01-06 at 18.42.24.png, Screen Shot 2021-01-06 at 18.42.55.png, Screen > Shot 2021-01-06 at 21.38.32.png, Screen Shot 2021-01-06 at 21.47.28.png, > Screen Shot 2021-01-08 at 21.01.40.png, add_debug_log_and_trace.patch > > Time Spent: 40m > Remaining Estimate: 0h > > I'm using Hive 3.1.2 with options below. > * hive.server2.logging.operation.enabled=true > * hive.server2.logging.operation.level=VERBOSE > * hive.async.log.enabled=false > I already know the ticket, https://issues.apache.org/jira/browse/HIVE-17128 > but HS2 still leaks log4j RandomAccessFileManager. > !Screen Shot 2021-01-06 at 18.42.05.png|width=756,height=197! > I checked the operation log file which is not closed/deleted properly. > !Screen Shot 2021-01-06 at 18.42.24.png|width=603,height=272! > Then there's the log, > {code:java} > client.TezClient: Shutting down Tez Session, sessionName= {code} > !Screen Shot 2021-01-06 at 18.42.55.png|width=1372,height=26! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-14165) Remove Hive file listing during split computation
[ https://issues.apache.org/jira/browse/HIVE-14165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Varga reassigned HIVE-14165: -- Assignee: Peter Varga > Remove Hive file listing during split computation > - > > Key: HIVE-14165 > URL: https://issues.apache.org/jira/browse/HIVE-14165 > Project: Hive > Issue Type: Sub-task >Affects Versions: 2.1.0 >Reporter: Abdullah Yousufi >Assignee: Peter Varga >Priority: Major > Attachments: HIVE-14165.02.patch, HIVE-14165.03.patch, > HIVE-14165.04.patch, HIVE-14165.05.patch, HIVE-14165.06.patch, > HIVE-14165.07.patch, HIVE-14165.patch > > > The Hive side listing in FetchOperator.java is unnecessary, since Hadoop's > FileInputFormat.java will list the files during split computation anyway to > determine their size. One way to remove this is to catch the > InvalidInputFormat exception thrown by FileInputFormat#getSplits() on the > Hive side instead of doing the file listing beforehand. > For S3 select queries on partitioned tables, this results in a 2x speedup. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24630) clean up multiple parseDelta implementation in AcidUtils
[ https://issues.apache.org/jira/browse/HIVE-24630?focusedWorklogId=535298=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-535298 ] ASF GitHub Bot logged work on HIVE-24630: - Author: ASF GitHub Bot Created on: 13/Jan/21 10:48 Start Date: 13/Jan/21 10:48 Worklog Time Spent: 10m Work Description: pvargacl opened a new pull request #1862: URL: https://github.com/apache/hive/pull/1862 ### What changes were proposed in this pull request? Remove multiple parsedDelta implementation in AcidUtils: - Remove code duplication - Use ParsedDeltaLight everywhere where rawformat is not used, because parsing that is cheaper ### Why are the changes needed? code quality ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Previous unit tests This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 535298) Remaining Estimate: 0h Time Spent: 10m > clean up multiple parseDelta implementation in AcidUtils > > > Key: HIVE-24630 > URL: https://issues.apache.org/jira/browse/HIVE-24630 > Project: Hive > Issue Type: Improvement >Reporter: Peter Varga >Assignee: Peter Varga >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > * Remove code duplication > * Use ParsedDeltaLight everywhere where rawformat is not used, because > parsing that is cheaper -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24630) clean up multiple parseDelta implementation in AcidUtils
[ https://issues.apache.org/jira/browse/HIVE-24630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-24630: -- Labels: pull-request-available (was: ) > clean up multiple parseDelta implementation in AcidUtils > > > Key: HIVE-24630 > URL: https://issues.apache.org/jira/browse/HIVE-24630 > Project: Hive > Issue Type: Improvement >Reporter: Peter Varga >Assignee: Peter Varga >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > * Remove code duplication > * Use ParsedDeltaLight everywhere where rawformat is not used, because > parsing that is cheaper -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24624) Repl Load should detect the compatible staging dir
[ https://issues.apache.org/jira/browse/HIVE-24624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pratyushotpal Madhukar updated HIVE-24624: -- Attachment: HIVE-24624.patch > Repl Load should detect the compatible staging dir > -- > > Key: HIVE-24624 > URL: https://issues.apache.org/jira/browse/HIVE-24624 > Project: Hive > Issue Type: Improvement >Reporter: Pratyushotpal Madhukar >Assignee: Pratyushotpal Madhukar >Priority: Major > Labels: pull-request-available > Attachments: HIVE-24624.patch > > Time Spent: 10m > Remaining Estimate: 0h > > Repl load in CDP when pointed to a staging dir should be able to detect > whether the staging dir has the dump structure in compatible format or not -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24630) clean up multiple parseDelta implementation in AcidUtils
[ https://issues.apache.org/jira/browse/HIVE-24630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Varga reassigned HIVE-24630: -- > clean up multiple parseDelta implementation in AcidUtils > > > Key: HIVE-24630 > URL: https://issues.apache.org/jira/browse/HIVE-24630 > Project: Hive > Issue Type: Improvement >Reporter: Peter Varga >Assignee: Peter Varga >Priority: Minor > > * Remove code duplication > * Use ParsedDeltaLight everywhere where rawformat is not used, because > parsing that is cheaper -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-24615) Remove unnecessary FileSystem listing from Initiator
[ https://issues.apache.org/jira/browse/HIVE-24615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Varga resolved HIVE-24615. Fix Version/s: 4.0.0 Resolution: Fixed > Remove unnecessary FileSystem listing from Initiator > - > > Key: HIVE-24615 > URL: https://issues.apache.org/jira/browse/HIVE-24615 > Project: Hive > Issue Type: Improvement >Reporter: Peter Varga >Assignee: Peter Varga >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 40m > Remaining Estimate: 0h > > AcidUtils already returns the file list in base and delta directories if it > does recursive listing on S3, listing those directories can be removed from > the Initiator -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24615) Remove unnecessary FileSystem listing from Initiator
[ https://issues.apache.org/jira/browse/HIVE-24615?focusedWorklogId=535263=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-535263 ] ASF GitHub Bot logged work on HIVE-24615: - Author: ASF GitHub Bot Created on: 13/Jan/21 09:25 Start Date: 13/Jan/21 09:25 Worklog Time Spent: 10m Work Description: lcspinter merged pull request #1848: URL: https://github.com/apache/hive/pull/1848 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 535263) Time Spent: 0.5h (was: 20m) > Remove unnecessary FileSystem listing from Initiator > - > > Key: HIVE-24615 > URL: https://issues.apache.org/jira/browse/HIVE-24615 > Project: Hive > Issue Type: Improvement >Reporter: Peter Varga >Assignee: Peter Varga >Priority: Minor > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > AcidUtils already returns the file list in base and delta directories if it > does recursive listing on S3, listing those directories can be removed from > the Initiator -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24615) Remove unnecessary FileSystem listing from Initiator
[ https://issues.apache.org/jira/browse/HIVE-24615?focusedWorklogId=535264=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-535264 ] ASF GitHub Bot logged work on HIVE-24615: - Author: ASF GitHub Bot Created on: 13/Jan/21 09:25 Start Date: 13/Jan/21 09:25 Worklog Time Spent: 10m Work Description: lcspinter commented on pull request #1848: URL: https://github.com/apache/hive/pull/1848#issuecomment-759321154 Thanks for the patch @pvargacl! Merged it into master. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 535264) Time Spent: 40m (was: 0.5h) > Remove unnecessary FileSystem listing from Initiator > - > > Key: HIVE-24615 > URL: https://issues.apache.org/jira/browse/HIVE-24615 > Project: Hive > Issue Type: Improvement >Reporter: Peter Varga >Assignee: Peter Varga >Priority: Minor > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > AcidUtils already returns the file list in base and delta directories if it > does recursive listing on S3, listing those directories can be removed from > the Initiator -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24339) REPL LOAD command ignores config properties set by WITH clause
[ https://issues.apache.org/jira/browse/HIVE-24339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ayush Saxena reassigned HIVE-24339: --- Assignee: Ayush Saxena > REPL LOAD command ignores config properties set by WITH clause > -- > > Key: HIVE-24339 > URL: https://issues.apache.org/jira/browse/HIVE-24339 > Project: Hive > Issue Type: Bug >Reporter: László Bodor >Assignee: Ayush Saxena >Priority: Major > > By debug messages we confirmed that REPL LOAD command ignored some config > properties when they were provided in WITH clause, e.g.: > {code} > REPL LOAD bdpp01pub FROM > 'hdfs://prdpdp01//apps/hive/repl/8237c7bd-ba26-4425-8659-3a0d32ab312c' WITH > ('mapreduce.job.queuename'='default','hive.exec.parallel'='true','hive.exec.parallel.thread.number'='128', > ... > {code} > We found that it was working on 16 threads, ignoring > 'hive.exec.parallel.thread.number'='128'. Setting this property on session > level worked. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-24584) IndexOutOfBoundsException from Kryo when running msck repair
[ https://issues.apache.org/jira/browse/HIVE-24584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17263996#comment-17263996 ] Attila Magyar commented on HIVE-24584: -- Hi [~srahman], Thanks for the input. My understanding is that PartitionExpressionForMetastore is the default value of "metastore.expression.proxy" (In HiveConf.java/MetaStoreConf.java). Msck attempts to override this by creating a HiveMetaStoreClient with a modified config object. However unless HS2 and HMS are running inside the same process (or Msck is called within HMS via the periodically running PartitionManagementTask) this doesn't work. In case of a remote HMS, Msck should have called msc.setMetaConf() or something that modifies the config via thrift. > IndexOutOfBoundsException from Kryo when running msck repair > > > Key: HIVE-24584 > URL: https://issues.apache.org/jira/browse/HIVE-24584 > Project: Hive > Issue Type: Bug >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > The following exception is coming when running "msck repair table t1 sync > partitions". > {code:java} > java.lang.IndexOutOfBoundsException: Index: 97, Size: 0 > at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_232] > at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_232] > at > org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60) > ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] > at > org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:834) > ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] > at > org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:684) > ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211) > ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:814) > ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775) > ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:116) > [hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.filterPartitionsByExpr(PartitionExpressionForMetastore.java:88) > [hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24515) Analyze table job can be skipped when stats populated are already accurate
[ https://issues.apache.org/jira/browse/HIVE-24515?focusedWorklogId=535257=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-535257 ] ASF GitHub Bot logged work on HIVE-24515: - Author: ASF GitHub Bot Created on: 13/Jan/21 09:03 Start Date: 13/Jan/21 09:03 Worklog Time Spent: 10m Work Description: deniskuzZ commented on a change in pull request #1834: URL: https://github.com/apache/hive/pull/1834#discussion_r556362110 ## File path: ql/src/java/org/apache/hadoop/hive/ql/stats/ColStatsProcessor.java ## @@ -204,6 +206,54 @@ public int persistColumnStats(Hive db, Table tbl) throws HiveException, MetaExce public void setDpPartSpecs(Collection dpPartSpecs) { } + public static boolean canSkipStatsGeneration(String dbName, String tblName, String partName, + long statsWriteId, String queryValidWriteIdList) { +if (queryValidWriteIdList != null) { // Can be null if its not an ACID table. + ValidWriteIdList validWriteIdList = new ValidReaderWriteIdList(queryValidWriteIdList); + // Just check if the write ID is valid. If it's valid (i.e. we are allowed to see it), + // that means it cannot possibly be a concurrent write. As stats optimization is enabled + // only in case auto gather is enabled. Thus the stats must be updated by a valid committed + // transaction and stats generation can be skipped. + if (validWriteIdList.isWriteIdValid(statsWriteId)) { +try { + IMetaStoreClient msc = Hive.get().getMSC(); + TxnState state = msc.findStatStatusByWriteId(dbName, tblName, partName, statsWriteId); Review comment: can't we just check here if there a newer commited writeId for table/partition and if yes - stats recompute is needed? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 535257) Time Spent: 1h (was: 50m) > Analyze table job can be skipped when stats populated are already accurate > -- > > Key: HIVE-24515 > URL: https://issues.apache.org/jira/browse/HIVE-24515 > Project: Hive > Issue Type: Improvement >Reporter: Rajesh Balamohan >Assignee: mahesh kumar behera >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > For non-partitioned tables, stats detail should be present in table level, > e.g > {noformat} > COLUMN_STATS_ACCURATE={"BASIC_STATS":"true","COLUMN_STATS":{"d_current_day":"true"... > }} > {noformat} > For partitioned tables, stats detail should be present in partition level, > {noformat} > store_sales(ss_sold_date_sk=2451819) > {totalSize=0, numRows=0, rawDataSize=0, > COLUMN_STATS_ACCURATE={"BASIC_STATS":"true","COLUMN_STATS":{"ss_addr_sk":"true"}} > > {noformat} > When stats populated are already accurate, {{analyze table tn compute > statistics for columns}} should skip launching the job. > > For ACID tables, stats are auto computed and it can skip computing stats > again when stats are accurate. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24623) Wrong FS error during dump for table-level replication when staging is remote.
[ https://issues.apache.org/jira/browse/HIVE-24623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arko Sharma updated HIVE-24623: --- Attachment: HIVE-24623.01.patch > Wrong FS error during dump for table-level replication when staging is remote. > -- > > Key: HIVE-24623 > URL: https://issues.apache.org/jira/browse/HIVE-24623 > Project: Hive > Issue Type: Bug >Reporter: Arko Sharma >Assignee: Arko Sharma >Priority: Major > Labels: pull-request-available > Attachments: HIVE-24623.01.patch > > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24597) Replication with timestamp type partition failing in HA case with same NS
[ https://issues.apache.org/jira/browse/HIVE-24597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arko Sharma updated HIVE-24597: --- Attachment: HIVE-24597.01.patch > Replication with timestamp type partition failing in HA case with same NS > - > > Key: HIVE-24597 > URL: https://issues.apache.org/jira/browse/HIVE-24597 > Project: Hive > Issue Type: Bug >Reporter: Arko Sharma >Assignee: Arko Sharma >Priority: Major > Labels: pull-request-available > Attachments: HIVE-24597.01.patch > > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-16352) Ability to skip or repair out of sync blocks with HIVE at runtime
[ https://issues.apache.org/jira/browse/HIVE-16352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17263987#comment-17263987 ] Zoltan Haindrich commented on HIVE-16352: - Is it really not avoidable to write things correctly Are these incorrect files created when the writer is being shut down incorrectly - or hive is possibly reading an incomplete file? Although I think the best would be to have a writer which could give better consistency guarantees - I'm not against this change: because it's small and is off by default. Any strong opinion against merging it? > Ability to skip or repair out of sync blocks with HIVE at runtime > - > > Key: HIVE-16352 > URL: https://issues.apache.org/jira/browse/HIVE-16352 > Project: Hive > Issue Type: New Feature > Components: Avro, File Formats, Reader >Affects Versions: 3.1.2 >Reporter: Navdeep Poonia >Assignee: gabrywu >Priority: Major > Labels: pull-request-available > Time Spent: 1h 50m > Remaining Estimate: 0h > > When a file is corrupted it raises the error java.io.IOException: Invalid > sync! with hive. > Can we have some functionality to skip or repair such blocks at runtime to > make avro more error resilient in case of data corruption. > Error: java.io.IOException: java.io.IOException: java.io.IOException: While > processing file > s3n:///navdeepp/warehouse/avro_test/354dc34474404f4bbc0d8013fc8e6e4b_42. > java.io.IOException: Invalid sync! > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) > at > org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:334) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HIVE-24584) IndexOutOfBoundsException from Kryo when running msck repair
[ https://issues.apache.org/jira/browse/HIVE-24584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17263983#comment-17263983 ] Syed Shameerur Rahman edited comment on HIVE-24584 at 1/13/21, 8:43 AM: [~amagyar] - As per my understanding all msck command flow are defaulted to use MsckPartitionExpressionProxy unless EXPRESSION_PROXY_CLASS is explicitly given. So is there any reason for explicitly setting EXPRESSION_PROXY_CLASS or am i missing anything? Because the above mentioned problem doesn't arise if you use the default path. {code:java} public static Configuration getMsckConf(Configuration conf) { // the only reason we are using new conf here is to override EXPRESSION_PROXY_CLASS Configuration metastoreConf = MetastoreConf.newMetastoreConf(new Configuration(conf)); metastoreConf.set(MetastoreConf.ConfVars.EXPRESSION_PROXY_CLASS.getVarname(), metastoreConf.get(MetastoreConf.ConfVars.EXPRESSION_PROXY_CLASS.getVarname(), MsckPartitionExpressionProxy.class.getCanonicalName())); return metastoreConf; } {code} was (Author: srahman): [~amagyar] - As per my understanding all msck command flow are defaulted to use MsckPartitionExpressionProxy unless EXPRESSION_PROXY_CLASS is explicitly given. So is there any reason for explicitly setting EXPRESSION_PROXY_CLASS or am i missing anything? Because the above mentioned problem doesn't arise if you use the default path. > IndexOutOfBoundsException from Kryo when running msck repair > > > Key: HIVE-24584 > URL: https://issues.apache.org/jira/browse/HIVE-24584 > Project: Hive > Issue Type: Bug >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > The following exception is coming when running "msck repair table t1 sync > partitions". > {code:java} > java.lang.IndexOutOfBoundsException: Index: 97, Size: 0 > at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_232] > at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_232] > at > org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60) > ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] > at > org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:834) > ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] > at > org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:684) > ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211) > ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:814) > ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775) > ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:116) > [hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.filterPartitionsByExpr(PartitionExpressionForMetastore.java:88) > [hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-24523) Vectorized read path for LazySimpleSerde does not honor the SERDEPROPERTIES for timestamp
[ https://issues.apache.org/jira/browse/HIVE-24523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17263984#comment-17263984 ] Denys Kuzmenko commented on HIVE-24523: --- Merged to master. Thank you for the patch, [~nareshpr]! > Vectorized read path for LazySimpleSerde does not honor the SERDEPROPERTIES > for timestamp > - > > Key: HIVE-24523 > URL: https://issues.apache.org/jira/browse/HIVE-24523 > Project: Hive > Issue Type: Bug > Components: Vectorization >Affects Versions: 3.2.0, 4.0.0 >Reporter: Rajkumar Singh >Assignee: Naresh P R >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > Steps to repro: > {code:java} > create external table tstable(date_created timestamp) ROW FORMAT SERDE > 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ( > 'timestamp.formats'='MMddHHmmss') stored as textfile; > cat sampledata > 2020120517 > hdfs dfs -put sampledata /warehouse/tablespace/external/hive/tstable > {code} > disable fetch task conversion and run select * from tstable which produce no > results, disabling the set > hive.vectorized.use.vector.serde.deserialize=false; return the expected > output. > while parsing the string to timestamp > https://github.com/apache/hive/blob/master/serde/src/java/org/apache/hadoop/hive/serde2/lazy/fast/LazySimpleDeserializeRead.java#L812 > does not set the DateTimeFormatter which results IllegalArgumentException > while parsing the timestamp through TimestampUtils.stringToTimestamp(strValue) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-24523) Vectorized read path for LazySimpleSerde does not honor the SERDEPROPERTIES for timestamp
[ https://issues.apache.org/jira/browse/HIVE-24523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Denys Kuzmenko resolved HIVE-24523. --- Resolution: Fixed > Vectorized read path for LazySimpleSerde does not honor the SERDEPROPERTIES > for timestamp > - > > Key: HIVE-24523 > URL: https://issues.apache.org/jira/browse/HIVE-24523 > Project: Hive > Issue Type: Bug > Components: Vectorization >Affects Versions: 3.2.0, 4.0.0 >Reporter: Rajkumar Singh >Assignee: Naresh P R >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > Steps to repro: > {code:java} > create external table tstable(date_created timestamp) ROW FORMAT SERDE > 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ( > 'timestamp.formats'='MMddHHmmss') stored as textfile; > cat sampledata > 2020120517 > hdfs dfs -put sampledata /warehouse/tablespace/external/hive/tstable > {code} > disable fetch task conversion and run select * from tstable which produce no > results, disabling the set > hive.vectorized.use.vector.serde.deserialize=false; return the expected > output. > while parsing the string to timestamp > https://github.com/apache/hive/blob/master/serde/src/java/org/apache/hadoop/hive/serde2/lazy/fast/LazySimpleDeserializeRead.java#L812 > does not set the DateTimeFormatter which results IllegalArgumentException > while parsing the timestamp through TimestampUtils.stringToTimestamp(strValue) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-24584) IndexOutOfBoundsException from Kryo when running msck repair
[ https://issues.apache.org/jira/browse/HIVE-24584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17263983#comment-17263983 ] Syed Shameerur Rahman commented on HIVE-24584: -- [~amagyar] - As per my understanding all msck command flow are defaulted to use MsckPartitionExpressionProxy unless EXPRESSION_PROXY_CLASS is explicitly given. So is there any reason for explicitly setting EXPRESSION_PROXY_CLASS or am i missing anything? Because the above mentioned problem doesn't arise if you use the default path. > IndexOutOfBoundsException from Kryo when running msck repair > > > Key: HIVE-24584 > URL: https://issues.apache.org/jira/browse/HIVE-24584 > Project: Hive > Issue Type: Bug >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > The following exception is coming when running "msck repair table t1 sync > partitions". > {code:java} > java.lang.IndexOutOfBoundsException: Index: 97, Size: 0 > at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_232] > at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_232] > at > org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60) > ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] > at > org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:834) > ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] > at > org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:684) > ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211) > ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:814) > ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775) > ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:116) > [hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.filterPartitionsByExpr(PartitionExpressionForMetastore.java:88) > [hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24629) Invoke optional output committer in TezProcessor
[ https://issues.apache.org/jira/browse/HIVE-24629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marton Bod reassigned HIVE-24629: - > Invoke optional output committer in TezProcessor > > > Key: HIVE-24629 > URL: https://issues.apache.org/jira/browse/HIVE-24629 > Project: Hive > Issue Type: Improvement >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > > In order to enable Hive to write to Iceberg tables, we need to use an output > committer which will fire at the end of each Tez task execution (commitTask) > and the after the execution of each vertex (commitOutput/commitJob). This > output committer will issue a commit containing the written-out data files to > the Iceberg table, replacing its previous snapshot pointer with a new one. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24523) Vectorized read path for LazySimpleSerde does not honor the SERDEPROPERTIES for timestamp
[ https://issues.apache.org/jira/browse/HIVE-24523?focusedWorklogId=535252=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-535252 ] ASF GitHub Bot logged work on HIVE-24523: - Author: ASF GitHub Bot Created on: 13/Jan/21 08:33 Start Date: 13/Jan/21 08:33 Worklog Time Spent: 10m Work Description: deniskuzZ merged pull request #1825: URL: https://github.com/apache/hive/pull/1825 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 535252) Time Spent: 1h (was: 50m) > Vectorized read path for LazySimpleSerde does not honor the SERDEPROPERTIES > for timestamp > - > > Key: HIVE-24523 > URL: https://issues.apache.org/jira/browse/HIVE-24523 > Project: Hive > Issue Type: Bug > Components: Vectorization >Affects Versions: 3.2.0, 4.0.0 >Reporter: Rajkumar Singh >Assignee: Naresh P R >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > Steps to repro: > {code:java} > create external table tstable(date_created timestamp) ROW FORMAT SERDE > 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ( > 'timestamp.formats'='MMddHHmmss') stored as textfile; > cat sampledata > 2020120517 > hdfs dfs -put sampledata /warehouse/tablespace/external/hive/tstable > {code} > disable fetch task conversion and run select * from tstable which produce no > results, disabling the set > hive.vectorized.use.vector.serde.deserialize=false; return the expected > output. > while parsing the string to timestamp > https://github.com/apache/hive/blob/master/serde/src/java/org/apache/hadoop/hive/serde2/lazy/fast/LazySimpleDeserializeRead.java#L812 > does not set the DateTimeFormatter which results IllegalArgumentException > while parsing the timestamp through TimestampUtils.stringToTimestamp(strValue) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-24203) Implement stats annotation rule for the LateralViewJoinOperator
[ https://issues.apache.org/jira/browse/HIVE-24203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17263969#comment-17263969 ] Zoltan Haindrich commented on HIVE-24203: - merged into master. Thank you [~okumin]! > Implement stats annotation rule for the LateralViewJoinOperator > --- > > Key: HIVE-24203 > URL: https://issues.apache.org/jira/browse/HIVE-24203 > Project: Hive > Issue Type: Improvement > Components: Physical Optimizer >Affects Versions: 2.3.7, 3.1.2, 4.0.0 >Reporter: okumin >Assignee: okumin >Priority: Major > Labels: pull-request-available > Time Spent: 3h 40m > Remaining Estimate: 0h > > StatsRulesProcFactory doesn't have any rules to handle a JOIN by LATERAL VIEW. > This can cause an underestimation in case that UDTF in LATERAL VIEW generates > multiple rows. > HIVE-20262 has already added the rule for UDTF. > This issue would add the rule for LateralViewJoinOperator. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-24203) Implement stats annotation rule for the LateralViewJoinOperator
[ https://issues.apache.org/jira/browse/HIVE-24203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich resolved HIVE-24203. - Fix Version/s: 4.0.0 Resolution: Fixed > Implement stats annotation rule for the LateralViewJoinOperator > --- > > Key: HIVE-24203 > URL: https://issues.apache.org/jira/browse/HIVE-24203 > Project: Hive > Issue Type: Improvement > Components: Physical Optimizer >Affects Versions: 2.3.7, 3.1.2, 4.0.0 >Reporter: okumin >Assignee: okumin >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 3h 40m > Remaining Estimate: 0h > > StatsRulesProcFactory doesn't have any rules to handle a JOIN by LATERAL VIEW. > This can cause an underestimation in case that UDTF in LATERAL VIEW generates > multiple rows. > HIVE-20262 has already added the rule for UDTF. > This issue would add the rule for LateralViewJoinOperator. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24203) Implement stats annotation rule for the LateralViewJoinOperator
[ https://issues.apache.org/jira/browse/HIVE-24203?focusedWorklogId=535241=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-535241 ] ASF GitHub Bot logged work on HIVE-24203: - Author: ASF GitHub Bot Created on: 13/Jan/21 08:06 Start Date: 13/Jan/21 08:06 Worklog Time Spent: 10m Work Description: kgyrtkirk merged pull request #1531: URL: https://github.com/apache/hive/pull/1531 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 535241) Time Spent: 3h 40m (was: 3.5h) > Implement stats annotation rule for the LateralViewJoinOperator > --- > > Key: HIVE-24203 > URL: https://issues.apache.org/jira/browse/HIVE-24203 > Project: Hive > Issue Type: Improvement > Components: Physical Optimizer >Affects Versions: 2.3.7, 3.1.2, 4.0.0 >Reporter: okumin >Assignee: okumin >Priority: Major > Labels: pull-request-available > Time Spent: 3h 40m > Remaining Estimate: 0h > > StatsRulesProcFactory doesn't have any rules to handle a JOIN by LATERAL VIEW. > This can cause an underestimation in case that UDTF in LATERAL VIEW generates > multiple rows. > HIVE-20262 has already added the rule for UDTF. > This issue would add the rule for LateralViewJoinOperator. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24278) Implement an UDF for throwing exception in arbitrary vertex
[ https://issues.apache.org/jira/browse/HIVE-24278?focusedWorklogId=535239=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-535239 ] ASF GitHub Bot logged work on HIVE-24278: - Author: ASF GitHub Bot Created on: 13/Jan/21 08:02 Start Date: 13/Jan/21 08:02 Worklog Time Spent: 10m Work Description: kgyrtkirk commented on a change in pull request #1817: URL: https://github.com/apache/hive/pull/1817#discussion_r556326695 ## File path: ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFExceptionInVertex.java ## @@ -0,0 +1,156 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.ql.udf.generic; + +import java.util.Arrays; +import java.util.List; +import java.util.stream.Collectors; + +import org.apache.hadoop.hive.ql.exec.Description; +import org.apache.hadoop.hive.ql.exec.MapredContext; +import org.apache.hadoop.hive.ql.exec.UDFArgumentException; +import org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException; +import org.apache.hadoop.hive.ql.exec.tez.TezProcessor; +import org.apache.hadoop.hive.ql.metadata.HiveException; +import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector; +import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory; +import org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableConstantIntObjectInspector; +import org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableConstantStringObjectInspector; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * This class implements the UDF which can throw an exception in arbitrary vertex (typically mapper) + * / task / task attempt. For throwing exception in reducer side, where most probably + * GroupByOperator codepath applies, GenericUDAFExceptionInVertex is used. + */ +@Description(name = "exception_in_vertex_udf", value = "_FUNC_(vertexName, taskNumberExpression, taskAttemptNumberExpression)") +public class GenericUDFExceptionInVertex extends GenericUDF { + private static final Logger LOG = LoggerFactory.getLogger(GenericUDFExceptionInVertex.class); + + private String vertexName; + private String taskNumberExpr; + private String taskAttemptNumberExpr; + private String currentVertexName; + private int currentTaskNumber; + private int currentTaskAttemptNumber; + private boolean alreadyCheckedAndPassed; + + @Override + public ObjectInspector initialize(ObjectInspector[] parameters) throws UDFArgumentException { +if (parameters.length < 2) { + throw new UDFArgumentTypeException(-1, + "At least two argument is expected (fake column ref, vertex name)"); +} + +this.vertexName = getVertexName(parameters, 1); +this.taskNumberExpr = getTaskNumber(parameters, 2); +this.taskAttemptNumberExpr = getTaskAttemptNumber(parameters, 3); Review comment: I know it will be mostly just us using this - but it would be helpfull to document the accepted format (and probably throw an exception if something else is passed) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 535239) Time Spent: 20m (was: 10m) > Implement an UDF for throwing exception in arbitrary vertex > --- > > Key: HIVE-24278 > URL: https://issues.apache.org/jira/browse/HIVE-24278 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > For testing purposes sometimes we need to make the query fail in a vertex, so > assuming that we already know the plan, it could be something like: >