[jira] [Commented] (HIVE-16352) Ability to skip or repair out of sync blocks with HIVE at runtime

2021-01-13 Thread gabrywu (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-16352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264647#comment-17264647
 ] 

gabrywu commented on HIVE-16352:


[~kgyrtkirk] Yes, it's small and very useful.

> Ability to skip or repair out of sync blocks with HIVE at runtime
> -
>
> Key: HIVE-16352
> URL: https://issues.apache.org/jira/browse/HIVE-16352
> Project: Hive
>  Issue Type: New Feature
>  Components: Avro, File Formats, Reader
>Affects Versions: 3.1.2
>Reporter: Navdeep Poonia
>Assignee: gabrywu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> When a file is corrupted it raises the error java.io.IOException: Invalid 
> sync! with hive.
>  Can we have some functionality to skip or repair such blocks at runtime to 
> make avro more error resilient in case of data corruption.
>  Error: java.io.IOException: java.io.IOException: java.io.IOException: While 
> processing file 
> s3n:///navdeepp/warehouse/avro_test/354dc34474404f4bbc0d8013fc8e6e4b_42.
>  java.io.IOException: Invalid sync!
>  at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
>  at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
>  at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:334)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24606) Multi-stage materialized CTEs can lose intermediate data

2021-01-13 Thread okumin (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

okumin updated HIVE-24606:
--
Summary: Multi-stage materialized CTEs can lose intermediate data  (was: 
Multi-stage materialized CTEs can lost intermediate data)

> Multi-stage materialized CTEs can lose intermediate data
> 
>
> Key: HIVE-24606
> URL: https://issues.apache.org/jira/browse/HIVE-24606
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 2.3.7, 3.1.2, 4.0.0
>Reporter: okumin
>Assignee: okumin
>Priority: Major
>
> With complex multi-stage CTEs, Hive can start a latter stage before its 
> previous stage finishes.
>  That's because `SemanticAnalyzer#toRealRootTasks` can fail to resolve 
> dependency between multistage materialized CTEs when a non-materialized CTE 
> cuts in.
>  
> [https://github.com/apache/hive/blob/425e1ff7c054f87c4db87e77d004282d529599ae/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L1414]
>  
> For example, when submitting this query,
> {code:sql}
> SET hive.optimize.cte.materialize.threshold=2;
> SET hive.optimize.cte.materialize.full.aggregate.only=false;
> WITH x AS ( SELECT 'x' AS id ), -- not materialized
> a1 AS ( SELECT 'a1' AS id ), -- materialized by a2 and the root
> a2 AS ( SELECT 'a2 <- ' || id AS id FROM a1) -- materialized by the root
> SELECT * FROM a1
> UNION ALL
> SELECT * FROM x
> UNION ALL
> SELECT * FROM a2
> UNION ALL
> SELECT * FROM a2;
> {code}
> `toRealRootTask` will traverse the CTEs in order of `a1`, `x`, and `a2`. It 
> means the dependency between `a1` and `a2` will be ignored and `a2` can start 
> without waiting for `a1`. As a result, the above query returns the following 
> result.
> {code:java}
> +-+
> | id  |
> +-+
> | a1  |
> | x   |
> +-+
> {code}
> For your information, I ran this test with revision = 
> 425e1ff7c054f87c4db87e77d004282d529599ae.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24589) Drop catalog failing with deadlock error for Oracle backend dbms.

2021-01-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24589?focusedWorklogId=535878=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-535878
 ]

ASF GitHub Bot logged work on HIVE-24589:
-

Author: ASF GitHub Bot
Created on: 14/Jan/21 05:03
Start Date: 14/Jan/21 05:03
Worklog Time Spent: 10m 
  Work Description: maheshk114 merged pull request #1850:
URL: https://github.com/apache/hive/pull/1850


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 535878)
Time Spent: 20m  (was: 10m)

> Drop catalog failing with deadlock error for Oracle backend dbms.
> -
>
> Key: HIVE-24589
> URL: https://issues.apache.org/jira/browse/HIVE-24589
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When we do a drop catalog we drop the catalog from the CTLGS table. The DBS 
> table has a foreign key reference on CTLGS for CTLG_NAME. This is causing the 
> DBS table to be locked exclusively and causing deadlocks. This can be avoided 
> by creating an index in the DBS table on CTLG_NAME.
> {code:java}
> CREATE INDEX CTLG_NAME_DBS ON DBS(CTLG_NAME); {code}
> {code:java}
>  Oracle Database maximizes the concurrency control of parent keys in relation 
> to dependent foreign keys.Locking behaviour depends on whether foreign key 
> columns are indexed. If foreign keys are not indexed, then the child table 
> will probably be locked more frequently, deadlocks will occur, and 
> concurrency will be decreased. For this reason foreign keys should almost 
> always be indexed. The only exception is when the matching unique or primary 
> key is never updated or deleted.{code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24386) Add builder methods for GetTablesRequest and GetPartitionsRequest to HiveMetaStoreClient

2021-01-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24386?focusedWorklogId=535861=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-535861
 ]

ASF GitHub Bot logged work on HIVE-24386:
-

Author: ASF GitHub Bot
Created on: 14/Jan/21 03:58
Start Date: 14/Jan/21 03:58
Worklog Time Spent: 10m 
  Work Description: vnhive opened a new pull request #1694:
URL: https://github.com/apache/hive/pull/1694


   HIVE-24386 : Add builder methods for GetTablesRequest and 
GetPartitionsRequest to HiveMetaStoreClient
   
   This patch builds over the patch for HIVE-24397 and adds builder methods for 
the request and the projection
   specification classes of Tables and Partitions. The relevant unit tests have 
also been updated.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 535861)
Time Spent: 1h 40m  (was: 1.5h)

> Add builder methods for GetTablesRequest and GetPartitionsRequest to 
> HiveMetaStoreClient
> 
>
> Key: HIVE-24386
> URL: https://issues.apache.org/jira/browse/HIVE-24386
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Narayanan Venkateswaran
>Assignee: Narayanan Venkateswaran
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Builder methods for GetTablesRequest and GetPartitionsRequest should be added 
> to the HiveMetaStoreClient class.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24386) Add builder methods for GetTablesRequest and GetPartitionsRequest to HiveMetaStoreClient

2021-01-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24386?focusedWorklogId=535860=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-535860
 ]

ASF GitHub Bot logged work on HIVE-24386:
-

Author: ASF GitHub Bot
Created on: 14/Jan/21 03:58
Start Date: 14/Jan/21 03:58
Worklog Time Spent: 10m 
  Work Description: vnhive commented on pull request #1694:
URL: https://github.com/apache/hive/pull/1694#issuecomment-759909797


   > Added some comments. Requesting changes.
   
   I have addressed all your requests. Can you please check if you are happy or 
you want me to change anything more ?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 535860)
Time Spent: 1.5h  (was: 1h 20m)

> Add builder methods for GetTablesRequest and GetPartitionsRequest to 
> HiveMetaStoreClient
> 
>
> Key: HIVE-24386
> URL: https://issues.apache.org/jira/browse/HIVE-24386
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Narayanan Venkateswaran
>Assignee: Narayanan Venkateswaran
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Builder methods for GetTablesRequest and GetPartitionsRequest should be added 
> to the HiveMetaStoreClient class.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24386) Add builder methods for GetTablesRequest and GetPartitionsRequest to HiveMetaStoreClient

2021-01-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24386?focusedWorklogId=535859=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-535859
 ]

ASF GitHub Bot logged work on HIVE-24386:
-

Author: ASF GitHub Bot
Created on: 14/Jan/21 03:58
Start Date: 14/Jan/21 03:58
Worklog Time Spent: 10m 
  Work Description: vnhive closed pull request #1694:
URL: https://github.com/apache/hive/pull/1694


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 535859)
Time Spent: 1h 20m  (was: 1h 10m)

> Add builder methods for GetTablesRequest and GetPartitionsRequest to 
> HiveMetaStoreClient
> 
>
> Key: HIVE-24386
> URL: https://issues.apache.org/jira/browse/HIVE-24386
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Narayanan Venkateswaran
>Assignee: Narayanan Venkateswaran
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Builder methods for GetTablesRequest and GetPartitionsRequest should be added 
> to the HiveMetaStoreClient class.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24627) Add Debug Logging to Hive JDBC Connection

2021-01-13 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor resolved HIVE-24627.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master.  Thanks [~mgergely] for the review!

> Add Debug Logging to Hive JDBC Connection
> -
>
> Key: HIVE-24627
> URL: https://issues.apache.org/jira/browse/HIVE-24627
> Project: Hive
>  Issue Type: Improvement
>  Components: JDBC
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Log the following:
> # Session handle
> # Version Number
> # Any configurations/variables set by the user at the client-side
> # Dump the Hive configurations at session-start



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24627) Add Debug Logging to Hive JDBC Connection

2021-01-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24627?focusedWorklogId=535847=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-535847
 ]

ASF GitHub Bot logged work on HIVE-24627:
-

Author: ASF GitHub Bot
Created on: 14/Jan/21 02:53
Start Date: 14/Jan/21 02:53
Worklog Time Spent: 10m 
  Work Description: belugabehr merged pull request #1859:
URL: https://github.com/apache/hive/pull/1859


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 535847)
Time Spent: 1h  (was: 50m)

> Add Debug Logging to Hive JDBC Connection
> -
>
> Key: HIVE-24627
> URL: https://issues.apache.org/jira/browse/HIVE-24627
> Project: Hive
>  Issue Type: Improvement
>  Components: JDBC
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Log the following:
> # Session handle
> # Version Number
> # Any configurations/variables set by the user at the client-side
> # Dump the Hive configurations at session-start



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24075) Optimise KeyValuesInputMerger

2021-01-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24075?focusedWorklogId=535833=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-535833
 ]

ASF GitHub Bot logged work on HIVE-24075:
-

Author: ASF GitHub Bot
Created on: 14/Jan/21 01:34
Start Date: 14/Jan/21 01:34
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #1463:
URL: https://github.com/apache/hive/pull/1463


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 535833)
Time Spent: 50m  (was: 40m)

> Optimise KeyValuesInputMerger
> -
>
> Key: HIVE-24075
> URL: https://issues.apache.org/jira/browse/HIVE-24075
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Comparisons in KeyValueInputMerger can be reduced.
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/tools/KeyValuesInputMerger.java#L165|https://github.infra.cloudera.com/CDH/hive/blob/cdpd-master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/tools/KeyValuesInputMerger.java#L165]
> [https://github.infra.cloudera.com/CDH/hive/blob/cdpd-master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/tools/KeyValuesInputMerger.java#L150|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/tools/KeyValuesInputMerger.java#L150]
> If the reader comparisons in the queue are same, we could reuse 
> "{{nextKVReaders}}" in next subsequent iteration instead of doing the 
> comparison all over again.
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/tools/KeyValuesInputMerger.java#L178]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24595) Vectorization causing incorrect results for scalar subquery

2021-01-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24595?focusedWorklogId=535805=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-535805
 ]

ASF GitHub Bot logged work on HIVE-24595:
-

Author: ASF GitHub Bot
Created on: 13/Jan/21 23:41
Start Date: 13/Jan/21 23:41
Worklog Time Spent: 10m 
  Work Description: mustafaiman opened a new pull request #1867:
URL: https://github.com/apache/hive/pull/1867


   Change-Id: Ia901a4b1ee6a4f34fdf13f02fcd9eaaf615cca58
   
   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 535805)
Remaining Estimate: 0h
Time Spent: 10m

> Vectorization causing incorrect results for scalar subquery
> ---
>
> Key: HIVE-24595
> URL: https://issues.apache.org/jira/browse/HIVE-24595
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 3.0.0
>Reporter: Vineet Garg
>Assignee: Mustafa İman
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> *Repro*
> {code:sql}
>  CREATE EXTERNAL TABLE `alltypessmall`( 
>`id` int,
>`bool_col` boolean,  
>`tinyint_col` tinyint,   
>`smallint_col` smallint, 
>`int_col` int,   
>`bigint_col` bigint, 
>`float_col` float,   
>`double_col` double, 
>`date_string_col` string,
>`string_col` string, 
>`timestamp_col` timestamp)   
>  PARTITIONED BY (   
>`year` int,  
>`month` int) 
>  ROW FORMAT SERDE   
>'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'  
>  WITH SERDEPROPERTIES ( 
>'escape.delim'='\\', 
>'field.delim'=',',   
>'serialization.format'=',')  
>  STORED AS INPUTFORMAT  
>'org.apache.hadoop.mapred.TextInputFormat'   
>  OUTPUTFORMAT   
>'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' 
>  TBLPROPERTIES (
>'DO_NOT_UPDATE_STATS'='true',
>'OBJCAPABILITIES'='EXTREAD,EXTWRITE',
>'STATS_GENERATED'='TASK',
>'impala.lastComputeStatsTime'='1608312793',  
>'transient_lastDdlTime'='1608310442');
> insert into alltypessmall partition(year=2002,month=1) values(1, true, 
> 3,3,4,3434,5.4,44.3,'str1','str2', '01-01-2001');
> insert into alltypessmall partition(year=2002,month=1) values(1, true, 
> 3,3,4,3434,5.4,44.3,'str1','str2', '01-01-2001');
> insert into alltypessmall partition(year=2002,month=1) values(1, true, 
> 3,3,40,3434,5.4,44.3,'str1','str2', '01-01-2001');
> {code}
> Following query should fail but it succeeds
> {code:sql}
> SELECT id FROM alltypessmall
> WHERE int_col =
>   (SELECT int_col
>FROM alltypessmall)
> ORDER BY id;
> {code}
> *Explain plan*
> {code:java}
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
> Tez
>   DagId: vgarg_20210106115838_3fe73bf6-66c2-4281-92e8-fd75fd8ad400:17
>   Edges:
> Map 1 <- Map 3 (BROADCAST_EDGE), Reducer 4 (BROADCAST_EDGE)
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
> Reducer 4 <- Map 3 (CUSTOM_SIMPLE_EDGE)
>   DagName: vgarg_20210106115838_3fe73bf6-66c2-4281-92e8-fd75fd8ad400:17
>   Vertices:
> Map 1
> Map Operator Tree:
> TableScan
>   alias: alltypessmall
>   filterExpr: int_col is not null (type: boolean)
>   Statistics: Num rows: 3 Data size: 24 Basic stats: COMPLETE 
> Column stats: COMPLETE
>   Filter Operator
> predicate: int_col is not null (type: 

[jira] [Updated] (HIVE-24595) Vectorization causing incorrect results for scalar subquery

2021-01-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24595:
--
Labels: pull-request-available  (was: )

> Vectorization causing incorrect results for scalar subquery
> ---
>
> Key: HIVE-24595
> URL: https://issues.apache.org/jira/browse/HIVE-24595
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 3.0.0
>Reporter: Vineet Garg
>Assignee: Mustafa İman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> *Repro*
> {code:sql}
>  CREATE EXTERNAL TABLE `alltypessmall`( 
>`id` int,
>`bool_col` boolean,  
>`tinyint_col` tinyint,   
>`smallint_col` smallint, 
>`int_col` int,   
>`bigint_col` bigint, 
>`float_col` float,   
>`double_col` double, 
>`date_string_col` string,
>`string_col` string, 
>`timestamp_col` timestamp)   
>  PARTITIONED BY (   
>`year` int,  
>`month` int) 
>  ROW FORMAT SERDE   
>'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'  
>  WITH SERDEPROPERTIES ( 
>'escape.delim'='\\', 
>'field.delim'=',',   
>'serialization.format'=',')  
>  STORED AS INPUTFORMAT  
>'org.apache.hadoop.mapred.TextInputFormat'   
>  OUTPUTFORMAT   
>'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' 
>  TBLPROPERTIES (
>'DO_NOT_UPDATE_STATS'='true',
>'OBJCAPABILITIES'='EXTREAD,EXTWRITE',
>'STATS_GENERATED'='TASK',
>'impala.lastComputeStatsTime'='1608312793',  
>'transient_lastDdlTime'='1608310442');
> insert into alltypessmall partition(year=2002,month=1) values(1, true, 
> 3,3,4,3434,5.4,44.3,'str1','str2', '01-01-2001');
> insert into alltypessmall partition(year=2002,month=1) values(1, true, 
> 3,3,4,3434,5.4,44.3,'str1','str2', '01-01-2001');
> insert into alltypessmall partition(year=2002,month=1) values(1, true, 
> 3,3,40,3434,5.4,44.3,'str1','str2', '01-01-2001');
> {code}
> Following query should fail but it succeeds
> {code:sql}
> SELECT id FROM alltypessmall
> WHERE int_col =
>   (SELECT int_col
>FROM alltypessmall)
> ORDER BY id;
> {code}
> *Explain plan*
> {code:java}
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
> Tez
>   DagId: vgarg_20210106115838_3fe73bf6-66c2-4281-92e8-fd75fd8ad400:17
>   Edges:
> Map 1 <- Map 3 (BROADCAST_EDGE), Reducer 4 (BROADCAST_EDGE)
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
> Reducer 4 <- Map 3 (CUSTOM_SIMPLE_EDGE)
>   DagName: vgarg_20210106115838_3fe73bf6-66c2-4281-92e8-fd75fd8ad400:17
>   Vertices:
> Map 1
> Map Operator Tree:
> TableScan
>   alias: alltypessmall
>   filterExpr: int_col is not null (type: boolean)
>   Statistics: Num rows: 3 Data size: 24 Basic stats: COMPLETE 
> Column stats: COMPLETE
>   Filter Operator
> predicate: int_col is not null (type: boolean)
> Statistics: Num rows: 3 Data size: 24 Basic stats: 
> COMPLETE Column stats: COMPLETE
> Select Operator
>   expressions: id (type: int), int_col (type: int)
>   outputColumnNames: _col0, _col1
>   Statistics: Num rows: 3 Data size: 24 Basic stats: 
> COMPLETE Column stats: COMPLETE
>   Map Join Operator
> condition map:
>  Inner Join 0 to 1
> keys:
>   0
>   1
> outputColumnNames: _col0, _col1
> input vertices:
>   1 Reducer 4
> Statistics: Num rows: 3 Data size: 24 Basic stats: 
> COMPLETE Column stats: COMPLETE
> Map Join Operator
>   condition map:
>  

[jira] [Updated] (HIVE-24634) Create table if not exists should validate whether table exists before doAuth()

2021-01-13 Thread Naresh P R (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naresh P R updated HIVE-24634:
--
Description: 
In Hive + Ranger cluster, Create table if not exist hive-ranger would validate 
privileges over complete files in table location even thought table already 
exist.

Table exist check should be validated before doAuthorization in compile.
{code:java}
 at 
org.apache.hadoop.hive.common.FileUtils.isActionPermittedForFileHierarchy(FileUtils.java:452)
 
 at 
org.apache.ranger.authorization.hive.authorizer.RangerHiveAuthorizer.isURIAccessAllowed(RangerHiveAuthorizer.java:1428)
 at 
org.apache.ranger.authorization.hive.authorizer.RangerHiveAuthorizer.checkPrivileges(RangerHiveAuthorizer.java:291)
 at org.apache.hadoop.hive.ql.Driver.doAuthorizationV2(Driver.java:1337)
 at org.apache.hadoop.hive.ql.Driver.doAuthorization(Driver.java:1101)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:710){code}

  was:
In Hive + Ranger cluster, Create table if not exist hive-ranger would validate 
privileges over complete files in table location even thought table already 
exist.

Table exist check should be validated before doAuthorization in compile.
{code:java}
at 
org.apache.ranger.authorization.hive.authorizer.RangerHiveAuthorizer.checkPrivileges(RangerHiveAuthorizer.java:291)
 at org.apache.hadoop.hive.ql.Driver.doAuthorizationV2(Driver.java:1337)
 at org.apache.hadoop.hive.ql.Driver.doAuthorization(Driver.java:1101)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:710){code}


> Create table if not exists should validate whether table exists before 
> doAuth()
> ---
>
> Key: HIVE-24634
> URL: https://issues.apache.org/jira/browse/HIVE-24634
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Priority: Major
>
> In Hive + Ranger cluster, Create table if not exist hive-ranger would 
> validate privileges over complete files in table location even thought table 
> already exist.
> Table exist check should be validated before doAuthorization in compile.
> {code:java}
>  at 
> org.apache.hadoop.hive.common.FileUtils.isActionPermittedForFileHierarchy(FileUtils.java:452)
>  
>  at 
> org.apache.ranger.authorization.hive.authorizer.RangerHiveAuthorizer.isURIAccessAllowed(RangerHiveAuthorizer.java:1428)
>  at 
> org.apache.ranger.authorization.hive.authorizer.RangerHiveAuthorizer.checkPrivileges(RangerHiveAuthorizer.java:291)
>  at org.apache.hadoop.hive.ql.Driver.doAuthorizationV2(Driver.java:1337)
>  at org.apache.hadoop.hive.ql.Driver.doAuthorization(Driver.java:1101)
>  at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:710){code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24634) Create table if not exists should validate whether table exists before doAuth()

2021-01-13 Thread Naresh P R (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naresh P R updated HIVE-24634:
--
Description: 
In Hive + Ranger cluster, Create table if not exist hive-ranger would validate 
privileges over complete files in table location even thought table already 
exist.

Table exist check should be validated before doAuthorization in compile.
{code:java}
at 
org.apache.ranger.authorization.hive.authorizer.RangerHiveAuthorizer.checkPrivileges(RangerHiveAuthorizer.java:291)
 at org.apache.hadoop.hive.ql.Driver.doAuthorizationV2(Driver.java:1337)
 at org.apache.hadoop.hive.ql.Driver.doAuthorization(Driver.java:1101)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:710){code}

  was:
In Hive + Ranger cluster, Create table if not exist hive-ranger would validate 
privileges over complete files in table location even thought table already 
exist.

Table exist check should be validated before doAuthorization in compile.
at 
org.apache.ranger.authorization.hive.authorizer.RangerHiveAuthorizer.checkPrivileges(RangerHiveAuthorizer.java:291)
at org.apache.hadoop.hive.ql.Driver.doAuthorizationV2(Driver.java:1337)
at org.apache.hadoop.hive.ql.Driver.doAuthorization(Driver.java:1101)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:710)


> Create table if not exists should validate whether table exists before 
> doAuth()
> ---
>
> Key: HIVE-24634
> URL: https://issues.apache.org/jira/browse/HIVE-24634
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Priority: Major
>
> In Hive + Ranger cluster, Create table if not exist hive-ranger would 
> validate privileges over complete files in table location even thought table 
> already exist.
> Table exist check should be validated before doAuthorization in compile.
> {code:java}
> at 
> org.apache.ranger.authorization.hive.authorizer.RangerHiveAuthorizer.checkPrivileges(RangerHiveAuthorizer.java:291)
>  at org.apache.hadoop.hive.ql.Driver.doAuthorizationV2(Driver.java:1337)
>  at org.apache.hadoop.hive.ql.Driver.doAuthorization(Driver.java:1101)
>  at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:710){code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24394) Enable printing explain to console at query start

2021-01-13 Thread Johan Gustavsson (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264330#comment-17264330
 ] 

Johan Gustavsson commented on HIVE-24394:
-

Thank you for reviewing and merging this [~kgyrtkirk]

> Enable printing explain to console at query start
> -
>
> Key: HIVE-24394
> URL: https://issues.apache.org/jira/browse/HIVE-24394
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive, Query Processor
>Affects Versions: 2.3.7, 3.1.2
>Reporter: Johan Gustavsson
>Assignee: Johan Gustavsson
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently there is a hive.log.explain.output option that prints extended 
> explain to log. While this is helpful for internal investigations, it limits 
> the information that is available to users. So we should add options to make 
> this print non-extended explain to console,. for general user consumption, to 
> make it easier for users to debug queries and workflows without having to 
> resubmit queries with explain.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24523) Vectorized read path for LazySimpleSerde does not honor the SERDEPROPERTIES for timestamp

2021-01-13 Thread Naresh P R (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naresh P R updated HIVE-24523:
--
Fix Version/s: 4.0.0

> Vectorized read path for LazySimpleSerde does not honor the SERDEPROPERTIES 
> for timestamp
> -
>
> Key: HIVE-24523
> URL: https://issues.apache.org/jira/browse/HIVE-24523
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 3.2.0, 4.0.0
>Reporter: Rajkumar Singh
>Assignee: Naresh P R
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Steps to repro:
> {code:java}
>   create external  table tstable(date_created timestamp)   ROW FORMAT SERDE 
> 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'   WITH SERDEPROPERTIES ( 
>  'timestamp.formats'='MMddHHmmss') stored as textfile;
> cat sampledata 
> 2020120517
> hdfs dfs -put sampledata /warehouse/tablespace/external/hive/tstable
> {code}
> disable fetch task conversion and run select * from tstable which produce no 
> results, disabling the set 
> hive.vectorized.use.vector.serde.deserialize=false; return the expected 
> output.
> while parsing the string to timestamp 
> https://github.com/apache/hive/blob/master/serde/src/java/org/apache/hadoop/hive/serde2/lazy/fast/LazySimpleDeserializeRead.java#L812
>  does not set the DateTimeFormatter which results IllegalArgumentException 
> while parsing the timestamp through TimestampUtils.stringToTimestamp(strValue)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24628) Decimal values are displayed as scientific notation in beeline

2021-01-13 Thread Naresh P R (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naresh P R updated HIVE-24628:
--
Component/s: Beeline

> Decimal values are displayed as scientific notation in beeline
> --
>
> Key: HIVE-24628
> URL: https://issues.apache.org/jira/browse/HIVE-24628
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>  Labels: pull-request-available
>
> As we are using BigDecimal.toString() returns scientific notation instead of 
> original text, which confuse customer. It should be changed to 
> toPlainString() at here
> [https://github.com/apache/hive/blob/master/beeline/src/java/org/apache/hive/beeline/Rows.java#L165]
> Repro steps:
>  
> {code:java}
> beeline> select cast(0 as decimal(20,10));
> //output
> 0E-10 
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23684) Large underestimation in NDV stats when input and join cardinality ratio is big

2021-01-13 Thread Vineet Garg (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264247#comment-17264247
 ] 

Vineet Garg commented on HIVE-23684:


Merged the pull request into master.

> Large underestimation in NDV stats when input and join cardinality ratio is 
> big
> ---
>
> Key: HIVE-23684
> URL: https://issues.apache.org/jira/browse/HIVE-23684
> Project: Hive
>  Issue Type: Bug
>Reporter: Stamatis Zampetakis
>Assignee: Vineet Garg
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Large underestimations of NDV values may occur after a join operation since 
> the current logic will decrease the original NDV values proportionally.
> The 
> [code|https://github.com/apache/hive/blob/1271d08a3c51c021fa710449f8748b8cdb12b70f/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java#L2558]
>  compares the number of rows of each relation before the join with the number 
> of rows after the join and extracts a ratio for each side. Based on this 
> ratio it adapts (reduces) the NDV accordingly.
> Consider for instance the following query:
> {code:sql}
> select inv_warehouse_sk
>  , inv_item_sk
>  , stddev_samp(inv_quantity_on_hand) stdev
>  , avg(inv_quantity_on_hand) mean
> from inventory
>, date_dim
> where inv_date_sk = d_date_sk
>   and d_year = 1999
>   and d_moy = 2
> group by inv_warehouse_sk, inv_item_sk;
> {code}
> For the sake of the discussion, I outline below some relevant stats (from 
> TPCDS30tb):
>  T(inventory) = 1627857000
>  T(date_dim) = 73049
>  T(inventory JOIN date_dim[d_year=1999 AND d_moy=2]) = 24948000
>  V(inventory, inv_date_sk) = 261
>  V(inventory, inv_item_sk) = 42
>  V(inventory, inv_warehouse_sk) = 27
>  V(date_dim, inv, d_date_sk) = 73049
> For instance, in this query the join between inventory and date_dim has ~24M 
> rows while inventory has ~1.5B so the NDV of the columns coming from 
> inventory are reduced by a factor of ~100 so we end up with V(JOIN, 
> inv_item_sk) = ~6K while the real one is 231000.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-23684) Large underestimation in NDV stats when input and join cardinality ratio is big

2021-01-13 Thread Vineet Garg (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg resolved HIVE-23684.

Fix Version/s: 4.0.0
   Resolution: Fixed

> Large underestimation in NDV stats when input and join cardinality ratio is 
> big
> ---
>
> Key: HIVE-23684
> URL: https://issues.apache.org/jira/browse/HIVE-23684
> Project: Hive
>  Issue Type: Bug
>Reporter: Stamatis Zampetakis
>Assignee: Vineet Garg
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Large underestimations of NDV values may occur after a join operation since 
> the current logic will decrease the original NDV values proportionally.
> The 
> [code|https://github.com/apache/hive/blob/1271d08a3c51c021fa710449f8748b8cdb12b70f/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java#L2558]
>  compares the number of rows of each relation before the join with the number 
> of rows after the join and extracts a ratio for each side. Based on this 
> ratio it adapts (reduces) the NDV accordingly.
> Consider for instance the following query:
> {code:sql}
> select inv_warehouse_sk
>  , inv_item_sk
>  , stddev_samp(inv_quantity_on_hand) stdev
>  , avg(inv_quantity_on_hand) mean
> from inventory
>, date_dim
> where inv_date_sk = d_date_sk
>   and d_year = 1999
>   and d_moy = 2
> group by inv_warehouse_sk, inv_item_sk;
> {code}
> For the sake of the discussion, I outline below some relevant stats (from 
> TPCDS30tb):
>  T(inventory) = 1627857000
>  T(date_dim) = 73049
>  T(inventory JOIN date_dim[d_year=1999 AND d_moy=2]) = 24948000
>  V(inventory, inv_date_sk) = 261
>  V(inventory, inv_item_sk) = 42
>  V(inventory, inv_warehouse_sk) = 27
>  V(date_dim, inv, d_date_sk) = 73049
> For instance, in this query the join between inventory and date_dim has ~24M 
> rows while inventory has ~1.5B so the NDV of the columns coming from 
> inventory are reduced by a factor of ~100 so we end up with V(JOIN, 
> inv_item_sk) = ~6K while the real one is 231000.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23684) Large underestimation in NDV stats when input and join cardinality ratio is big

2021-01-13 Thread Vineet Garg (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg reassigned HIVE-23684:
--

Assignee: Vineet Garg  (was: Stamatis Zampetakis)

> Large underestimation in NDV stats when input and join cardinality ratio is 
> big
> ---
>
> Key: HIVE-23684
> URL: https://issues.apache.org/jira/browse/HIVE-23684
> Project: Hive
>  Issue Type: Bug
>Reporter: Stamatis Zampetakis
>Assignee: Vineet Garg
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Large underestimations of NDV values may occur after a join operation since 
> the current logic will decrease the original NDV values proportionally.
> The 
> [code|https://github.com/apache/hive/blob/1271d08a3c51c021fa710449f8748b8cdb12b70f/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java#L2558]
>  compares the number of rows of each relation before the join with the number 
> of rows after the join and extracts a ratio for each side. Based on this 
> ratio it adapts (reduces) the NDV accordingly.
> Consider for instance the following query:
> {code:sql}
> select inv_warehouse_sk
>  , inv_item_sk
>  , stddev_samp(inv_quantity_on_hand) stdev
>  , avg(inv_quantity_on_hand) mean
> from inventory
>, date_dim
> where inv_date_sk = d_date_sk
>   and d_year = 1999
>   and d_moy = 2
> group by inv_warehouse_sk, inv_item_sk;
> {code}
> For the sake of the discussion, I outline below some relevant stats (from 
> TPCDS30tb):
>  T(inventory) = 1627857000
>  T(date_dim) = 73049
>  T(inventory JOIN date_dim[d_year=1999 AND d_moy=2]) = 24948000
>  V(inventory, inv_date_sk) = 261
>  V(inventory, inv_item_sk) = 42
>  V(inventory, inv_warehouse_sk) = 27
>  V(date_dim, inv, d_date_sk) = 73049
> For instance, in this query the join between inventory and date_dim has ~24M 
> rows while inventory has ~1.5B so the NDV of the columns coming from 
> inventory are reduced by a factor of ~100 so we end up with V(JOIN, 
> inv_item_sk) = ~6K while the real one is 231000.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23684) Large underestimation in NDV stats when input and join cardinality ratio is big

2021-01-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23684?focusedWorklogId=535539=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-535539
 ]

ASF GitHub Bot logged work on HIVE-23684:
-

Author: ASF GitHub Bot
Created on: 13/Jan/21 16:46
Start Date: 13/Jan/21 16:46
Worklog Time Spent: 10m 
  Work Description: vineetgarg02 merged pull request #1786:
URL: https://github.com/apache/hive/pull/1786


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 535539)
Time Spent: 50m  (was: 40m)

> Large underestimation in NDV stats when input and join cardinality ratio is 
> big
> ---
>
> Key: HIVE-23684
> URL: https://issues.apache.org/jira/browse/HIVE-23684
> Project: Hive
>  Issue Type: Bug
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Large underestimations of NDV values may occur after a join operation since 
> the current logic will decrease the original NDV values proportionally.
> The 
> [code|https://github.com/apache/hive/blob/1271d08a3c51c021fa710449f8748b8cdb12b70f/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java#L2558]
>  compares the number of rows of each relation before the join with the number 
> of rows after the join and extracts a ratio for each side. Based on this 
> ratio it adapts (reduces) the NDV accordingly.
> Consider for instance the following query:
> {code:sql}
> select inv_warehouse_sk
>  , inv_item_sk
>  , stddev_samp(inv_quantity_on_hand) stdev
>  , avg(inv_quantity_on_hand) mean
> from inventory
>, date_dim
> where inv_date_sk = d_date_sk
>   and d_year = 1999
>   and d_moy = 2
> group by inv_warehouse_sk, inv_item_sk;
> {code}
> For the sake of the discussion, I outline below some relevant stats (from 
> TPCDS30tb):
>  T(inventory) = 1627857000
>  T(date_dim) = 73049
>  T(inventory JOIN date_dim[d_year=1999 AND d_moy=2]) = 24948000
>  V(inventory, inv_date_sk) = 261
>  V(inventory, inv_item_sk) = 42
>  V(inventory, inv_warehouse_sk) = 27
>  V(date_dim, inv, d_date_sk) = 73049
> For instance, in this query the join between inventory and date_dim has ~24M 
> rows while inventory has ~1.5B so the NDV of the columns coming from 
> inventory are reduced by a factor of ~100 so we end up with V(JOIN, 
> inv_item_sk) = ~6K while the real one is 231000.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-14165) Remove Hive file listing during split computation

2021-01-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-14165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-14165:
--
Labels: pull-request-available  (was: )

> Remove Hive file listing during split computation
> -
>
> Key: HIVE-14165
> URL: https://issues.apache.org/jira/browse/HIVE-14165
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.1.0
>Reporter: Abdullah Yousufi
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-14165.02.patch, HIVE-14165.03.patch, 
> HIVE-14165.04.patch, HIVE-14165.05.patch, HIVE-14165.06.patch, 
> HIVE-14165.07.patch, HIVE-14165.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The Hive side listing in FetchOperator.java is unnecessary, since Hadoop's 
> FileInputFormat.java will list the files during split computation anyway to 
> determine their size. One way to remove this is to catch the 
> InvalidInputFormat exception thrown by FileInputFormat#getSplits() on the 
> Hive side instead of doing the file listing beforehand.
> For S3 select queries on partitioned tables, this results in a 2x speedup.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-14165) Remove Hive file listing during split computation

2021-01-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-14165?focusedWorklogId=535524=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-535524
 ]

ASF GitHub Bot logged work on HIVE-14165:
-

Author: ASF GitHub Bot
Created on: 13/Jan/21 16:31
Start Date: 13/Jan/21 16:31
Worklog Time Spent: 10m 
  Work Description: pvargacl opened a new pull request #1866:
URL: https://github.com/apache/hive/pull/1866


   
   ### What changes were proposed in this pull request?
   Remove unnecessary file listing from Fetchoperator, rather handle 
FileNotFoundException, to make it more performant on s3.
   Rebased the original patch from Sahil Takiar.
   
   ### Why are the changes needed?
   Performance improvement
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   Current unit tests. Manually test: deleted some directories during execution 
to cause FileNotFoundEx.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 535524)
Remaining Estimate: 0h
Time Spent: 10m

> Remove Hive file listing during split computation
> -
>
> Key: HIVE-14165
> URL: https://issues.apache.org/jira/browse/HIVE-14165
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.1.0
>Reporter: Abdullah Yousufi
>Assignee: Peter Varga
>Priority: Major
> Attachments: HIVE-14165.02.patch, HIVE-14165.03.patch, 
> HIVE-14165.04.patch, HIVE-14165.05.patch, HIVE-14165.06.patch, 
> HIVE-14165.07.patch, HIVE-14165.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The Hive side listing in FetchOperator.java is unnecessary, since Hadoop's 
> FileInputFormat.java will list the files during split computation anyway to 
> determine their size. One way to remove this is to catch the 
> InvalidInputFormat exception thrown by FileInputFormat#getSplits() on the 
> Hive side instead of doing the file listing beforehand.
> For S3 select queries on partitioned tables, this results in a 2x speedup.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24633) Support CTE with column labels

2021-01-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24633?focusedWorklogId=535468=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-535468
 ]

ASF GitHub Bot logged work on HIVE-24633:
-

Author: ASF GitHub Bot
Created on: 13/Jan/21 15:23
Start Date: 13/Jan/21 15:23
Worklog Time Spent: 10m 
  Work Description: kasakrisz opened a new pull request #1865:
URL: https://github.com/apache/hive/pull/1865


   ### What changes were proposed in this pull request?
   1. Improve the parser to accept CTE clause with `with column list` specified:
   ```
   WITH cte(a, b) AS ...
   ```
   2. When transforming subquery AST tree to Calcite RelNode tree a new 
RowResolver is created for the subquery's top node to point its alias. Extend 
this logic with assign the `with column list` elements to each entry if 
explicitly specified in the `WITH` clause.
   
   ### Why are the changes needed?
   SQL standard enables this feature.
   
   ### Does this PR introduce _any_ user-facing change?
   Yes. When users specify `with column list` the list elements must be used to 
reference expressions in the CTE' select clause from the main query.
   
   ### How was this patch tested?
   ```
   mvn test -DskipSparkTests -Dtest=TestMiniLlapLocalCliDriver -Dqfile=cte_8.q 
-pl itests/qtest -Pitests
   mvn test -DskipSparkTests -Dtest=TestMiniLlapLocalCliDriver 
-Dqfile=cte_mat_1.q -pl itests/qtest -Pitests
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 535468)
Remaining Estimate: 0h
Time Spent: 10m

> Support CTE with column labels
> --
>
> Key: HIVE-24633
> URL: https://issues.apache.org/jira/browse/HIVE-24633
> Project: Hive
>  Issue Type: Improvement
>  Components: Parser
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {code}
> with cte1(a, b) as (select int_col x, bigint_col y from t1)
> select a, b from cte1{code}
> {code}
> a b
> 1 2
> 3 4
> {code}
> {code}
>  ::=
>   [  ] 
>   [  ] [  ] [  ]
>  ::=
>   WITH [ RECURSIVE ] 
>  ::=
>[ {   }... ]
>  ::=
>[]
>   AS  [  ]
>  ::=
>   
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24633) Support CTE with column labels

2021-01-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24633:
--
Labels: pull-request-available  (was: )

> Support CTE with column labels
> --
>
> Key: HIVE-24633
> URL: https://issues.apache.org/jira/browse/HIVE-24633
> Project: Hive
>  Issue Type: Improvement
>  Components: Parser
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {code}
> with cte1(a, b) as (select int_col x, bigint_col y from t1)
> select a, b from cte1{code}
> {code}
> a b
> 1 2
> 3 4
> {code}
> {code}
>  ::=
>   [  ] 
>   [  ] [  ] [  ]
>  ::=
>   WITH [ RECURSIVE ] 
>  ::=
>[ {   }... ]
>  ::=
>[]
>   AS  [  ]
>  ::=
>   
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24633) Support CTE with column labels

2021-01-13 Thread Krisztian Kasa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa updated HIVE-24633:
--
Description: 
{code}
with cte1(a, b) as (select int_col x, bigint_col y from t1)
select a, b from cte1{code}
{code}
a   b
1   2
3   4
{code}

{code}
 ::=
  [  ] 
  [  ] [  ] [  ]

 ::=
  WITH [ RECURSIVE ] 

 ::=
   [ {   }... ]

 ::=
   []
  AS  [  ]

 ::=
  
{code}

  was:
{code}
with cte1(a, b) as (select int_col x, bigint_col y from t1)
select a, b from cte1{code}
{code}
a   b
1   2
3   4
{code}


> Support CTE with column labels
> --
>
> Key: HIVE-24633
> URL: https://issues.apache.org/jira/browse/HIVE-24633
> Project: Hive
>  Issue Type: Improvement
>  Components: Parser
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>
> {code}
> with cte1(a, b) as (select int_col x, bigint_col y from t1)
> select a, b from cte1{code}
> {code}
> a b
> 1 2
> 3 4
> {code}
> {code}
>  ::=
>   [  ] 
>   [  ] [  ] [  ]
>  ::=
>   WITH [ RECURSIVE ] 
>  ::=
>[ {   }... ]
>  ::=
>[]
>   AS  [  ]
>  ::=
>   
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24394) Enable printing explain to console at query start

2021-01-13 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-24394:

Fix Version/s: 4.0.0
 Assignee: Johan Gustavsson  (was: Zoltan Haindrich)
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

merged into master. Thank you [~johang] for the patch!

> Enable printing explain to console at query start
> -
>
> Key: HIVE-24394
> URL: https://issues.apache.org/jira/browse/HIVE-24394
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive, Query Processor
>Affects Versions: 2.3.7, 3.1.2
>Reporter: Johan Gustavsson
>Assignee: Johan Gustavsson
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently there is a hive.log.explain.output option that prints extended 
> explain to log. While this is helpful for internal investigations, it limits 
> the information that is available to users. So we should add options to make 
> this print non-extended explain to console,. for general user consumption, to 
> make it easier for users to debug queries and workflows without having to 
> resubmit queries with explain.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24394) Enable printing explain to console at query start

2021-01-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24394?focusedWorklogId=535447=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-535447
 ]

ASF GitHub Bot logged work on HIVE-24394:
-

Author: ASF GitHub Bot
Created on: 13/Jan/21 14:57
Start Date: 13/Jan/21 14:57
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk merged pull request #1679:
URL: https://github.com/apache/hive/pull/1679


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 535447)
Time Spent: 20m  (was: 10m)

> Enable printing explain to console at query start
> -
>
> Key: HIVE-24394
> URL: https://issues.apache.org/jira/browse/HIVE-24394
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive, Query Processor
>Affects Versions: 2.3.7, 3.1.2
>Reporter: Johan Gustavsson
>Assignee: Zoltan Haindrich
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently there is a hive.log.explain.output option that prints extended 
> explain to log. While this is helpful for internal investigations, it limits 
> the information that is available to users. So we should add options to make 
> this print non-extended explain to console,. for general user consumption, to 
> make it easier for users to debug queries and workflows without having to 
> resubmit queries with explain.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24627) Add Debug Logging to Hive JDBC Connection

2021-01-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24627?focusedWorklogId=535441=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-535441
 ]

ASF GitHub Bot logged work on HIVE-24627:
-

Author: ASF GitHub Bot
Created on: 13/Jan/21 14:50
Start Date: 13/Jan/21 14:50
Worklog Time Spent: 10m 
  Work Description: belugabehr opened a new pull request #1859:
URL: https://github.com/apache/hive/pull/1859


   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 535441)
Time Spent: 50m  (was: 40m)

> Add Debug Logging to Hive JDBC Connection
> -
>
> Key: HIVE-24627
> URL: https://issues.apache.org/jira/browse/HIVE-24627
> Project: Hive
>  Issue Type: Improvement
>  Components: JDBC
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Log the following:
> # Session handle
> # Version Number
> # Any configurations/variables set by the user at the client-side
> # Dump the Hive configurations at session-start



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24339) REPL LOAD command ignores config properties set by WITH clause

2021-01-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24339?focusedWorklogId=535439=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-535439
 ]

ASF GitHub Bot logged work on HIVE-24339:
-

Author: ASF GitHub Bot
Created on: 13/Jan/21 14:49
Start Date: 13/Jan/21 14:49
Worklog Time Spent: 10m 
  Work Description: ayushtkn commented on pull request #1864:
URL: https://github.com/apache/hive/pull/1864#issuecomment-759497451


   cc. @kgyrtkirk @abstractdog 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 535439)
Time Spent: 20m  (was: 10m)

> REPL LOAD command ignores config properties set by WITH clause
> --
>
> Key: HIVE-24339
> URL: https://issues.apache.org/jira/browse/HIVE-24339
> Project: Hive
>  Issue Type: Bug
>Reporter: László Bodor
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> By debug messages we confirmed that REPL LOAD command ignored some config 
> properties when they were provided in WITH clause, e.g.:
> {code}
> REPL LOAD bdpp01pub FROM 
> 'hdfs://prdpdp01//apps/hive/repl/8237c7bd-ba26-4425-8659-3a0d32ab312c' WITH 
> ('mapreduce.job.queuename'='default','hive.exec.parallel'='true','hive.exec.parallel.thread.number'='128',
> ...
> {code}
> We found that it was working on 16 threads, ignoring 
> 'hive.exec.parallel.thread.number'='128'. Setting this property on session 
> level worked.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24627) Add Debug Logging to Hive JDBC Connection

2021-01-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24627?focusedWorklogId=535429=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-535429
 ]

ASF GitHub Bot logged work on HIVE-24627:
-

Author: ASF GitHub Bot
Created on: 13/Jan/21 14:46
Start Date: 13/Jan/21 14:46
Worklog Time Spent: 10m 
  Work Description: belugabehr closed pull request #1859:
URL: https://github.com/apache/hive/pull/1859


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 535429)
Time Spent: 40m  (was: 0.5h)

> Add Debug Logging to Hive JDBC Connection
> -
>
> Key: HIVE-24627
> URL: https://issues.apache.org/jira/browse/HIVE-24627
> Project: Hive
>  Issue Type: Improvement
>  Components: JDBC
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Log the following:
> # Session handle
> # Version Number
> # Any configurations/variables set by the user at the client-side
> # Dump the Hive configurations at session-start



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24613) Support Values clause without Insert

2021-01-13 Thread Krisztian Kasa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa updated HIVE-24613:
--
Component/s: Parser

> Support Values clause without Insert
> 
>
> Key: HIVE-24613
> URL: https://issues.apache.org/jira/browse/HIVE-24613
> Project: Hive
>  Issue Type: Improvement
>  Components: Parser
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Standalone:
> {code}
> VALUES(1,2,3),(4,5,6);
> {code}
> {code}
> 1 2   3
> 4 5   6
> {code}
> In subquery:
> {code}
> SELECT * FROM (VALUES(1,2,3),(4,5,6)) as FOO;
> {code}
> {code}
> 1 2   3
> 4 5   6
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24633) Support CTE with column labels

2021-01-13 Thread Krisztian Kasa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa reassigned HIVE-24633:
-


> Support CTE with column labels
> --
>
> Key: HIVE-24633
> URL: https://issues.apache.org/jira/browse/HIVE-24633
> Project: Hive
>  Issue Type: Improvement
>  Components: Parser
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>
> {code}
> with cte1(a, b) as (select int_col x, bigint_col y from t1)
> select a, b from cte1{code}
> {code}
> a b
> 1 2
> 3 4
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24613) Support Values clause without Insert

2021-01-13 Thread Krisztian Kasa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa resolved HIVE-24613.
---
Resolution: Fixed

Pushed to master. Thanks [~jcamachorodriguez], [~kgyrtkirk] for review.

> Support Values clause without Insert
> 
>
> Key: HIVE-24613
> URL: https://issues.apache.org/jira/browse/HIVE-24613
> Project: Hive
>  Issue Type: Improvement
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Standalone:
> {code}
> VALUES(1,2,3),(4,5,6);
> {code}
> {code}
> 1 2   3
> 4 5   6
> {code}
> In subquery:
> {code}
> SELECT * FROM (VALUES(1,2,3),(4,5,6)) as FOO;
> {code}
> {code}
> 1 2   3
> 4 5   6
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24613) Support Values clause without Insert

2021-01-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24613?focusedWorklogId=535416=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-535416
 ]

ASF GitHub Bot logged work on HIVE-24613:
-

Author: ASF GitHub Bot
Created on: 13/Jan/21 14:10
Start Date: 13/Jan/21 14:10
Worklog Time Spent: 10m 
  Work Description: kasakrisz merged pull request #1847:
URL: https://github.com/apache/hive/pull/1847


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 535416)
Time Spent: 50m  (was: 40m)

> Support Values clause without Insert
> 
>
> Key: HIVE-24613
> URL: https://issues.apache.org/jira/browse/HIVE-24613
> Project: Hive
>  Issue Type: Improvement
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Standalone:
> {code}
> VALUES(1,2,3),(4,5,6);
> {code}
> {code}
> 1 2   3
> 4 5   6
> {code}
> In subquery:
> {code}
> SELECT * FROM (VALUES(1,2,3),(4,5,6)) as FOO;
> {code}
> {code}
> 1 2   3
> 4 5   6
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-23459) Reduce number of listPath calls in AcidUtils::getAcidState

2021-01-13 Thread Peter Varga (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Varga resolved HIVE-23459.

Resolution: Duplicate

> Reduce number of listPath calls in AcidUtils::getAcidState
> --
>
> Key: HIVE-23459
> URL: https://issues.apache.org/jira/browse/HIVE-23459
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Peter Varga
>Priority: Minor
> Attachments: image-2020-05-13-13-57-27-270.png
>
>
> There are atleast 3 places where listPaths is invoked for FS (highlighted in 
> the follow profile).
> !image-2020-05-13-13-57-27-270.png|width=869,height=626!
>  
> Dir caching works mainly for BI strategy and when there are no-delta files. 
> It would be good to consider reducing number of NN calls to reduce getSplits 
> time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24339) REPL LOAD command ignores config properties set by WITH clause

2021-01-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24339:
--
Labels: pull-request-available  (was: )

> REPL LOAD command ignores config properties set by WITH clause
> --
>
> Key: HIVE-24339
> URL: https://issues.apache.org/jira/browse/HIVE-24339
> Project: Hive
>  Issue Type: Bug
>Reporter: László Bodor
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> By debug messages we confirmed that REPL LOAD command ignored some config 
> properties when they were provided in WITH clause, e.g.:
> {code}
> REPL LOAD bdpp01pub FROM 
> 'hdfs://prdpdp01//apps/hive/repl/8237c7bd-ba26-4425-8659-3a0d32ab312c' WITH 
> ('mapreduce.job.queuename'='default','hive.exec.parallel'='true','hive.exec.parallel.thread.number'='128',
> ...
> {code}
> We found that it was working on 16 threads, ignoring 
> 'hive.exec.parallel.thread.number'='128'. Setting this property on session 
> level worked.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24339) REPL LOAD command ignores config properties set by WITH clause

2021-01-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24339?focusedWorklogId=535353=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-535353
 ]

ASF GitHub Bot logged work on HIVE-24339:
-

Author: ASF GitHub Bot
Created on: 13/Jan/21 12:31
Start Date: 13/Jan/21 12:31
Worklog Time Spent: 10m 
  Work Description: ayushtkn opened a new pull request #1864:
URL: https://github.com/apache/hive/pull/1864


   ### What changes were proposed in this pull request?
   Take numThreads form root task if explicitly specified
   
   
   ### Why are the changes needed?
   For repl load/dump to specify numThreads as part of With clause
   
   
   ### How was this patch tested?
   Added ut
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 535353)
Remaining Estimate: 0h
Time Spent: 10m

> REPL LOAD command ignores config properties set by WITH clause
> --
>
> Key: HIVE-24339
> URL: https://issues.apache.org/jira/browse/HIVE-24339
> Project: Hive
>  Issue Type: Bug
>Reporter: László Bodor
>Assignee: Ayush Saxena
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> By debug messages we confirmed that REPL LOAD command ignored some config 
> properties when they were provided in WITH clause, e.g.:
> {code}
> REPL LOAD bdpp01pub FROM 
> 'hdfs://prdpdp01//apps/hive/repl/8237c7bd-ba26-4425-8659-3a0d32ab312c' WITH 
> ('mapreduce.job.queuename'='default','hive.exec.parallel'='true','hive.exec.parallel.thread.number'='128',
> ...
> {code}
> We found that it was working on 16 threads, ignoring 
> 'hive.exec.parallel.thread.number'='128'. Setting this property on session 
> level worked.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24632) Replace with null when GenericUDFBaseCompare has a non-interpretable val

2021-01-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24632:
--
Labels: pull-request-available  (was: )

> Replace with null when GenericUDFBaseCompare has a non-interpretable val
> 
>
> Key: HIVE-24632
> URL: https://issues.apache.org/jira/browse/HIVE-24632
> Project: Hive
>  Issue Type: Improvement
>  Components: Parser
>Affects Versions: 4.0.0
>Reporter: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The query
> {code:java}
> create table ccn_table(key int, value string);
> set hive.cbo.enable=false;
> select * from ccn_table where key > '123a'  ;
> {code}
> will scan all records(partitions) compared to older version,  as the plan 
> tells: 
> {noformat}
> STAGE PLANS:
>  Stage: Stage-0
>Fetch Operator
>  limit: -1
>  Processor Tree:
>TableScan
>  alias: ccn_table
>  filterExpr: (key > '123a') (type: boolean)
>  Statistics: Num rows: 2 Data size: 180 Basic stats: COMPLETE Column 
> stats: COMPLETE
>  GatherStats: false
>  Filter Operator
>isSamplingPred: false
>predicate: (key > '123a') (type: boolean)
>Statistics: Num rows: 1 Data size: 90 Basic stats: COMPLETE Column 
> stats: COMPLETE
>Select Operator
>  expressions: key (type: int), value (type: string)
>  outputColumnNames: _col0, _col1
>  Statistics: Num rows: 1 Data size: 90 Basic stats: COMPLETE 
> Column stats: COMPLETE
>  ListSink{noformat}
> When the TypeCheckProcFactory#getXpathOrFuncExprNodeDesc validates the expr: 
> +key > '123a',+  the operator(>) is not an equal operator(=),  so the factory 
> returns +key > '123a'+ as it is.  However all the subclass of 
> GenericUDFBaseCompare(except GenericUDFOPEqualNS and GenericUDFOPNotEqualNS) 
> would return null if either side of the function children is null,  so it's 
> safe to return constant null when processing the expr +`key > '123a'`+.  This 
> will  benifit some queries when the cbo is disabled.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24632) Replace with null when GenericUDFBaseCompare has a non-interpretable val

2021-01-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24632?focusedWorklogId=535340=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-535340
 ]

ASF GitHub Bot logged work on HIVE-24632:
-

Author: ASF GitHub Bot
Created on: 13/Jan/21 12:19
Start Date: 13/Jan/21 12:19
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 opened a new pull request #1863:
URL: https://github.com/apache/hive/pull/1863


   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   Added tests
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 535340)
Remaining Estimate: 0h
Time Spent: 10m

> Replace with null when GenericUDFBaseCompare has a non-interpretable val
> 
>
> Key: HIVE-24632
> URL: https://issues.apache.org/jira/browse/HIVE-24632
> Project: Hive
>  Issue Type: Improvement
>  Components: Parser
>Affects Versions: 4.0.0
>Reporter: Zhihua Deng
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The query
> {code:java}
> create table ccn_table(key int, value string);
> set hive.cbo.enable=false;
> select * from ccn_table where key > '123a'  ;
> {code}
> will scan all records(partitions) compared to older version,  as the plan 
> tells: 
> {noformat}
> STAGE PLANS:
>  Stage: Stage-0
>Fetch Operator
>  limit: -1
>  Processor Tree:
>TableScan
>  alias: ccn_table
>  filterExpr: (key > '123a') (type: boolean)
>  Statistics: Num rows: 2 Data size: 180 Basic stats: COMPLETE Column 
> stats: COMPLETE
>  GatherStats: false
>  Filter Operator
>isSamplingPred: false
>predicate: (key > '123a') (type: boolean)
>Statistics: Num rows: 1 Data size: 90 Basic stats: COMPLETE Column 
> stats: COMPLETE
>Select Operator
>  expressions: key (type: int), value (type: string)
>  outputColumnNames: _col0, _col1
>  Statistics: Num rows: 1 Data size: 90 Basic stats: COMPLETE 
> Column stats: COMPLETE
>  ListSink{noformat}
> When the TypeCheckProcFactory#getXpathOrFuncExprNodeDesc validates the expr: 
> +key > '123a',+  the operator(>) is not an equal operator(=),  so the factory 
> returns +key > '123a'+ as it is.  However all the subclass of 
> GenericUDFBaseCompare(except GenericUDFOPEqualNS and GenericUDFOPNotEqualNS) 
> would return null if either side of the function children is null,  so it's 
> safe to return constant null when processing the expr +`key > '123a'`+.  This 
> will  benifit some queries when the cbo is disabled.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24611) Remove unnecessary parameter from AbstractAlterTableOperation

2021-01-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24611?focusedWorklogId=535334=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-535334
 ]

ASF GitHub Bot logged work on HIVE-24611:
-

Author: ASF GitHub Bot
Created on: 13/Jan/21 12:15
Start Date: 13/Jan/21 12:15
Worklog Time Spent: 10m 
  Work Description: miklosgergely merged pull request #1846:
URL: https://github.com/apache/hive/pull/1846


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 535334)
Time Spent: 20m  (was: 10m)

> Remove unnecessary parameter from AbstractAlterTableOperation
> -
>
> Key: HIVE-24611
> URL: https://issues.apache.org/jira/browse/HIVE-24611
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24611) Remove unnecessary parameter from AbstractAlterTableOperation

2021-01-13 Thread Miklos Gergely (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Gergely resolved HIVE-24611.
---
Resolution: Fixed

Merged to master, thank you [~kkasa]

> Remove unnecessary parameter from AbstractAlterTableOperation
> -
>
> Key: HIVE-24611
> URL: https://issues.apache.org/jira/browse/HIVE-24611
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24632) Replace with null when GenericUDFBaseCompare has a non-interpretable val

2021-01-13 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-24632:
---
Description: 
The query
{code:java}
create table ccn_table(key int, value string);
set hive.cbo.enable=false;
select * from ccn_table where key > '123a'  ;
{code}
will scan all records(partitions) compared to older version,  as the plan 
tells: 
{noformat}
STAGE PLANS:
 Stage: Stage-0
   Fetch Operator
 limit: -1
 Processor Tree:
   TableScan
 alias: ccn_table
 filterExpr: (key > '123a') (type: boolean)
 Statistics: Num rows: 2 Data size: 180 Basic stats: COMPLETE Column 
stats: COMPLETE
 GatherStats: false
 Filter Operator
   isSamplingPred: false
   predicate: (key > '123a') (type: boolean)
   Statistics: Num rows: 1 Data size: 90 Basic stats: COMPLETE Column 
stats: COMPLETE
   Select Operator
 expressions: key (type: int), value (type: string)
 outputColumnNames: _col0, _col1
 Statistics: Num rows: 1 Data size: 90 Basic stats: COMPLETE Column 
stats: COMPLETE
 ListSink{noformat}
When the TypeCheckProcFactory#getXpathOrFuncExprNodeDesc validates the expr: 
+key > '123a',+  the operator(>) is not an equal operator(=),  so the factory 
returns +key > '123a'+ as it is.  However all the subclass of 
GenericUDFBaseCompare(except GenericUDFOPEqualNS and GenericUDFOPNotEqualNS) 
would return null if either side of the function children is null,  so it's 
safe to return constant null when processing the expr +`key > '123a'`+.  This 
will  benifit some queries when the cbo is disabled.

  was:
The query

 
{code:java}
create table ccn_table(key int, value string);
set hive.cbo.enable=false;
select * from ccn_table where key > '123a'  ;
{code}
 

will scan all records(partitions) compared to older version,  as the plan 
tells: 

 
{noformat}
STAGE PLANS:
 Stage: Stage-0
   Fetch Operator
 limit: -1
 Processor Tree:
   TableScan
 alias: ccn_table
 filterExpr: (key > '123a') (type: boolean)
 Statistics: Num rows: 2 Data size: 180 Basic stats: COMPLETE Column 
stats: COMPLETE
 GatherStats: false
 Filter Operator
   isSamplingPred: false
   predicate: (key > '123a') (type: boolean)
   Statistics: Num rows: 1 Data size: 90 Basic stats: COMPLETE Column 
stats: COMPLETE
   Select Operator
 expressions: key (type: int), value (type: string)
 outputColumnNames: _col0, _col1
 Statistics: Num rows: 1 Data size: 90 Basic stats: COMPLETE Column 
stats: COMPLETE
 ListSink{noformat}
 

 

 

 

 

 

When the TypeCheckProcFactory#getXpathOrFuncExprNodeDesc validates the expr: 
+key > '123a',+  the operator(>) is not an equal operator(=),  so the factory 
returns +key > '123a'+ as it is.  However all the subclass of 
GenericUDFBaseCompare(except GenericUDFOPEqualNS and GenericUDFOPNotEqualNS) 
would return null if either side of the function children is null,  so it's 
safe to return constant null when processing the expr +`key > '123a'`+.  This 
will  benifit some queries when the cbo is disabled.


> Replace with null when GenericUDFBaseCompare has a non-interpretable val
> 
>
> Key: HIVE-24632
> URL: https://issues.apache.org/jira/browse/HIVE-24632
> Project: Hive
>  Issue Type: Improvement
>  Components: Parser
>Affects Versions: 4.0.0
>Reporter: Zhihua Deng
>Priority: Major
>
> The query
> {code:java}
> create table ccn_table(key int, value string);
> set hive.cbo.enable=false;
> select * from ccn_table where key > '123a'  ;
> {code}
> will scan all records(partitions) compared to older version,  as the plan 
> tells: 
> {noformat}
> STAGE PLANS:
>  Stage: Stage-0
>Fetch Operator
>  limit: -1
>  Processor Tree:
>TableScan
>  alias: ccn_table
>  filterExpr: (key > '123a') (type: boolean)
>  Statistics: Num rows: 2 Data size: 180 Basic stats: COMPLETE Column 
> stats: COMPLETE
>  GatherStats: false
>  Filter Operator
>isSamplingPred: false
>predicate: (key > '123a') (type: boolean)
>Statistics: Num rows: 1 Data size: 90 Basic stats: COMPLETE Column 
> stats: COMPLETE
>Select Operator
>  expressions: key (type: int), value (type: string)
>  outputColumnNames: _col0, _col1
>  Statistics: Num rows: 1 Data size: 90 Basic stats: COMPLETE 
> Column stats: COMPLETE
>  ListSink{noformat}
> When the TypeCheckProcFactory#getXpathOrFuncExprNodeDesc validates the expr: 
> +key > '123a',+  the operator(>) is not an equal operator(=),  so the factory 
> returns +key > '123a'+ 

[jira] [Commented] (HIVE-24590) Operation Logging still leaks the log4j Appenders

2021-01-13 Thread Eugene Chung (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264100#comment-17264100
 ] 

Eugene Chung commented on HIVE-24590:
-

[~zabetak] Okay. Let me try.

> Operation Logging still leaks the log4j Appenders
> -
>
> Key: HIVE-24590
> URL: https://issues.apache.org/jira/browse/HIVE-24590
> Project: Hive
>  Issue Type: Bug
>  Components: Logging
>Reporter: Eugene Chung
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Attachments: Screen Shot 2021-01-06 at 18.42.05.png, Screen Shot 
> 2021-01-06 at 18.42.24.png, Screen Shot 2021-01-06 at 18.42.55.png, Screen 
> Shot 2021-01-06 at 21.38.32.png, Screen Shot 2021-01-06 at 21.47.28.png, 
> Screen Shot 2021-01-08 at 21.01.40.png, add_debug_log_and_trace.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> I'm using Hive 3.1.2 with options below.
>  * hive.server2.logging.operation.enabled=true
>  * hive.server2.logging.operation.level=VERBOSE
>  * hive.async.log.enabled=false
> I already know the ticket, https://issues.apache.org/jira/browse/HIVE-17128 
> but HS2 still leaks log4j RandomAccessFileManager.
> !Screen Shot 2021-01-06 at 18.42.05.png|width=756,height=197!
> I checked the operation log file which is not closed/deleted properly.
> !Screen Shot 2021-01-06 at 18.42.24.png|width=603,height=272!
> Then there's the log,
> {code:java}
> client.TezClient: Shutting down Tez Session, sessionName= {code}
> !Screen Shot 2021-01-06 at 18.42.55.png|width=1372,height=26!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-14165) Remove Hive file listing during split computation

2021-01-13 Thread Peter Varga (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-14165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Varga reassigned HIVE-14165:
--

Assignee: Peter Varga

> Remove Hive file listing during split computation
> -
>
> Key: HIVE-14165
> URL: https://issues.apache.org/jira/browse/HIVE-14165
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.1.0
>Reporter: Abdullah Yousufi
>Assignee: Peter Varga
>Priority: Major
> Attachments: HIVE-14165.02.patch, HIVE-14165.03.patch, 
> HIVE-14165.04.patch, HIVE-14165.05.patch, HIVE-14165.06.patch, 
> HIVE-14165.07.patch, HIVE-14165.patch
>
>
> The Hive side listing in FetchOperator.java is unnecessary, since Hadoop's 
> FileInputFormat.java will list the files during split computation anyway to 
> determine their size. One way to remove this is to catch the 
> InvalidInputFormat exception thrown by FileInputFormat#getSplits() on the 
> Hive side instead of doing the file listing beforehand.
> For S3 select queries on partitioned tables, this results in a 2x speedup.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24630) clean up multiple parseDelta implementation in AcidUtils

2021-01-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24630?focusedWorklogId=535298=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-535298
 ]

ASF GitHub Bot logged work on HIVE-24630:
-

Author: ASF GitHub Bot
Created on: 13/Jan/21 10:48
Start Date: 13/Jan/21 10:48
Worklog Time Spent: 10m 
  Work Description: pvargacl opened a new pull request #1862:
URL: https://github.com/apache/hive/pull/1862


   
   
   ### What changes were proposed in this pull request?
   Remove multiple parsedDelta implementation in AcidUtils:
   
   - Remove code duplication
   - Use ParsedDeltaLight everywhere where rawformat is not used, because 
parsing that is cheaper
   
   ### Why are the changes needed?
   code quality
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   Previous unit tests
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 535298)
Remaining Estimate: 0h
Time Spent: 10m

> clean up multiple parseDelta implementation in AcidUtils
> 
>
> Key: HIVE-24630
> URL: https://issues.apache.org/jira/browse/HIVE-24630
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> * Remove code duplication
> * Use ParsedDeltaLight everywhere where rawformat is not used, because 
> parsing that is cheaper



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24630) clean up multiple parseDelta implementation in AcidUtils

2021-01-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24630:
--
Labels: pull-request-available  (was: )

> clean up multiple parseDelta implementation in AcidUtils
> 
>
> Key: HIVE-24630
> URL: https://issues.apache.org/jira/browse/HIVE-24630
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> * Remove code duplication
> * Use ParsedDeltaLight everywhere where rawformat is not used, because 
> parsing that is cheaper



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24624) Repl Load should detect the compatible staging dir

2021-01-13 Thread Pratyushotpal Madhukar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pratyushotpal Madhukar updated HIVE-24624:
--
Attachment: HIVE-24624.patch

> Repl Load should detect the compatible staging dir
> --
>
> Key: HIVE-24624
> URL: https://issues.apache.org/jira/browse/HIVE-24624
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pratyushotpal Madhukar
>Assignee: Pratyushotpal Madhukar
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24624.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Repl load in CDP when pointed to a staging dir should be able to detect 
> whether the staging dir has the dump structure in compatible format or not



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24630) clean up multiple parseDelta implementation in AcidUtils

2021-01-13 Thread Peter Varga (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Varga reassigned HIVE-24630:
--


> clean up multiple parseDelta implementation in AcidUtils
> 
>
> Key: HIVE-24630
> URL: https://issues.apache.org/jira/browse/HIVE-24630
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Minor
>
> * Remove code duplication
> * Use ParsedDeltaLight everywhere where rawformat is not used, because 
> parsing that is cheaper



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24615) Remove unnecessary FileSystem listing from Initiator

2021-01-13 Thread Peter Varga (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Varga resolved HIVE-24615.

Fix Version/s: 4.0.0
   Resolution: Fixed

> Remove unnecessary FileSystem listing from Initiator 
> -
>
> Key: HIVE-24615
> URL: https://issues.apache.org/jira/browse/HIVE-24615
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> AcidUtils already returns the file list in base and delta directories if it 
> does recursive listing on S3, listing those directories can be removed from 
> the Initiator



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24615) Remove unnecessary FileSystem listing from Initiator

2021-01-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24615?focusedWorklogId=535263=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-535263
 ]

ASF GitHub Bot logged work on HIVE-24615:
-

Author: ASF GitHub Bot
Created on: 13/Jan/21 09:25
Start Date: 13/Jan/21 09:25
Worklog Time Spent: 10m 
  Work Description: lcspinter merged pull request #1848:
URL: https://github.com/apache/hive/pull/1848


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 535263)
Time Spent: 0.5h  (was: 20m)

> Remove unnecessary FileSystem listing from Initiator 
> -
>
> Key: HIVE-24615
> URL: https://issues.apache.org/jira/browse/HIVE-24615
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> AcidUtils already returns the file list in base and delta directories if it 
> does recursive listing on S3, listing those directories can be removed from 
> the Initiator



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24615) Remove unnecessary FileSystem listing from Initiator

2021-01-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24615?focusedWorklogId=535264=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-535264
 ]

ASF GitHub Bot logged work on HIVE-24615:
-

Author: ASF GitHub Bot
Created on: 13/Jan/21 09:25
Start Date: 13/Jan/21 09:25
Worklog Time Spent: 10m 
  Work Description: lcspinter commented on pull request #1848:
URL: https://github.com/apache/hive/pull/1848#issuecomment-759321154


   Thanks for the patch @pvargacl! Merged it into master.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 535264)
Time Spent: 40m  (was: 0.5h)

> Remove unnecessary FileSystem listing from Initiator 
> -
>
> Key: HIVE-24615
> URL: https://issues.apache.org/jira/browse/HIVE-24615
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> AcidUtils already returns the file list in base and delta directories if it 
> does recursive listing on S3, listing those directories can be removed from 
> the Initiator



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24339) REPL LOAD command ignores config properties set by WITH clause

2021-01-13 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena reassigned HIVE-24339:
---

Assignee: Ayush Saxena

> REPL LOAD command ignores config properties set by WITH clause
> --
>
> Key: HIVE-24339
> URL: https://issues.apache.org/jira/browse/HIVE-24339
> Project: Hive
>  Issue Type: Bug
>Reporter: László Bodor
>Assignee: Ayush Saxena
>Priority: Major
>
> By debug messages we confirmed that REPL LOAD command ignored some config 
> properties when they were provided in WITH clause, e.g.:
> {code}
> REPL LOAD bdpp01pub FROM 
> 'hdfs://prdpdp01//apps/hive/repl/8237c7bd-ba26-4425-8659-3a0d32ab312c' WITH 
> ('mapreduce.job.queuename'='default','hive.exec.parallel'='true','hive.exec.parallel.thread.number'='128',
> ...
> {code}
> We found that it was working on 16 threads, ignoring 
> 'hive.exec.parallel.thread.number'='128'. Setting this property on session 
> level worked.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24584) IndexOutOfBoundsException from Kryo when running msck repair

2021-01-13 Thread Attila Magyar (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17263996#comment-17263996
 ] 

Attila Magyar commented on HIVE-24584:
--

Hi [~srahman],

Thanks for the input. My understanding is that PartitionExpressionForMetastore 
is the default value of "metastore.expression.proxy" (In 
HiveConf.java/MetaStoreConf.java).

Msck attempts to override this by creating a HiveMetaStoreClient with a 
modified config object. However unless HS2 and HMS are running inside the same 
process (or Msck is called within HMS via the periodically running 
PartitionManagementTask) this doesn't work.

In case of a remote HMS, Msck should have called msc.setMetaConf() or something 
that modifies the config via thrift.

> IndexOutOfBoundsException from Kryo when running msck repair
> 
>
> Key: HIVE-24584
> URL: https://issues.apache.org/jira/browse/HIVE-24584
> Project: Hive
>  Issue Type: Bug
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The following exception is coming when running "msck repair table t1 sync 
> partitions".
> {code:java}
> java.lang.IndexOutOfBoundsException: Index: 97, Size: 0
> at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_232]
> at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_232]
> at 
> org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:834)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:684) 
> ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:814)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:116)
>  [hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.filterPartitionsByExpr(PartitionExpressionForMetastore.java:88)
>  [hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24515) Analyze table job can be skipped when stats populated are already accurate

2021-01-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24515?focusedWorklogId=535257=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-535257
 ]

ASF GitHub Bot logged work on HIVE-24515:
-

Author: ASF GitHub Bot
Created on: 13/Jan/21 09:03
Start Date: 13/Jan/21 09:03
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1834:
URL: https://github.com/apache/hive/pull/1834#discussion_r556362110



##
File path: ql/src/java/org/apache/hadoop/hive/ql/stats/ColStatsProcessor.java
##
@@ -204,6 +206,54 @@ public int persistColumnStats(Hive db, Table tbl) throws 
HiveException, MetaExce
   public void setDpPartSpecs(Collection dpPartSpecs) {
   }
 
+  public static boolean canSkipStatsGeneration(String dbName, String tblName, 
String partName,
+   long statsWriteId, String 
queryValidWriteIdList) {
+if (queryValidWriteIdList != null) { // Can be null if its not an ACID 
table.
+  ValidWriteIdList validWriteIdList = new 
ValidReaderWriteIdList(queryValidWriteIdList);
+  // Just check if the write ID is valid. If it's valid (i.e. we are 
allowed to see it),
+  // that means it cannot possibly be a concurrent write. As stats 
optimization is enabled
+  // only in case auto gather is enabled. Thus the stats must be updated 
by a valid committed
+  // transaction and stats generation can be skipped.
+  if (validWriteIdList.isWriteIdValid(statsWriteId)) {
+try {
+  IMetaStoreClient msc = Hive.get().getMSC();
+  TxnState state = msc.findStatStatusByWriteId(dbName, tblName, 
partName, statsWriteId);

Review comment:
   can't we just check here if there a newer commited writeId for 
table/partition and if yes - stats recompute is needed?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 535257)
Time Spent: 1h  (was: 50m)

> Analyze table job can be skipped when stats populated are already accurate
> --
>
> Key: HIVE-24515
> URL: https://issues.apache.org/jira/browse/HIVE-24515
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> For non-partitioned tables, stats detail should be present in table level,
> e.g
> {noformat}
> COLUMN_STATS_ACCURATE={"BASIC_STATS":"true","COLUMN_STATS":{"d_current_day":"true"...
>  }}
>   {noformat}
> For partitioned tables, stats detail should be present in partition level,
> {noformat}
> store_sales(ss_sold_date_sk=2451819)
> {totalSize=0, numRows=0, rawDataSize=0, 
> COLUMN_STATS_ACCURATE={"BASIC_STATS":"true","COLUMN_STATS":{"ss_addr_sk":"true"}}
>  
>  {noformat}
> When stats populated are already accurate, {{analyze table tn compute 
> statistics for columns}} should skip launching the job.
>  
> For ACID tables, stats are auto computed and it can skip computing stats 
> again when stats are accurate.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24623) Wrong FS error during dump for table-level replication when staging is remote.

2021-01-13 Thread Arko Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arko Sharma updated HIVE-24623:
---
Attachment: HIVE-24623.01.patch

> Wrong FS error during dump for table-level replication when staging is remote.
> --
>
> Key: HIVE-24623
> URL: https://issues.apache.org/jira/browse/HIVE-24623
> Project: Hive
>  Issue Type: Bug
>Reporter: Arko Sharma
>Assignee: Arko Sharma
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24623.01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24597) Replication with timestamp type partition failing in HA case with same NS

2021-01-13 Thread Arko Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arko Sharma updated HIVE-24597:
---
Attachment: HIVE-24597.01.patch

> Replication with timestamp type partition failing in HA case with same NS
> -
>
> Key: HIVE-24597
> URL: https://issues.apache.org/jira/browse/HIVE-24597
> Project: Hive
>  Issue Type: Bug
>Reporter: Arko Sharma
>Assignee: Arko Sharma
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24597.01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-16352) Ability to skip or repair out of sync blocks with HIVE at runtime

2021-01-13 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-16352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17263987#comment-17263987
 ] 

Zoltan Haindrich commented on HIVE-16352:
-

Is it really not avoidable to write things correctly
Are these incorrect files created when the writer is being shut down 
incorrectly  - or hive is possibly reading an incomplete file?
Although I think the best would be to have a writer which could give better 
consistency guarantees - I'm not against this change: because it's small and is 
off by default.
Any strong opinion against merging it?

> Ability to skip or repair out of sync blocks with HIVE at runtime
> -
>
> Key: HIVE-16352
> URL: https://issues.apache.org/jira/browse/HIVE-16352
> Project: Hive
>  Issue Type: New Feature
>  Components: Avro, File Formats, Reader
>Affects Versions: 3.1.2
>Reporter: Navdeep Poonia
>Assignee: gabrywu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> When a file is corrupted it raises the error java.io.IOException: Invalid 
> sync! with hive.
>  Can we have some functionality to skip or repair such blocks at runtime to 
> make avro more error resilient in case of data corruption.
>  Error: java.io.IOException: java.io.IOException: java.io.IOException: While 
> processing file 
> s3n:///navdeepp/warehouse/avro_test/354dc34474404f4bbc0d8013fc8e6e4b_42.
>  java.io.IOException: Invalid sync!
>  at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
>  at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
>  at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:334)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-24584) IndexOutOfBoundsException from Kryo when running msck repair

2021-01-13 Thread Syed Shameerur Rahman (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17263983#comment-17263983
 ] 

Syed Shameerur Rahman edited comment on HIVE-24584 at 1/13/21, 8:43 AM:


[~amagyar] - As per my understanding all msck command flow are defaulted to use 
MsckPartitionExpressionProxy unless EXPRESSION_PROXY_CLASS is explicitly given.
So is there any reason for explicitly setting EXPRESSION_PROXY_CLASS or am i 
missing anything? Because the above mentioned problem doesn't arise if you use 
the default path.


{code:java}
public static Configuration getMsckConf(Configuration conf) {
// the only reason we are using new conf here is to override 
EXPRESSION_PROXY_CLASS
Configuration metastoreConf = MetastoreConf.newMetastoreConf(new 
Configuration(conf));

metastoreConf.set(MetastoreConf.ConfVars.EXPRESSION_PROXY_CLASS.getVarname(),

metastoreConf.get(MetastoreConf.ConfVars.EXPRESSION_PROXY_CLASS.getVarname(),
MsckPartitionExpressionProxy.class.getCanonicalName()));
return metastoreConf;
  }
{code}



was (Author: srahman):
[~amagyar] - As per my understanding all msck command flow are defaulted to use 
MsckPartitionExpressionProxy unless EXPRESSION_PROXY_CLASS is explicitly given.
So is there any reason for explicitly setting EXPRESSION_PROXY_CLASS or am i 
missing anything? Because the above mentioned problem doesn't arise if you use 
the default path.

> IndexOutOfBoundsException from Kryo when running msck repair
> 
>
> Key: HIVE-24584
> URL: https://issues.apache.org/jira/browse/HIVE-24584
> Project: Hive
>  Issue Type: Bug
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The following exception is coming when running "msck repair table t1 sync 
> partitions".
> {code:java}
> java.lang.IndexOutOfBoundsException: Index: 97, Size: 0
> at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_232]
> at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_232]
> at 
> org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:834)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:684) 
> ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:814)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:116)
>  [hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.filterPartitionsByExpr(PartitionExpressionForMetastore.java:88)
>  [hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24523) Vectorized read path for LazySimpleSerde does not honor the SERDEPROPERTIES for timestamp

2021-01-13 Thread Denys Kuzmenko (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17263984#comment-17263984
 ] 

Denys Kuzmenko commented on HIVE-24523:
---

Merged to master.
Thank you for the patch, [~nareshpr]! 

> Vectorized read path for LazySimpleSerde does not honor the SERDEPROPERTIES 
> for timestamp
> -
>
> Key: HIVE-24523
> URL: https://issues.apache.org/jira/browse/HIVE-24523
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 3.2.0, 4.0.0
>Reporter: Rajkumar Singh
>Assignee: Naresh P R
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Steps to repro:
> {code:java}
>   create external  table tstable(date_created timestamp)   ROW FORMAT SERDE 
> 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'   WITH SERDEPROPERTIES ( 
>  'timestamp.formats'='MMddHHmmss') stored as textfile;
> cat sampledata 
> 2020120517
> hdfs dfs -put sampledata /warehouse/tablespace/external/hive/tstable
> {code}
> disable fetch task conversion and run select * from tstable which produce no 
> results, disabling the set 
> hive.vectorized.use.vector.serde.deserialize=false; return the expected 
> output.
> while parsing the string to timestamp 
> https://github.com/apache/hive/blob/master/serde/src/java/org/apache/hadoop/hive/serde2/lazy/fast/LazySimpleDeserializeRead.java#L812
>  does not set the DateTimeFormatter which results IllegalArgumentException 
> while parsing the timestamp through TimestampUtils.stringToTimestamp(strValue)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24523) Vectorized read path for LazySimpleSerde does not honor the SERDEPROPERTIES for timestamp

2021-01-13 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko resolved HIVE-24523.
---
Resolution: Fixed

> Vectorized read path for LazySimpleSerde does not honor the SERDEPROPERTIES 
> for timestamp
> -
>
> Key: HIVE-24523
> URL: https://issues.apache.org/jira/browse/HIVE-24523
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 3.2.0, 4.0.0
>Reporter: Rajkumar Singh
>Assignee: Naresh P R
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Steps to repro:
> {code:java}
>   create external  table tstable(date_created timestamp)   ROW FORMAT SERDE 
> 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'   WITH SERDEPROPERTIES ( 
>  'timestamp.formats'='MMddHHmmss') stored as textfile;
> cat sampledata 
> 2020120517
> hdfs dfs -put sampledata /warehouse/tablespace/external/hive/tstable
> {code}
> disable fetch task conversion and run select * from tstable which produce no 
> results, disabling the set 
> hive.vectorized.use.vector.serde.deserialize=false; return the expected 
> output.
> while parsing the string to timestamp 
> https://github.com/apache/hive/blob/master/serde/src/java/org/apache/hadoop/hive/serde2/lazy/fast/LazySimpleDeserializeRead.java#L812
>  does not set the DateTimeFormatter which results IllegalArgumentException 
> while parsing the timestamp through TimestampUtils.stringToTimestamp(strValue)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24584) IndexOutOfBoundsException from Kryo when running msck repair

2021-01-13 Thread Syed Shameerur Rahman (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17263983#comment-17263983
 ] 

Syed Shameerur Rahman commented on HIVE-24584:
--

[~amagyar] - As per my understanding all msck command flow are defaulted to use 
MsckPartitionExpressionProxy unless EXPRESSION_PROXY_CLASS is explicitly given.
So is there any reason for explicitly setting EXPRESSION_PROXY_CLASS or am i 
missing anything? Because the above mentioned problem doesn't arise if you use 
the default path.

> IndexOutOfBoundsException from Kryo when running msck repair
> 
>
> Key: HIVE-24584
> URL: https://issues.apache.org/jira/browse/HIVE-24584
> Project: Hive
>  Issue Type: Bug
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The following exception is coming when running "msck repair table t1 sync 
> partitions".
> {code:java}
> java.lang.IndexOutOfBoundsException: Index: 97, Size: 0
> at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_232]
> at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_232]
> at 
> org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:834)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:684) 
> ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:814)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:116)
>  [hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.filterPartitionsByExpr(PartitionExpressionForMetastore.java:88)
>  [hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24629) Invoke optional output committer in TezProcessor

2021-01-13 Thread Marton Bod (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod reassigned HIVE-24629:
-


> Invoke optional output committer in TezProcessor
> 
>
> Key: HIVE-24629
> URL: https://issues.apache.org/jira/browse/HIVE-24629
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>
> In order to enable Hive to write to Iceberg tables, we need to use an output 
> committer which will fire at the end of each Tez task execution (commitTask) 
> and the after the execution of each vertex (commitOutput/commitJob). This 
> output committer will issue a commit containing the written-out data files to 
> the Iceberg table, replacing its previous snapshot pointer with a new one.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24523) Vectorized read path for LazySimpleSerde does not honor the SERDEPROPERTIES for timestamp

2021-01-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24523?focusedWorklogId=535252=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-535252
 ]

ASF GitHub Bot logged work on HIVE-24523:
-

Author: ASF GitHub Bot
Created on: 13/Jan/21 08:33
Start Date: 13/Jan/21 08:33
Worklog Time Spent: 10m 
  Work Description: deniskuzZ merged pull request #1825:
URL: https://github.com/apache/hive/pull/1825


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 535252)
Time Spent: 1h  (was: 50m)

> Vectorized read path for LazySimpleSerde does not honor the SERDEPROPERTIES 
> for timestamp
> -
>
> Key: HIVE-24523
> URL: https://issues.apache.org/jira/browse/HIVE-24523
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 3.2.0, 4.0.0
>Reporter: Rajkumar Singh
>Assignee: Naresh P R
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Steps to repro:
> {code:java}
>   create external  table tstable(date_created timestamp)   ROW FORMAT SERDE 
> 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'   WITH SERDEPROPERTIES ( 
>  'timestamp.formats'='MMddHHmmss') stored as textfile;
> cat sampledata 
> 2020120517
> hdfs dfs -put sampledata /warehouse/tablespace/external/hive/tstable
> {code}
> disable fetch task conversion and run select * from tstable which produce no 
> results, disabling the set 
> hive.vectorized.use.vector.serde.deserialize=false; return the expected 
> output.
> while parsing the string to timestamp 
> https://github.com/apache/hive/blob/master/serde/src/java/org/apache/hadoop/hive/serde2/lazy/fast/LazySimpleDeserializeRead.java#L812
>  does not set the DateTimeFormatter which results IllegalArgumentException 
> while parsing the timestamp through TimestampUtils.stringToTimestamp(strValue)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24203) Implement stats annotation rule for the LateralViewJoinOperator

2021-01-13 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17263969#comment-17263969
 ] 

Zoltan Haindrich commented on HIVE-24203:
-

merged into master. Thank you [~okumin]!

> Implement stats annotation rule for the LateralViewJoinOperator
> ---
>
> Key: HIVE-24203
> URL: https://issues.apache.org/jira/browse/HIVE-24203
> Project: Hive
>  Issue Type: Improvement
>  Components: Physical Optimizer
>Affects Versions: 2.3.7, 3.1.2, 4.0.0
>Reporter: okumin
>Assignee: okumin
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> StatsRulesProcFactory doesn't have any rules to handle a JOIN by LATERAL VIEW.
> This can cause an underestimation in case that UDTF in LATERAL VIEW generates 
> multiple rows.
> HIVE-20262 has already added the rule for UDTF.
> This issue would add the rule for LateralViewJoinOperator.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24203) Implement stats annotation rule for the LateralViewJoinOperator

2021-01-13 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich resolved HIVE-24203.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

> Implement stats annotation rule for the LateralViewJoinOperator
> ---
>
> Key: HIVE-24203
> URL: https://issues.apache.org/jira/browse/HIVE-24203
> Project: Hive
>  Issue Type: Improvement
>  Components: Physical Optimizer
>Affects Versions: 2.3.7, 3.1.2, 4.0.0
>Reporter: okumin
>Assignee: okumin
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> StatsRulesProcFactory doesn't have any rules to handle a JOIN by LATERAL VIEW.
> This can cause an underestimation in case that UDTF in LATERAL VIEW generates 
> multiple rows.
> HIVE-20262 has already added the rule for UDTF.
> This issue would add the rule for LateralViewJoinOperator.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24203) Implement stats annotation rule for the LateralViewJoinOperator

2021-01-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24203?focusedWorklogId=535241=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-535241
 ]

ASF GitHub Bot logged work on HIVE-24203:
-

Author: ASF GitHub Bot
Created on: 13/Jan/21 08:06
Start Date: 13/Jan/21 08:06
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk merged pull request #1531:
URL: https://github.com/apache/hive/pull/1531


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 535241)
Time Spent: 3h 40m  (was: 3.5h)

> Implement stats annotation rule for the LateralViewJoinOperator
> ---
>
> Key: HIVE-24203
> URL: https://issues.apache.org/jira/browse/HIVE-24203
> Project: Hive
>  Issue Type: Improvement
>  Components: Physical Optimizer
>Affects Versions: 2.3.7, 3.1.2, 4.0.0
>Reporter: okumin
>Assignee: okumin
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> StatsRulesProcFactory doesn't have any rules to handle a JOIN by LATERAL VIEW.
> This can cause an underestimation in case that UDTF in LATERAL VIEW generates 
> multiple rows.
> HIVE-20262 has already added the rule for UDTF.
> This issue would add the rule for LateralViewJoinOperator.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24278) Implement an UDF for throwing exception in arbitrary vertex

2021-01-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24278?focusedWorklogId=535239=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-535239
 ]

ASF GitHub Bot logged work on HIVE-24278:
-

Author: ASF GitHub Bot
Created on: 13/Jan/21 08:02
Start Date: 13/Jan/21 08:02
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on a change in pull request #1817:
URL: https://github.com/apache/hive/pull/1817#discussion_r556326695



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFExceptionInVertex.java
##
@@ -0,0 +1,156 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.udf.generic;
+
+import java.util.Arrays;
+import java.util.List;
+import java.util.stream.Collectors;
+
+import org.apache.hadoop.hive.ql.exec.Description;
+import org.apache.hadoop.hive.ql.exec.MapredContext;
+import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
+import org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException;
+import org.apache.hadoop.hive.ql.exec.tez.TezProcessor;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
+import 
org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
+import 
org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableConstantIntObjectInspector;
+import 
org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableConstantStringObjectInspector;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * This class implements the UDF which can throw an exception in arbitrary 
vertex (typically mapper)
+ * / task / task attempt. For throwing exception in reducer side, where most 
probably
+ * GroupByOperator codepath applies, GenericUDAFExceptionInVertex is used.
+ */
+@Description(name = "exception_in_vertex_udf", value = "_FUNC_(vertexName, 
taskNumberExpression, taskAttemptNumberExpression)")
+public class GenericUDFExceptionInVertex extends GenericUDF {
+  private static final Logger LOG = 
LoggerFactory.getLogger(GenericUDFExceptionInVertex.class);
+
+  private String vertexName;
+  private String taskNumberExpr;
+  private String taskAttemptNumberExpr;
+  private String currentVertexName;
+  private int currentTaskNumber;
+  private int currentTaskAttemptNumber;
+  private boolean alreadyCheckedAndPassed;
+
+  @Override
+  public ObjectInspector initialize(ObjectInspector[] parameters) throws 
UDFArgumentException {
+if (parameters.length < 2) {
+  throw new UDFArgumentTypeException(-1,
+  "At least two argument is expected (fake column ref, vertex name)");
+}
+
+this.vertexName = getVertexName(parameters, 1);
+this.taskNumberExpr = getTaskNumber(parameters, 2);
+this.taskAttemptNumberExpr = getTaskAttemptNumber(parameters, 3);

Review comment:
   I know it will be mostly just us using this - but it would be helpfull 
to document the accepted format (and probably throw an exception if something 
else is passed)
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 535239)
Time Spent: 20m  (was: 10m)

> Implement an UDF for throwing exception in arbitrary vertex
> ---
>
> Key: HIVE-24278
> URL: https://issues.apache.org/jira/browse/HIVE-24278
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> For testing purposes sometimes we need to make the query fail in a vertex, so 
> assuming that we already know the plan, it could be something like:
>