[jira] [Work logged] (HIVE-24350) NullScanTaskDispatcher should use stats

2020-11-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24350?focusedWorklogId=511651=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-511651
 ]

ASF GitHub Bot logged work on HIVE-24350:
-

Author: ASF GitHub Bot
Created on: 14/Nov/20 02:06
Start Date: 14/Nov/20 02:06
Worklog Time Spent: 10m 
  Work Description: mustafaiman closed pull request #1645:
URL: https://github.com/apache/hive/pull/1645


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 511651)
Time Spent: 1h 20m  (was: 1h 10m)

> NullScanTaskDispatcher should use stats
> ---
>
> Key: HIVE-24350
> URL: https://issues.apache.org/jira/browse/HIVE-24350
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Mustafa İman
>Assignee: Mustafa İman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> NullScanTaskDispatcher manually checks each partition directory to see if 
> they are empty. While this is necessary in external tables, we can just use 
> stats for managed tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24350) NullScanTaskDispatcher should use stats

2020-11-13 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-24350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mustafa İman resolved HIVE-24350.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

> NullScanTaskDispatcher should use stats
> ---
>
> Key: HIVE-24350
> URL: https://issues.apache.org/jira/browse/HIVE-24350
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Mustafa İman
>Assignee: Mustafa İman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> NullScanTaskDispatcher manually checks each partition directory to see if 
> they are empty. While this is necessary in external tables, we can just use 
> stats for managed tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24350) NullScanTaskDispatcher should use stats

2020-11-13 Thread Jira


[ 
https://issues.apache.org/jira/browse/HIVE-24350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17231911#comment-17231911
 ] 

Mustafa İman commented on HIVE-24350:
-

merged to master. Thanks [~rajesh.balamohan] for review

> NullScanTaskDispatcher should use stats
> ---
>
> Key: HIVE-24350
> URL: https://issues.apache.org/jira/browse/HIVE-24350
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Mustafa İman
>Assignee: Mustafa İman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> NullScanTaskDispatcher manually checks each partition directory to see if 
> they are empty. While this is necessary in external tables, we can just use 
> stats for managed tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24075) Optimise KeyValuesInputMerger

2020-11-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24075?focusedWorklogId=511634=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-511634
 ]

ASF GitHub Bot logged work on HIVE-24075:
-

Author: ASF GitHub Bot
Created on: 14/Nov/20 00:38
Start Date: 14/Nov/20 00:38
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #1463:
URL: https://github.com/apache/hive/pull/1463


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 511634)
Time Spent: 0.5h  (was: 20m)

> Optimise KeyValuesInputMerger
> -
>
> Key: HIVE-24075
> URL: https://issues.apache.org/jira/browse/HIVE-24075
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Comparisons in KeyValueInputMerger can be reduced.
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/tools/KeyValuesInputMerger.java#L165|https://github.infra.cloudera.com/CDH/hive/blob/cdpd-master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/tools/KeyValuesInputMerger.java#L165]
> [https://github.infra.cloudera.com/CDH/hive/blob/cdpd-master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/tools/KeyValuesInputMerger.java#L150|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/tools/KeyValuesInputMerger.java#L150]
> If the reader comparisons in the queue are same, we could reuse 
> "{{nextKVReaders}}" in next subsequent iteration instead of doing the 
> comparison all over again.
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/tools/KeyValuesInputMerger.java#L178]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24324) Remove deprecated API usage from Avro

2020-11-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24324?focusedWorklogId=511622=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-511622
 ]

ASF GitHub Bot logged work on HIVE-24324:
-

Author: ASF GitHub Bot
Created on: 13/Nov/20 23:37
Start Date: 13/Nov/20 23:37
Worklog Time Spent: 10m 
  Work Description: sunchao commented on pull request #1672:
URL: https://github.com/apache/hive/pull/1672#issuecomment-727085688


   cc @szehon-ho @aihuaxu could you review this? thanks.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 511622)
Time Spent: 40m  (was: 0.5h)

> Remove deprecated API usage from Avro
> -
>
> Key: HIVE-24324
> URL: https://issues.apache.org/jira/browse/HIVE-24324
> Project: Hive
>  Issue Type: Improvement
>  Components: Avro
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> {{JsonProperties#getJsonProp}} has been marked as deprecated in Avro 1.8 and 
> removed since Avro 1.9. This replaces the API usage for this with 
> {{getObjectProp}} which doesn't leak Json node from jackson. This will help 
> downstream apps to depend on Hive while using higher version of Avro, and 
> also help Hive to upgrade Avro version itself.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24381) Compressed text input returns 0 rows if skip header/footer is mentioned

2020-11-13 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez resolved HIVE-24381.

Resolution: Fixed

Pushed to master, thanks [~nareshpr]!

> Compressed text input returns 0 rows if skip header/footer is mentioned
> ---
>
> Key: HIVE-24381
> URL: https://issues.apache.org/jira/browse/HIVE-24381
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>  Labels: pull-request-available
> Attachments: test.q
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Attached q file returns 0 rows with hive.fetch.task.conversion=none, instead 
> correct result is 2 rows.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24381) Compressed text input returns 0 rows if skip header/footer is mentioned

2020-11-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24381?focusedWorklogId=511612=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-511612
 ]

ASF GitHub Bot logged work on HIVE-24381:
-

Author: ASF GitHub Bot
Created on: 13/Nov/20 22:54
Start Date: 13/Nov/20 22:54
Worklog Time Spent: 10m 
  Work Description: jcamachor merged pull request #1671:
URL: https://github.com/apache/hive/pull/1671


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 511612)
Time Spent: 20m  (was: 10m)

> Compressed text input returns 0 rows if skip header/footer is mentioned
> ---
>
> Key: HIVE-24381
> URL: https://issues.apache.org/jira/browse/HIVE-24381
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>  Labels: pull-request-available
> Attachments: test.q
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Attached q file returns 0 rows with hive.fetch.task.conversion=none, instead 
> correct result is 2 rows.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24381) Compressed text input returns 0 rows if skip header/footer is mentioned

2020-11-13 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-24381:
---
Fix Version/s: 4.0.0

> Compressed text input returns 0 rows if skip header/footer is mentioned
> ---
>
> Key: HIVE-24381
> URL: https://issues.apache.org/jira/browse/HIVE-24381
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: test.q
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Attached q file returns 0 rows with hive.fetch.task.conversion=none, instead 
> correct result is 2 rows.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24381) Compressed text input returns 0 rows if skip header/footer is mentioned

2020-11-13 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-24381:
---
Summary: Compressed text input returns 0 rows if skip header/footer is 
mentioned  (was: compressed text input returns 0 rows if skip header/footer is 
mentioned.)

> Compressed text input returns 0 rows if skip header/footer is mentioned
> ---
>
> Key: HIVE-24381
> URL: https://issues.apache.org/jira/browse/HIVE-24381
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>  Labels: pull-request-available
> Attachments: test.q
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Attached q file returns 0 rows with hive.fetch.task.conversion=none, instead 
> correct result is 2 rows.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24387) Metastore access through JDBC handler does not use correct database accessor

2020-11-13 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-24387:
---
Status: Patch Available  (was: Open)

> Metastore access through JDBC handler does not use correct database accessor
> 
>
> Key: HIVE-24387
> URL: https://issues.apache.org/jira/browse/HIVE-24387
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC storage handler
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> There is some differences in the SQL syntax for each RDBMS generated by the 
> database accessor. For metastore, we always end up with the default accessor, 
> which lead to errors, e.g., when a limit query is executed for a 
> Postgres-backed metastore.
> {code}
> Error: java.io.IOException: java.io.IOException: 
> org.apache.hive.storage.jdbc.exception.HiveJdbcDatabaseAccessException: Error 
> while trying to get column names: ERROR: syntax error at or near "{"
> Position: 200 (state=,code=0)
> SELECT "TBL_COLUMN_GRANT_ID", "COLUMN_NAME", "CREATE_TIME", "GRANT_OPTION", 
> "GRANTOR", "GRANTOR_TYPE", "PRINCIPAL_NAME", "PRINCIPAL_TYPE", 
> "TBL_COL_PRIV", "TBL_ID", "AUTHORIZER" FROM "TBL_COL_PRIVS"
> {LIMIT 1}
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24387) Metastore access through JDBC handler does not use correct database accessor

2020-11-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24387:
--
Labels: pull-request-available  (was: )

> Metastore access through JDBC handler does not use correct database accessor
> 
>
> Key: HIVE-24387
> URL: https://issues.apache.org/jira/browse/HIVE-24387
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC storage handler
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> There is some differences in the SQL syntax for each RDBMS generated by the 
> database accessor. For metastore, we always end up with the default accessor, 
> which lead to errors, e.g., when a limit query is executed for a 
> Postgres-backed metastore.
> {code}
> Error: java.io.IOException: java.io.IOException: 
> org.apache.hive.storage.jdbc.exception.HiveJdbcDatabaseAccessException: Error 
> while trying to get column names: ERROR: syntax error at or near "{"
> Position: 200 (state=,code=0)
> SELECT "TBL_COLUMN_GRANT_ID", "COLUMN_NAME", "CREATE_TIME", "GRANT_OPTION", 
> "GRANTOR", "GRANTOR_TYPE", "PRINCIPAL_NAME", "PRINCIPAL_TYPE", 
> "TBL_COL_PRIV", "TBL_ID", "AUTHORIZER" FROM "TBL_COL_PRIVS"
> {LIMIT 1}
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24387) Metastore access through JDBC handler does not use correct database accessor

2020-11-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24387?focusedWorklogId=511587=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-511587
 ]

ASF GitHub Bot logged work on HIVE-24387:
-

Author: ASF GitHub Bot
Created on: 13/Nov/20 22:11
Start Date: 13/Nov/20 22:11
Worklog Time Spent: 10m 
  Work Description: jcamachor opened a new pull request #1673:
URL: https://github.com/apache/hive/pull/1673


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 511587)
Remaining Estimate: 0h
Time Spent: 10m

> Metastore access through JDBC handler does not use correct database accessor
> 
>
> Key: HIVE-24387
> URL: https://issues.apache.org/jira/browse/HIVE-24387
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC storage handler
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> There is some differences in the SQL syntax for each RDBMS generated by the 
> database accessor. For metastore, we always end up with the default accessor, 
> which lead to errors, e.g., when a limit query is executed for a 
> Postgres-backed metastore.
> {code}
> Error: java.io.IOException: java.io.IOException: 
> org.apache.hive.storage.jdbc.exception.HiveJdbcDatabaseAccessException: Error 
> while trying to get column names: ERROR: syntax error at or near "{"
> Position: 200 (state=,code=0)
> SELECT "TBL_COLUMN_GRANT_ID", "COLUMN_NAME", "CREATE_TIME", "GRANT_OPTION", 
> "GRANTOR", "GRANTOR_TYPE", "PRINCIPAL_NAME", "PRINCIPAL_TYPE", 
> "TBL_COL_PRIV", "TBL_ID", "AUTHORIZER" FROM "TBL_COL_PRIVS"
> {LIMIT 1}
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24387) Metastore access through JDBC handler does not use correct database accessor

2020-11-13 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez reassigned HIVE-24387:
--


> Metastore access through JDBC handler does not use correct database accessor
> 
>
> Key: HIVE-24387
> URL: https://issues.apache.org/jira/browse/HIVE-24387
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC storage handler
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>
> There is some differences in the SQL syntax for each RDBMS generated by the 
> database accessor. For metastore, we always end up with the default accessor, 
> which lead to errors, e.g., when a limit query is executed for a 
> Postgres-backed metastore.
> {code}
> Error: java.io.IOException: java.io.IOException: 
> org.apache.hive.storage.jdbc.exception.HiveJdbcDatabaseAccessException: Error 
> while trying to get column names: ERROR: syntax error at or near "{"
> Position: 200 (state=,code=0)
> SELECT "TBL_COLUMN_GRANT_ID", "COLUMN_NAME", "CREATE_TIME", "GRANT_OPTION", 
> "GRANTOR", "GRANTOR_TYPE", "PRINCIPAL_NAME", "PRINCIPAL_TYPE", 
> "TBL_COL_PRIV", "TBL_ID", "AUTHORIZER" FROM "TBL_COL_PRIVS"
> {LIMIT 1}
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24324) Remove deprecated API usage from Avro

2020-11-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24324?focusedWorklogId=511489=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-511489
 ]

ASF GitHub Bot logged work on HIVE-24324:
-

Author: ASF GitHub Bot
Created on: 13/Nov/20 18:57
Start Date: 13/Nov/20 18:57
Worklog Time Spent: 10m 
  Work Description: sunchao opened a new pull request #1672:
URL: https://github.com/apache/hive/pull/1672


   
   
   ### What changes were proposed in this pull request?
   
   
   This backports #1621 to branch-2.3.
   
   This mainly replace `JsonProperties.getJsonProp` with 
`JsonProperties.getObjectProp`. 
   
   Note that there's one place in `SchemaToTypeInfo` where we explicitly call 
`getIntValue` to forbid string as precision/scale values (see 
[HIVE-7174](https://issues.apache.org/jira/browse/HIVE-7174)). To retain the 
old behavior, we check if the returned object is integer type, and if not, 
return a default 0 following `JsonNode` implementation.
   
   ### Why are the changes needed?
   
   
   `JsonProperties#getJsonProp` has been marked as deprecated in Avro 1.8 and 
removed since Avro 1.9. This replaces the API usage for this with 
`getObjectProp` which doesn't leak Json node from jackson. This will help 
downstream apps to depend on Hive while using higher version of Avro, and also 
help Hive to upgrade Avro version itself.
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   No
   
   ### How was this patch tested?
   
   
   Existing tests.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 511489)
Time Spent: 0.5h  (was: 20m)

> Remove deprecated API usage from Avro
> -
>
> Key: HIVE-24324
> URL: https://issues.apache.org/jira/browse/HIVE-24324
> Project: Hive
>  Issue Type: Improvement
>  Components: Avro
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> {{JsonProperties#getJsonProp}} has been marked as deprecated in Avro 1.8 and 
> removed since Avro 1.9. This replaces the API usage for this with 
> {{getObjectProp}} which doesn't leak Json node from jackson. This will help 
> downstream apps to depend on Hive while using higher version of Avro, and 
> also help Hive to upgrade Avro version itself.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24379) Backport HIVE-19662 to branch-2.3

2020-11-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24379?focusedWorklogId=511488=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-511488
 ]

ASF GitHub Bot logged work on HIVE-24379:
-

Author: ASF GitHub Bot
Created on: 13/Nov/20 18:52
Start Date: 13/Nov/20 18:52
Worklog Time Spent: 10m 
  Work Description: sunchao merged pull request #1669:
URL: https://github.com/apache/hive/pull/1669


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 511488)
Time Spent: 0.5h  (was: 20m)

> Backport HIVE-19662 to branch-2.3
> -
>
> Key: HIVE-24379
> URL: https://issues.apache.org/jira/browse/HIVE-24379
> Project: Hive
>  Issue Type: Improvement
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> In order to backport HIVE-24324 to branch-2.3 to remove deprecated Avro API 
> so that downstream applications who depend on Hive 2.3.x can upgrade their 
> Avro version, we'll need to first backport HIVE-19662 and bump Avro version 
> in branch-2.3 first as it is currently on 1.7.x, while HIVE-24324 requires 
> Avro 1.8.x.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24379) Backport HIVE-19662 to branch-2.3

2020-11-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24379?focusedWorklogId=511487=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-511487
 ]

ASF GitHub Bot logged work on HIVE-24379:
-

Author: ASF GitHub Bot
Created on: 13/Nov/20 18:50
Start Date: 13/Nov/20 18:50
Worklog Time Spent: 10m 
  Work Description: sunchao commented on pull request #1669:
URL: https://github.com/apache/hive/pull/1669#issuecomment-726966704


   Thanks @szehon-ho and @aihuaxu ! merging to branch-2.3 now, also cc @iemejia



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 511487)
Time Spent: 20m  (was: 10m)

> Backport HIVE-19662 to branch-2.3
> -
>
> Key: HIVE-24379
> URL: https://issues.apache.org/jira/browse/HIVE-24379
> Project: Hive
>  Issue Type: Improvement
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In order to backport HIVE-24324 to branch-2.3 to remove deprecated Avro API 
> so that downstream applications who depend on Hive 2.3.x can upgrade their 
> Avro version, we'll need to first backport HIVE-19662 and bump Avro version 
> in branch-2.3 first as it is currently on 1.7.x, while HIVE-24324 requires 
> Avro 1.8.x.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24051) Hive lineage information exposed in ExecuteWithHookContext

2020-11-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24051?focusedWorklogId=511477=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-511477
 ]

ASF GitHub Bot logged work on HIVE-24051:
-

Author: ASF GitHub Bot
Created on: 13/Nov/20 18:39
Start Date: 13/Nov/20 18:39
Worklog Time Spent: 10m 
  Work Description: szehon-ho opened a new pull request #1413:
URL: https://github.com/apache/hive/pull/1413


   The lineage information is not populated unless certain hooks are enabled.
   
   However, this is a bit fragile, and no way for another hook that we write to 
get this information. This proposes a flag to enable this instead.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 511477)
Time Spent: 40m  (was: 0.5h)

> Hive lineage information exposed in ExecuteWithHookContext
> --
>
> Key: HIVE-24051
> URL: https://issues.apache.org/jira/browse/HIVE-24051
> Project: Hive
>  Issue Type: Improvement
>  Components: Configuration
>Reporter: Szehon Ho
>Assignee: Szehon Ho
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24051.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The lineage information is not populated unless certain hooks are enabled.
> However, this is a bit fragile, and no way for another hook that we write to 
> get this information.  This proposes a flag to enable this instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24179) Memory leak in HS2 DbTxnManager when compiling SHOW LOCKS statement

2020-11-13 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez resolved HIVE-24179.

Resolution: Fixed

Pushed to master, thanks [~zabetak]!

> Memory leak in HS2 DbTxnManager when compiling SHOW LOCKS statement
> ---
>
> Key: HIVE-24179
> URL: https://issues.apache.org/jira/browse/HIVE-24179
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: summary.png
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> The problem can be reproduced by executing repeatedly a SHOW LOCK statement 
> and monitoring the heap memory of HS2. For a small heap (e.g., 2g) it only 
> takes a few minutes before the server crashes with OutOfMemory error such as 
> the one shown below.
> {noformat}
> java.lang.OutOfMemoryError: GC overhead limit exceeded
> at java.util.Arrays.copyOf(Arrays.java:3332)
> at 
> java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)
> at 
> java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:448)
> at java.lang.StringBuilder.append(StringBuilder.java:136)
> at 
> org.apache.maven.surefire.booter.ForkedChannelEncoder.encodeMessage(ForkedChannelEncoder.j
> at 
> org.apache.maven.surefire.booter.ForkedChannelEncoder.setOutErr(ForkedChannelEncoder.java:
> at 
> org.apache.maven.surefire.booter.ForkedChannelEncoder.stdErr(ForkedChannelEncoder.java:166
> at 
> org.apache.maven.surefire.booter.ForkingRunListener.writeTestOutput(ForkingRunListener.jav
> at 
> org.apache.maven.surefire.report.ConsoleOutputCapture$ForwardingPrintStream.write(ConsoleO
> at 
> org.apache.logging.log4j.core.util.CloseShieldOutputStream.write(CloseShieldOutputStream.j
> at 
> org.apache.logging.log4j.core.appender.OutputStreamManager.writeToDestination(OutputStream
> at 
> org.apache.logging.log4j.core.appender.OutputStreamManager.flushBuffer(OutputStreamManager
> at 
> org.apache.logging.log4j.core.appender.OutputStreamManager.flush(OutputStreamManager.java:
> at 
> org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.directEncodeEvent(Abst
> at 
> org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.tryAppend(AbstractOutp
> at 
> org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.append(AbstractOutputS
> at 
> org.apache.logging.log4j.core.config.AppenderControl.tryCallAppender(AppenderControl.java:
> at 
> org.apache.logging.log4j.core.config.AppenderControl.callAppender0(AppenderControl.java:12
> at 
> org.apache.logging.log4j.core.config.AppenderControl.callAppenderPreventRecursion(Appender
> at 
> org.apache.logging.log4j.core.config.AppenderControl.callAppender(AppenderControl.java:84)
> at 
> org.apache.logging.log4j.core.config.LoggerConfig.callAppenders(LoggerConfig.java:543)
> at 
> org.apache.logging.log4j.core.config.LoggerConfig.processLogEvent(LoggerConfig.java:502)
> at 
> org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:485)
> at 
> org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:460)
> at 
> org.apache.logging.log4j.core.config.AwaitCompletionReliabilityStrategy.log(AwaitCompletio
> at org.apache.logging.log4j.core.Logger.log(Logger.java:162)
> at 
> org.apache.logging.log4j.spi.AbstractLogger.tryLogMessage(AbstractLogger.java:2190)
> at 
> org.apache.logging.log4j.spi.AbstractLogger.logMessageTrackRecursion(AbstractLogger.java:2
> at 
> org.apache.logging.log4j.spi.AbstractLogger.logMessageSafely(AbstractLogger.java:2127)
> at 
> org.apache.logging.log4j.spi.AbstractLogger.logMessage(AbstractLogger.java:2008)
> at 
> org.apache.logging.log4j.spi.AbstractLogger.logIfEnabled(AbstractLogger.java:1867)
> at org.apache.logging.slf4j.Log4jLogger.info(Log4jLogger.java:179)
> {noformat}
> The heap dump shows (summary.png) that most of the memory is consumed by 
> {{Hashtable$Entry}} and {{ConcurrentHashMap$Node}} objects coming from Hive 
> configurations referenced by {{DbTxnManager}}. 
> The latter are not eligible for garbage collection since at 
> [construction|https://github.com/apache/hive/blob/975c832b6d069559c5b406a4aa8def3180fe4e75/ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java#L212]
>  time they are passed implicitly in a callback  stored inside 
> ShutdownHookManager.  
> When the {{DbTxnManager}} is closed properly 

[jira] [Work logged] (HIVE-24179) Memory leak in HS2 DbTxnManager when compiling SHOW LOCKS statement

2020-11-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24179?focusedWorklogId=511446=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-511446
 ]

ASF GitHub Bot logged work on HIVE-24179:
-

Author: ASF GitHub Bot
Created on: 13/Nov/20 17:24
Start Date: 13/Nov/20 17:24
Worklog Time Spent: 10m 
  Work Description: jcamachor merged pull request #1509:
URL: https://github.com/apache/hive/pull/1509


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 511446)
Time Spent: 2h 10m  (was: 2h)

> Memory leak in HS2 DbTxnManager when compiling SHOW LOCKS statement
> ---
>
> Key: HIVE-24179
> URL: https://issues.apache.org/jira/browse/HIVE-24179
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: summary.png
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> The problem can be reproduced by executing repeatedly a SHOW LOCK statement 
> and monitoring the heap memory of HS2. For a small heap (e.g., 2g) it only 
> takes a few minutes before the server crashes with OutOfMemory error such as 
> the one shown below.
> {noformat}
> java.lang.OutOfMemoryError: GC overhead limit exceeded
> at java.util.Arrays.copyOf(Arrays.java:3332)
> at 
> java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)
> at 
> java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:448)
> at java.lang.StringBuilder.append(StringBuilder.java:136)
> at 
> org.apache.maven.surefire.booter.ForkedChannelEncoder.encodeMessage(ForkedChannelEncoder.j
> at 
> org.apache.maven.surefire.booter.ForkedChannelEncoder.setOutErr(ForkedChannelEncoder.java:
> at 
> org.apache.maven.surefire.booter.ForkedChannelEncoder.stdErr(ForkedChannelEncoder.java:166
> at 
> org.apache.maven.surefire.booter.ForkingRunListener.writeTestOutput(ForkingRunListener.jav
> at 
> org.apache.maven.surefire.report.ConsoleOutputCapture$ForwardingPrintStream.write(ConsoleO
> at 
> org.apache.logging.log4j.core.util.CloseShieldOutputStream.write(CloseShieldOutputStream.j
> at 
> org.apache.logging.log4j.core.appender.OutputStreamManager.writeToDestination(OutputStream
> at 
> org.apache.logging.log4j.core.appender.OutputStreamManager.flushBuffer(OutputStreamManager
> at 
> org.apache.logging.log4j.core.appender.OutputStreamManager.flush(OutputStreamManager.java:
> at 
> org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.directEncodeEvent(Abst
> at 
> org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.tryAppend(AbstractOutp
> at 
> org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.append(AbstractOutputS
> at 
> org.apache.logging.log4j.core.config.AppenderControl.tryCallAppender(AppenderControl.java:
> at 
> org.apache.logging.log4j.core.config.AppenderControl.callAppender0(AppenderControl.java:12
> at 
> org.apache.logging.log4j.core.config.AppenderControl.callAppenderPreventRecursion(Appender
> at 
> org.apache.logging.log4j.core.config.AppenderControl.callAppender(AppenderControl.java:84)
> at 
> org.apache.logging.log4j.core.config.LoggerConfig.callAppenders(LoggerConfig.java:543)
> at 
> org.apache.logging.log4j.core.config.LoggerConfig.processLogEvent(LoggerConfig.java:502)
> at 
> org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:485)
> at 
> org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:460)
> at 
> org.apache.logging.log4j.core.config.AwaitCompletionReliabilityStrategy.log(AwaitCompletio
> at org.apache.logging.log4j.core.Logger.log(Logger.java:162)
> at 
> org.apache.logging.log4j.spi.AbstractLogger.tryLogMessage(AbstractLogger.java:2190)
> at 
> org.apache.logging.log4j.spi.AbstractLogger.logMessageTrackRecursion(AbstractLogger.java:2
> at 
> org.apache.logging.log4j.spi.AbstractLogger.logMessageSafely(AbstractLogger.java:2127)
> at 
> org.apache.logging.log4j.spi.AbstractLogger.logMessage(AbstractLogger.java:2008)
> at 
> 

[jira] [Updated] (HIVE-24370) Make the GetPartitionsProjectionSpec generic and add builder methods for tables and partitions in HiveMetaStoreClient

2020-11-13 Thread Narayanan Venkateswaran (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Narayanan Venkateswaran updated HIVE-24370:
---
Parent: HIVE-24369
Issue Type: Sub-task  (was: Task)

> Make the GetPartitionsProjectionSpec generic and add builder methods for 
> tables and partitions in HiveMetaStoreClient
> -
>
> Key: HIVE-24370
> URL: https://issues.apache.org/jira/browse/HIVE-24370
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Narayanan Venkateswaran
>Assignee: Narayanan Venkateswaran
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> HIVE-20306 defines a projection struct called GetPartitionsProjectionSpec 
> While the name has Partition in its name, this is a fairly generic struct 
> with nothing specific to partitions. This should be renamed to a more generic 
> name (GetProjectionSpec ?) and builder methods of this class for tables and 
> partitions must be added to HiveMetaStoreClient.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24386) Add builder methods for GetTablesRequest and GetPartitionsRequest to HiveMetaStoreClient

2020-11-13 Thread Narayanan Venkateswaran (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Narayanan Venkateswaran reassigned HIVE-24386:
--


> Add builder methods for GetTablesRequest and GetPartitionsRequest to 
> HiveMetaStoreClient
> 
>
> Key: HIVE-24386
> URL: https://issues.apache.org/jira/browse/HIVE-24386
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Narayanan Venkateswaran
>Assignee: Narayanan Venkateswaran
>Priority: Minor
>
> Builder methods for GetTablesRequest and GetPartitionsRequest should be added 
> to the HiveMetaStoreClient class.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24353) performance: Refactor TimestampTZ parsing

2020-11-13 Thread Panagiotis Garefalakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Panagiotis Garefalakis reassigned HIVE-24353:
-

Assignee: Vincenz Priesnitz

> performance: Refactor TimestampTZ parsing
> -
>
> Key: HIVE-24353
> URL: https://issues.apache.org/jira/browse/HIVE-24353
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vincenz Priesnitz
>Assignee: Vincenz Priesnitz
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I found that for datasets that contain a lot of timestamps (without 
> timezones) hive spends the majority of time in TimestampTZUtil.parse, in 
> particular constructing stractraces for the try-catch blocks. 
> When parsing TimestampTZ we are currently using a fallback chain with several 
> try-catch blocks. For a common timestamp string without a timezone, we 
> currently throw and catch 2 exceptions, and actually parse the string twice. 
> I propose a refactor, that parses the string once and then expresses the 
> fallback chain with queries to the parsed TemporalAccessor. 
>  
> Update: I added a PR that resolves this issue: 
> [https://github.com/apache/hive/pull/1650] 
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24353) performance: Refactor TimestampTZ parsing

2020-11-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24353?focusedWorklogId=511359=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-511359
 ]

ASF GitHub Bot logged work on HIVE-24353:
-

Author: ASF GitHub Bot
Created on: 13/Nov/20 13:32
Start Date: 13/Nov/20 13:32
Worklog Time Spent: 10m 
  Work Description: pgaref commented on a change in pull request #1650:
URL: https://github.com/apache/hive/pull/1650#discussion_r522950225



##
File path: 
common/src/test/org/apache/hadoop/hive/common/type/TestTimestampTZ.java
##
@@ -84,6 +86,18 @@ public void testVariations() {
 TimestampTZUtil.parse("2017-05-08 07:45:00-3:00");
   }
 
+  @Test
+  public void testPerformance() {
+for (int i = 0; i < 100; i++) {

Review comment:
   First loop redundant?

##
File path: 
common/src/test/org/apache/hadoop/hive/common/type/TestTimestampTZ.java
##
@@ -84,6 +86,18 @@ public void testVariations() {
 TimestampTZUtil.parse("2017-05-08 07:45:00-3:00");
   }
 
+  @Test
+  public void testPerformance() {
+for (int i = 0; i < 100; i++) {
+  TimestampTZUtil.parse("2017-01-01 13:33:00", ZoneId.of("UTC"));
+}
+Stopwatch sw = Stopwatch.createStarted();
+for (int i = 0; i < 1; i++) {
+  TimestampTZUtil.parse("2017-01-01 13:33:00", ZoneId.of("UTC"));

Review comment:
   Can we add some randomness on the parsed timestamp? Maybe use i as part 
of time?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 511359)
Remaining Estimate: 0h
Time Spent: 10m

> performance: Refactor TimestampTZ parsing
> -
>
> Key: HIVE-24353
> URL: https://issues.apache.org/jira/browse/HIVE-24353
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vincenz Priesnitz
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I found that for datasets that contain a lot of timestamps (without 
> timezones) hive spends the majority of time in TimestampTZUtil.parse, in 
> particular constructing stractraces for the try-catch blocks. 
> When parsing TimestampTZ we are currently using a fallback chain with several 
> try-catch blocks. For a common timestamp string without a timezone, we 
> currently throw and catch 2 exceptions, and actually parse the string twice. 
> I propose a refactor, that parses the string once and then expresses the 
> fallback chain with queries to the parsed TemporalAccessor. 
>  
> Update: I added a PR that resolves this issue: 
> [https://github.com/apache/hive/pull/1650] 
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24353) performance: Refactor TimestampTZ parsing

2020-11-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24353:
--
Labels: pull-request-available  (was: )

> performance: Refactor TimestampTZ parsing
> -
>
> Key: HIVE-24353
> URL: https://issues.apache.org/jira/browse/HIVE-24353
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vincenz Priesnitz
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I found that for datasets that contain a lot of timestamps (without 
> timezones) hive spends the majority of time in TimestampTZUtil.parse, in 
> particular constructing stractraces for the try-catch blocks. 
> When parsing TimestampTZ we are currently using a fallback chain with several 
> try-catch blocks. For a common timestamp string without a timezone, we 
> currently throw and catch 2 exceptions, and actually parse the string twice. 
> I propose a refactor, that parses the string once and then expresses the 
> fallback chain with queries to the parsed TemporalAccessor. 
>  
> Update: I added a PR that resolves this issue: 
> [https://github.com/apache/hive/pull/1650] 
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24224) Fix skipping header/footer for Hive on Tez on compressed files

2020-11-13 Thread Panagiotis Garefalakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Panagiotis Garefalakis updated HIVE-24224:
--
Parent: HIVE-22769
Issue Type: Sub-task  (was: Bug)

> Fix skipping header/footer for Hive on Tez on compressed files
> --
>
> Key: HIVE-24224
> URL: https://issues.apache.org/jira/browse/HIVE-24224
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Compressed file with Hive on Tez  returns header and footers - for both 
> select * and select count ( * ):
> {noformat}
> printf "offset,id,other\n9,\"20200315 X00 1356\",123\n17,\"20200315 X00 
> 1357\",123\nrst,rst,rst" > data.csv
> hdfs dfs -put -f data.csv /apps/hive/warehouse/bz2test/bz2tbl1/
> bzip2 -f data.csv 
> hdfs dfs -put -f data.csv.bz2 /apps/hive/warehouse/bz2test/bz2tbl2/
> beeline -e "CREATE EXTERNAL TABLE default.bz2tst2 (
>   sequence   int,
>   id string,
>   other  string) 
> ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
> LOCATION '/apps/hive/warehouse/bz2test/bz2tbl2' 
> TBLPROPERTIES (
>   'skip.header.line.count'='1',
>   'skip.footer.line.count'='1');"
> beeline -e "
>   SET hive.fetch.task.conversion = none;
>   SELECT * FROM default.bz2tst2;"
> +---+++
> | bz2tst2.sequence  | bz2tst2.id | bz2tst2.other  |
> +---+++
> | offset| id | other  |
> | 9 | 20200315 X00 1356  | 123|
> | 17| 20200315 X00 1357  | 123|
> | rst   | rst| rst|
> +---+++
> {noformat}
> PS: HIVE-22769 addressed the issue for Hive on LLAP.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24381) compressed text input returns 0 rows if skip header/footer is mentioned.

2020-11-13 Thread Panagiotis Garefalakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Panagiotis Garefalakis updated HIVE-24381:
--
Parent: HIVE-22769
Issue Type: Sub-task  (was: Bug)

> compressed text input returns 0 rows if skip header/footer is mentioned.
> 
>
> Key: HIVE-24381
> URL: https://issues.apache.org/jira/browse/HIVE-24381
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>  Labels: pull-request-available
> Attachments: test.q
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Attached q file returns 0 rows with hive.fetch.task.conversion=none, instead 
> correct result is 2 rows.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24341) Sweep phase for proactive cache eviction

2020-11-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24341?focusedWorklogId=511323=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-511323
 ]

ASF GitHub Bot logged work on HIVE-24341:
-

Author: ASF GitHub Bot
Created on: 13/Nov/20 12:03
Start Date: 13/Nov/20 12:03
Worklog Time Spent: 10m 
  Work Description: szlta commented on a change in pull request #1665:
URL: https://github.com/apache/hive/pull/1665#discussion_r522907947



##
File path: 
llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapIoImpl.java
##
@@ -151,8 +151,10 @@ private LlapIoImpl(Configuration conf) throws IOException {
   LowLevelCachePolicy
   realCachePolicy =
   useLrfu ? new LowLevelLrfuCachePolicy(minAllocSize, totalMemorySize, 
conf) : new LowLevelFifoCachePolicy();
-  // TODO: if realCachePolicy is not something that supports proactive 
caching
-  // turn the feature off (LLAP_IO_PROACTIVE_EVICTION_ENABLED) and log it
+  if (!(realCachePolicy instanceof ProactiveEvictingCachePolicy.Impl)) {

Review comment:
   It would be very cumbersome to check this on HS2 side due to potentially 
different config files being used for HS2 and LLAP side. Also these services 
might get restarted independently from each other with different config.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 511323)
Time Spent: 1.5h  (was: 1h 20m)

> Sweep phase for proactive cache eviction
> 
>
> Key: HIVE-24341
> URL: https://issues.apache.org/jira/browse/HIVE-24341
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24341) Sweep phase for proactive cache eviction

2020-11-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24341?focusedWorklogId=511322=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-511322
 ]

ASF GitHub Bot logged work on HIVE-24341:
-

Author: ASF GitHub Bot
Created on: 13/Nov/20 12:02
Start Date: 13/Nov/20 12:02
Worklog Time Spent: 10m 
  Work Description: szlta commented on a change in pull request #1665:
URL: https://github.com/apache/hive/pull/1665#discussion_r522907503



##
File path: 
llap-server/src/java/org/apache/hadoop/hive/llap/cache/EvictionListener.java
##
@@ -20,4 +20,5 @@
 
 public interface EvictionListener {
   void notifyEvicted(LlapCacheableBuffer buffer);
+  void notifyEvictedBytes(long size);

Review comment:
   Good point, reworked in follow-up commit as discussed offline.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 511322)
Time Spent: 1h 20m  (was: 1h 10m)

> Sweep phase for proactive cache eviction
> 
>
> Key: HIVE-24341
> URL: https://issues.apache.org/jira/browse/HIVE-24341
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24349) Client connection count is not printed correctly in HiveMetastoreClient

2020-11-13 Thread Arko Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arko Sharma updated HIVE-24349:
---
Attachment: HIVE-24349.02.patch

> Client connection count is not printed correctly in HiveMetastoreClient
> ---
>
> Key: HIVE-24349
> URL: https://issues.apache.org/jira/browse/HIVE-24349
> Project: Hive
>  Issue Type: Bug
>Reporter: Arko Sharma
>Assignee: Arko Sharma
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24349.01.patch, HIVE-24349.02.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24378) Leading and trailing spaces are not removed before decimal conversion

2020-11-13 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-24378:
---
Attachment: HIVE-24378-1.patch

> Leading and trailing spaces are not removed before decimal conversion
> -
>
> Key: HIVE-24378
> URL: https://issues.apache.org/jira/browse/HIVE-24378
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24378-1.patch, HIVE-24378.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The decimal conversion is not removing the extra spaces in some scenarios. 
> because of this the numbers are getting converted to null.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24373) Wrong predicate is pushed down for view with constant value projection.

2020-11-13 Thread mahesh kumar behera (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17231395#comment-17231395
 ] 

mahesh kumar behera commented on HIVE-24373:


Committed to master. Thanks [~jcamachorodriguez] for review.

> Wrong predicate is pushed down for view with constant value projection.
> ---
>
> Key: HIVE-24373
> URL: https://issues.apache.org/jira/browse/HIVE-24373
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24373-explain-paln.txt, HIVE-24373.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> For below query the predicate pushed down for one of the table scan is not 
> proper.
>  
> {code:java}
> set hive.explain.user=false;
> set hive.cbo.enable=false;
> set hive.optimize.ppd=true;DROP TABLE arc;
> CREATE table arc(`dt_from` string, `dt_to` string);
> CREATE table loc1(`dt_from` string, `dt_to` string);
> CREATE
>  VIEW view AS
>  SELECT
> '' as DT_FROM,
> uuid() as DT_TO
>  FROM
>loc1
>  UNION ALL
>  SELECT
> dt_from as DT_FROM,
> uuid() as DT_TO
>  FROM
>arc;
> EXPLAIN
> SELECT
>   dt_from, dt_to
> FROM
>   view
> WHERE
>   '2020'  between dt_from and dt_to;
> {code}
>  
> For table loc1,  DT_FROM is projected as '' so the predicate "predicate: 
> '2020' BETWEEN '' AND _col1 (type: boolean)" is proper. But for table 
> arc, the column is projected so the predicate should be "predicate: '2020' 
> BETWEEN _col0 (type: boolean) AND _col1 (type: boolean)".
> This is because the predicates are stored in a map for each expression. Here 
> the expression is "_col0". When the predicate is pushed down the union, the 
> same predicate is used for creating the filter expression. Later when 
> constant replacement is done, the first filter is overwriting the second one.
> So we should create a clone (as done at other places) before using the cached 
> predicate for filter. This way the overwrite can be avoided.   
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24373) Wrong predicate is pushed down for view with constant value projection.

2020-11-13 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera resolved HIVE-24373.

Resolution: Fixed

> Wrong predicate is pushed down for view with constant value projection.
> ---
>
> Key: HIVE-24373
> URL: https://issues.apache.org/jira/browse/HIVE-24373
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24373-explain-paln.txt, HIVE-24373.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> For below query the predicate pushed down for one of the table scan is not 
> proper.
>  
> {code:java}
> set hive.explain.user=false;
> set hive.cbo.enable=false;
> set hive.optimize.ppd=true;DROP TABLE arc;
> CREATE table arc(`dt_from` string, `dt_to` string);
> CREATE table loc1(`dt_from` string, `dt_to` string);
> CREATE
>  VIEW view AS
>  SELECT
> '' as DT_FROM,
> uuid() as DT_TO
>  FROM
>loc1
>  UNION ALL
>  SELECT
> dt_from as DT_FROM,
> uuid() as DT_TO
>  FROM
>arc;
> EXPLAIN
> SELECT
>   dt_from, dt_to
> FROM
>   view
> WHERE
>   '2020'  between dt_from and dt_to;
> {code}
>  
> For table loc1,  DT_FROM is projected as '' so the predicate "predicate: 
> '2020' BETWEEN '' AND _col1 (type: boolean)" is proper. But for table 
> arc, the column is projected so the predicate should be "predicate: '2020' 
> BETWEEN _col0 (type: boolean) AND _col1 (type: boolean)".
> This is because the predicates are stored in a map for each expression. Here 
> the expression is "_col0". When the predicate is pushed down the union, the 
> same predicate is used for creating the filter expression. Later when 
> constant replacement is done, the first filter is overwriting the second one.
> So we should create a clone (as done at other places) before using the cached 
> predicate for filter. This way the overwrite can be avoided.   
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24373) Wrong predicate is pushed down for view with constant value projection.

2020-11-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24373?focusedWorklogId=511312=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-511312
 ]

ASF GitHub Bot logged work on HIVE-24373:
-

Author: ASF GitHub Bot
Created on: 13/Nov/20 11:22
Start Date: 13/Nov/20 11:22
Worklog Time Spent: 10m 
  Work Description: maheshk114 merged pull request #1666:
URL: https://github.com/apache/hive/pull/1666


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 511312)
Time Spent: 20m  (was: 10m)

> Wrong predicate is pushed down for view with constant value projection.
> ---
>
> Key: HIVE-24373
> URL: https://issues.apache.org/jira/browse/HIVE-24373
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24373-explain-paln.txt, HIVE-24373.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> For below query the predicate pushed down for one of the table scan is not 
> proper.
>  
> {code:java}
> set hive.explain.user=false;
> set hive.cbo.enable=false;
> set hive.optimize.ppd=true;DROP TABLE arc;
> CREATE table arc(`dt_from` string, `dt_to` string);
> CREATE table loc1(`dt_from` string, `dt_to` string);
> CREATE
>  VIEW view AS
>  SELECT
> '' as DT_FROM,
> uuid() as DT_TO
>  FROM
>loc1
>  UNION ALL
>  SELECT
> dt_from as DT_FROM,
> uuid() as DT_TO
>  FROM
>arc;
> EXPLAIN
> SELECT
>   dt_from, dt_to
> FROM
>   view
> WHERE
>   '2020'  between dt_from and dt_to;
> {code}
>  
> For table loc1,  DT_FROM is projected as '' so the predicate "predicate: 
> '2020' BETWEEN '' AND _col1 (type: boolean)" is proper. But for table 
> arc, the column is projected so the predicate should be "predicate: '2020' 
> BETWEEN _col0 (type: boolean) AND _col1 (type: boolean)".
> This is because the predicates are stored in a map for each expression. Here 
> the expression is "_col0". When the predicate is pushed down the union, the 
> same predicate is used for creating the filter expression. Later when 
> constant replacement is done, the first filter is overwriting the second one.
> So we should create a clone (as done at other places) before using the cached 
> predicate for filter. This way the overwrite can be avoided.   
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23965) Improve plan regression tests using TPCDS30TB metastore dump and custom configs

2020-11-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23965?focusedWorklogId=511305=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-511305
 ]

ASF GitHub Bot logged work on HIVE-23965:
-

Author: ASF GitHub Bot
Created on: 13/Nov/20 10:44
Start Date: 13/Nov/20 10:44
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on a change in pull request #1347:
URL: https://github.com/apache/hive/pull/1347#discussion_r522870090



##
File path: 
itests/qtest/src/test/java/org/apache/hadoop/hive/cli/TestTezTPCDS30TBPerfCliDriver.java
##
@@ -17,30 +17,32 @@
  */
 package org.apache.hadoop.hive.cli;
 
-import java.io.File;
-import java.util.Comparator;
-import java.util.List;
-
 import org.apache.hadoop.hive.cli.control.CliAdapter;
 import org.apache.hadoop.hive.cli.control.CliConfigs;
+import org.apache.hadoop.hive.cli.control.SplitSupport;
 import org.junit.ClassRule;
 import org.junit.Rule;
 import org.junit.Test;
 import org.junit.rules.TestRule;
 import org.junit.runner.RunWith;
 import org.junit.runners.Parameterized;
 import org.junit.runners.Parameterized.Parameters;
+import org.junit.runners.model.Statement;
+
+import java.io.File;
+import java.util.Comparator;
+import java.util.List;
 
 @RunWith(Parameterized.class)
-public class TestTezPerfCliDriver {
+public class TestTezTPCDS30TBPerfCliDriver {
 
-  static CliAdapter adapter = new 
CliConfigs.TezPerfCliConfig(false).getCliAdapter();
+  static CliAdapter adapter = new 
CliConfigs.TezTPCDS30TBCliConfig().getCliAdapter();
 
   @Parameters(name = "{0}")
   public static List getParameters() throws Exception {
 List parameters = adapter.getParameters();
 parameters.sort(new C1());
-return parameters;
+return SplitSupport.process(parameters, 
TestTezTPCDS30TBPerfCliDriver.class, 10);

Review comment:
   20 minutes is also fine...we have quite a few [replication tests above 
30 minutes right 
now](http://ci.hive.apache.org/job/hive-precommit/job/master/337/testReport/org.apache.hadoop.hive.ql.parse/)
   
   
   note that in its current form it would not work because you need to have a 
separated "N_SPLITS" integer in the class source(like in other clidriver 
classes) for the splitter to work.
   
   I think we can leave this out for now - 10 splits is a bit too many for a 20 
minute test (2 or 3 would fit better)
   in case of this test there is a bit more overhead per execution; because it 
will probably need to download and launch the metastore.
   
   
   
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 511305)
Time Spent: 2h 10m  (was: 2h)

> Improve plan regression tests using TPCDS30TB metastore dump and custom 
> configs
> ---
>
> Key: HIVE-23965
> URL: https://issues.apache.org/jira/browse/HIVE-23965
> Project: Hive
>  Issue Type: Improvement
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> The existing regression tests (HIVE-12586) based on TPC-DS have certain 
> shortcomings:
> The table statistics do not reflect cardinalities from a specific TPC-DS 
> scale factor (SF). Some tables are from a 30TB dataset, others from 200GB 
> dataset, and others from a 3GB dataset. This mix leads to plans that may 
> never appear when using an actual TPC-DS dataset. 
> The existing statistics do not contain information about partitions something 
> that can have a big impact on the resulting plans.
> The existing regression tests rely on more or less on the default 
> configuration (hive-site.xml). In real-life scenarios though some of the 
> configurations differ and may impact the choices of the optimizer.
> This issue aims to address the above shortcomings by using a curated 
> TPCDS30TB metastore dump along with some custom hive configurations. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23965) Improve plan regression tests using TPCDS30TB metastore dump and custom configs

2020-11-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23965?focusedWorklogId=511304=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-511304
 ]

ASF GitHub Bot logged work on HIVE-23965:
-

Author: ASF GitHub Bot
Created on: 13/Nov/20 10:37
Start Date: 13/Nov/20 10:37
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on a change in pull request #1347:
URL: https://github.com/apache/hive/pull/1347#discussion_r522866016



##
File path: 
itests/qtest/src/test/java/org/apache/hadoop/hive/cli/TestTezTPCDS30TBPerfCliDriver.java
##
@@ -56,12 +58,22 @@ public int compare(Object[] o1, Object[] o2) {
   public static TestRule cliClassRule = adapter.buildClassRule();
 
   @Rule
-  public TestRule cliTestRule = adapter.buildTestRule();
+  public TestRule cliTestRule = (statement, description) -> new Statement() {

Review comment:
   okay...then its not needed? :)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 511304)
Time Spent: 2h  (was: 1h 50m)

> Improve plan regression tests using TPCDS30TB metastore dump and custom 
> configs
> ---
>
> Key: HIVE-23965
> URL: https://issues.apache.org/jira/browse/HIVE-23965
> Project: Hive
>  Issue Type: Improvement
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> The existing regression tests (HIVE-12586) based on TPC-DS have certain 
> shortcomings:
> The table statistics do not reflect cardinalities from a specific TPC-DS 
> scale factor (SF). Some tables are from a 30TB dataset, others from 200GB 
> dataset, and others from a 3GB dataset. This mix leads to plans that may 
> never appear when using an actual TPC-DS dataset. 
> The existing statistics do not contain information about partitions something 
> that can have a big impact on the resulting plans.
> The existing regression tests rely on more or less on the default 
> configuration (hive-site.xml). In real-life scenarios though some of the 
> configurations differ and may impact the choices of the optimizer.
> This issue aims to address the above shortcomings by using a curated 
> TPCDS30TB metastore dump along with some custom hive configurations. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-14928) Analyze table no scan mess up schema

2020-11-13 Thread skwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-14928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

skwang updated HIVE-14928:
--
Summary: Analyze table no scan mess up schema  (was: Analyze table no scan 
mess up schema112121212121)

> Analyze table no scan mess up schema
> 
>
> Key: HIVE-14928
> URL: https://issues.apache.org/jira/browse/HIVE-14928
> Project: Hive
>  Issue Type: Bug
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
>Priority: Major
> Fix For: 2.2.0
>
> Attachments: HIVE-14928.1.patch, HIVE-14928.2.patch
>
>
> StatsNoJobTask uses static variables partUpdates and  table to track stats 
> changes. If multiple analyze no scan tasks run at the same time, then 
> table/partition schema could mess up.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24269) In SharedWorkOptimizer run simplification after merging TS filter expressions

2020-11-13 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17231358#comment-17231358
 ] 

Zoltan Haindrich commented on HIVE-24269:
-

HIVE-24365 have reduce redundant expression creation a lot - this might not be 
needed

> In SharedWorkOptimizer run simplification after merging TS filter expressions
> -
>
> Key: HIVE-24269
> URL: https://issues.apache.org/jira/browse/HIVE-24269
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>
> https://github.com/apache/hive/pull/1553#discussion_r503837757



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-14928) Analyze table no scan mess up schema112121212121

2020-11-13 Thread skwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-14928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

skwang updated HIVE-14928:
--
Summary: Analyze table no scan mess up schema112121212121  (was: Analyze 
table no scan mess up schema)

> Analyze table no scan mess up schema112121212121
> 
>
> Key: HIVE-14928
> URL: https://issues.apache.org/jira/browse/HIVE-14928
> Project: Hive
>  Issue Type: Bug
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
>Priority: Major
> Fix For: 2.2.0
>
> Attachments: HIVE-14928.1.patch, HIVE-14928.2.patch
>
>
> StatsNoJobTask uses static variables partUpdates and  table to track stats 
> changes. If multiple analyze no scan tasks run at the same time, then 
> table/partition schema could mess up.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-14928) Analyze table no scan mess up schema1

2020-11-13 Thread skwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-14928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

skwang updated HIVE-14928:
--
Summary: Analyze table no scan mess up schema1  (was: Analyze table no scan 
mess up schema)

> Analyze table no scan mess up schema1
> -
>
> Key: HIVE-14928
> URL: https://issues.apache.org/jira/browse/HIVE-14928
> Project: Hive
>  Issue Type: Bug
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
>Priority: Major
> Fix For: 2.2.0
>
> Attachments: HIVE-14928.1.patch, HIVE-14928.2.patch
>
>
> StatsNoJobTask uses static variables partUpdates and  table to track stats 
> changes. If multiple analyze no scan tasks run at the same time, then 
> table/partition schema could mess up.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-14928) Analyze table no scan mess up schema

2020-11-13 Thread skwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-14928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

skwang updated HIVE-14928:
--
Summary: Analyze table no scan mess up schema  (was: Analyze table no scan 
mess up schema1)

> Analyze table no scan mess up schema
> 
>
> Key: HIVE-14928
> URL: https://issues.apache.org/jira/browse/HIVE-14928
> Project: Hive
>  Issue Type: Bug
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
>Priority: Major
> Fix For: 2.2.0
>
> Attachments: HIVE-14928.1.patch, HIVE-14928.2.patch
>
>
> StatsNoJobTask uses static variables partUpdates and  table to track stats 
> changes. If multiple analyze no scan tasks run at the same time, then 
> table/partition schema could mess up.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24241) Enable SharedWorkOptimizer to merge downstream operators after an optimization step

2020-11-13 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich resolved HIVE-24241.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

merged into master. Thank you Jesus for reviewing the changes!

> Enable SharedWorkOptimizer to merge downstream operators after an 
> optimization step
> ---
>
> Key: HIVE-24241
> URL: https://issues.apache.org/jira/browse/HIVE-24241
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24241) Enable SharedWorkOptimizer to merge downstream operators after an optimization step

2020-11-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24241?focusedWorklogId=511300=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-511300
 ]

ASF GitHub Bot logged work on HIVE-24241:
-

Author: ASF GitHub Bot
Created on: 13/Nov/20 10:26
Start Date: 13/Nov/20 10:26
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk merged pull request #1562:
URL: https://github.com/apache/hive/pull/1562


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 511300)
Time Spent: 2h 20m  (was: 2h 10m)

> Enable SharedWorkOptimizer to merge downstream operators after an 
> optimization step
> ---
>
> Key: HIVE-24241
> URL: https://issues.apache.org/jira/browse/HIVE-24241
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24360) SharedWorkOptimizer may create incorrect plans with DPPUnion

2020-11-13 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-24360:

Parent: HIVE-24384
Issue Type: Sub-task  (was: Bug)

> SharedWorkOptimizer may create incorrect plans with DPPUnion
> 
>
> Key: HIVE-24360
> URL: https://issues.apache.org/jira/browse/HIVE-24360
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24295) Apply schema merge to all shared work optimizations

2020-11-13 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-24295:

Parent: HIVE-24384
Issue Type: Sub-task  (was: Improvement)

> Apply schema merge to all shared work optimizations
> ---
>
> Key: HIVE-24295
> URL: https://issues.apache.org/jira/browse/HIVE-24295
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24382) Organize replaceTabAlias methods

2020-11-13 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-24382:

Parent: HIVE-24384
Issue Type: Sub-task  (was: Improvement)

> Organize replaceTabAlias methods
> 
>
> Key: HIVE-24382
> URL: https://issues.apache.org/jira/browse/HIVE-24382
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>
> * move to the OperatorDesc / etc
> https://github.com/apache/hive/pull/1661#discussion_r522693729



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24357) Exchange SWO table/algorithm strategy

2020-11-13 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-24357:

Parent: HIVE-24384
Issue Type: Sub-task  (was: Improvement)

> Exchange SWO table/algorithm strategy
> -
>
> Key: HIVE-24357
> URL: https://issues.apache.org/jira/browse/HIVE-24357
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: swo.before.jointree.dot.png
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> SWO right now runs like: 
> {code}
> for every strategy s: for every table t: try s for t
> {code}
> this results in that an earlier startegy may create a more entangled operator 
> tree behind - in case its able to merge for a less prioritized table
> it would probably make more sense to do:
> {code}
> for every table t: for every strategy s: try s for t
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24269) In SharedWorkOptimizer run simplification after merging TS filter expressions

2020-11-13 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-24269:

Parent: HIVE-24384
Issue Type: Sub-task  (was: Improvement)

> In SharedWorkOptimizer run simplification after merging TS filter expressions
> -
>
> Key: HIVE-24269
> URL: https://issues.apache.org/jira/browse/HIVE-24269
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>
> https://github.com/apache/hive/pull/1553#discussion_r503837757



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24242) Relax safety checks in SharedWorkOptimizer

2020-11-13 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-24242:

Parent: HIVE-24384
Issue Type: Sub-task  (was: Improvement)

> Relax safety checks in SharedWorkOptimizer
> --
>
> Key: HIVE-24242
> URL: https://issues.apache.org/jira/browse/HIVE-24242
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> there are some checks to lock out problematic cases
> For UnionOperator 
> [here|https://github.com/apache/hive/blob/1507d80fd47aad38b87bba4fd58c1427ba89dbbf/ql/src/java/org/apache/hadoop/hive/ql/optimizer/SharedWorkOptimizer.java#L1571]
> This check could prevent the optimization even if the Union is only visible 
> from only 1 of the TS ops.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24241) Enable SharedWorkOptimizer to merge downstream operators after an optimization step

2020-11-13 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-24241:

Parent: HIVE-24384
Issue Type: Sub-task  (was: Improvement)

> Enable SharedWorkOptimizer to merge downstream operators after an 
> optimization step
> ---
>
> Key: HIVE-24241
> URL: https://issues.apache.org/jira/browse/HIVE-24241
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24355) Implement hashCode and equals for Partition

2020-11-13 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-24355:

Parent: HIVE-24384
Issue Type: Sub-task  (was: Bug)

> Implement hashCode and equals for Partition 
> 
>
> Key: HIVE-24355
> URL: https://issues.apache.org/jira/browse/HIVE-24355
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> this might cause some issues - it also prevents the SWO from merging TS 
> operators which have partitions in the "pruned list"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24365) SWO should not create complex and redundant filter expressions

2020-11-13 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-24365:

Parent: HIVE-24384
Issue Type: Sub-task  (was: Improvement)

> SWO should not create complex and redundant filter expressions
> --
>
> Key: HIVE-24365
> URL: https://issues.apache.org/jira/browse/HIVE-24365
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> for q88 we have complex and mostly unreadable filter expressions; because 
> before merging 2 branches the TS filterexpression is pushed into a FIL 
> operator.
> consider 3 scans with filters: (A,B,C)
> initially we have
> {code} 
> T(A)
> T(B)
> T(C)
> {code}
> after merging A,B
> {code}
> T(A || B) -> FIL(A)
>   -> FIL(B)
> T(C)
> {code}
> right now if we merge C as well:
> {code}
> T(A || B || C) -> FIL(A AND (A || B))
>-> FIL(B AND (A || B))
>-> FIL(C)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24231) Enhance shared work optimizer to merge scans with filters on both sides

2020-11-13 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-24231:

Parent: HIVE-24384
Issue Type: Sub-task  (was: Improvement)

> Enhance shared work optimizer to merge scans with filters on both sides
> ---
>
> Key: HIVE-24231
> URL: https://issues.apache.org/jira/browse/HIVE-24231
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24384) SharedWorkOptimizer improvements

2020-11-13 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-24384:
---


> SharedWorkOptimizer improvements
> 
>
> Key: HIVE-24384
> URL: https://issues.apache.org/jira/browse/HIVE-24384
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>
> this started as a small feature addition but due to the sheer volume of the 
> q.out changes - its better to do smaller changes at a time; which means more 
> tickets...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24357) Exchange SWO table/algorithm strategy

2020-11-13 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich resolved HIVE-24357.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

merged into master. Thank you Jesus for reviewing the changes!

> Exchange SWO table/algorithm strategy
> -
>
> Key: HIVE-24357
> URL: https://issues.apache.org/jira/browse/HIVE-24357
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: swo.before.jointree.dot.png
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> SWO right now runs like: 
> {code}
> for every strategy s: for every table t: try s for t
> {code}
> this results in that an earlier startegy may create a more entangled operator 
> tree behind - in case its able to merge for a less prioritized table
> it would probably make more sense to do:
> {code}
> for every table t: for every strategy s: try s for t
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24365) SWO should not create complex and redundant filter expressions

2020-11-13 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich resolved HIVE-24365.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

merged into master. Thank you Jesus for reviewing the changes!

> SWO should not create complex and redundant filter expressions
> --
>
> Key: HIVE-24365
> URL: https://issues.apache.org/jira/browse/HIVE-24365
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> for q88 we have complex and mostly unreadable filter expressions; because 
> before merging 2 branches the TS filterexpression is pushed into a FIL 
> operator.
> consider 3 scans with filters: (A,B,C)
> initially we have
> {code} 
> T(A)
> T(B)
> T(C)
> {code}
> after merging A,B
> {code}
> T(A || B) -> FIL(A)
>   -> FIL(B)
> T(C)
> {code}
> right now if we merge C as well:
> {code}
> T(A || B || C) -> FIL(A AND (A || B))
>-> FIL(B AND (A || B))
>-> FIL(C)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23965) Improve plan regression tests using TPCDS30TB metastore dump and custom configs

2020-11-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23965?focusedWorklogId=511294=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-511294
 ]

ASF GitHub Bot logged work on HIVE-23965:
-

Author: ASF GitHub Bot
Created on: 13/Nov/20 10:10
Start Date: 13/Nov/20 10:10
Worklog Time Spent: 10m 
  Work Description: zabetak commented on a change in pull request #1347:
URL: https://github.com/apache/hive/pull/1347#discussion_r522852179



##
File path: ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java
##
@@ -1743,7 +1743,9 @@ public void setLocalMapRedErrors(Map> localMapRedErrors) {
 
   public String getCurrentDatabase() {
 if (currentDatabase == null) {
-  currentDatabase = DEFAULT_DATABASE_NAME;
+  currentDatabase = sessionConf.getVar(ConfVars.HIVE_CURRENT_DATABASE);

Review comment:
   Well I didn't thought of it as a hack since the property is there but 
for sure I can pick another approach; I will update the PR.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 511294)
Time Spent: 1h 50m  (was: 1h 40m)

> Improve plan regression tests using TPCDS30TB metastore dump and custom 
> configs
> ---
>
> Key: HIVE-23965
> URL: https://issues.apache.org/jira/browse/HIVE-23965
> Project: Hive
>  Issue Type: Improvement
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> The existing regression tests (HIVE-12586) based on TPC-DS have certain 
> shortcomings:
> The table statistics do not reflect cardinalities from a specific TPC-DS 
> scale factor (SF). Some tables are from a 30TB dataset, others from 200GB 
> dataset, and others from a 3GB dataset. This mix leads to plans that may 
> never appear when using an actual TPC-DS dataset. 
> The existing statistics do not contain information about partitions something 
> that can have a big impact on the resulting plans.
> The existing regression tests rely on more or less on the default 
> configuration (hive-site.xml). In real-life scenarios though some of the 
> configurations differ and may impact the choices of the optimizer.
> This issue aims to address the above shortcomings by using a curated 
> TPCDS30TB metastore dump along with some custom hive configurations. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23965) Improve plan regression tests using TPCDS30TB metastore dump and custom configs

2020-11-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23965?focusedWorklogId=511293=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-511293
 ]

ASF GitHub Bot logged work on HIVE-23965:
-

Author: ASF GitHub Bot
Created on: 13/Nov/20 10:06
Start Date: 13/Nov/20 10:06
Worklog Time Spent: 10m 
  Work Description: zabetak commented on a change in pull request #1347:
URL: https://github.com/apache/hive/pull/1347#discussion_r522850158



##
File path: 
itests/qtest/src/test/java/org/apache/hadoop/hive/cli/TestTezTPCDS30TBPerfCliDriver.java
##
@@ -56,12 +58,22 @@ public int compare(Object[] o1, Object[] o2) {
   public static TestRule cliClassRule = adapter.buildClassRule();
 
   @Rule
-  public TestRule cliTestRule = adapter.buildTestRule();
+  public TestRule cliTestRule = (statement, description) -> new Statement() {

Review comment:
   The purpose is not to do more but rather do less :)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 511293)
Time Spent: 1h 40m  (was: 1.5h)

> Improve plan regression tests using TPCDS30TB metastore dump and custom 
> configs
> ---
>
> Key: HIVE-23965
> URL: https://issues.apache.org/jira/browse/HIVE-23965
> Project: Hive
>  Issue Type: Improvement
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> The existing regression tests (HIVE-12586) based on TPC-DS have certain 
> shortcomings:
> The table statistics do not reflect cardinalities from a specific TPC-DS 
> scale factor (SF). Some tables are from a 30TB dataset, others from 200GB 
> dataset, and others from a 3GB dataset. This mix leads to plans that may 
> never appear when using an actual TPC-DS dataset. 
> The existing statistics do not contain information about partitions something 
> that can have a big impact on the resulting plans.
> The existing regression tests rely on more or less on the default 
> configuration (hive-site.xml). In real-life scenarios though some of the 
> configurations differ and may impact the choices of the optimizer.
> This issue aims to address the above shortcomings by using a curated 
> TPCDS30TB metastore dump along with some custom hive configurations. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23965) Improve plan regression tests using TPCDS30TB metastore dump and custom configs

2020-11-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23965?focusedWorklogId=511290=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-511290
 ]

ASF GitHub Bot logged work on HIVE-23965:
-

Author: ASF GitHub Bot
Created on: 13/Nov/20 10:04
Start Date: 13/Nov/20 10:04
Worklog Time Spent: 10m 
  Work Description: zabetak commented on a change in pull request #1347:
URL: https://github.com/apache/hive/pull/1347#discussion_r522848692



##
File path: 
itests/qtest/src/test/java/org/apache/hadoop/hive/cli/TestTezTPCDS30TBPerfCliDriver.java
##
@@ -17,30 +17,32 @@
  */
 package org.apache.hadoop.hive.cli;
 
-import java.io.File;
-import java.util.Comparator;
-import java.util.List;
-
 import org.apache.hadoop.hive.cli.control.CliAdapter;
 import org.apache.hadoop.hive.cli.control.CliConfigs;
+import org.apache.hadoop.hive.cli.control.SplitSupport;
 import org.junit.ClassRule;
 import org.junit.Rule;
 import org.junit.Test;
 import org.junit.rules.TestRule;
 import org.junit.runner.RunWith;
 import org.junit.runners.Parameterized;
 import org.junit.runners.Parameterized.Parameters;
+import org.junit.runners.model.Statement;
+
+import java.io.File;
+import java.util.Comparator;
+import java.util.List;
 
 @RunWith(Parameterized.class)
-public class TestTezPerfCliDriver {
+public class TestTezTPCDS30TBPerfCliDriver {
 
-  static CliAdapter adapter = new 
CliConfigs.TezPerfCliConfig(false).getCliAdapter();
+  static CliAdapter adapter = new 
CliConfigs.TezTPCDS30TBCliConfig().getCliAdapter();
 
   @Parameters(name = "{0}")
   public static List getParameters() throws Exception {
 List parameters = adapter.getParameters();
 parameters.sort(new C1());
-return parameters;
+return SplitSupport.process(parameters, 
TestTezTPCDS30TBPerfCliDriver.class, 10);

Review comment:
   If I remember well, it was around 20 minutes.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 511290)
Time Spent: 1.5h  (was: 1h 20m)

> Improve plan regression tests using TPCDS30TB metastore dump and custom 
> configs
> ---
>
> Key: HIVE-23965
> URL: https://issues.apache.org/jira/browse/HIVE-23965
> Project: Hive
>  Issue Type: Improvement
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The existing regression tests (HIVE-12586) based on TPC-DS have certain 
> shortcomings:
> The table statistics do not reflect cardinalities from a specific TPC-DS 
> scale factor (SF). Some tables are from a 30TB dataset, others from 200GB 
> dataset, and others from a 3GB dataset. This mix leads to plans that may 
> never appear when using an actual TPC-DS dataset. 
> The existing statistics do not contain information about partitions something 
> that can have a big impact on the resulting plans.
> The existing regression tests rely on more or less on the default 
> configuration (hive-site.xml). In real-life scenarios though some of the 
> configurations differ and may impact the choices of the optimizer.
> This issue aims to address the above shortcomings by using a curated 
> TPCDS30TB metastore dump along with some custom hive configurations. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24341) Sweep phase for proactive cache eviction

2020-11-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24341?focusedWorklogId=511280=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-511280
 ]

ASF GitHub Bot logged work on HIVE-24341:
-

Author: ASF GitHub Bot
Created on: 13/Nov/20 09:32
Start Date: 13/Nov/20 09:32
Worklog Time Spent: 10m 
  Work Description: asinkovits commented on a change in pull request #1665:
URL: https://github.com/apache/hive/pull/1665#discussion_r522500974



##
File path: 
llap-server/src/java/org/apache/hadoop/hive/llap/cache/EvictionListener.java
##
@@ -20,4 +20,5 @@
 
 public interface EvictionListener {
   void notifyEvicted(LlapCacheableBuffer buffer);
+  void notifyEvictedBytes(long size);

Review comment:
   Why is this required?

##
File path: 
llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapIoImpl.java
##
@@ -151,8 +151,10 @@ private LlapIoImpl(Configuration conf) throws IOException {
   LowLevelCachePolicy
   realCachePolicy =
   useLrfu ? new LowLevelLrfuCachePolicy(minAllocSize, totalMemorySize, 
conf) : new LowLevelFifoCachePolicy();
-  // TODO: if realCachePolicy is not something that supports proactive 
caching
-  // turn the feature off (LLAP_IO_PROACTIVE_EVICTION_ENABLED) and log it
+  if (!(realCachePolicy instanceof ProactiveEvictingCachePolicy.Impl)) {

Review comment:
   I think this would make sense in the mark phase as well.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 511280)
Time Spent: 1h 10m  (was: 1h)

> Sweep phase for proactive cache eviction
> 
>
> Key: HIVE-24341
> URL: https://issues.apache.org/jira/browse/HIVE-24341
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24383) Add Table type to HPL/SQL

2020-11-13 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar reassigned HIVE-24383:



> Add Table type to HPL/SQL
> -
>
> Key: HIVE-24383
> URL: https://issues.apache.org/jira/browse/HIVE-24383
> Project: Hive
>  Issue Type: Improvement
>  Components: hpl/sql
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23965) Improve plan regression tests using TPCDS30TB metastore dump and custom configs

2020-11-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23965?focusedWorklogId=511264=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-511264
 ]

ASF GitHub Bot logged work on HIVE-23965:
-

Author: ASF GitHub Bot
Created on: 13/Nov/20 08:31
Start Date: 13/Nov/20 08:31
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on a change in pull request #1347:
URL: https://github.com/apache/hive/pull/1347#discussion_r517278142



##
File path: 
itests/qtest/src/test/java/org/apache/hadoop/hive/cli/TestTezTPCDS30TBPerfCliDriver.java
##
@@ -17,30 +17,32 @@
  */
 package org.apache.hadoop.hive.cli;
 
-import java.io.File;
-import java.util.Comparator;
-import java.util.List;
-
 import org.apache.hadoop.hive.cli.control.CliAdapter;
 import org.apache.hadoop.hive.cli.control.CliConfigs;
+import org.apache.hadoop.hive.cli.control.SplitSupport;
 import org.junit.ClassRule;
 import org.junit.Rule;
 import org.junit.Test;
 import org.junit.rules.TestRule;
 import org.junit.runner.RunWith;
 import org.junit.runners.Parameterized;
 import org.junit.runners.Parameterized.Parameters;
+import org.junit.runners.model.Statement;
+
+import java.io.File;
+import java.util.Comparator;
+import java.util.List;
 
 @RunWith(Parameterized.class)
-public class TestTezPerfCliDriver {
+public class TestTezTPCDS30TBPerfCliDriver {
 
-  static CliAdapter adapter = new 
CliConfigs.TezPerfCliConfig(false).getCliAdapter();
+  static CliAdapter adapter = new 
CliConfigs.TezTPCDS30TBCliConfig().getCliAdapter();
 
   @Parameters(name = "{0}")
   public static List getParameters() throws Exception {
 List parameters = adapter.getParameters();
 parameters.sort(new C1());
-return parameters;
+return SplitSupport.process(parameters, 
TestTezTPCDS30TBPerfCliDriver.class, 10);

Review comment:
   I don't think this is really neccessary - does this testcase runs for 
more than 15 minutes?
   

##
File path: ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java
##
@@ -1743,7 +1743,9 @@ public void setLocalMapRedErrors(Map> localMapRedErrors) {
 
   public String getCurrentDatabase() {
 if (currentDatabase == null) {
-  currentDatabase = DEFAULT_DATABASE_NAME;
+  currentDatabase = sessionConf.getVar(ConfVars.HIVE_CURRENT_DATABASE);

Review comment:
   instead of hacking the system - can't we just put the data inside the 
docker image under `default` ?
   ...or add a `use xxx` to the init sql - but please don't add something like 
this to  `SessionState`

##
File path: 
itests/qtest/src/test/java/org/apache/hadoop/hive/cli/TestTezTPCDS30TBPerfCliDriver.java
##
@@ -56,12 +58,22 @@ public int compare(Object[] o1, Object[] o2) {
   public static TestRule cliClassRule = adapter.buildClassRule();
 
   @Rule
-  public TestRule cliTestRule = adapter.buildTestRule();
+  public TestRule cliTestRule = (statement, description) -> new Statement() {

Review comment:
   I don't think this does anything more than `adapter.buildTestRule`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 511264)
Time Spent: 1h 20m  (was: 1h 10m)

> Improve plan regression tests using TPCDS30TB metastore dump and custom 
> configs
> ---
>
> Key: HIVE-23965
> URL: https://issues.apache.org/jira/browse/HIVE-23965
> Project: Hive
>  Issue Type: Improvement
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> The existing regression tests (HIVE-12586) based on TPC-DS have certain 
> shortcomings:
> The table statistics do not reflect cardinalities from a specific TPC-DS 
> scale factor (SF). Some tables are from a 30TB dataset, others from 200GB 
> dataset, and others from a 3GB dataset. This mix leads to plans that may 
> never appear when using an actual TPC-DS dataset. 
> The existing statistics do not contain information about partitions something 
> that can have a big impact on the resulting plans.
> The existing regression tests rely on more or less on the default 
> configuration (hive-site.xml). In real-life scenarios though some of the 
> configurations differ and may impact the choices of the optimizer.
> This issue aims to address the above shortcomings by using a curated 
> TPCDS30TB metastore dump along with some custom hive 

[jira] [Updated] (HIVE-24328) Run distcp in parallel for all file entries in repl load.

2020-11-13 Thread Anishek Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anishek Agarwal updated HIVE-24328:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

committed to master, Thanks for the patch [~aasha] and review [~pkumarsinha]

> Run distcp in parallel for all file entries in repl load.
> -
>
> Key: HIVE-24328
> URL: https://issues.apache.org/jira/browse/HIVE-24328
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24328.01.patch, HIVE-24328.02.patch, 
> HIVE-24328.03.patch, HIVE-24328.04.patch
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)