date:20220121

[jira] [Commented] (HIVE-25401) Insert overwrite a table which location is on other cluster fail in kerberos cluster

2022-01-21 Thread zhangbutao (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-25401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17480361#comment-17480361
 ] 

zhangbutao commented on HIVE-25401:
---

I think you can try this parameter and its value is mutiple cluster deFaultFS : 

    
      mapreduce.job.hdfs-servers
      hdfs://cluster1,hdfs://cluster2
    

> Insert overwrite  a table which location is on other cluster fail  in 
> kerberos cluster
> --
>
> Key: HIVE-25401
> URL: https://issues.apache.org/jira/browse/HIVE-25401
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.3.0, 3.1.2
> Environment: hive 2.3 
> hadoop3 cluster with kerberos 
>Reporter: Max  Xie
>Assignee: Max  Xie
>Priority: Minor
>  Labels: pull-request-available
> Attachments: HIVE-25401.patch, image-2021-07-29-14-25-23-418.png
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> we have tow hdfs clusters with kerberos security,  it means that mapreduce 
> task need delegation tokens to authenticate namenode when hive on mapreduce 
> run.
> Insert overwrite a table which location is on other cluster fail in kerberos 
> cluster. For example, 
>  # yarn cluster's default fs is hdfs://cluster1
>  # tb1's location is hdfs://cluster1/tb1
>  # tb2's location is hdfs://cluster2/tb2 
>  #  sql `INSERT OVERWRITE TABLE  tb2 SELECT * from tb1` run on yarn cluster 
> will fail
>  
> reduce task error log:
> !image-2021-07-29-14-25-23-418.png!
> How to fix:
> After dig it, web found mapreduce job just obtain delegation tokens for input 
> files in FileInputFormat. But Hive context get extendal scratchDir base on 
> table's location, If the table 's location is on other cluster, the 
> delegation token will not be obtained. 
> So we need to obtaine delegation tokens for hive scratchDirs before hive 
> submit mapreduce job.
>  
> How to test:
> no test
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Resolved] (HIVE-25783) Refine standalone-metastore module pom.xml files

2022-01-21 Thread Zhihua Deng (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng resolved HIVE-25783.

Resolution: Fixed

> Refine standalone-metastore module pom.xml files
> 
>
> Key: HIVE-25783
> URL: https://issues.apache.org/jira/browse/HIVE-25783
> Project: Hive
>  Issue Type: Improvement
>  Components: Build Infrastructure
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> In HIVE-25774,  we added ASF license for newly created files in 
> standalone-metastore, but we may face the same issue latter on. The Jira 
> tries to investigate if we can provide some common ways to make sure that the 
> newly added source files contain the ASF license information. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (HIVE-25783) Refine standalone-metastore module pom.xml files

2022-01-21 Thread Zhihua Deng (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-25783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17480310#comment-17480310
 ] 

Zhihua Deng commented on HIVE-25783:


Merged to master. Thank you for the feedback and review, [~pvary]!

> Refine standalone-metastore module pom.xml files
> 
>
> Key: HIVE-25783
> URL: https://issues.apache.org/jira/browse/HIVE-25783
> Project: Hive
>  Issue Type: Improvement
>  Components: Build Infrastructure
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> In HIVE-25774,  we added ASF license for newly created files in 
> standalone-metastore, but we may face the same issue latter on. The Jira 
> tries to investigate if we can provide some common ways to make sure that the 
> newly added source files contain the ASF license information. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HIVE-25783) Refine standalone-metastore module pom.xml files

2022-01-21 Thread Zhihua Deng (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-25783:
---
Fix Version/s: 4.0.0

> Refine standalone-metastore module pom.xml files
> 
>
> Key: HIVE-25783
> URL: https://issues.apache.org/jira/browse/HIVE-25783
> Project: Hive
>  Issue Type: Improvement
>  Components: Build Infrastructure
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> In HIVE-25774,  we added ASF license for newly created files in 
> standalone-metastore, but we may face the same issue latter on. The Jira 
> tries to investigate if we can provide some common ways to make sure that the 
> newly added source files contain the ASF license information. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-25783) Refine standalone-metastore module pom.xml files

2022-01-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25783?focusedWorklogId=713147=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-713147
 ]

ASF GitHub Bot logged work on HIVE-25783:
-

Author: ASF GitHub Bot
Created on: 22/Jan/22 00:49
Start Date: 22/Jan/22 00:49
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 merged pull request #2852:
URL: https://github.com/apache/hive/pull/2852


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 713147)
Time Spent: 1h 50m  (was: 1h 40m)

> Refine standalone-metastore module pom.xml files
> 
>
> Key: HIVE-25783
> URL: https://issues.apache.org/jira/browse/HIVE-25783
> Project: Hive
>  Issue Type: Improvement
>  Components: Build Infrastructure
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> In HIVE-25774,  we added ASF license for newly created files in 
> standalone-metastore, but we may face the same issue latter on. The Jira 
> tries to investigate if we can provide some common ways to make sure that the 
> newly added source files contain the ASF license information. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-25401) Insert overwrite a table which location is on other cluster fail in kerberos cluster

2022-01-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25401?focusedWorklogId=713138=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-713138
 ]

ASF GitHub Bot logged work on HIVE-25401:
-

Author: ASF GitHub Bot
Created on: 22/Jan/22 00:11
Start Date: 22/Jan/22 00:11
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on pull request #2544:
URL: https://github.com/apache/hive/pull/2544#issuecomment-1018984793


   This pull request has been automatically marked as stale because it has not 
had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the d...@hive.apache.org list if the patch is in 
need of reviews.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 713138)
Time Spent: 1.5h  (was: 1h 20m)

> Insert overwrite  a table which location is on other cluster fail  in 
> kerberos cluster
> --
>
> Key: HIVE-25401
> URL: https://issues.apache.org/jira/browse/HIVE-25401
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.3.0, 3.1.2
> Environment: hive 2.3 
> hadoop3 cluster with kerberos 
>Reporter: Max  Xie
>Assignee: Max  Xie
>Priority: Minor
>  Labels: pull-request-available
> Attachments: HIVE-25401.patch, image-2021-07-29-14-25-23-418.png
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> we have tow hdfs clusters with kerberos security,  it means that mapreduce 
> task need delegation tokens to authenticate namenode when hive on mapreduce 
> run.
> Insert overwrite a table which location is on other cluster fail in kerberos 
> cluster. For example, 
>  # yarn cluster's default fs is hdfs://cluster1
>  # tb1's location is hdfs://cluster1/tb1
>  # tb2's location is hdfs://cluster2/tb2 
>  #  sql `INSERT OVERWRITE TABLE  tb2 SELECT * from tb1` run on yarn cluster 
> will fail
>  
> reduce task error log:
> !image-2021-07-29-14-25-23-418.png!
> How to fix:
> After dig it, web found mapreduce job just obtain delegation tokens for input 
> files in FileInputFormat. But Hive context get extendal scratchDir base on 
> table's location, If the table 's location is on other cluster, the 
> delegation token will not be obtained. 
> So we need to obtaine delegation tokens for hive scratchDirs before hive 
> submit mapreduce job.
>  
> How to test:
> no test
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-24830) Revise RowSchema mutability usage

2022-01-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24830?focusedWorklogId=713136=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-713136
 ]

ASF GitHub Bot logged work on HIVE-24830:
-

Author: ASF GitHub Bot
Created on: 22/Jan/22 00:11
Start Date: 22/Jan/22 00:11
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on pull request #2019:
URL: https://github.com/apache/hive/pull/2019#issuecomment-1018984833


   This pull request has been automatically marked as stale because it has not 
had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the d...@hive.apache.org list if the patch is in 
need of reviews.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 713136)
Time Spent: 1h 10m  (was: 1h)

> Revise RowSchema mutability usage
> -
>
> Key: HIVE-24830
> URL: https://issues.apache.org/jira/browse/HIVE-24830
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> RowSchema is essentially a container class for a list of fields.
> * it can be constructed from a "list"
> * the list can be set
> * the list can be accessed
> none of the above methods try to protect the data inside; hence the following 
> could easily  happen:
> {code}
> s=o1.getSchema();
> col=s.getCol("favourite")
> col.setInternalName("asd"); // will modify o1 schema
> newSchema.add(col);
> o2.setSchema(newSchema);
> o2.getSchema().get("asd").setInternalName("xxx"); // will modify o1 and o2 
> schema
> [...]
> {code}
> not sure how much of this is actually cruical; exploratory testrun revealed 
> some cases
> https://github.com/apache/hive/pull/2019



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-25352) Optimise DBTokenStore for RDBMS

2022-01-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25352?focusedWorklogId=713135=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-713135
 ]

ASF GitHub Bot logged work on HIVE-25352:
-

Author: ASF GitHub Bot
Created on: 22/Jan/22 00:11
Start Date: 22/Jan/22 00:11
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on pull request #2499:
URL: https://github.com/apache/hive/pull/2499#issuecomment-1018984814


   This pull request has been automatically marked as stale because it has not 
had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the d...@hive.apache.org list if the patch is in 
need of reviews.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 713135)
Time Spent: 20m  (was: 10m)

> Optimise DBTokenStore for RDBMS
> ---
>
> Key: HIVE-25352
> URL: https://issues.apache.org/jira/browse/HIVE-25352
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sahana Bhat
>Assignee: Sahana Bhat
>Priority: Major
>  Labels: pull-request-available, pull_request_available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The existing DBTokenStore implementation is very under optimised when an 
> RDBMS is used.
>  * All available tokens are fetched from the DB. The validity of each token 
> is determined based on its max date and renew date and deleted if required. 
> For a relational database like MySQL, a *query to fetch all rows with no 
> filters or pagination* can be costly and impact the performance of the 
> database and the server. 
>  * From the token identifiers fetched, if the token hasn’t breached its max 
> date, the token information is again fetched from the database to validate 
> its renew date.  
>  * The token expiration daemon is part of the Hive system. In a cluster of 
> tens or hundreds of Hive servers, the daemon runs on each of the servers. 
> This means that the flow of fetching of tokens, validation for expiration and 
> deleting them is executed in duplication in each of the servers. The 
> *duplication of the functionality in every server* along with the problems 
> discussed in Point 1 & 2, can severely degrade the performance of the 
> database.
> This issue will address the issues mentioned in 1 & 2.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-25621) Alter table partition compact/concatenate commands should send HivePrivilegeObjects for Authz

2022-01-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25621?focusedWorklogId=713134=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-713134
 ]

ASF GitHub Bot logged work on HIVE-25621:
-

Author: ASF GitHub Bot
Created on: 22/Jan/22 00:11
Start Date: 22/Jan/22 00:11
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #2731:
URL: https://github.com/apache/hive/pull/2731


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 713134)
Time Spent: 0.5h  (was: 20m)

> Alter table partition compact/concatenate commands should send 
> HivePrivilegeObjects for Authz
> -
>
> Key: HIVE-25621
> URL: https://issues.apache.org/jira/browse/HIVE-25621
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> # Run the following queries 
> Create table temp(c0 int) partitioned by (c1 int);
> Insert into temp values(1,1);
> ALTER TABLE temp PARTITION (c1=1) COMPACT 'minor';
> ALTER TABLE temp PARTITION (c1=1) CONCATENATE;
> Insert into temp values(1,1);
>  # The above compact/concatenate commands are currently not sending any hive 
> privilege objects for authorization. Hive needs to send these objects to 
> avoid malicious users doing any operation.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-25871) Hive should set name mapping table property for migrated Iceberg tables

2022-01-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25871?focusedWorklogId=713004=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-713004
 ]

ASF GitHub Bot logged work on HIVE-25871:
-

Author: ASF GitHub Bot
Created on: 21/Jan/22 17:36
Start Date: 21/Jan/22 17:36
Worklog Time Spent: 10m 
  Work Description: boroknagyz commented on pull request #2948:
URL: https://github.com/apache/hive/pull/2948#issuecomment-1018721560


   FYI I've uploaded a PR to Iceberg: 
https://github.com/apache/iceberg/pull/3947
   
   It only contains the 1-based indexing of this PR as the table migration code 
is not present in the Iceberg repo.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 713004)
Time Spent: 2.5h  (was: 2h 20m)

> Hive should set name mapping table property for migrated Iceberg tables
> ---
>
> Key: HIVE-25871
> URL: https://issues.apache.org/jira/browse/HIVE-25871
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Hive should  set the name-mapping table property during table migration.
> It would be useful for [column 
> projection|https://iceberg.apache.org/#spec/#column-projection] for files 
> without field ids.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HIVE-25889) Increase default value of "metastore.thread.pool.size"

2022-01-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25889:
--
Labels: pull-request-available  (was: )

> Increase default value of "metastore.thread.pool.size"
> --
>
> Key: HIVE-25889
> URL: https://issues.apache.org/jira/browse/HIVE-25889
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> HiveMetastore uses a threadpool to execute tasks listed under 
> "metastore.task.threads.remote" and "metastore.task.threads.always" configs. 
> The size of this threadpool is controlled by "metastore.thread.pool.size" 
> config which by default is set to 10. The number of tasks in the two lists 
> has grown significantly in the last two years, but the size of the pool 
> remained the same. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-25889) Increase default value of "metastore.thread.pool.size"

2022-01-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25889?focusedWorklogId=712940=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-712940
 ]

ASF GitHub Bot logged work on HIVE-25889:
-

Author: ASF GitHub Bot
Created on: 21/Jan/22 15:28
Start Date: 21/Jan/22 15:28
Worklog Time Spent: 10m 
  Work Description: lcspinter opened a new pull request #2962:
URL: https://github.com/apache/hive/pull/2962


   
   
   ### What changes were proposed in this pull request?
   Increase default value of "metastore.thread.pool.size" from 10 to 15.
   
   
   
   ### Why are the changes needed?
   HiveMetastore uses a threadpool to execute tasks listed under 
"metastore.task.threads.remote" and "metastore.task.threads.always" configs. 
The size of this threadpool is controlled by "metastore.thread.pool.size" 
config which by default is set to 10. The number of tasks in the two lists has 
grown significantly in the last two years, but the size of the pool remained 
the same.
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 712940)
Remaining Estimate: 0h
Time Spent: 10m

> Increase default value of "metastore.thread.pool.size"
> --
>
> Key: HIVE-25889
> URL: https://issues.apache.org/jira/browse/HIVE-25889
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> HiveMetastore uses a threadpool to execute tasks listed under 
> "metastore.task.threads.remote" and "metastore.task.threads.always" configs. 
> The size of this threadpool is controlled by "metastore.thread.pool.size" 
> config which by default is set to 10. The number of tasks in the two lists 
> has grown significantly in the last two years, but the size of the pool 
> remained the same. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Assigned] (HIVE-25889) Increase default value of "metastore.thread.pool.size"

2022-01-21 Thread Jira



 [ 
https://issues.apache.org/jira/browse/HIVE-25889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Pintér reassigned HIVE-25889:



> Increase default value of "metastore.thread.pool.size"
> --
>
> Key: HIVE-25889
> URL: https://issues.apache.org/jira/browse/HIVE-25889
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>
> HiveMetastore uses a threadpool to execute tasks listed under 
> "metastore.task.threads.remote" and "metastore.task.threads.always" configs. 
> The size of this threadpool is controlled by "metastore.thread.pool.size" 
> config which by default is set to 10. The number of tasks in the two lists 
> has grown significantly in the last two years, but the size of the pool 
> remained the same. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (HIVE-25842) Reimplement delta file metric collection

2022-01-21 Thread Jira



[ 
https://issues.apache.org/jira/browse/HIVE-25842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17480121#comment-17480121
 ] 

László Pintér commented on HIVE-25842:
--

Submitted to master. Thanks [~klcopp] and [~dkuzmenko] for the review

> Reimplement delta file metric collection
> 
>
> Key: HIVE-25842
> URL: https://issues.apache.org/jira/browse/HIVE-25842
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 7h
>  Remaining Estimate: 0h
>
> FUNCTIONALITY: Metrics are collected only when a Tez query runs a table 
> (select * and select count( * ) don't update the metrics)
> Metrics aren't updated after compaction or cleaning after compaction, so 
> users will probably see "issues" with compaction (like many active or 
> obsolete or small deltas) that don't exist.
> RISK: Metrics are collected during queries – we tried to put a try-catch 
> around each method in DeltaFilesMetricsReporter but of course this isn't 
> foolproof. This is a HUGE performance and functionality liability. Tests 
> caught some issues, but our tests aren't perfect.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Resolved] (HIVE-25842) Reimplement delta file metric collection

2022-01-21 Thread Jira



 [ 
https://issues.apache.org/jira/browse/HIVE-25842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Pintér resolved HIVE-25842.
--
Resolution: Fixed

> Reimplement delta file metric collection
> 
>
> Key: HIVE-25842
> URL: https://issues.apache.org/jira/browse/HIVE-25842
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 7h
>  Remaining Estimate: 0h
>
> FUNCTIONALITY: Metrics are collected only when a Tez query runs a table 
> (select * and select count( * ) don't update the metrics)
> Metrics aren't updated after compaction or cleaning after compaction, so 
> users will probably see "issues" with compaction (like many active or 
> obsolete or small deltas) that don't exist.
> RISK: Metrics are collected during queries – we tried to put a try-catch 
> around each method in DeltaFilesMetricsReporter but of course this isn't 
> foolproof. This is a HUGE performance and functionality liability. Tests 
> caught some issues, but our tests aren't perfect.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-25842) Reimplement delta file metric collection

2022-01-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25842?focusedWorklogId=712923=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-712923
 ]

ASF GitHub Bot logged work on HIVE-25842:
-

Author: ASF GitHub Bot
Created on: 21/Jan/22 15:00
Start Date: 21/Jan/22 15:00
Worklog Time Spent: 10m 
  Work Description: lcspinter merged pull request #2916:
URL: https://github.com/apache/hive/pull/2916


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 712923)
Time Spent: 7h  (was: 6h 50m)

> Reimplement delta file metric collection
> 
>
> Key: HIVE-25842
> URL: https://issues.apache.org/jira/browse/HIVE-25842
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 7h
>  Remaining Estimate: 0h
>
> FUNCTIONALITY: Metrics are collected only when a Tez query runs a table 
> (select * and select count( * ) don't update the metrics)
> Metrics aren't updated after compaction or cleaning after compaction, so 
> users will probably see "issues" with compaction (like many active or 
> obsolete or small deltas) that don't exist.
> RISK: Metrics are collected during queries – we tried to put a try-catch 
> around each method in DeltaFilesMetricsReporter but of course this isn't 
> foolproof. This is a HUGE performance and functionality liability. Tests 
> caught some issues, but our tests aren't perfect.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-25871) Hive should set name mapping table property for migrated Iceberg tables

2022-01-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25871?focusedWorklogId=712922=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-712922
 ]

ASF GitHub Bot logged work on HIVE-25871:
-

Author: ASF GitHub Bot
Created on: 21/Jan/22 14:59
Start Date: 21/Jan/22 14:59
Worklog Time Spent: 10m 
  Work Description: boroknagyz commented on pull request #2948:
URL: https://github.com/apache/hive/pull/2948#issuecomment-1018586068


   Ah right, things are currently being duplicated between Hive and Iceberg. 
Sure, I'll happily add these changes to Iceberg as well!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 712922)
Time Spent: 2h 20m  (was: 2h 10m)

> Hive should set name mapping table property for migrated Iceberg tables
> ---
>
> Key: HIVE-25871
> URL: https://issues.apache.org/jira/browse/HIVE-25871
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Hive should  set the name-mapping table property during table migration.
> It would be useful for [column 
> projection|https://iceberg.apache.org/#spec/#column-projection] for files 
> without field ids.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HIVE-25888) Improve RuleEventLogger to also print input rels in FULL_PLAN mode

2022-01-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25888:
--
Labels: pull-request-available  (was: )

> Improve RuleEventLogger to also print input rels in FULL_PLAN mode
> --
>
> Key: HIVE-25888
> URL: https://issues.apache.org/jira/browse/HIVE-25888
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO
>Affects Versions: 4.0.0
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Hive porting of CALCITE-4991, refer to that ticket for more details.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-25888) Improve RuleEventLogger to also print input rels in FULL_PLAN mode

2022-01-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25888?focusedWorklogId=712919=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-712919
 ]

ASF GitHub Bot logged work on HIVE-25888:
-

Author: ASF GitHub Bot
Created on: 21/Jan/22 14:51
Start Date: 21/Jan/22 14:51
Worklog Time Spent: 10m 
  Work Description: asolimando opened a new pull request #2961:
URL: https://github.com/apache/hive/pull/2961


   …PLAN mode
   
   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 712919)
Remaining Estimate: 0h
Time Spent: 10m

> Improve RuleEventLogger to also print input rels in FULL_PLAN mode
> --
>
> Key: HIVE-25888
> URL: https://issues.apache.org/jira/browse/HIVE-25888
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO
>Affects Versions: 4.0.0
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Hive porting of CALCITE-4991, refer to that ticket for more details.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-25871) Hive should set name mapping table property for migrated Iceberg tables

2022-01-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25871?focusedWorklogId=712913=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-712913
 ]

ASF GitHub Bot logged work on HIVE-25871:
-

Author: ASF GitHub Bot
Created on: 21/Jan/22 14:43
Start Date: 21/Jan/22 14:43
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on pull request #2948:
URL: https://github.com/apache/hive/pull/2948#issuecomment-1018568724


   Thanks for the contribution @boroknagyz! As Peter mentioned it would be 
great to get the relevant parts into the upstream Iceberg code base as well - 
is this something you would be fancy doing too?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 712913)
Time Spent: 2h 10m  (was: 2h)

> Hive should set name mapping table property for migrated Iceberg tables
> ---
>
> Key: HIVE-25871
> URL: https://issues.apache.org/jira/browse/HIVE-25871
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Hive should  set the name-mapping table property during table migration.
> It would be useful for [column 
> projection|https://iceberg.apache.org/#spec/#column-projection] for files 
> without field ids.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-25871) Hive should set name mapping table property for migrated Iceberg tables

2022-01-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25871?focusedWorklogId=712911=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-712911
 ]

ASF GitHub Bot logged work on HIVE-25871:
-

Author: ASF GitHub Bot
Created on: 21/Jan/22 14:42
Start Date: 21/Jan/22 14:42
Worklog Time Spent: 10m 
  Work Description: marton-bod merged pull request #2948:
URL: https://github.com/apache/hive/pull/2948


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 712911)
Time Spent: 2h  (was: 1h 50m)

> Hive should set name mapping table property for migrated Iceberg tables
> ---
>
> Key: HIVE-25871
> URL: https://issues.apache.org/jira/browse/HIVE-25871
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Hive should  set the name-mapping table property during table migration.
> It would be useful for [column 
> projection|https://iceberg.apache.org/#spec/#column-projection] for files 
> without field ids.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-25871) Hive should set name mapping table property for migrated Iceberg tables

2022-01-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25871?focusedWorklogId=712908=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-712908
 ]

ASF GitHub Bot logged work on HIVE-25871:
-

Author: ASF GitHub Bot
Created on: 21/Jan/22 14:40
Start Date: 21/Jan/22 14:40
Worklog Time Spent: 10m 
  Work Description: pvary commented on pull request #2948:
URL: https://github.com/apache/hive/pull/2948#issuecomment-1018564045


   LGTM +1.
   I think this change should go into the Iceberg repo as well. What do you 
think?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 712908)
Time Spent: 1h 50m  (was: 1h 40m)

> Hive should set name mapping table property for migrated Iceberg tables
> ---
>
> Key: HIVE-25871
> URL: https://issues.apache.org/jira/browse/HIVE-25871
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Hive should  set the name-mapping table property during table migration.
> It would be useful for [column 
> projection|https://iceberg.apache.org/#spec/#column-projection] for files 
> without field ids.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-25871) Hive should set name mapping table property for migrated Iceberg tables

2022-01-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25871?focusedWorklogId=712896=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-712896
 ]

ASF GitHub Bot logged work on HIVE-25871:
-

Author: ASF GitHub Bot
Created on: 21/Jan/22 14:28
Start Date: 21/Jan/22 14:28
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2948:
URL: https://github.com/apache/hive/pull/2948#discussion_r789702644



##
File path: 
iceberg/iceberg-handler/src/test/queries/positive/describe_iceberg_table.q
##
@@ -8,7 +8,7 @@ DROP TABLE IF EXISTS ice_t_transform;
 CREATE EXTERNAL TABLE ice_t_transform (year_field date, month_field date, 
day_field date, hour_field timestamp, truncate_field string, bucket_field int, 
identity_field int) PARTITIONED BY SPEC (year(year_field), month(month_field), 
day(day_field), hour(hour_field), truncate(2, truncate_field), bucket(2, 
bucket_field), identity_field) STORED BY ICEBERG;
 
 DROP TABLE IF EXISTS ice_t_transform_prop;
-CREATE EXTERNAL TABLE ice_t_transform_prop (id int, year_field date, 
month_field date, day_field date, hour_field timestamp, truncate_field string, 
bucket_field int, identity_field int) STORED BY ICEBERG TBLPROPERTIES 
('iceberg.mr.table.partition.spec'='{"spec-id":0,"fields":[{"name":"year_field_year","transform":"year","source-id":1,"field-id":1000},{"name":"month_field_month","transform":"month","source-id":2,"field-id":1001},{"name":"day_field_day","transform":"day","source-id":3,"field-id":1002},{"name":"hour_field_hour","transform":"hour","source-id":4,"field-id":1003},{"name":"truncate_field_trunc","transform":"truncate[2]","source-id":5,"field-id":1004},{"name":"bucket_field_bucket","transform":"bucket[2]","source-id":6,"field-id":1005},{"name":"identity_field","transform":"identity","source-id":7,"field-id":1006}]}');
+CREATE EXTERNAL TABLE ice_t_transform_prop (id int, year_field date, 
month_field date, day_field date, hour_field timestamp, truncate_field string, 
bucket_field int, identity_field int) STORED BY ICEBERG TBLPROPERTIES 
('iceberg.mr.table.partition.spec'='{"spec-id":0,"fields":[{"name":"year_field_year","transform":"year","source-id":2,"field-id":1000},{"name":"month_field_month","transform":"month","source-id":3,"field-id":1001},{"name":"day_field_day","transform":"day","source-id":4,"field-id":1002},{"name":"hour_field_hour","transform":"hour","source-id":5,"field-id":1003},{"name":"truncate_field_trunc","transform":"truncate[2]","source-id":6,"field-id":1004},{"name":"bucket_field_bucket","transform":"bucket[2]","source-id":7,"field-id":1005},{"name":"identity_field","transform":"identity","source-id":8,"field-id":1006}]}');

Review comment:
   Makes sense, thx!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 712896)
Time Spent: 1.5h  (was: 1h 20m)

> Hive should set name mapping table property for migrated Iceberg tables
> ---
>
> Key: HIVE-25871
> URL: https://issues.apache.org/jira/browse/HIVE-25871
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Hive should  set the name-mapping table property during table migration.
> It would be useful for [column 
> projection|https://iceberg.apache.org/#spec/#column-projection] for files 
> without field ids.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-25871) Hive should set name mapping table property for migrated Iceberg tables

2022-01-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25871?focusedWorklogId=712899=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-712899
 ]

ASF GitHub Bot logged work on HIVE-25871:
-

Author: ASF GitHub Bot
Created on: 21/Jan/22 14:28
Start Date: 21/Jan/22 14:28
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on pull request #2948:
URL: https://github.com/apache/hive/pull/2948#issuecomment-1018551920


   LGTM, will merge this today unless @pvary has further comments


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 712899)
Time Spent: 1h 40m  (was: 1.5h)

> Hive should set name mapping table property for migrated Iceberg tables
> ---
>
> Key: HIVE-25871
> URL: https://issues.apache.org/jira/browse/HIVE-25871
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Hive should  set the name-mapping table property during table migration.
> It would be useful for [column 
> projection|https://iceberg.apache.org/#spec/#column-projection] for files 
> without field ids.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-25871) Hive should set name mapping table property for migrated Iceberg tables

2022-01-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25871?focusedWorklogId=712874=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-712874
 ]

ASF GitHub Bot logged work on HIVE-25871:
-

Author: ASF GitHub Bot
Created on: 21/Jan/22 13:45
Start Date: 21/Jan/22 13:45
Worklog Time Spent: 10m 
  Work Description: boroknagyz commented on a change in pull request #2948:
URL: https://github.com/apache/hive/pull/2948#discussion_r789669255



##
File path: 
iceberg/iceberg-handler/src/test/queries/positive/describe_iceberg_table.q
##
@@ -8,7 +8,7 @@ DROP TABLE IF EXISTS ice_t_transform;
 CREATE EXTERNAL TABLE ice_t_transform (year_field date, month_field date, 
day_field date, hour_field timestamp, truncate_field string, bucket_field int, 
identity_field int) PARTITIONED BY SPEC (year(year_field), month(month_field), 
day(day_field), hour(hour_field), truncate(2, truncate_field), bucket(2, 
bucket_field), identity_field) STORED BY ICEBERG;
 
 DROP TABLE IF EXISTS ice_t_transform_prop;
-CREATE EXTERNAL TABLE ice_t_transform_prop (id int, year_field date, 
month_field date, day_field date, hour_field timestamp, truncate_field string, 
bucket_field int, identity_field int) STORED BY ICEBERG TBLPROPERTIES 
('iceberg.mr.table.partition.spec'='{"spec-id":0,"fields":[{"name":"year_field_year","transform":"year","source-id":1,"field-id":1000},{"name":"month_field_month","transform":"month","source-id":2,"field-id":1001},{"name":"day_field_day","transform":"day","source-id":3,"field-id":1002},{"name":"hour_field_hour","transform":"hour","source-id":4,"field-id":1003},{"name":"truncate_field_trunc","transform":"truncate[2]","source-id":5,"field-id":1004},{"name":"bucket_field_bucket","transform":"bucket[2]","source-id":6,"field-id":1005},{"name":"identity_field","transform":"identity","source-id":7,"field-id":1006}]}');
+CREATE EXTERNAL TABLE ice_t_transform_prop (id int, year_field date, 
month_field date, day_field date, hour_field timestamp, truncate_field string, 
bucket_field int, identity_field int) STORED BY ICEBERG TBLPROPERTIES 
('iceberg.mr.table.partition.spec'='{"spec-id":0,"fields":[{"name":"year_field_year","transform":"year","source-id":2,"field-id":1000},{"name":"month_field_month","transform":"month","source-id":3,"field-id":1001},{"name":"day_field_day","transform":"day","source-id":4,"field-id":1002},{"name":"hour_field_hour","transform":"hour","source-id":5,"field-id":1003},{"name":"truncate_field_trunc","transform":"truncate[2]","source-id":6,"field-id":1004},{"name":"bucket_field_bucket","transform":"bucket[2]","source-id":7,"field-id":1005},{"name":"identity_field","transform":"identity","source-id":8,"field-id":1006}]}');

Review comment:
   Prior to this patch `HiveSchemaConverter` used 0-based indexing when it 
assigned the field ids. E.g. in the above statement it would assign field id 0 
to `id`, field id 1 to `year_field`, and so on. Hence in 
'iceberg.mr.table.partition.spec' the source-id 1 referred to the `year_field`. 
Everything was fine, but when Iceberg creates a table it reassigns the field 
ids using 1-based indexing (field id 1 is `id`, field id 2 is `year_field`). 
And Iceberg is smart enough to use the correct ids in the partition spec, i.e. 
it replaces source id 1 to source id 2 and so on.
   
   So everything worked OK, but you had to specify different field/source ids 
in Hive than the actual field/source ids assigned by Iceberg.
   
   With this change, you need to use the same 1-based indexing in the partition 
spec that Iceberg will use later.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 712874)
Time Spent: 1h 20m  (was: 1h 10m)

> Hive should set name mapping table property for migrated Iceberg tables
> ---
>
> Key: HIVE-25871
> URL: https://issues.apache.org/jira/browse/HIVE-25871
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Hive should  set the name-mapping table property during table migration.
> It would be useful for [column 
> projection|https://iceberg.apache.org/#spec/#column-projection] for files 
> without field ids.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-25842) Reimplement delta file metric collection

2022-01-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25842?focusedWorklogId=712778=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-712778
 ]

ASF GitHub Bot logged work on HIVE-25842:
-

Author: ASF GitHub Bot
Created on: 21/Jan/22 11:26
Start Date: 21/Jan/22 11:26
Worklog Time Spent: 10m 
  Work Description: lcspinter commented on a change in pull request #2916:
URL: https://github.com/apache/hive/pull/2916#discussion_r789579851



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java
##
@@ -412,7 +421,11 @@ private boolean removeFiles(String location, 
ValidWriteIdList writeIdList, Compa
 }
 StringBuilder extraDebugInfo = new 
StringBuilder("[").append(obsoleteDirs.stream()
 .map(Path::getName).collect(Collectors.joining(",")));
-return remove(location, ci, obsoleteDirs, true, fs, extraDebugInfo);
+boolean success = remove(location, ci, obsoleteDirs, true, fs, 
extraDebugInfo);
+if (dir.getObsolete().size() > 0) {
+  updateDeltaFilesMetrics(ci.dbname, ci.tableName, ci.partName, 
obsoleteDirs);

Review comment:
   Reverted.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 712778)
Time Spent: 6h 50m  (was: 6h 40m)

> Reimplement delta file metric collection
> 
>
> Key: HIVE-25842
> URL: https://issues.apache.org/jira/browse/HIVE-25842
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6h 50m
>  Remaining Estimate: 0h
>
> FUNCTIONALITY: Metrics are collected only when a Tez query runs a table 
> (select * and select count( * ) don't update the metrics)
> Metrics aren't updated after compaction or cleaning after compaction, so 
> users will probably see "issues" with compaction (like many active or 
> obsolete or small deltas) that don't exist.
> RISK: Metrics are collected during queries – we tried to put a try-catch 
> around each method in DeltaFilesMetricsReporter but of course this isn't 
> foolproof. This is a HUGE performance and functionality liability. Tests 
> caught some issues, but our tests aren't perfect.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Assigned] (HIVE-25888) Improve RuleEventLogger to also print input rels in FULL_PLAN mode

2022-01-21 Thread Alessandro Solimando (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Solimando reassigned HIVE-25888:
---


> Improve RuleEventLogger to also print input rels in FULL_PLAN mode
> --
>
> Key: HIVE-25888
> URL: https://issues.apache.org/jira/browse/HIVE-25888
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO
>Affects Versions: 4.0.0
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>
> Hive porting of CALCITE-4991, refer to that ticket for more details.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-25871) Hive should set name mapping table property for migrated Iceberg tables

2022-01-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25871?focusedWorklogId=712720=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-712720
 ]

ASF GitHub Bot logged work on HIVE-25871:
-

Author: ASF GitHub Bot
Created on: 21/Jan/22 10:29
Start Date: 21/Jan/22 10:29
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2948:
URL: https://github.com/apache/hive/pull/2948#discussion_r789540515



##
File path: 
iceberg/iceberg-handler/src/test/queries/positive/describe_iceberg_table.q
##
@@ -8,7 +8,7 @@ DROP TABLE IF EXISTS ice_t_transform;
 CREATE EXTERNAL TABLE ice_t_transform (year_field date, month_field date, 
day_field date, hour_field timestamp, truncate_field string, bucket_field int, 
identity_field int) PARTITIONED BY SPEC (year(year_field), month(month_field), 
day(day_field), hour(hour_field), truncate(2, truncate_field), bucket(2, 
bucket_field), identity_field) STORED BY ICEBERG;
 
 DROP TABLE IF EXISTS ice_t_transform_prop;
-CREATE EXTERNAL TABLE ice_t_transform_prop (id int, year_field date, 
month_field date, day_field date, hour_field timestamp, truncate_field string, 
bucket_field int, identity_field int) STORED BY ICEBERG TBLPROPERTIES 
('iceberg.mr.table.partition.spec'='{"spec-id":0,"fields":[{"name":"year_field_year","transform":"year","source-id":1,"field-id":1000},{"name":"month_field_month","transform":"month","source-id":2,"field-id":1001},{"name":"day_field_day","transform":"day","source-id":3,"field-id":1002},{"name":"hour_field_hour","transform":"hour","source-id":4,"field-id":1003},{"name":"truncate_field_trunc","transform":"truncate[2]","source-id":5,"field-id":1004},{"name":"bucket_field_bucket","transform":"bucket[2]","source-id":6,"field-id":1005},{"name":"identity_field","transform":"identity","source-id":7,"field-id":1006}]}');
+CREATE EXTERNAL TABLE ice_t_transform_prop (id int, year_field date, 
month_field date, day_field date, hour_field timestamp, truncate_field string, 
bucket_field int, identity_field int) STORED BY ICEBERG TBLPROPERTIES 
('iceberg.mr.table.partition.spec'='{"spec-id":0,"fields":[{"name":"year_field_year","transform":"year","source-id":2,"field-id":1000},{"name":"month_field_month","transform":"month","source-id":3,"field-id":1001},{"name":"day_field_day","transform":"day","source-id":4,"field-id":1002},{"name":"hour_field_hour","transform":"hour","source-id":5,"field-id":1003},{"name":"truncate_field_trunc","transform":"truncate[2]","source-id":6,"field-id":1004},{"name":"bucket_field_bucket","transform":"bucket[2]","source-id":7,"field-id":1005},{"name":"identity_field","transform":"identity","source-id":8,"field-id":1006}]}');

Review comment:
   I'm probably missing something obvious - can you explain why the 
source-id values had to be incremented? Thanks!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 712720)
Time Spent: 1h 10m  (was: 1h)

> Hive should set name mapping table property for migrated Iceberg tables
> ---
>
> Key: HIVE-25871
> URL: https://issues.apache.org/jira/browse/HIVE-25871
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Hive should  set the name-mapping table property during table migration.
> It would be useful for [column 
> projection|https://iceberg.apache.org/#spec/#column-projection] for files 
> without field ids.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HIVE-23644) Fix FindBug issues in hive-jdbc

2022-01-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-23644:
--
Labels: pull-request-available  (was: )

> Fix FindBug issues in hive-jdbc
> ---
>
> Key: HIVE-23644
> URL: https://issues.apache.org/jira/browse/HIVE-23644
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Panagiotis Garefalakis
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
> Attachments: spotbugsXml.xml
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-23644) Fix FindBug issues in hive-jdbc

2022-01-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23644?focusedWorklogId=712712=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-712712
 ]

ASF GitHub Bot logged work on HIVE-23644:
-

Author: ASF GitHub Bot
Created on: 21/Jan/22 10:14
Start Date: 21/Jan/22 10:14
Worklog Time Spent: 10m 
  Work Description: mbathori-cloudera opened a new pull request #2960:
URL: https://github.com/apache/hive/pull/2960


   ### What changes were proposed in this pull request?
   Fixing FindBug issues in hive-jdbc module.
   
   ### Why are the changes needed?
   Get rid of violations and issues detected by findBug, and enforce these 
rules on precommit.
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   `mvn -Pspotbugs 
-Dorg.slf4j.simpleLogger.log.org.apache.maven.plugin.surefire.SurefirePlugin=INFO
 -pl :hive-jdbc test-compile 
com.github.spotbugs:spotbugs-maven-plugin:4.0.0:check`
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 712712)
Remaining Estimate: 0h
Time Spent: 10m

> Fix FindBug issues in hive-jdbc
> ---
>
> Key: HIVE-23644
> URL: https://issues.apache.org/jira/browse/HIVE-23644
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Panagiotis Garefalakis
>Assignee: David Mollitor
>Priority: Major
> Attachments: spotbugsXml.xml
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-25842) Reimplement delta file metric collection

2022-01-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25842?focusedWorklogId=712682=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-712682
 ]

ASF GitHub Bot logged work on HIVE-25842:
-

Author: ASF GitHub Bot
Created on: 21/Jan/22 08:53
Start Date: 21/Jan/22 08:53
Worklog Time Spent: 10m 
  Work Description: klcopp commented on a change in pull request #2916:
URL: https://github.com/apache/hive/pull/2916#discussion_r789465573



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java
##
@@ -412,7 +421,11 @@ private boolean removeFiles(String location, 
ValidWriteIdList writeIdList, Compa
 }
 StringBuilder extraDebugInfo = new 
StringBuilder("[").append(obsoleteDirs.stream()
 .map(Path::getName).collect(Collectors.joining(",")));
-return remove(location, ci, obsoleteDirs, true, fs, extraDebugInfo);
+boolean success = remove(location, ci, obsoleteDirs, true, fs, 
extraDebugInfo);
+if (dir.getObsolete().size() > 0) {
+  updateDeltaFilesMetrics(ci.dbname, ci.tableName, ci.partName, 
obsoleteDirs);

Review comment:
   I regret suggesting that we include aborted directories in the obsolete 
count. 
   1. There are other metrics about aborted directories.
   2. previouslyActiveDeltas - (obsolete + aborted) != currentlyActiveDeltas, 
so the active delta count would be off.
   
   My bad :/




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 712682)
Time Spent: 6h 40m  (was: 6.5h)

> Reimplement delta file metric collection
> 
>
> Key: HIVE-25842
> URL: https://issues.apache.org/jira/browse/HIVE-25842
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6h 40m
>  Remaining Estimate: 0h
>
> FUNCTIONALITY: Metrics are collected only when a Tez query runs a table 
> (select * and select count( * ) don't update the metrics)
> Metrics aren't updated after compaction or cleaning after compaction, so 
> users will probably see "issues" with compaction (like many active or 
> obsolete or small deltas) that don't exist.
> RISK: Metrics are collected during queries – we tried to put a try-catch 
> around each method in DeltaFilesMetricsReporter but of course this isn't 
> foolproof. This is a HUGE performance and functionality liability. Tests 
> caught some issues, but our tests aren't perfect.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-25842) Reimplement delta file metric collection

2022-01-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25842?focusedWorklogId=712681=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-712681
 ]

ASF GitHub Bot logged work on HIVE-25842:
-

Author: ASF GitHub Bot
Created on: 21/Jan/22 08:38
Start Date: 21/Jan/22 08:38
Worklog Time Spent: 10m 
  Work Description: klcopp commented on a change in pull request #2916:
URL: https://github.com/apache/hive/pull/2916#discussion_r789454143



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/metrics/DeltaFilesMetricReporter.java
##
@@ -139,157 +92,37 @@ public static DeltaFilesMetricReporter getInstance() {
 return InstanceHolder.instance;
   }
 
-  public static synchronized void init(HiveConf conf) throws Exception {
-getInstance().configure(conf);
+  public static synchronized void init(Configuration conf, TxnStore 
txnHandler) throws Exception {
+if (!initialized) {
+  getInstance().configure(conf, txnHandler);
+  initialized = true;
+}
   }
 
-  private void configure(HiveConf conf) throws Exception {
+  private void configure(Configuration conf, TxnStore txnHandler) throws 
Exception {
 long reportingInterval =
-HiveConf.getTimeVar(conf, 
HiveConf.ConfVars.HIVE_TXN_ACID_METRICS_REPORTING_INTERVAL, TimeUnit.SECONDS);
-hiveEntitySeparator = conf.getVar(HiveConf.ConfVars.HIVE_ENTITY_SEPARATOR);
+MetastoreConf.getTimeVar(conf, 
MetastoreConf.ConfVars.METASTORE_DELTAMETRICS_REPORTING_INTERVAL, 
TimeUnit.SECONDS);
+
+maxCacheSize = MetastoreConf.getIntVar(conf, 
MetastoreConf.ConfVars.METASTORE_DELTAMETRICS_MAX_CACHE_SIZE);
 
-initCachesForMetrics(conf);
 initObjectsForMetrics();
 
 ThreadFactory threadFactory =
 new 
ThreadFactoryBuilder().setDaemon(true).setNameFormat("DeltaFilesMetricReporter 
%d").build();
-executorService = 
Executors.newSingleThreadScheduledExecutor(threadFactory);
-executorService.scheduleAtFixedRate(new ReportingTask(), 0, 
reportingInterval, TimeUnit.SECONDS);
+reporterExecutorService = 
Executors.newSingleThreadScheduledExecutor(threadFactory);
+reporterExecutorService.scheduleAtFixedRate(new ReportingTask(txnHandler), 
0, reportingInterval, TimeUnit.SECONDS);
 
 LOG.info("Started DeltaFilesMetricReporter thread");

Review comment:
   Never mind, I had reading problems :D 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 712681)
Time Spent: 6.5h  (was: 6h 20m)

> Reimplement delta file metric collection
> 
>
> Key: HIVE-25842
> URL: https://issues.apache.org/jira/browse/HIVE-25842
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> FUNCTIONALITY: Metrics are collected only when a Tez query runs a table 
> (select * and select count( * ) don't update the metrics)
> Metrics aren't updated after compaction or cleaning after compaction, so 
> users will probably see "issues" with compaction (like many active or 
> obsolete or small deltas) that don't exist.
> RISK: Metrics are collected during queries – we tried to put a try-catch 
> around each method in DeltaFilesMetricsReporter but of course this isn't 
> foolproof. This is a HUGE performance and functionality liability. Tests 
> caught some issues, but our tests aren't perfect.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (HIVE-25401) Insert overwrite a table which location is on other cluster fail in kerberos cluster

[jira] [Resolved] (HIVE-25783) Refine standalone-metastore module pom.xml files

[jira] [Commented] (HIVE-25783) Refine standalone-metastore module pom.xml files

[jira] [Updated] (HIVE-25783) Refine standalone-metastore module pom.xml files

[jira] [Work logged] (HIVE-25783) Refine standalone-metastore module pom.xml files

[jira] [Work logged] (HIVE-25401) Insert overwrite a table which location is on other cluster fail in kerberos cluster

[jira] [Work logged] (HIVE-24830) Revise RowSchema mutability usage

[jira] [Work logged] (HIVE-25352) Optimise DBTokenStore for RDBMS

[jira] [Work logged] (HIVE-25621) Alter table partition compact/concatenate commands should send HivePrivilegeObjects for Authz

[jira] [Work logged] (HIVE-25871) Hive should set name mapping table property for migrated Iceberg tables

[jira] [Updated] (HIVE-25889) Increase default value of "metastore.thread.pool.size"

[jira] [Work logged] (HIVE-25889) Increase default value of "metastore.thread.pool.size"

[jira] [Assigned] (HIVE-25889) Increase default value of "metastore.thread.pool.size"

[jira] [Commented] (HIVE-25842) Reimplement delta file metric collection

[jira] [Resolved] (HIVE-25842) Reimplement delta file metric collection

[jira] [Work logged] (HIVE-25842) Reimplement delta file metric collection

[jira] [Work logged] (HIVE-25871) Hive should set name mapping table property for migrated Iceberg tables

[jira] [Updated] (HIVE-25888) Improve RuleEventLogger to also print input rels in FULL_PLAN mode

[jira] [Work logged] (HIVE-25888) Improve RuleEventLogger to also print input rels in FULL_PLAN mode

[jira] [Work logged] (HIVE-25871) Hive should set name mapping table property for migrated Iceberg tables

[jira] [Work logged] (HIVE-25871) Hive should set name mapping table property for migrated Iceberg tables

[jira] [Work logged] (HIVE-25871) Hive should set name mapping table property for migrated Iceberg tables

[jira] [Work logged] (HIVE-25871) Hive should set name mapping table property for migrated Iceberg tables

[jira] [Work logged] (HIVE-25871) Hive should set name mapping table property for migrated Iceberg tables

[jira] [Work logged] (HIVE-25871) Hive should set name mapping table property for migrated Iceberg tables

[jira] [Work logged] (HIVE-25842) Reimplement delta file metric collection

[jira] [Assigned] (HIVE-25888) Improve RuleEventLogger to also print input rels in FULL_PLAN mode

[jira] [Work logged] (HIVE-25871) Hive should set name mapping table property for migrated Iceberg tables

[jira] [Updated] (HIVE-23644) Fix FindBug issues in hive-jdbc

[jira] [Work logged] (HIVE-23644) Fix FindBug issues in hive-jdbc

[jira] [Work logged] (HIVE-25842) Reimplement delta file metric collection

[jira] [Work logged] (HIVE-25842) Reimplement delta file metric collection

32 matches

Site Navigation

Mail list logo

Footer information