[jira] [Commented] (HIVE-25401) Insert overwrite a table which location is on other cluster fail in kerberos cluster
[ https://issues.apache.org/jira/browse/HIVE-25401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17480361#comment-17480361 ] zhangbutao commented on HIVE-25401: --- I think you can try this parameter and its value is mutiple cluster deFaultFS : mapreduce.job.hdfs-servers hdfs://cluster1,hdfs://cluster2 > Insert overwrite a table which location is on other cluster fail in > kerberos cluster > -- > > Key: HIVE-25401 > URL: https://issues.apache.org/jira/browse/HIVE-25401 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 2.3.0, 3.1.2 > Environment: hive 2.3 > hadoop3 cluster with kerberos >Reporter: Max Xie >Assignee: Max Xie >Priority: Minor > Labels: pull-request-available > Attachments: HIVE-25401.patch, image-2021-07-29-14-25-23-418.png > > Time Spent: 1.5h > Remaining Estimate: 0h > > we have tow hdfs clusters with kerberos security, it means that mapreduce > task need delegation tokens to authenticate namenode when hive on mapreduce > run. > Insert overwrite a table which location is on other cluster fail in kerberos > cluster. For example, > # yarn cluster's default fs is hdfs://cluster1 > # tb1's location is hdfs://cluster1/tb1 > # tb2's location is hdfs://cluster2/tb2 > # sql `INSERT OVERWRITE TABLE tb2 SELECT * from tb1` run on yarn cluster > will fail > > reduce task error log: > !image-2021-07-29-14-25-23-418.png! > How to fix: > After dig it, web found mapreduce job just obtain delegation tokens for input > files in FileInputFormat. But Hive context get extendal scratchDir base on > table's location, If the table 's location is on other cluster, the > delegation token will not be obtained. > So we need to obtaine delegation tokens for hive scratchDirs before hive > submit mapreduce job. > > How to test: > no test > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (HIVE-25783) Refine standalone-metastore module pom.xml files
[ https://issues.apache.org/jira/browse/HIVE-25783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihua Deng resolved HIVE-25783. Resolution: Fixed > Refine standalone-metastore module pom.xml files > > > Key: HIVE-25783 > URL: https://issues.apache.org/jira/browse/HIVE-25783 > Project: Hive > Issue Type: Improvement > Components: Build Infrastructure >Reporter: Zhihua Deng >Assignee: Zhihua Deng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > In HIVE-25774, we added ASF license for newly created files in > standalone-metastore, but we may face the same issue latter on. The Jira > tries to investigate if we can provide some common ways to make sure that the > newly added source files contain the ASF license information. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HIVE-25783) Refine standalone-metastore module pom.xml files
[ https://issues.apache.org/jira/browse/HIVE-25783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17480310#comment-17480310 ] Zhihua Deng commented on HIVE-25783: Merged to master. Thank you for the feedback and review, [~pvary]! > Refine standalone-metastore module pom.xml files > > > Key: HIVE-25783 > URL: https://issues.apache.org/jira/browse/HIVE-25783 > Project: Hive > Issue Type: Improvement > Components: Build Infrastructure >Reporter: Zhihua Deng >Assignee: Zhihua Deng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > In HIVE-25774, we added ASF license for newly created files in > standalone-metastore, but we may face the same issue latter on. The Jira > tries to investigate if we can provide some common ways to make sure that the > newly added source files contain the ASF license information. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HIVE-25783) Refine standalone-metastore module pom.xml files
[ https://issues.apache.org/jira/browse/HIVE-25783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihua Deng updated HIVE-25783: --- Fix Version/s: 4.0.0 > Refine standalone-metastore module pom.xml files > > > Key: HIVE-25783 > URL: https://issues.apache.org/jira/browse/HIVE-25783 > Project: Hive > Issue Type: Improvement > Components: Build Infrastructure >Reporter: Zhihua Deng >Assignee: Zhihua Deng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > In HIVE-25774, we added ASF license for newly created files in > standalone-metastore, but we may face the same issue latter on. The Jira > tries to investigate if we can provide some common ways to make sure that the > newly added source files contain the ASF license information. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25783) Refine standalone-metastore module pom.xml files
[ https://issues.apache.org/jira/browse/HIVE-25783?focusedWorklogId=713147=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-713147 ] ASF GitHub Bot logged work on HIVE-25783: - Author: ASF GitHub Bot Created on: 22/Jan/22 00:49 Start Date: 22/Jan/22 00:49 Worklog Time Spent: 10m Work Description: dengzhhu653 merged pull request #2852: URL: https://github.com/apache/hive/pull/2852 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 713147) Time Spent: 1h 50m (was: 1h 40m) > Refine standalone-metastore module pom.xml files > > > Key: HIVE-25783 > URL: https://issues.apache.org/jira/browse/HIVE-25783 > Project: Hive > Issue Type: Improvement > Components: Build Infrastructure >Reporter: Zhihua Deng >Assignee: Zhihua Deng >Priority: Major > Labels: pull-request-available > Time Spent: 1h 50m > Remaining Estimate: 0h > > In HIVE-25774, we added ASF license for newly created files in > standalone-metastore, but we may face the same issue latter on. The Jira > tries to investigate if we can provide some common ways to make sure that the > newly added source files contain the ASF license information. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25401) Insert overwrite a table which location is on other cluster fail in kerberos cluster
[ https://issues.apache.org/jira/browse/HIVE-25401?focusedWorklogId=713138=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-713138 ] ASF GitHub Bot logged work on HIVE-25401: - Author: ASF GitHub Bot Created on: 22/Jan/22 00:11 Start Date: 22/Jan/22 00:11 Worklog Time Spent: 10m Work Description: github-actions[bot] commented on pull request #2544: URL: https://github.com/apache/hive/pull/2544#issuecomment-1018984793 This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Feel free to reach out on the d...@hive.apache.org list if the patch is in need of reviews. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 713138) Time Spent: 1.5h (was: 1h 20m) > Insert overwrite a table which location is on other cluster fail in > kerberos cluster > -- > > Key: HIVE-25401 > URL: https://issues.apache.org/jira/browse/HIVE-25401 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 2.3.0, 3.1.2 > Environment: hive 2.3 > hadoop3 cluster with kerberos >Reporter: Max Xie >Assignee: Max Xie >Priority: Minor > Labels: pull-request-available > Attachments: HIVE-25401.patch, image-2021-07-29-14-25-23-418.png > > Time Spent: 1.5h > Remaining Estimate: 0h > > we have tow hdfs clusters with kerberos security, it means that mapreduce > task need delegation tokens to authenticate namenode when hive on mapreduce > run. > Insert overwrite a table which location is on other cluster fail in kerberos > cluster. For example, > # yarn cluster's default fs is hdfs://cluster1 > # tb1's location is hdfs://cluster1/tb1 > # tb2's location is hdfs://cluster2/tb2 > # sql `INSERT OVERWRITE TABLE tb2 SELECT * from tb1` run on yarn cluster > will fail > > reduce task error log: > !image-2021-07-29-14-25-23-418.png! > How to fix: > After dig it, web found mapreduce job just obtain delegation tokens for input > files in FileInputFormat. But Hive context get extendal scratchDir base on > table's location, If the table 's location is on other cluster, the > delegation token will not be obtained. > So we need to obtaine delegation tokens for hive scratchDirs before hive > submit mapreduce job. > > How to test: > no test > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-24830) Revise RowSchema mutability usage
[ https://issues.apache.org/jira/browse/HIVE-24830?focusedWorklogId=713136=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-713136 ] ASF GitHub Bot logged work on HIVE-24830: - Author: ASF GitHub Bot Created on: 22/Jan/22 00:11 Start Date: 22/Jan/22 00:11 Worklog Time Spent: 10m Work Description: github-actions[bot] commented on pull request #2019: URL: https://github.com/apache/hive/pull/2019#issuecomment-1018984833 This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Feel free to reach out on the d...@hive.apache.org list if the patch is in need of reviews. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 713136) Time Spent: 1h 10m (was: 1h) > Revise RowSchema mutability usage > - > > Key: HIVE-24830 > URL: https://issues.apache.org/jira/browse/HIVE-24830 > Project: Hive > Issue Type: Bug >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > RowSchema is essentially a container class for a list of fields. > * it can be constructed from a "list" > * the list can be set > * the list can be accessed > none of the above methods try to protect the data inside; hence the following > could easily happen: > {code} > s=o1.getSchema(); > col=s.getCol("favourite") > col.setInternalName("asd"); // will modify o1 schema > newSchema.add(col); > o2.setSchema(newSchema); > o2.getSchema().get("asd").setInternalName("xxx"); // will modify o1 and o2 > schema > [...] > {code} > not sure how much of this is actually cruical; exploratory testrun revealed > some cases > https://github.com/apache/hive/pull/2019 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25352) Optimise DBTokenStore for RDBMS
[ https://issues.apache.org/jira/browse/HIVE-25352?focusedWorklogId=713135=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-713135 ] ASF GitHub Bot logged work on HIVE-25352: - Author: ASF GitHub Bot Created on: 22/Jan/22 00:11 Start Date: 22/Jan/22 00:11 Worklog Time Spent: 10m Work Description: github-actions[bot] commented on pull request #2499: URL: https://github.com/apache/hive/pull/2499#issuecomment-1018984814 This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Feel free to reach out on the d...@hive.apache.org list if the patch is in need of reviews. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 713135) Time Spent: 20m (was: 10m) > Optimise DBTokenStore for RDBMS > --- > > Key: HIVE-25352 > URL: https://issues.apache.org/jira/browse/HIVE-25352 > Project: Hive > Issue Type: Improvement >Reporter: Sahana Bhat >Assignee: Sahana Bhat >Priority: Major > Labels: pull-request-available, pull_request_available > Time Spent: 20m > Remaining Estimate: 0h > > The existing DBTokenStore implementation is very under optimised when an > RDBMS is used. > * All available tokens are fetched from the DB. The validity of each token > is determined based on its max date and renew date and deleted if required. > For a relational database like MySQL, a *query to fetch all rows with no > filters or pagination* can be costly and impact the performance of the > database and the server. > * From the token identifiers fetched, if the token hasn’t breached its max > date, the token information is again fetched from the database to validate > its renew date. > * The token expiration daemon is part of the Hive system. In a cluster of > tens or hundreds of Hive servers, the daemon runs on each of the servers. > This means that the flow of fetching of tokens, validation for expiration and > deleting them is executed in duplication in each of the servers. The > *duplication of the functionality in every server* along with the problems > discussed in Point 1 & 2, can severely degrade the performance of the > database. > This issue will address the issues mentioned in 1 & 2. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25621) Alter table partition compact/concatenate commands should send HivePrivilegeObjects for Authz
[ https://issues.apache.org/jira/browse/HIVE-25621?focusedWorklogId=713134=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-713134 ] ASF GitHub Bot logged work on HIVE-25621: - Author: ASF GitHub Bot Created on: 22/Jan/22 00:11 Start Date: 22/Jan/22 00:11 Worklog Time Spent: 10m Work Description: github-actions[bot] closed pull request #2731: URL: https://github.com/apache/hive/pull/2731 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 713134) Time Spent: 0.5h (was: 20m) > Alter table partition compact/concatenate commands should send > HivePrivilegeObjects for Authz > - > > Key: HIVE-25621 > URL: https://issues.apache.org/jira/browse/HIVE-25621 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Sai Hemanth Gantasala >Assignee: Sai Hemanth Gantasala >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > # Run the following queries > Create table temp(c0 int) partitioned by (c1 int); > Insert into temp values(1,1); > ALTER TABLE temp PARTITION (c1=1) COMPACT 'minor'; > ALTER TABLE temp PARTITION (c1=1) CONCATENATE; > Insert into temp values(1,1); > # The above compact/concatenate commands are currently not sending any hive > privilege objects for authorization. Hive needs to send these objects to > avoid malicious users doing any operation. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25871) Hive should set name mapping table property for migrated Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-25871?focusedWorklogId=713004=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-713004 ] ASF GitHub Bot logged work on HIVE-25871: - Author: ASF GitHub Bot Created on: 21/Jan/22 17:36 Start Date: 21/Jan/22 17:36 Worklog Time Spent: 10m Work Description: boroknagyz commented on pull request #2948: URL: https://github.com/apache/hive/pull/2948#issuecomment-1018721560 FYI I've uploaded a PR to Iceberg: https://github.com/apache/iceberg/pull/3947 It only contains the 1-based indexing of this PR as the table migration code is not present in the Iceberg repo. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 713004) Time Spent: 2.5h (was: 2h 20m) > Hive should set name mapping table property for migrated Iceberg tables > --- > > Key: HIVE-25871 > URL: https://issues.apache.org/jira/browse/HIVE-25871 > Project: Hive > Issue Type: Bug >Reporter: Zoltán Borók-Nagy >Assignee: Zoltán Borók-Nagy >Priority: Major > Labels: pull-request-available > Time Spent: 2.5h > Remaining Estimate: 0h > > Hive should set the name-mapping table property during table migration. > It would be useful for [column > projection|https://iceberg.apache.org/#spec/#column-projection] for files > without field ids. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HIVE-25889) Increase default value of "metastore.thread.pool.size"
[ https://issues.apache.org/jira/browse/HIVE-25889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-25889: -- Labels: pull-request-available (was: ) > Increase default value of "metastore.thread.pool.size" > -- > > Key: HIVE-25889 > URL: https://issues.apache.org/jira/browse/HIVE-25889 > Project: Hive > Issue Type: Improvement >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > HiveMetastore uses a threadpool to execute tasks listed under > "metastore.task.threads.remote" and "metastore.task.threads.always" configs. > The size of this threadpool is controlled by "metastore.thread.pool.size" > config which by default is set to 10. The number of tasks in the two lists > has grown significantly in the last two years, but the size of the pool > remained the same. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25889) Increase default value of "metastore.thread.pool.size"
[ https://issues.apache.org/jira/browse/HIVE-25889?focusedWorklogId=712940=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-712940 ] ASF GitHub Bot logged work on HIVE-25889: - Author: ASF GitHub Bot Created on: 21/Jan/22 15:28 Start Date: 21/Jan/22 15:28 Worklog Time Spent: 10m Work Description: lcspinter opened a new pull request #2962: URL: https://github.com/apache/hive/pull/2962 ### What changes were proposed in this pull request? Increase default value of "metastore.thread.pool.size" from 10 to 15. ### Why are the changes needed? HiveMetastore uses a threadpool to execute tasks listed under "metastore.task.threads.remote" and "metastore.task.threads.always" configs. The size of this threadpool is controlled by "metastore.thread.pool.size" config which by default is set to 10. The number of tasks in the two lists has grown significantly in the last two years, but the size of the pool remained the same. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 712940) Remaining Estimate: 0h Time Spent: 10m > Increase default value of "metastore.thread.pool.size" > -- > > Key: HIVE-25889 > URL: https://issues.apache.org/jira/browse/HIVE-25889 > Project: Hive > Issue Type: Improvement >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > HiveMetastore uses a threadpool to execute tasks listed under > "metastore.task.threads.remote" and "metastore.task.threads.always" configs. > The size of this threadpool is controlled by "metastore.thread.pool.size" > config which by default is set to 10. The number of tasks in the two lists > has grown significantly in the last two years, but the size of the pool > remained the same. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (HIVE-25889) Increase default value of "metastore.thread.pool.size"
[ https://issues.apache.org/jira/browse/HIVE-25889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Pintér reassigned HIVE-25889: > Increase default value of "metastore.thread.pool.size" > -- > > Key: HIVE-25889 > URL: https://issues.apache.org/jira/browse/HIVE-25889 > Project: Hive > Issue Type: Improvement >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > > HiveMetastore uses a threadpool to execute tasks listed under > "metastore.task.threads.remote" and "metastore.task.threads.always" configs. > The size of this threadpool is controlled by "metastore.thread.pool.size" > config which by default is set to 10. The number of tasks in the two lists > has grown significantly in the last two years, but the size of the pool > remained the same. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HIVE-25842) Reimplement delta file metric collection
[ https://issues.apache.org/jira/browse/HIVE-25842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17480121#comment-17480121 ] László Pintér commented on HIVE-25842: -- Submitted to master. Thanks [~klcopp] and [~dkuzmenko] for the review > Reimplement delta file metric collection > > > Key: HIVE-25842 > URL: https://issues.apache.org/jira/browse/HIVE-25842 > Project: Hive > Issue Type: Improvement >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > Labels: pull-request-available > Time Spent: 7h > Remaining Estimate: 0h > > FUNCTIONALITY: Metrics are collected only when a Tez query runs a table > (select * and select count( * ) don't update the metrics) > Metrics aren't updated after compaction or cleaning after compaction, so > users will probably see "issues" with compaction (like many active or > obsolete or small deltas) that don't exist. > RISK: Metrics are collected during queries – we tried to put a try-catch > around each method in DeltaFilesMetricsReporter but of course this isn't > foolproof. This is a HUGE performance and functionality liability. Tests > caught some issues, but our tests aren't perfect. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (HIVE-25842) Reimplement delta file metric collection
[ https://issues.apache.org/jira/browse/HIVE-25842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Pintér resolved HIVE-25842. -- Resolution: Fixed > Reimplement delta file metric collection > > > Key: HIVE-25842 > URL: https://issues.apache.org/jira/browse/HIVE-25842 > Project: Hive > Issue Type: Improvement >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > Labels: pull-request-available > Time Spent: 7h > Remaining Estimate: 0h > > FUNCTIONALITY: Metrics are collected only when a Tez query runs a table > (select * and select count( * ) don't update the metrics) > Metrics aren't updated after compaction or cleaning after compaction, so > users will probably see "issues" with compaction (like many active or > obsolete or small deltas) that don't exist. > RISK: Metrics are collected during queries – we tried to put a try-catch > around each method in DeltaFilesMetricsReporter but of course this isn't > foolproof. This is a HUGE performance and functionality liability. Tests > caught some issues, but our tests aren't perfect. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25842) Reimplement delta file metric collection
[ https://issues.apache.org/jira/browse/HIVE-25842?focusedWorklogId=712923=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-712923 ] ASF GitHub Bot logged work on HIVE-25842: - Author: ASF GitHub Bot Created on: 21/Jan/22 15:00 Start Date: 21/Jan/22 15:00 Worklog Time Spent: 10m Work Description: lcspinter merged pull request #2916: URL: https://github.com/apache/hive/pull/2916 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 712923) Time Spent: 7h (was: 6h 50m) > Reimplement delta file metric collection > > > Key: HIVE-25842 > URL: https://issues.apache.org/jira/browse/HIVE-25842 > Project: Hive > Issue Type: Improvement >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > Labels: pull-request-available > Time Spent: 7h > Remaining Estimate: 0h > > FUNCTIONALITY: Metrics are collected only when a Tez query runs a table > (select * and select count( * ) don't update the metrics) > Metrics aren't updated after compaction or cleaning after compaction, so > users will probably see "issues" with compaction (like many active or > obsolete or small deltas) that don't exist. > RISK: Metrics are collected during queries – we tried to put a try-catch > around each method in DeltaFilesMetricsReporter but of course this isn't > foolproof. This is a HUGE performance and functionality liability. Tests > caught some issues, but our tests aren't perfect. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25871) Hive should set name mapping table property for migrated Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-25871?focusedWorklogId=712922=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-712922 ] ASF GitHub Bot logged work on HIVE-25871: - Author: ASF GitHub Bot Created on: 21/Jan/22 14:59 Start Date: 21/Jan/22 14:59 Worklog Time Spent: 10m Work Description: boroknagyz commented on pull request #2948: URL: https://github.com/apache/hive/pull/2948#issuecomment-1018586068 Ah right, things are currently being duplicated between Hive and Iceberg. Sure, I'll happily add these changes to Iceberg as well! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 712922) Time Spent: 2h 20m (was: 2h 10m) > Hive should set name mapping table property for migrated Iceberg tables > --- > > Key: HIVE-25871 > URL: https://issues.apache.org/jira/browse/HIVE-25871 > Project: Hive > Issue Type: Bug >Reporter: Zoltán Borók-Nagy >Assignee: Zoltán Borók-Nagy >Priority: Major > Labels: pull-request-available > Time Spent: 2h 20m > Remaining Estimate: 0h > > Hive should set the name-mapping table property during table migration. > It would be useful for [column > projection|https://iceberg.apache.org/#spec/#column-projection] for files > without field ids. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HIVE-25888) Improve RuleEventLogger to also print input rels in FULL_PLAN mode
[ https://issues.apache.org/jira/browse/HIVE-25888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-25888: -- Labels: pull-request-available (was: ) > Improve RuleEventLogger to also print input rels in FULL_PLAN mode > -- > > Key: HIVE-25888 > URL: https://issues.apache.org/jira/browse/HIVE-25888 > Project: Hive > Issue Type: Improvement > Components: CBO >Affects Versions: 4.0.0 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Hive porting of CALCITE-4991, refer to that ticket for more details. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25888) Improve RuleEventLogger to also print input rels in FULL_PLAN mode
[ https://issues.apache.org/jira/browse/HIVE-25888?focusedWorklogId=712919=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-712919 ] ASF GitHub Bot logged work on HIVE-25888: - Author: ASF GitHub Bot Created on: 21/Jan/22 14:51 Start Date: 21/Jan/22 14:51 Worklog Time Spent: 10m Work Description: asolimando opened a new pull request #2961: URL: https://github.com/apache/hive/pull/2961 …PLAN mode ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 712919) Remaining Estimate: 0h Time Spent: 10m > Improve RuleEventLogger to also print input rels in FULL_PLAN mode > -- > > Key: HIVE-25888 > URL: https://issues.apache.org/jira/browse/HIVE-25888 > Project: Hive > Issue Type: Improvement > Components: CBO >Affects Versions: 4.0.0 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Hive porting of CALCITE-4991, refer to that ticket for more details. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25871) Hive should set name mapping table property for migrated Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-25871?focusedWorklogId=712913=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-712913 ] ASF GitHub Bot logged work on HIVE-25871: - Author: ASF GitHub Bot Created on: 21/Jan/22 14:43 Start Date: 21/Jan/22 14:43 Worklog Time Spent: 10m Work Description: marton-bod commented on pull request #2948: URL: https://github.com/apache/hive/pull/2948#issuecomment-1018568724 Thanks for the contribution @boroknagyz! As Peter mentioned it would be great to get the relevant parts into the upstream Iceberg code base as well - is this something you would be fancy doing too? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 712913) Time Spent: 2h 10m (was: 2h) > Hive should set name mapping table property for migrated Iceberg tables > --- > > Key: HIVE-25871 > URL: https://issues.apache.org/jira/browse/HIVE-25871 > Project: Hive > Issue Type: Bug >Reporter: Zoltán Borók-Nagy >Assignee: Zoltán Borók-Nagy >Priority: Major > Labels: pull-request-available > Time Spent: 2h 10m > Remaining Estimate: 0h > > Hive should set the name-mapping table property during table migration. > It would be useful for [column > projection|https://iceberg.apache.org/#spec/#column-projection] for files > without field ids. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25871) Hive should set name mapping table property for migrated Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-25871?focusedWorklogId=712911=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-712911 ] ASF GitHub Bot logged work on HIVE-25871: - Author: ASF GitHub Bot Created on: 21/Jan/22 14:42 Start Date: 21/Jan/22 14:42 Worklog Time Spent: 10m Work Description: marton-bod merged pull request #2948: URL: https://github.com/apache/hive/pull/2948 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 712911) Time Spent: 2h (was: 1h 50m) > Hive should set name mapping table property for migrated Iceberg tables > --- > > Key: HIVE-25871 > URL: https://issues.apache.org/jira/browse/HIVE-25871 > Project: Hive > Issue Type: Bug >Reporter: Zoltán Borók-Nagy >Assignee: Zoltán Borók-Nagy >Priority: Major > Labels: pull-request-available > Time Spent: 2h > Remaining Estimate: 0h > > Hive should set the name-mapping table property during table migration. > It would be useful for [column > projection|https://iceberg.apache.org/#spec/#column-projection] for files > without field ids. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25871) Hive should set name mapping table property for migrated Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-25871?focusedWorklogId=712908=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-712908 ] ASF GitHub Bot logged work on HIVE-25871: - Author: ASF GitHub Bot Created on: 21/Jan/22 14:40 Start Date: 21/Jan/22 14:40 Worklog Time Spent: 10m Work Description: pvary commented on pull request #2948: URL: https://github.com/apache/hive/pull/2948#issuecomment-1018564045 LGTM +1. I think this change should go into the Iceberg repo as well. What do you think? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 712908) Time Spent: 1h 50m (was: 1h 40m) > Hive should set name mapping table property for migrated Iceberg tables > --- > > Key: HIVE-25871 > URL: https://issues.apache.org/jira/browse/HIVE-25871 > Project: Hive > Issue Type: Bug >Reporter: Zoltán Borók-Nagy >Assignee: Zoltán Borók-Nagy >Priority: Major > Labels: pull-request-available > Time Spent: 1h 50m > Remaining Estimate: 0h > > Hive should set the name-mapping table property during table migration. > It would be useful for [column > projection|https://iceberg.apache.org/#spec/#column-projection] for files > without field ids. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25871) Hive should set name mapping table property for migrated Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-25871?focusedWorklogId=712896=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-712896 ] ASF GitHub Bot logged work on HIVE-25871: - Author: ASF GitHub Bot Created on: 21/Jan/22 14:28 Start Date: 21/Jan/22 14:28 Worklog Time Spent: 10m Work Description: marton-bod commented on a change in pull request #2948: URL: https://github.com/apache/hive/pull/2948#discussion_r789702644 ## File path: iceberg/iceberg-handler/src/test/queries/positive/describe_iceberg_table.q ## @@ -8,7 +8,7 @@ DROP TABLE IF EXISTS ice_t_transform; CREATE EXTERNAL TABLE ice_t_transform (year_field date, month_field date, day_field date, hour_field timestamp, truncate_field string, bucket_field int, identity_field int) PARTITIONED BY SPEC (year(year_field), month(month_field), day(day_field), hour(hour_field), truncate(2, truncate_field), bucket(2, bucket_field), identity_field) STORED BY ICEBERG; DROP TABLE IF EXISTS ice_t_transform_prop; -CREATE EXTERNAL TABLE ice_t_transform_prop (id int, year_field date, month_field date, day_field date, hour_field timestamp, truncate_field string, bucket_field int, identity_field int) STORED BY ICEBERG TBLPROPERTIES ('iceberg.mr.table.partition.spec'='{"spec-id":0,"fields":[{"name":"year_field_year","transform":"year","source-id":1,"field-id":1000},{"name":"month_field_month","transform":"month","source-id":2,"field-id":1001},{"name":"day_field_day","transform":"day","source-id":3,"field-id":1002},{"name":"hour_field_hour","transform":"hour","source-id":4,"field-id":1003},{"name":"truncate_field_trunc","transform":"truncate[2]","source-id":5,"field-id":1004},{"name":"bucket_field_bucket","transform":"bucket[2]","source-id":6,"field-id":1005},{"name":"identity_field","transform":"identity","source-id":7,"field-id":1006}]}'); +CREATE EXTERNAL TABLE ice_t_transform_prop (id int, year_field date, month_field date, day_field date, hour_field timestamp, truncate_field string, bucket_field int, identity_field int) STORED BY ICEBERG TBLPROPERTIES ('iceberg.mr.table.partition.spec'='{"spec-id":0,"fields":[{"name":"year_field_year","transform":"year","source-id":2,"field-id":1000},{"name":"month_field_month","transform":"month","source-id":3,"field-id":1001},{"name":"day_field_day","transform":"day","source-id":4,"field-id":1002},{"name":"hour_field_hour","transform":"hour","source-id":5,"field-id":1003},{"name":"truncate_field_trunc","transform":"truncate[2]","source-id":6,"field-id":1004},{"name":"bucket_field_bucket","transform":"bucket[2]","source-id":7,"field-id":1005},{"name":"identity_field","transform":"identity","source-id":8,"field-id":1006}]}'); Review comment: Makes sense, thx! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 712896) Time Spent: 1.5h (was: 1h 20m) > Hive should set name mapping table property for migrated Iceberg tables > --- > > Key: HIVE-25871 > URL: https://issues.apache.org/jira/browse/HIVE-25871 > Project: Hive > Issue Type: Bug >Reporter: Zoltán Borók-Nagy >Assignee: Zoltán Borók-Nagy >Priority: Major > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > Hive should set the name-mapping table property during table migration. > It would be useful for [column > projection|https://iceberg.apache.org/#spec/#column-projection] for files > without field ids. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25871) Hive should set name mapping table property for migrated Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-25871?focusedWorklogId=712899=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-712899 ] ASF GitHub Bot logged work on HIVE-25871: - Author: ASF GitHub Bot Created on: 21/Jan/22 14:28 Start Date: 21/Jan/22 14:28 Worklog Time Spent: 10m Work Description: marton-bod commented on pull request #2948: URL: https://github.com/apache/hive/pull/2948#issuecomment-1018551920 LGTM, will merge this today unless @pvary has further comments -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 712899) Time Spent: 1h 40m (was: 1.5h) > Hive should set name mapping table property for migrated Iceberg tables > --- > > Key: HIVE-25871 > URL: https://issues.apache.org/jira/browse/HIVE-25871 > Project: Hive > Issue Type: Bug >Reporter: Zoltán Borók-Nagy >Assignee: Zoltán Borók-Nagy >Priority: Major > Labels: pull-request-available > Time Spent: 1h 40m > Remaining Estimate: 0h > > Hive should set the name-mapping table property during table migration. > It would be useful for [column > projection|https://iceberg.apache.org/#spec/#column-projection] for files > without field ids. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25871) Hive should set name mapping table property for migrated Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-25871?focusedWorklogId=712874=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-712874 ] ASF GitHub Bot logged work on HIVE-25871: - Author: ASF GitHub Bot Created on: 21/Jan/22 13:45 Start Date: 21/Jan/22 13:45 Worklog Time Spent: 10m Work Description: boroknagyz commented on a change in pull request #2948: URL: https://github.com/apache/hive/pull/2948#discussion_r789669255 ## File path: iceberg/iceberg-handler/src/test/queries/positive/describe_iceberg_table.q ## @@ -8,7 +8,7 @@ DROP TABLE IF EXISTS ice_t_transform; CREATE EXTERNAL TABLE ice_t_transform (year_field date, month_field date, day_field date, hour_field timestamp, truncate_field string, bucket_field int, identity_field int) PARTITIONED BY SPEC (year(year_field), month(month_field), day(day_field), hour(hour_field), truncate(2, truncate_field), bucket(2, bucket_field), identity_field) STORED BY ICEBERG; DROP TABLE IF EXISTS ice_t_transform_prop; -CREATE EXTERNAL TABLE ice_t_transform_prop (id int, year_field date, month_field date, day_field date, hour_field timestamp, truncate_field string, bucket_field int, identity_field int) STORED BY ICEBERG TBLPROPERTIES ('iceberg.mr.table.partition.spec'='{"spec-id":0,"fields":[{"name":"year_field_year","transform":"year","source-id":1,"field-id":1000},{"name":"month_field_month","transform":"month","source-id":2,"field-id":1001},{"name":"day_field_day","transform":"day","source-id":3,"field-id":1002},{"name":"hour_field_hour","transform":"hour","source-id":4,"field-id":1003},{"name":"truncate_field_trunc","transform":"truncate[2]","source-id":5,"field-id":1004},{"name":"bucket_field_bucket","transform":"bucket[2]","source-id":6,"field-id":1005},{"name":"identity_field","transform":"identity","source-id":7,"field-id":1006}]}'); +CREATE EXTERNAL TABLE ice_t_transform_prop (id int, year_field date, month_field date, day_field date, hour_field timestamp, truncate_field string, bucket_field int, identity_field int) STORED BY ICEBERG TBLPROPERTIES ('iceberg.mr.table.partition.spec'='{"spec-id":0,"fields":[{"name":"year_field_year","transform":"year","source-id":2,"field-id":1000},{"name":"month_field_month","transform":"month","source-id":3,"field-id":1001},{"name":"day_field_day","transform":"day","source-id":4,"field-id":1002},{"name":"hour_field_hour","transform":"hour","source-id":5,"field-id":1003},{"name":"truncate_field_trunc","transform":"truncate[2]","source-id":6,"field-id":1004},{"name":"bucket_field_bucket","transform":"bucket[2]","source-id":7,"field-id":1005},{"name":"identity_field","transform":"identity","source-id":8,"field-id":1006}]}'); Review comment: Prior to this patch `HiveSchemaConverter` used 0-based indexing when it assigned the field ids. E.g. in the above statement it would assign field id 0 to `id`, field id 1 to `year_field`, and so on. Hence in 'iceberg.mr.table.partition.spec' the source-id 1 referred to the `year_field`. Everything was fine, but when Iceberg creates a table it reassigns the field ids using 1-based indexing (field id 1 is `id`, field id 2 is `year_field`). And Iceberg is smart enough to use the correct ids in the partition spec, i.e. it replaces source id 1 to source id 2 and so on. So everything worked OK, but you had to specify different field/source ids in Hive than the actual field/source ids assigned by Iceberg. With this change, you need to use the same 1-based indexing in the partition spec that Iceberg will use later. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 712874) Time Spent: 1h 20m (was: 1h 10m) > Hive should set name mapping table property for migrated Iceberg tables > --- > > Key: HIVE-25871 > URL: https://issues.apache.org/jira/browse/HIVE-25871 > Project: Hive > Issue Type: Bug >Reporter: Zoltán Borók-Nagy >Assignee: Zoltán Borók-Nagy >Priority: Major > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > Hive should set the name-mapping table property during table migration. > It would be useful for [column > projection|https://iceberg.apache.org/#spec/#column-projection] for files > without field ids. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25842) Reimplement delta file metric collection
[ https://issues.apache.org/jira/browse/HIVE-25842?focusedWorklogId=712778=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-712778 ] ASF GitHub Bot logged work on HIVE-25842: - Author: ASF GitHub Bot Created on: 21/Jan/22 11:26 Start Date: 21/Jan/22 11:26 Worklog Time Spent: 10m Work Description: lcspinter commented on a change in pull request #2916: URL: https://github.com/apache/hive/pull/2916#discussion_r789579851 ## File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java ## @@ -412,7 +421,11 @@ private boolean removeFiles(String location, ValidWriteIdList writeIdList, Compa } StringBuilder extraDebugInfo = new StringBuilder("[").append(obsoleteDirs.stream() .map(Path::getName).collect(Collectors.joining(","))); -return remove(location, ci, obsoleteDirs, true, fs, extraDebugInfo); +boolean success = remove(location, ci, obsoleteDirs, true, fs, extraDebugInfo); +if (dir.getObsolete().size() > 0) { + updateDeltaFilesMetrics(ci.dbname, ci.tableName, ci.partName, obsoleteDirs); Review comment: Reverted. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 712778) Time Spent: 6h 50m (was: 6h 40m) > Reimplement delta file metric collection > > > Key: HIVE-25842 > URL: https://issues.apache.org/jira/browse/HIVE-25842 > Project: Hive > Issue Type: Improvement >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > Labels: pull-request-available > Time Spent: 6h 50m > Remaining Estimate: 0h > > FUNCTIONALITY: Metrics are collected only when a Tez query runs a table > (select * and select count( * ) don't update the metrics) > Metrics aren't updated after compaction or cleaning after compaction, so > users will probably see "issues" with compaction (like many active or > obsolete or small deltas) that don't exist. > RISK: Metrics are collected during queries – we tried to put a try-catch > around each method in DeltaFilesMetricsReporter but of course this isn't > foolproof. This is a HUGE performance and functionality liability. Tests > caught some issues, but our tests aren't perfect. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (HIVE-25888) Improve RuleEventLogger to also print input rels in FULL_PLAN mode
[ https://issues.apache.org/jira/browse/HIVE-25888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alessandro Solimando reassigned HIVE-25888: --- > Improve RuleEventLogger to also print input rels in FULL_PLAN mode > -- > > Key: HIVE-25888 > URL: https://issues.apache.org/jira/browse/HIVE-25888 > Project: Hive > Issue Type: Improvement > Components: CBO >Affects Versions: 4.0.0 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Major > > Hive porting of CALCITE-4991, refer to that ticket for more details. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25871) Hive should set name mapping table property for migrated Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-25871?focusedWorklogId=712720=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-712720 ] ASF GitHub Bot logged work on HIVE-25871: - Author: ASF GitHub Bot Created on: 21/Jan/22 10:29 Start Date: 21/Jan/22 10:29 Worklog Time Spent: 10m Work Description: marton-bod commented on a change in pull request #2948: URL: https://github.com/apache/hive/pull/2948#discussion_r789540515 ## File path: iceberg/iceberg-handler/src/test/queries/positive/describe_iceberg_table.q ## @@ -8,7 +8,7 @@ DROP TABLE IF EXISTS ice_t_transform; CREATE EXTERNAL TABLE ice_t_transform (year_field date, month_field date, day_field date, hour_field timestamp, truncate_field string, bucket_field int, identity_field int) PARTITIONED BY SPEC (year(year_field), month(month_field), day(day_field), hour(hour_field), truncate(2, truncate_field), bucket(2, bucket_field), identity_field) STORED BY ICEBERG; DROP TABLE IF EXISTS ice_t_transform_prop; -CREATE EXTERNAL TABLE ice_t_transform_prop (id int, year_field date, month_field date, day_field date, hour_field timestamp, truncate_field string, bucket_field int, identity_field int) STORED BY ICEBERG TBLPROPERTIES ('iceberg.mr.table.partition.spec'='{"spec-id":0,"fields":[{"name":"year_field_year","transform":"year","source-id":1,"field-id":1000},{"name":"month_field_month","transform":"month","source-id":2,"field-id":1001},{"name":"day_field_day","transform":"day","source-id":3,"field-id":1002},{"name":"hour_field_hour","transform":"hour","source-id":4,"field-id":1003},{"name":"truncate_field_trunc","transform":"truncate[2]","source-id":5,"field-id":1004},{"name":"bucket_field_bucket","transform":"bucket[2]","source-id":6,"field-id":1005},{"name":"identity_field","transform":"identity","source-id":7,"field-id":1006}]}'); +CREATE EXTERNAL TABLE ice_t_transform_prop (id int, year_field date, month_field date, day_field date, hour_field timestamp, truncate_field string, bucket_field int, identity_field int) STORED BY ICEBERG TBLPROPERTIES ('iceberg.mr.table.partition.spec'='{"spec-id":0,"fields":[{"name":"year_field_year","transform":"year","source-id":2,"field-id":1000},{"name":"month_field_month","transform":"month","source-id":3,"field-id":1001},{"name":"day_field_day","transform":"day","source-id":4,"field-id":1002},{"name":"hour_field_hour","transform":"hour","source-id":5,"field-id":1003},{"name":"truncate_field_trunc","transform":"truncate[2]","source-id":6,"field-id":1004},{"name":"bucket_field_bucket","transform":"bucket[2]","source-id":7,"field-id":1005},{"name":"identity_field","transform":"identity","source-id":8,"field-id":1006}]}'); Review comment: I'm probably missing something obvious - can you explain why the source-id values had to be incremented? Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 712720) Time Spent: 1h 10m (was: 1h) > Hive should set name mapping table property for migrated Iceberg tables > --- > > Key: HIVE-25871 > URL: https://issues.apache.org/jira/browse/HIVE-25871 > Project: Hive > Issue Type: Bug >Reporter: Zoltán Borók-Nagy >Assignee: Zoltán Borók-Nagy >Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > Hive should set the name-mapping table property during table migration. > It would be useful for [column > projection|https://iceberg.apache.org/#spec/#column-projection] for files > without field ids. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HIVE-23644) Fix FindBug issues in hive-jdbc
[ https://issues.apache.org/jira/browse/HIVE-23644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-23644: -- Labels: pull-request-available (was: ) > Fix FindBug issues in hive-jdbc > --- > > Key: HIVE-23644 > URL: https://issues.apache.org/jira/browse/HIVE-23644 > Project: Hive > Issue Type: Sub-task >Reporter: Panagiotis Garefalakis >Assignee: David Mollitor >Priority: Major > Labels: pull-request-available > Attachments: spotbugsXml.xml > > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-23644) Fix FindBug issues in hive-jdbc
[ https://issues.apache.org/jira/browse/HIVE-23644?focusedWorklogId=712712=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-712712 ] ASF GitHub Bot logged work on HIVE-23644: - Author: ASF GitHub Bot Created on: 21/Jan/22 10:14 Start Date: 21/Jan/22 10:14 Worklog Time Spent: 10m Work Description: mbathori-cloudera opened a new pull request #2960: URL: https://github.com/apache/hive/pull/2960 ### What changes were proposed in this pull request? Fixing FindBug issues in hive-jdbc module. ### Why are the changes needed? Get rid of violations and issues detected by findBug, and enforce these rules on precommit. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? `mvn -Pspotbugs -Dorg.slf4j.simpleLogger.log.org.apache.maven.plugin.surefire.SurefirePlugin=INFO -pl :hive-jdbc test-compile com.github.spotbugs:spotbugs-maven-plugin:4.0.0:check` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 712712) Remaining Estimate: 0h Time Spent: 10m > Fix FindBug issues in hive-jdbc > --- > > Key: HIVE-23644 > URL: https://issues.apache.org/jira/browse/HIVE-23644 > Project: Hive > Issue Type: Sub-task >Reporter: Panagiotis Garefalakis >Assignee: David Mollitor >Priority: Major > Attachments: spotbugsXml.xml > > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25842) Reimplement delta file metric collection
[ https://issues.apache.org/jira/browse/HIVE-25842?focusedWorklogId=712682=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-712682 ] ASF GitHub Bot logged work on HIVE-25842: - Author: ASF GitHub Bot Created on: 21/Jan/22 08:53 Start Date: 21/Jan/22 08:53 Worklog Time Spent: 10m Work Description: klcopp commented on a change in pull request #2916: URL: https://github.com/apache/hive/pull/2916#discussion_r789465573 ## File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java ## @@ -412,7 +421,11 @@ private boolean removeFiles(String location, ValidWriteIdList writeIdList, Compa } StringBuilder extraDebugInfo = new StringBuilder("[").append(obsoleteDirs.stream() .map(Path::getName).collect(Collectors.joining(","))); -return remove(location, ci, obsoleteDirs, true, fs, extraDebugInfo); +boolean success = remove(location, ci, obsoleteDirs, true, fs, extraDebugInfo); +if (dir.getObsolete().size() > 0) { + updateDeltaFilesMetrics(ci.dbname, ci.tableName, ci.partName, obsoleteDirs); Review comment: I regret suggesting that we include aborted directories in the obsolete count. 1. There are other metrics about aborted directories. 2. previouslyActiveDeltas - (obsolete + aborted) != currentlyActiveDeltas, so the active delta count would be off. My bad :/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 712682) Time Spent: 6h 40m (was: 6.5h) > Reimplement delta file metric collection > > > Key: HIVE-25842 > URL: https://issues.apache.org/jira/browse/HIVE-25842 > Project: Hive > Issue Type: Improvement >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > Labels: pull-request-available > Time Spent: 6h 40m > Remaining Estimate: 0h > > FUNCTIONALITY: Metrics are collected only when a Tez query runs a table > (select * and select count( * ) don't update the metrics) > Metrics aren't updated after compaction or cleaning after compaction, so > users will probably see "issues" with compaction (like many active or > obsolete or small deltas) that don't exist. > RISK: Metrics are collected during queries – we tried to put a try-catch > around each method in DeltaFilesMetricsReporter but of course this isn't > foolproof. This is a HUGE performance and functionality liability. Tests > caught some issues, but our tests aren't perfect. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25842) Reimplement delta file metric collection
[ https://issues.apache.org/jira/browse/HIVE-25842?focusedWorklogId=712681=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-712681 ] ASF GitHub Bot logged work on HIVE-25842: - Author: ASF GitHub Bot Created on: 21/Jan/22 08:38 Start Date: 21/Jan/22 08:38 Worklog Time Spent: 10m Work Description: klcopp commented on a change in pull request #2916: URL: https://github.com/apache/hive/pull/2916#discussion_r789454143 ## File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/metrics/DeltaFilesMetricReporter.java ## @@ -139,157 +92,37 @@ public static DeltaFilesMetricReporter getInstance() { return InstanceHolder.instance; } - public static synchronized void init(HiveConf conf) throws Exception { -getInstance().configure(conf); + public static synchronized void init(Configuration conf, TxnStore txnHandler) throws Exception { +if (!initialized) { + getInstance().configure(conf, txnHandler); + initialized = true; +} } - private void configure(HiveConf conf) throws Exception { + private void configure(Configuration conf, TxnStore txnHandler) throws Exception { long reportingInterval = -HiveConf.getTimeVar(conf, HiveConf.ConfVars.HIVE_TXN_ACID_METRICS_REPORTING_INTERVAL, TimeUnit.SECONDS); -hiveEntitySeparator = conf.getVar(HiveConf.ConfVars.HIVE_ENTITY_SEPARATOR); +MetastoreConf.getTimeVar(conf, MetastoreConf.ConfVars.METASTORE_DELTAMETRICS_REPORTING_INTERVAL, TimeUnit.SECONDS); + +maxCacheSize = MetastoreConf.getIntVar(conf, MetastoreConf.ConfVars.METASTORE_DELTAMETRICS_MAX_CACHE_SIZE); -initCachesForMetrics(conf); initObjectsForMetrics(); ThreadFactory threadFactory = new ThreadFactoryBuilder().setDaemon(true).setNameFormat("DeltaFilesMetricReporter %d").build(); -executorService = Executors.newSingleThreadScheduledExecutor(threadFactory); -executorService.scheduleAtFixedRate(new ReportingTask(), 0, reportingInterval, TimeUnit.SECONDS); +reporterExecutorService = Executors.newSingleThreadScheduledExecutor(threadFactory); +reporterExecutorService.scheduleAtFixedRate(new ReportingTask(txnHandler), 0, reportingInterval, TimeUnit.SECONDS); LOG.info("Started DeltaFilesMetricReporter thread"); Review comment: Never mind, I had reading problems :D -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 712681) Time Spent: 6.5h (was: 6h 20m) > Reimplement delta file metric collection > > > Key: HIVE-25842 > URL: https://issues.apache.org/jira/browse/HIVE-25842 > Project: Hive > Issue Type: Improvement >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > Labels: pull-request-available > Time Spent: 6.5h > Remaining Estimate: 0h > > FUNCTIONALITY: Metrics are collected only when a Tez query runs a table > (select * and select count( * ) don't update the metrics) > Metrics aren't updated after compaction or cleaning after compaction, so > users will probably see "issues" with compaction (like many active or > obsolete or small deltas) that don't exist. > RISK: Metrics are collected during queries – we tried to put a try-catch > around each method in DeltaFilesMetricsReporter but of course this isn't > foolproof. This is a HUGE performance and functionality liability. Tests > caught some issues, but our tests aren't perfect. -- This message was sent by Atlassian Jira (v8.20.1#820001)