Re: [PR] HIVE-28188:Upgrade Postgres to 42.7.3 to target CVE-2024-1597 [hive]
zhangbutao commented on code in PR #5185: URL: https://github.com/apache/hive/pull/5185#discussion_r1577298939 ## packaging/src/docker/README.md: ## @@ -117,7 +117,7 @@ export POSTGRES_LOCAL_PATH=your_local_path_to_postgres_driver ``` Example: ```shell -mvn dependency:copy -Dartifact="org.postgresql:postgresql:42.5.1" && \ +mvn dependency:copy -Dartifact="org.postgresql:postgresql:42.7.3" && \ Review Comment: Could you also please fix the version in these lines? Thanks. https://github.com/apache/hive/blob/1a969f6642d4081027040f97e6a689bf2e687db3/packaging/src/docker/README.md?plain=1#L63 https://github.com/apache/hive/blob/1a969f6642d4081027040f97e6a689bf2e687db3/packaging/src/docker/README.md?plain=1#L72 https://github.com/apache/hive/blob/1a969f6642d4081027040f97e6a689bf2e687db3/packaging/src/docker/README.md?plain=1#L121 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
Re: [PR] HIVE-28121. (2.3) Use direct SQL for transactional altering table parameter [hive]
pan3793 commented on PR #5204: URL: https://github.com/apache/hive/pull/5204#issuecomment-2074014828 @sunchao based on the CI results of this PR branch, I think the failure tests should be flaky cases http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/activity?branch=PR-5204 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
Re: [PR] HIVE-27995: Fix inconsistent behavior of LOAD DATA command for partitoned and non-partitioned tables [hive]
chinnaraolalam commented on code in PR #5000: URL: https://github.com/apache/hive/pull/5000#discussion_r1577238980 ## ql/src/test/results/clientnegative/load_wrong_noof_part.q.out: ## @@ -6,4 +6,4 @@ POSTHOOK: query: CREATE TABLE loadpart1(a STRING, b STRING) PARTITIONED BY (ds S POSTHOOK: type: CREATETABLE POSTHOOK: Output: database:default POSTHOOK: Output: default@loadpart1 -FAILED: SemanticException [Error 10006]: Line 2:82 Partition not found ''2009-05-05'' + A masked pattern was here Review Comment: How it is related to this change. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
Re: [PR] HIVE-28077: Iceberg: Major QB Compaction on partition level [hive]
sonarcloud[bot] commented on PR #5123: URL: https://github.com/apache/hive/pull/5123#issuecomment-2073992773 ## [![Quality Gate Passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/qg-passed-20px.png 'Quality Gate Passed')](https://sonarcloud.io/dashboard?id=apache_hive=5123) **Quality Gate passed** Issues ![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/passed-16px.png '') [11 New issues](https://sonarcloud.io/project/issues?id=apache_hive=5123=false=true) ![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/accepted-16px.png '') [0 Accepted issues](https://sonarcloud.io/component_measures?id=apache_hive=5123=new_accepted_issues=list) Measures ![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/passed-16px.png '') [0 Security Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=5123=false=true) ![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/no-data-16px.png '') No data about Coverage ![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/no-data-16px.png '') No data about Duplication [See analysis details on SonarCloud](https://sonarcloud.io/dashboard?id=apache_hive=5123) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
Re: [PR] HIVE-28121. (2.3) Use direct SQL for transactional altering table parameter [hive]
sunchao commented on PR #5204: URL: https://github.com/apache/hive/pull/5204#issuecomment-2073980462 > that this is the current state of the tests on this branch Yes that's unfortunately the current state of branch-2.3. We do know a fixed set of tests are failing though and are using that as the baseline for testing against PRs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
Re: [PR] HIVE-28121. (2.3) Use direct SQL for transactional altering table parameter [hive]
sunchao commented on PR #5204: URL: https://github.com/apache/hive/pull/5204#issuecomment-2073979665 Thanks @pan3793 @lirui-apache and @pvary . It looks like this PR has more tests failing than the previous PR: https://github.com/apache/hive/assets/506679/ddd8cf84-193c-4f7b-8b68-5e1248407972;> In the previous PR: #4892, there were only 27 tests failing (normally there would only be 26 tests but 1 was flaky in that PR). Could you double check whether the test failures are related? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
Re: [PR] HIVE-24167: TPC-DS query 14 fails while generating plan for the filter [hive]
okumin commented on PR #5077: URL: https://github.com/apache/hive/pull/5077#issuecomment-207383 I also think the current one could fail somewhere especially if we follow minor options. Now, I have achieved some additional experience and knowledge, and may come up with the to-be approach. I will work on it tomorrow(I am unavailable because a tech event for Hive 4 takes place in my city today!) if I don't see very viable options here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[PR] [WIP] Calcite upgrade 1.33 without SEARCH operators [hive]
soumyakanti3578 opened a new pull request, #5210: URL: https://github.com/apache/hive/pull/5210 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### Is the change a dependency upgrade? ### How was this patch tested? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
Re: [PR] HIVE-28077: Iceberg: Major QB Compaction on partition level [hive]
sonarcloud[bot] commented on PR #5123: URL: https://github.com/apache/hive/pull/5123#issuecomment-2073591787 ## [![Quality Gate Passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/qg-passed-20px.png 'Quality Gate Passed')](https://sonarcloud.io/dashboard?id=apache_hive=5123) **Quality Gate passed** Issues ![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/passed-16px.png '') [12 New issues](https://sonarcloud.io/project/issues?id=apache_hive=5123=false=true) ![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/accepted-16px.png '') [0 Accepted issues](https://sonarcloud.io/component_measures?id=apache_hive=5123=new_accepted_issues=list) Measures ![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/passed-16px.png '') [0 Security Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=5123=false=true) ![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/no-data-16px.png '') No data about Coverage ![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/no-data-16px.png '') No data about Duplication [See analysis details on SonarCloud](https://sonarcloud.io/dashboard?id=apache_hive=5123) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
Re: [PR] HIVE-26220: Restore hive-exec-core jar [hive]
sonarcloud[bot] commented on PR #5209: URL: https://github.com/apache/hive/pull/5209#issuecomment-2073523132 ## [![Quality Gate Passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/qg-passed-20px.png 'Quality Gate Passed')](https://sonarcloud.io/dashboard?id=apache_hive=5209) **Quality Gate passed** Issues ![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/passed-16px.png '') [1 New issue](https://sonarcloud.io/project/issues?id=apache_hive=5209=false=true) ![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/accepted-16px.png '') [0 Accepted issues](https://sonarcloud.io/component_measures?id=apache_hive=5209=new_accepted_issues=list) Measures ![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/passed-16px.png '') [0 Security Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=5209=false=true) ![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/no-data-16px.png '') No data about Coverage ![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/no-data-16px.png '') No data about Duplication [See analysis details on SonarCloud](https://sonarcloud.io/dashboard?id=apache_hive=5209) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
Re: [PR] HIVE-28188:Upgrade Postgres to 42.7.3 to target CVE-2024-1597 [hive]
devaspatikrishnatri commented on PR #5185: URL: https://github.com/apache/hive/pull/5185#issuecomment-2073483153 @zhangbutao Could you please merge it if this is okay..I do not have merge access yet in HIVE. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
Re: [PR] WIP: Dont review [hive]
sonarcloud[bot] commented on PR #5012: URL: https://github.com/apache/hive/pull/5012#issuecomment-2073421910 ## [![Quality Gate Passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/qg-passed-20px.png 'Quality Gate Passed')](https://sonarcloud.io/dashboard?id=apache_hive=5012) **Quality Gate passed** Issues ![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/passed-16px.png '') [19 New issues](https://sonarcloud.io/project/issues?id=apache_hive=5012=false=true) ![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/accepted-16px.png '') [0 Accepted issues](https://sonarcloud.io/component_measures?id=apache_hive=5012=new_accepted_issues=list) Measures ![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/passed-16px.png '') [0 Security Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=5012=false=true) ![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/no-data-16px.png '') No data about Coverage ![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/no-data-16px.png '') No data about Duplication [See analysis details on SonarCloud](https://sonarcloud.io/dashboard?id=apache_hive=5012) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
Re: [PR] HIVE-28173: Fixed staging dir issues on materialized views on HDFS en… [hive]
scarlin-cloudera commented on PR #5180: URL: https://github.com/apache/hive/pull/5180#issuecomment-2073399509 > LGTM, @scarlin-cloudera could we add tests for MV? Maybe @kasakrisz can suggest if there are some similar tests that. we could extend Thanks for the review! Yeah, I'm looking to add a unit test. I've been trying to add for MV, but the real change involves the MV daemon thread and there are no current integrated or unit tests for that. Having said that, it looks like there might be something I can add and I'll hopefully be adding that within the next day or two. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
Re: [PR] HIVE-28077: Iceberg: Major QB Compaction on partition level [hive]
difin commented on code in PR #5123: URL: https://github.com/apache/hive/pull/5123#discussion_r1576860570 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/compaction/IcebergMajorQueryCompactor.java: ## @@ -44,22 +55,51 @@ public boolean run(CompactorContext context) throws IOException, HiveException, Map tblProperties = context.getTable().getParameters(); LOG.debug("Initiating compaction for the {} table", compactTableName); -String compactionQuery = String.format("insert overwrite table %s select * from %
Re: [PR] HIVE-28077: Iceberg: Major QB Compaction on partition level [hive]
difin commented on code in PR #5123: URL: https://github.com/apache/hive/pull/5123#discussion_r1576859775 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/compaction/IcebergMajorQueryCompactor.java: ## @@ -44,22 +55,51 @@ public boolean run(CompactorContext context) throws IOException, HiveException, Map tblProperties = context.getTable().getParameters(); LOG.debug("Initiating compaction for the {} table", compactTableName); -String compactionQuery = String.format("insert overwrite table %s select * from % partSpecMap = new LinkedHashMap<>(); + Warehouse.makeSpecFromName(partSpecMap, new Path(partSpec), null); + + List partitionKeys = table.getStorageHandler().getPartitionKeys(table); + List partValues = partitionKeys.stream().map( + fs -> String.join("=", HiveUtils.unparseIdentifier(fs.getName()), + TypeInfoUtils.convertStringToLiteralForSQL(partSpecMap.get(fs.getName()), + ((PrimitiveTypeInfo) TypeInfoUtils.getTypeInfoFromTypeString(fs.getType())).getPrimitiveCategory()) + ) + ).collect(Collectors.toList()); + + String queryFields = table.getCols().stream() + .map(FieldSchema::getName) + .filter(col -> !partSpecMap.containsKey(col)) + .collect(Collectors.joining(",")); + + compactionQuery = String.format("insert overwrite table %1$s partition(%2$s) select %4$s from %1$s where %3$s", + compactTableName, + StringUtils.join(partValues, ","), + StringUtils.join(partValues, " and "), + queryFields); +} + +SessionState sessionState = setupQueryCompactionSession(conf, context.getCompactionInfo(), tblProperties); -SessionState sessionState = setupQueryCompactionSession(context.getConf(), -context.getCompactionInfo(), tblProperties); -HiveConf.setVar(context.getConf(), ConfVars.REWRITE_POLICY, RewritePolicy.ALL_PARTITIONS.name()); try { - DriverUtils.runOnDriver(context.getConf(), sessionState, compactionQuery); + DriverUtils.runOnDriver(conf, sessionState, compactionQuery); LOG.info("Completed compaction for table {}", compactTableName); + return true; } catch (HiveException e) { - LOG.error("Error doing query based {} compaction", RewritePolicy.ALL_PARTITIONS.name(), e); - throw new RuntimeException(e); + LOG.error("Failed compacting table {}", compactTableName, e); Review Comment: Done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
Re: [PR] HIVE-28077: Iceberg: Major QB Compaction on partition level [hive]
difin commented on code in PR #5123: URL: https://github.com/apache/hive/pull/5123#discussion_r1576852118 ## ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveStorageHandler.java: ## @@ -777,6 +783,16 @@ default List getPartitionsByExpr(org.apache.hadoop.hive.ql.metadata.T "for a table."); } + default List getPartitions(org.apache.hadoop.hive.ql.metadata.Table table, + Map partialPartSpec) throws SemanticException { Review Comment: Done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
Re: [PR] HIVE-28077: Iceberg: Major QB Compaction on partition level [hive]
deniskuzZ commented on code in PR #5123: URL: https://github.com/apache/hive/pull/5123#discussion_r1576721296 ## ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveStorageHandler.java: ## @@ -777,6 +783,16 @@ default List getPartitionsByExpr(org.apache.hadoop.hive.ql.metadata.T "for a table."); } + default List getPartitions(org.apache.hadoop.hive.ql.metadata.Table table, + Map partialPartSpec) throws SemanticException { Review Comment: `partitionSpec` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
Re: [PR] HIVE-26220: Restore hive-exec-core jar [hive]
sonarcloud[bot] commented on PR #5209: URL: https://github.com/apache/hive/pull/5209#issuecomment-2073114772 ## [![Quality Gate Passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/qg-passed-20px.png 'Quality Gate Passed')](https://sonarcloud.io/dashboard?id=apache_hive=5209) **Quality Gate passed** Issues ![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/passed-16px.png '') [1 New issue](https://sonarcloud.io/project/issues?id=apache_hive=5209=false=true) ![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/accepted-16px.png '') [0 Accepted issues](https://sonarcloud.io/component_measures?id=apache_hive=5209=new_accepted_issues=list) Measures ![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/passed-16px.png '') [0 Security Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=5209=false=true) ![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/no-data-16px.png '') No data about Coverage ![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/no-data-16px.png '') No data about Duplication [See analysis details on SonarCloud](https://sonarcloud.io/dashboard?id=apache_hive=5209) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
Re: [PR] HIVE-28077: Iceberg: Major QB Compaction on partition level [hive]
deniskuzZ commented on code in PR #5123: URL: https://github.com/apache/hive/pull/5123#discussion_r1576709834 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/compaction/IcebergMajorQueryCompactor.java: ## @@ -44,22 +55,51 @@ public boolean run(CompactorContext context) throws IOException, HiveException, Map tblProperties = context.getTable().getParameters(); LOG.debug("Initiating compaction for the {} table", compactTableName); -String compactionQuery = String.format("insert overwrite table %s select * from % partSpecMap = new LinkedHashMap<>(); + Warehouse.makeSpecFromName(partSpecMap, new Path(partSpec), null); + + List partitionKeys = table.getStorageHandler().getPartitionKeys(table); + List partValues = partitionKeys.stream().map( + fs -> String.join("=", HiveUtils.unparseIdentifier(fs.getName()), + TypeInfoUtils.convertStringToLiteralForSQL(partSpecMap.get(fs.getName()), + ((PrimitiveTypeInfo) TypeInfoUtils.getTypeInfoFromTypeString(fs.getType())).getPrimitiveCategory()) + ) + ).collect(Collectors.toList()); + + String queryFields = table.getCols().stream() + .map(FieldSchema::getName) + .filter(col -> !partSpecMap.containsKey(col)) + .collect(Collectors.joining(",")); + + compactionQuery = String.format("insert overwrite table %1$s partition(%2$s) select %4$s from %1$s where %3$s", + compactTableName, + StringUtils.join(partValues, ","), + StringUtils.join(partValues, " and "), + queryFields); +} + +SessionState sessionState = setupQueryCompactionSession(conf, context.getCompactionInfo(), tblProperties); -SessionState sessionState = setupQueryCompactionSession(context.getConf(), -context.getCompactionInfo(), tblProperties); -HiveConf.setVar(context.getConf(), ConfVars.REWRITE_POLICY, RewritePolicy.ALL_PARTITIONS.name()); try { - DriverUtils.runOnDriver(context.getConf(), sessionState, compactionQuery); + DriverUtils.runOnDriver(conf, sessionState, compactionQuery); LOG.info("Completed compaction for table {}", compactTableName); + return true; } catch (HiveException e) { - LOG.error("Error doing query based {} compaction", RewritePolicy.ALL_PARTITIONS.name(), e); - throw new RuntimeException(e); + LOG.error("Failed compacting table {}", compactTableName, e); Review Comment: "Failed to compact table {}, partition {} " -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
Re: [PR] HIVE-28077: Iceberg: Major QB Compaction on partition level [hive]
deniskuzZ commented on code in PR #5123: URL: https://github.com/apache/hive/pull/5123#discussion_r1576703781 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/compaction/IcebergMajorQueryCompactor.java: ## @@ -44,22 +55,51 @@ public boolean run(CompactorContext context) throws IOException, HiveException, Map tblProperties = context.getTable().getParameters(); LOG.debug("Initiating compaction for the {} table", compactTableName); -String compactionQuery = String.format("insert overwrite table %s select * from %
[PR] HIVE-26220: Restore hive-exec-core jar [hive]
simhadri-g opened a new pull request, #5209: URL: https://github.com/apache/hive/pull/5209 ### What changes were proposed in this pull request? The hive-exec-core jar is used by spark, oozie, hudi and many other pojects. Removal of the hive-exec-core jar has caused the following issues. Spark : https://lists.apache.org/list?d...@hive.apache.org:lte=1M:joda Oozie: https://lists.apache.org/thread/yld75ltf9y8d9q3cow3xqlg0fqyj6mkg Hudi: https://github.com/apache/hudi/issues/8147 Until the we shade & relocate dependencies in hive-exec, we should restore the hive-exec core jar . ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### Is the change a dependency upgrade? ### How was this patch tested? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
Re: [PR] HIVE-28077: Iceberg: Major QB Compaction on partition level [hive]
difin commented on code in PR #5123: URL: https://github.com/apache/hive/pull/5123#discussion_r1576572891 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java: ## @@ -1915,6 +1917,46 @@ private boolean hasUndergonePartitionEvolution(Table table) { .anyMatch(id -> id < table.spec().specId()); } + private boolean isIdentityPartitionTable(org.apache.hadoop.hive.ql.metadata.Table table) { +return getPartitionTransformSpec(table).stream().map(TransformSpec::getTransformType) +.allMatch(type -> type == TransformSpec.TransformType.IDENTITY); + } + + @Override + public org.apache.commons.lang3.tuple.Pair isEligibleForCompaction( + org.apache.hadoop.hive.ql.metadata.Table table, Map partitionSpec) { +if (partitionSpec != null) { + Table icebergTable = IcebergTableUtil.getTable(conf, table.getTTable()); + if (hasUndergonePartitionEvolution(icebergTable)) { +return org.apache.commons.lang3.tuple.Pair.of(false, ErrorMsg.COMPACTION_PARTITION_EVOLUTION); + } + if (!isIdentityPartitionTable(table)) { +return org.apache.commons.lang3.tuple.Pair.of(false, ErrorMsg.COMPACTION_NON_IDENTITY_PARTITION_SPEC); + } +} +return org.apache.commons.lang3.tuple.Pair.of(true, null); + } + + @Override + public List getPartitions(org.apache.hadoop.hive.ql.metadata.Table table, + Map partitionSpec, short limit) throws SemanticException { +return table.getStorageHandler().getPartitionNames(table, partitionSpec).stream() +.limit(limit > 0 ? limit : Long.MAX_VALUE) +.map(partName -> new DummyPartition(table, partName, partitionSpec)).collect(Collectors.toList()); + } + + @Override + public Partition getPartition(org.apache.hadoop.hive.ql.metadata.Table table, + Map partitionSpec) throws SemanticException { +validatePartSpec(table, partitionSpec); +try { + String partName = Warehouse.makePartName(partitionSpec, false); + return new DummyPartition(table, partName, partitionSpec); +} catch (MetaException e) { Review Comment: The line above also throws SemanticException (as discussed) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
Re: [PR] HIVE-28197: Add deserializer to convert JSON plans to RelNodes [hive]
soumyakanti3578 commented on code in PR #5131: URL: https://github.com/apache/hive/pull/5131#discussion_r1576566058 ## ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java: ## @@ -1745,9 +1748,72 @@ public RelNode apply(RelOptCluster cluster, RelOptSchema relOptSchema, SchemaPlu if (LOG.isDebugEnabled()) { LOG.debug("Plan after post-join transformations:\n" + RelOptUtil.toString(calcitePlan)); } + perfLogger.perfLogEnd(this.getClass().getName(), PerfLogger.OPTIMIZER); + + if (conf.getBoolVar(ConfVars.TEST_CBO_PLAN_SERIALIZATION_DESERIALIZATION_ENABLED)) { +calcitePlan = testSerializationAndDeserialization(perfLogger, calcitePlan); + } + + return calcitePlan; +} + +@Nullable +private RelNode testSerializationAndDeserialization(PerfLogger perfLogger, RelNode calcitePlan) { + if (!isSerializable(calcitePlan)) { +return calcitePlan; + } + perfLogger.perfLogBegin(this.getClass().getName(), "plan serializer"); + String calcitePlanJson = serializePlan(calcitePlan); + perfLogger.perfLogEnd(this.getClass().getName(), "plan serializer"); + + if (stringSizeGreaterThan(calcitePlanJson, PLAN_SERIALIZATION_DESERIALIZATION_STR_SIZE_LIMIT)) { +return calcitePlan; + } + + if (LOG.isDebugEnabled()) { +LOG.debug("Size of calcite plan: {}", calcitePlanJson.getBytes(Charset.defaultCharset()).length); +LOG.debug("JSON plan: \n{}", calcitePlanJson); + } + + try { +perfLogger.perfLogBegin(this.getClass().getName(), "plan deserializer"); +RelNode fromJson = deserializePlan(calcitePlan.getCluster(), calcitePlanJson); +perfLogger.perfLogEnd(this.getClass().getName(), "plan deserializer"); + +if (LOG.isDebugEnabled()) { + LOG.debug("Base plan: \n{}", RelOptUtil.toString(calcitePlan)); + LOG.debug("Plan from JSON: \n{}", RelOptUtil.toString(fromJson)); +} + +calcitePlan = fromJson; + } catch (IOException e) { +throw new RuntimeException(e); + } + Review Comment: Only reason I have the method `testSerializationAndDeserialization` here instead of a unit test is to enable integration tests with just a property from the qfile. My idea was to enable `TEST_CBO_PLAN_SERIALIZATION_DESERIALIZATION_ENABLED` from either each individual qfile, or from `hive-site.xml` for a whole driver like `TestTezTPCDS30TBPerfCliDriver`. That would ensure that we will test this for a larger number and more diverse set of queries, instead of testing a few scenarios with unit tests. Let me know if you still want me to add a unit test for this and remove this from here and I will do it. :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
Re: [PR] HIVE-28019 Fix query type information in proto files for load queries [hive]
ramesh0201 commented on PR #5022: URL: https://github.com/apache/hive/pull/5022#issuecomment-2072807576 Thank you @zabetak and @jfsii for the review. I have modified this change to contain only the changes for the load queries. For the explain queries we will still fallback to the previous type information it had. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
Re: [PR] HIVE-27995: Fix inconsistent behavior of LOAD DATA command for partitoned and non-partitioned tables [hive]
shivjha30 commented on code in PR #5000: URL: https://github.com/apache/hive/pull/5000#discussion_r1576424351 ## ql/src/java/org/apache/hadoop/hive/ql/parse/LoadSemanticAnalyzer.java: ## @@ -344,7 +358,7 @@ private void analyzeLoad(ASTNode ast) throws SemanticException { } // make sure the arguments make sense -List files = applyConstraintsAndGetFiles(fromURI, ts.tableHandle); +List files = applyConstraintsAndGetFiles(fromURI, ts.tableHandle, srcs); Review Comment: In the partition case if the file doesn't exist then number of splits will be zero, as the splits are zero the mappers also will be equal to zero. As there are no mappers the CombineHiveInputformat will not be called. Now after the fix we would be validating at the server level, thus in all such cases in the existing applications, those queries needs to be corrected. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
Re: [PR] HIVE-28077: Iceberg: Major QB Compaction on partition level [hive]
difin commented on code in PR #5123: URL: https://github.com/apache/hive/pull/5123#discussion_r1576421625 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/compaction/IcebergMajorQueryCompactor.java: ## @@ -44,22 +55,71 @@ public boolean run(CompactorContext context) throws IOException, HiveException, Map tblProperties = context.getTable().getParameters(); LOG.debug("Initiating compaction for the {} table", compactTableName); -String compactionQuery = String.format("insert overwrite table %s select * from % partFields = table.spec().fields().stream().map(PartitionField::name).collect(Collectors.toList()); + String queryFields = table.schema().columns().stream().map(Types.NestedField::name) + .filter(f -> !partFields.contains(f)).collect(Collectors.joining(",")); + PartitionData partitionData = DataFiles.data(table.spec(), partSpec); + compactionQuery = String.format("insert overwrite table %1$s partition(%2$s) select %4$s from %1$s where %3$s", + compactTableName, partDataToSQL(partitionData, partSpec, ","), + partDataToSQL(partitionData, partSpec, " and "), queryFields); +} + +SessionState sessionState = setupQueryCompactionSession(conf, context.getCompactionInfo(), tblProperties); -SessionState sessionState = setupQueryCompactionSession(context.getConf(), -context.getCompactionInfo(), tblProperties); -HiveConf.setVar(context.getConf(), ConfVars.REWRITE_POLICY, RewritePolicy.ALL_PARTITIONS.name()); try { - DriverUtils.runOnDriver(context.getConf(), sessionState, compactionQuery); + DriverUtils.runOnDriver(conf, sessionState, compactionQuery); LOG.info("Completed compaction for table {}", compactTableName); + return true; } catch (HiveException e) { - LOG.error("Error doing query based {} compaction", RewritePolicy.ALL_PARTITIONS.name(), e); - throw new RuntimeException(e); + LOG.error("Failed compacting table {}", compactTableName, e); + throw e; } finally { sessionState.setCompaction(false); } + } + + private String partDataToSQL(PartitionData partitionData, String partSpec, String delimiter) + throws HiveException { + +try { + Map partSpecMap = Warehouse.makeSpecFromName(partSpec); + StringBuilder sb = new StringBuilder(); + + for (int i = 0; i < partitionData.size(); ++i) { +if (i > 0) { + sb.append(delimiter); +} + +String quoteOpt = ""; +if (partitionData.getType(i).typeId() == Type.TypeID.STRING || +partitionData.getType(i).typeId() == Type.TypeID.DATE || +partitionData.getType(i).typeId() == Type.TypeID.TIME || +partitionData.getType(i).typeId() == Type.TypeID.TIMESTAMP || +partitionData.getType(i).typeId() == Type.TypeID.BINARY) { + quoteOpt = "'"; +} -return true; +String fieldName = partitionData.getSchema().getFields().get(i).name(); + +sb.append(fieldName) +.append("=") +.append(quoteOpt) +.append(partSpecMap.get(fieldName)) + .append(quoteOpt); Review Comment: Thanks, done. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
Re: [PR] HIVE-28077: Iceberg: Major QB Compaction on partition level [hive]
difin commented on code in PR #5123: URL: https://github.com/apache/hive/pull/5123#discussion_r1576401895 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java: ## @@ -1915,6 +1917,46 @@ private boolean hasUndergonePartitionEvolution(Table table) { .anyMatch(id -> id < table.spec().specId()); } + private boolean isIdentityPartitionTable(org.apache.hadoop.hive.ql.metadata.Table table) { +return getPartitionTransformSpec(table).stream().map(TransformSpec::getTransformType) +.allMatch(type -> type == TransformSpec.TransformType.IDENTITY); + } + + @Override + public org.apache.commons.lang3.tuple.Pair isEligibleForCompaction( + org.apache.hadoop.hive.ql.metadata.Table table, Map partitionSpec) { +if (partitionSpec != null) { + Table icebergTable = IcebergTableUtil.getTable(conf, table.getTTable()); + if (hasUndergonePartitionEvolution(icebergTable)) { +return org.apache.commons.lang3.tuple.Pair.of(false, ErrorMsg.COMPACTION_PARTITION_EVOLUTION); + } + if (!isIdentityPartitionTable(table)) { +return org.apache.commons.lang3.tuple.Pair.of(false, ErrorMsg.COMPACTION_NON_IDENTITY_PARTITION_SPEC); + } +} +return org.apache.commons.lang3.tuple.Pair.of(true, null); + } + + @Override + public List getPartitions(org.apache.hadoop.hive.ql.metadata.Table table, + Map partitionSpec, short limit) throws SemanticException { Review Comment: We don't need limit, fixed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
Re: [PR] HIVE-28077: Iceberg: Major QB Compaction on partition level [hive]
difin commented on code in PR #5123: URL: https://github.com/apache/hive/pull/5123#discussion_r1576402233 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java: ## @@ -1915,6 +1917,46 @@ private boolean hasUndergonePartitionEvolution(Table table) { .anyMatch(id -> id < table.spec().specId()); } + private boolean isIdentityPartitionTable(org.apache.hadoop.hive.ql.metadata.Table table) { +return getPartitionTransformSpec(table).stream().map(TransformSpec::getTransformType) +.allMatch(type -> type == TransformSpec.TransformType.IDENTITY); + } + + @Override + public org.apache.commons.lang3.tuple.Pair isEligibleForCompaction( + org.apache.hadoop.hive.ql.metadata.Table table, Map partitionSpec) { +if (partitionSpec != null) { + Table icebergTable = IcebergTableUtil.getTable(conf, table.getTTable()); + if (hasUndergonePartitionEvolution(icebergTable)) { +return org.apache.commons.lang3.tuple.Pair.of(false, ErrorMsg.COMPACTION_PARTITION_EVOLUTION); + } + if (!isIdentityPartitionTable(table)) { +return org.apache.commons.lang3.tuple.Pair.of(false, ErrorMsg.COMPACTION_NON_IDENTITY_PARTITION_SPEC); + } +} +return org.apache.commons.lang3.tuple.Pair.of(true, null); + } + + @Override + public List getPartitions(org.apache.hadoop.hive.ql.metadata.Table table, + Map partitionSpec, short limit) throws SemanticException { +return table.getStorageHandler().getPartitionNames(table, partitionSpec).stream() +.limit(limit > 0 ? limit : Long.MAX_VALUE) Review Comment: done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
Re: [PR] HIVE-24167: TPC-DS query 14 fails while generating plan for the filter [hive]
zabetak commented on code in PR #5077: URL: https://github.com/apache/hive/pull/5077#discussion_r1576377419 ## ql/src/test/queries/clientpositive/cbo_cte_materialization.q: ## @@ -0,0 +1,28 @@ +--! qt:dataset:src + +set hive.optimize.cte.materialize.threshold=1; +set hive.optimize.cte.materialize.full.aggregate.only=false; Review Comment: ``` set hive.query.planmapper.link.relnodes=false; ``` By setting this property the test fails with the same `RuntimeException: equivalence mapping violation`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
Re: [PR] HIVE-28019 Fix query type information in proto files for load and explain queries [hive]
sonarcloud[bot] commented on PR #5022: URL: https://github.com/apache/hive/pull/5022#issuecomment-2072509126 ## [![Quality Gate Passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/qg-passed-20px.png 'Quality Gate Passed')](https://sonarcloud.io/dashboard?id=apache_hive=5022) **Quality Gate passed** Issues ![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/passed-16px.png '') [2 New issues](https://sonarcloud.io/project/issues?id=apache_hive=5022=false=true) ![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/accepted-16px.png '') [0 Accepted issues](https://sonarcloud.io/component_measures?id=apache_hive=5022=new_accepted_issues=list) Measures ![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/passed-16px.png '') [0 Security Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=5022=false=true) ![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/no-data-16px.png '') No data about Coverage ![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/no-data-16px.png '') No data about Duplication [See analysis details on SonarCloud](https://sonarcloud.io/dashboard?id=apache_hive=5022) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
Re: [PR] HIVE-28077: Iceberg: Major QB Compaction on partition level [hive]
difin commented on code in PR #5123: URL: https://github.com/apache/hive/pull/5123#discussion_r1576332462 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java: ## @@ -1915,6 +1917,46 @@ private boolean hasUndergonePartitionEvolution(Table table) { .anyMatch(id -> id < table.spec().specId()); } + private boolean isIdentityPartitionTable(org.apache.hadoop.hive.ql.metadata.Table table) { +return getPartitionTransformSpec(table).stream().map(TransformSpec::getTransformType) +.allMatch(type -> type == TransformSpec.TransformType.IDENTITY); + } + + @Override + public org.apache.commons.lang3.tuple.Pair isEligibleForCompaction( + org.apache.hadoop.hive.ql.metadata.Table table, Map partitionSpec) { +if (partitionSpec != null) { + Table icebergTable = IcebergTableUtil.getTable(conf, table.getTTable()); + if (hasUndergonePartitionEvolution(icebergTable)) { +return org.apache.commons.lang3.tuple.Pair.of(false, ErrorMsg.COMPACTION_PARTITION_EVOLUTION); + } + if (!isIdentityPartitionTable(table)) { +return org.apache.commons.lang3.tuple.Pair.of(false, ErrorMsg.COMPACTION_NON_IDENTITY_PARTITION_SPEC); + } +} +return org.apache.commons.lang3.tuple.Pair.of(true, null); + } + + @Override + public List getPartitions(org.apache.hadoop.hive.ql.metadata.Table table, + Map partitionSpec, short limit) throws SemanticException { +return table.getStorageHandler().getPartitionNames(table, partitionSpec).stream() Review Comment: Done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
Re: [PR] HIVE-28188:Upgrade Postgres to 42.7.3 to target CVE-2024-1597 [hive]
sonarcloud[bot] commented on PR #5185: URL: https://github.com/apache/hive/pull/5185#issuecomment-2072151330 ## [![Quality Gate Passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/qg-passed-20px.png 'Quality Gate Passed')](https://sonarcloud.io/dashboard?id=apache_hive=5185) **Quality Gate passed** Issues ![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/passed-16px.png '') [1 New issue](https://sonarcloud.io/project/issues?id=apache_hive=5185=false=true) ![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/accepted-16px.png '') [0 Accepted issues](https://sonarcloud.io/component_measures?id=apache_hive=5185=new_accepted_issues=list) Measures ![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/passed-16px.png '') [0 Security Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=5185=false=true) ![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/no-data-16px.png '') No data about Coverage ![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/no-data-16px.png '') No data about Duplication [See analysis details on SonarCloud](https://sonarcloud.io/dashboard?id=apache_hive=5185) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
Re: [PR] HIVE-28077: Iceberg: Major QB Compaction on partition level [hive]
deniskuzZ commented on code in PR #5123: URL: https://github.com/apache/hive/pull/5123#discussion_r1575838382 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/compaction/IcebergMajorQueryCompactor.java: ## @@ -44,22 +55,71 @@ public boolean run(CompactorContext context) throws IOException, HiveException, Map tblProperties = context.getTable().getParameters(); LOG.debug("Initiating compaction for the {} table", compactTableName); -String compactionQuery = String.format("insert overwrite table %s select * from % partFields = table.spec().fields().stream().map(PartitionField::name).collect(Collectors.toList()); + String queryFields = table.schema().columns().stream().map(Types.NestedField::name) + .filter(f -> !partFields.contains(f)).collect(Collectors.joining(",")); + PartitionData partitionData = DataFiles.data(table.spec(), partSpec); + compactionQuery = String.format("insert overwrite table %1$s partition(%2$s) select %4$s from %1$s where %3$s", + compactTableName, partDataToSQL(partitionData, partSpec, ","), + partDataToSQL(partitionData, partSpec, " and "), queryFields); +} + +SessionState sessionState = setupQueryCompactionSession(conf, context.getCompactionInfo(), tblProperties); -SessionState sessionState = setupQueryCompactionSession(context.getConf(), -context.getCompactionInfo(), tblProperties); -HiveConf.setVar(context.getConf(), ConfVars.REWRITE_POLICY, RewritePolicy.ALL_PARTITIONS.name()); try { - DriverUtils.runOnDriver(context.getConf(), sessionState, compactionQuery); + DriverUtils.runOnDriver(conf, sessionState, compactionQuery); LOG.info("Completed compaction for table {}", compactTableName); + return true; } catch (HiveException e) { - LOG.error("Error doing query based {} compaction", RewritePolicy.ALL_PARTITIONS.name(), e); - throw new RuntimeException(e); + LOG.error("Failed compacting table {}", compactTableName, e); + throw e; } finally { sessionState.setCompaction(false); } + } + + private String partDataToSQL(PartitionData partitionData, String partSpec, String delimiter) + throws HiveException { + +try { + Map partSpecMap = Warehouse.makeSpecFromName(partSpec); + StringBuilder sb = new StringBuilder(); + + for (int i = 0; i < partitionData.size(); ++i) { +if (i > 0) { + sb.append(delimiter); +} + +String quoteOpt = ""; +if (partitionData.getType(i).typeId() == Type.TypeID.STRING || +partitionData.getType(i).typeId() == Type.TypeID.DATE || +partitionData.getType(i).typeId() == Type.TypeID.TIME || +partitionData.getType(i).typeId() == Type.TypeID.TIMESTAMP || +partitionData.getType(i).typeId() == Type.TypeID.BINARY) { + quoteOpt = "'"; +} -return true; +String fieldName = partitionData.getSchema().getFields().get(i).name(); + +sb.append(fieldName) +.append("=") +.append(quoteOpt) +.append(partSpecMap.get(fieldName)) + .append(quoteOpt); Review Comment: how about Map partSpecMap = new LinkedHashMap<>(); Warehouse.makeSpecFromName(partSpecMap, new Path(partSpec), null); List partitionKeys = table.getStorageHandler().getPartitionKeys(table); List partValues = partitionKeys.stream().map( fs -> String.join("=", HiveUtils.unparseIdentifier(fs.getName()), TypeInfoUtils.convertStringToLiteralForSQL(partSpecMap.get(fs.getName()), ((PrimitiveTypeInfo) TypeInfoUtils.getTypeInfoFromTypeString(fs.getType())).getPrimitiveCategory()) ) ).collect(Collectors.toList()); String queryFields = table.getCols().stream() .map(FieldSchema::getName) .filter(col -> !partSpecMap.containsKey(col)) .collect(Collectors.joining(",")); compactionQuery = String.format("insert overwrite table %1$s partition(%2$s) select %4$s from %1$s where %3$s", compactTableName, StringUtils.join(partValues, ","), StringUtils.join(partValues, " and "), queryFields); table.getStorageHandler().getColumnInfo looks expensive and unnecessary -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
Re: [PR] HIVE-28197: Add deserializer to convert JSON plans to RelNodes [hive]
kasakrisz commented on code in PR #5131: URL: https://github.com/apache/hive/pull/5131#discussion_r1576115514 ## ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java: ## @@ -1745,9 +1748,72 @@ public RelNode apply(RelOptCluster cluster, RelOptSchema relOptSchema, SchemaPlu if (LOG.isDebugEnabled()) { LOG.debug("Plan after post-join transformations:\n" + RelOptUtil.toString(calcitePlan)); } + perfLogger.perfLogEnd(this.getClass().getName(), PerfLogger.OPTIMIZER); + + if (conf.getBoolVar(ConfVars.TEST_CBO_PLAN_SERIALIZATION_DESERIALIZATION_ENABLED)) { +calcitePlan = testSerializationAndDeserialization(perfLogger, calcitePlan); + } + + return calcitePlan; +} + +@Nullable +private RelNode testSerializationAndDeserialization(PerfLogger perfLogger, RelNode calcitePlan) { + if (!isSerializable(calcitePlan)) { +return calcitePlan; + } + perfLogger.perfLogBegin(this.getClass().getName(), "plan serializer"); + String calcitePlanJson = serializePlan(calcitePlan); + perfLogger.perfLogEnd(this.getClass().getName(), "plan serializer"); + + if (stringSizeGreaterThan(calcitePlanJson, PLAN_SERIALIZATION_DESERIALIZATION_STR_SIZE_LIMIT)) { +return calcitePlan; + } + + if (LOG.isDebugEnabled()) { +LOG.debug("Size of calcite plan: {}", calcitePlanJson.getBytes(Charset.defaultCharset()).length); +LOG.debug("JSON plan: \n{}", calcitePlanJson); + } + + try { +perfLogger.perfLogBegin(this.getClass().getName(), "plan deserializer"); +RelNode fromJson = deserializePlan(calcitePlan.getCluster(), calcitePlanJson); +perfLogger.perfLogEnd(this.getClass().getName(), "plan deserializer"); + +if (LOG.isDebugEnabled()) { + LOG.debug("Base plan: \n{}", RelOptUtil.toString(calcitePlan)); + LOG.debug("Plan from JSON: \n{}", RelOptUtil.toString(fromJson)); +} + +calcitePlan = fromJson; + } catch (IOException e) { +throw new RuntimeException(e); + } + Review Comment: I think test code should't be shipped to production. Could you please move this to a unit test, there are ways to build a test plan: https://github.com/apache/hive/blob/master/ql/src/test/org/apache/hadoop/hive/ql/optimizer/calcite/rules/views/TestHiveRowIsDeletedPropagator.java -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
Re: [PR] HIVE-28077: Iceberg: Major QB Compaction on partition level [hive]
deniskuzZ commented on code in PR #5123: URL: https://github.com/apache/hive/pull/5123#discussion_r1575838382 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/compaction/IcebergMajorQueryCompactor.java: ## @@ -44,22 +55,71 @@ public boolean run(CompactorContext context) throws IOException, HiveException, Map tblProperties = context.getTable().getParameters(); LOG.debug("Initiating compaction for the {} table", compactTableName); -String compactionQuery = String.format("insert overwrite table %s select * from % partFields = table.spec().fields().stream().map(PartitionField::name).collect(Collectors.toList()); + String queryFields = table.schema().columns().stream().map(Types.NestedField::name) + .filter(f -> !partFields.contains(f)).collect(Collectors.joining(",")); + PartitionData partitionData = DataFiles.data(table.spec(), partSpec); + compactionQuery = String.format("insert overwrite table %1$s partition(%2$s) select %4$s from %1$s where %3$s", + compactTableName, partDataToSQL(partitionData, partSpec, ","), + partDataToSQL(partitionData, partSpec, " and "), queryFields); +} + +SessionState sessionState = setupQueryCompactionSession(conf, context.getCompactionInfo(), tblProperties); -SessionState sessionState = setupQueryCompactionSession(context.getConf(), -context.getCompactionInfo(), tblProperties); -HiveConf.setVar(context.getConf(), ConfVars.REWRITE_POLICY, RewritePolicy.ALL_PARTITIONS.name()); try { - DriverUtils.runOnDriver(context.getConf(), sessionState, compactionQuery); + DriverUtils.runOnDriver(conf, sessionState, compactionQuery); LOG.info("Completed compaction for table {}", compactTableName); + return true; } catch (HiveException e) { - LOG.error("Error doing query based {} compaction", RewritePolicy.ALL_PARTITIONS.name(), e); - throw new RuntimeException(e); + LOG.error("Failed compacting table {}", compactTableName, e); + throw e; } finally { sessionState.setCompaction(false); } + } + + private String partDataToSQL(PartitionData partitionData, String partSpec, String delimiter) + throws HiveException { + +try { + Map partSpecMap = Warehouse.makeSpecFromName(partSpec); + StringBuilder sb = new StringBuilder(); + + for (int i = 0; i < partitionData.size(); ++i) { +if (i > 0) { + sb.append(delimiter); +} + +String quoteOpt = ""; +if (partitionData.getType(i).typeId() == Type.TypeID.STRING || +partitionData.getType(i).typeId() == Type.TypeID.DATE || +partitionData.getType(i).typeId() == Type.TypeID.TIME || +partitionData.getType(i).typeId() == Type.TypeID.TIMESTAMP || +partitionData.getType(i).typeId() == Type.TypeID.BINARY) { + quoteOpt = "'"; +} -return true; +String fieldName = partitionData.getSchema().getFields().get(i).name(); + +sb.append(fieldName) +.append("=") +.append(quoteOpt) +.append(partSpecMap.get(fieldName)) + .append(quoteOpt); Review Comment: how about Map partSpecMap = new LinkedHashMap<>(); Warehouse.makeSpecFromName(partSpecMap, new Path(partSpec), null); List partitionKeys = table.getStorageHandler().getPartitionKeys(table); List partValues = partitionKeys.stream().map( fs -> String.join("=", HiveUtils.unparseIdentifier(fs.getName()), TypeInfoUtils.convertStringToLiteralForSQL(partSpecMap.get(fs.getName()), ((PrimitiveTypeInfo) TypeInfoUtils.getTypeInfoFromTypeString(fs.getType())).getPrimitiveCategory()) ) ).collect(Collectors.toList()); String queryFields = table.getCols().stream() .map(FieldSchema::getName) .filter(col -> !partSpecMap.containsKey(col)) .collect(Collectors.joining(",")); compactionQuery = String.format("insert overwrite table %1$s partition(%2$s) select %4$s from %1$s where %3$s", compactTableName, StringUtils.join(partValues, ","), StringUtils.join(partValues, " and "), queryFields); table.getStorageHandler().getColumnInfo looks expensive and unnecessary -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
Re: [PR] HIVE-28077: Iceberg: Major QB Compaction on partition level [hive]
deniskuzZ commented on code in PR #5123: URL: https://github.com/apache/hive/pull/5123#discussion_r1575794099 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java: ## @@ -1915,6 +1917,46 @@ private boolean hasUndergonePartitionEvolution(Table table) { .anyMatch(id -> id < table.spec().specId()); } + private boolean isIdentityPartitionTable(org.apache.hadoop.hive.ql.metadata.Table table) { +return getPartitionTransformSpec(table).stream().map(TransformSpec::getTransformType) +.allMatch(type -> type == TransformSpec.TransformType.IDENTITY); + } + + @Override + public org.apache.commons.lang3.tuple.Pair isEligibleForCompaction( + org.apache.hadoop.hive.ql.metadata.Table table, Map partitionSpec) { +if (partitionSpec != null) { + Table icebergTable = IcebergTableUtil.getTable(conf, table.getTTable()); + if (hasUndergonePartitionEvolution(icebergTable)) { +return org.apache.commons.lang3.tuple.Pair.of(false, ErrorMsg.COMPACTION_PARTITION_EVOLUTION); + } + if (!isIdentityPartitionTable(table)) { +return org.apache.commons.lang3.tuple.Pair.of(false, ErrorMsg.COMPACTION_NON_IDENTITY_PARTITION_SPEC); + } +} +return org.apache.commons.lang3.tuple.Pair.of(true, null); + } + + @Override + public List getPartitions(org.apache.hadoop.hive.ql.metadata.Table table, + Map partitionSpec, short limit) throws SemanticException { +return table.getStorageHandler().getPartitionNames(table, partitionSpec).stream() +.limit(limit > 0 ? limit : Long.MAX_VALUE) +.map(partName -> new DummyPartition(table, partName, partitionSpec)).collect(Collectors.toList()); + } + + @Override + public Partition getPartition(org.apache.hadoop.hive.ql.metadata.Table table, + Map partitionSpec) throws SemanticException { +validatePartSpec(table, partitionSpec); +try { + String partName = Warehouse.makePartName(partitionSpec, false); + return new DummyPartition(table, partName, partitionSpec); +} catch (MetaException e) { Review Comment: can we use makeDynamicPartName to avoid exception? should be ok since we validate before -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
Re: [PR] HIVE-28077: Iceberg: Major QB Compaction on partition level [hive]
deniskuzZ commented on code in PR #5123: URL: https://github.com/apache/hive/pull/5123#discussion_r1575794099 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java: ## @@ -1915,6 +1917,46 @@ private boolean hasUndergonePartitionEvolution(Table table) { .anyMatch(id -> id < table.spec().specId()); } + private boolean isIdentityPartitionTable(org.apache.hadoop.hive.ql.metadata.Table table) { +return getPartitionTransformSpec(table).stream().map(TransformSpec::getTransformType) +.allMatch(type -> type == TransformSpec.TransformType.IDENTITY); + } + + @Override + public org.apache.commons.lang3.tuple.Pair isEligibleForCompaction( + org.apache.hadoop.hive.ql.metadata.Table table, Map partitionSpec) { +if (partitionSpec != null) { + Table icebergTable = IcebergTableUtil.getTable(conf, table.getTTable()); + if (hasUndergonePartitionEvolution(icebergTable)) { +return org.apache.commons.lang3.tuple.Pair.of(false, ErrorMsg.COMPACTION_PARTITION_EVOLUTION); + } + if (!isIdentityPartitionTable(table)) { +return org.apache.commons.lang3.tuple.Pair.of(false, ErrorMsg.COMPACTION_NON_IDENTITY_PARTITION_SPEC); + } +} +return org.apache.commons.lang3.tuple.Pair.of(true, null); + } + + @Override + public List getPartitions(org.apache.hadoop.hive.ql.metadata.Table table, + Map partitionSpec, short limit) throws SemanticException { +return table.getStorageHandler().getPartitionNames(table, partitionSpec).stream() +.limit(limit > 0 ? limit : Long.MAX_VALUE) +.map(partName -> new DummyPartition(table, partName, partitionSpec)).collect(Collectors.toList()); + } + + @Override + public Partition getPartition(org.apache.hadoop.hive.ql.metadata.Table table, + Map partitionSpec) throws SemanticException { +validatePartSpec(table, partitionSpec); +try { + String partName = Warehouse.makePartName(partitionSpec, false); + return new DummyPartition(table, partName, partitionSpec); +} catch (MetaException e) { Review Comment: can we use `makeDynamicPartNameNoTrailingSeperator` to avoid exception? should be ok since we validate before -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
Re: [PR] HIVE-28077: Iceberg: Major QB Compaction on partition level [hive]
deniskuzZ commented on code in PR #5123: URL: https://github.com/apache/hive/pull/5123#discussion_r1575838382 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/compaction/IcebergMajorQueryCompactor.java: ## @@ -44,22 +55,71 @@ public boolean run(CompactorContext context) throws IOException, HiveException, Map tblProperties = context.getTable().getParameters(); LOG.debug("Initiating compaction for the {} table", compactTableName); -String compactionQuery = String.format("insert overwrite table %s select * from % partFields = table.spec().fields().stream().map(PartitionField::name).collect(Collectors.toList()); + String queryFields = table.schema().columns().stream().map(Types.NestedField::name) + .filter(f -> !partFields.contains(f)).collect(Collectors.joining(",")); + PartitionData partitionData = DataFiles.data(table.spec(), partSpec); + compactionQuery = String.format("insert overwrite table %1$s partition(%2$s) select %4$s from %1$s where %3$s", + compactTableName, partDataToSQL(partitionData, partSpec, ","), + partDataToSQL(partitionData, partSpec, " and "), queryFields); +} + +SessionState sessionState = setupQueryCompactionSession(conf, context.getCompactionInfo(), tblProperties); -SessionState sessionState = setupQueryCompactionSession(context.getConf(), -context.getCompactionInfo(), tblProperties); -HiveConf.setVar(context.getConf(), ConfVars.REWRITE_POLICY, RewritePolicy.ALL_PARTITIONS.name()); try { - DriverUtils.runOnDriver(context.getConf(), sessionState, compactionQuery); + DriverUtils.runOnDriver(conf, sessionState, compactionQuery); LOG.info("Completed compaction for table {}", compactTableName); + return true; } catch (HiveException e) { - LOG.error("Error doing query based {} compaction", RewritePolicy.ALL_PARTITIONS.name(), e); - throw new RuntimeException(e); + LOG.error("Failed compacting table {}", compactTableName, e); + throw e; } finally { sessionState.setCompaction(false); } + } + + private String partDataToSQL(PartitionData partitionData, String partSpec, String delimiter) + throws HiveException { + +try { + Map partSpecMap = Warehouse.makeSpecFromName(partSpec); + StringBuilder sb = new StringBuilder(); + + for (int i = 0; i < partitionData.size(); ++i) { +if (i > 0) { + sb.append(delimiter); +} + +String quoteOpt = ""; +if (partitionData.getType(i).typeId() == Type.TypeID.STRING || +partitionData.getType(i).typeId() == Type.TypeID.DATE || +partitionData.getType(i).typeId() == Type.TypeID.TIME || +partitionData.getType(i).typeId() == Type.TypeID.TIMESTAMP || +partitionData.getType(i).typeId() == Type.TypeID.BINARY) { + quoteOpt = "'"; +} -return true; +String fieldName = partitionData.getSchema().getFields().get(i).name(); + +sb.append(fieldName) +.append("=") +.append(quoteOpt) +.append(partSpecMap.get(fieldName)) + .append(quoteOpt); Review Comment: how about Map partSpecMap = new LinkedHashMap<>(); Warehouse.makeSpecFromName(partSpecMap, new Path(partSpec), null); List partitionKeys = hiveTable.getStorageHandler().getPartitionKeys(hiveTable); List partValues = partitionKeys.stream().map( fs -> String.join("=", HiveUtils.unparseIdentifier(fs.getName()), TypeInfoUtils.convertStringToLiteralForSQL(partSpecMap.get(fs.getName()), ((PrimitiveTypeInfo) TypeInfoUtils.getTypeInfoFromTypeString(fs.getType())).getPrimitiveCategory()) ) ).collect(Collectors.toList()); String queryFields = hiveTable.getCols().stream().filter(col -> !partSpecMap.containsKey(col.getName())) .map(FieldSchema::getName) .collect(Collectors.joining(",")); compactionQuery = String.format("insert overwrite table %1$s partition(%2$s) select %4$s from %1$s where %3$s", compactTableName, StringUtils.join(partValues, ","), StringUtils.join(partValues, " and "), queryFields); } table.getStorageHandler().getColumnInfo looks expensive and unnecessary -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
Re: [PR] HIVE-28077: Iceberg: Major QB Compaction on partition level [hive]
deniskuzZ commented on code in PR #5123: URL: https://github.com/apache/hive/pull/5123#discussion_r1575838382 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/compaction/IcebergMajorQueryCompactor.java: ## @@ -44,22 +55,71 @@ public boolean run(CompactorContext context) throws IOException, HiveException, Map tblProperties = context.getTable().getParameters(); LOG.debug("Initiating compaction for the {} table", compactTableName); -String compactionQuery = String.format("insert overwrite table %s select * from % partFields = table.spec().fields().stream().map(PartitionField::name).collect(Collectors.toList()); + String queryFields = table.schema().columns().stream().map(Types.NestedField::name) + .filter(f -> !partFields.contains(f)).collect(Collectors.joining(",")); + PartitionData partitionData = DataFiles.data(table.spec(), partSpec); + compactionQuery = String.format("insert overwrite table %1$s partition(%2$s) select %4$s from %1$s where %3$s", + compactTableName, partDataToSQL(partitionData, partSpec, ","), + partDataToSQL(partitionData, partSpec, " and "), queryFields); +} + +SessionState sessionState = setupQueryCompactionSession(conf, context.getCompactionInfo(), tblProperties); -SessionState sessionState = setupQueryCompactionSession(context.getConf(), -context.getCompactionInfo(), tblProperties); -HiveConf.setVar(context.getConf(), ConfVars.REWRITE_POLICY, RewritePolicy.ALL_PARTITIONS.name()); try { - DriverUtils.runOnDriver(context.getConf(), sessionState, compactionQuery); + DriverUtils.runOnDriver(conf, sessionState, compactionQuery); LOG.info("Completed compaction for table {}", compactTableName); + return true; } catch (HiveException e) { - LOG.error("Error doing query based {} compaction", RewritePolicy.ALL_PARTITIONS.name(), e); - throw new RuntimeException(e); + LOG.error("Failed compacting table {}", compactTableName, e); + throw e; } finally { sessionState.setCompaction(false); } + } + + private String partDataToSQL(PartitionData partitionData, String partSpec, String delimiter) + throws HiveException { + +try { + Map partSpecMap = Warehouse.makeSpecFromName(partSpec); + StringBuilder sb = new StringBuilder(); + + for (int i = 0; i < partitionData.size(); ++i) { +if (i > 0) { + sb.append(delimiter); +} + +String quoteOpt = ""; +if (partitionData.getType(i).typeId() == Type.TypeID.STRING || +partitionData.getType(i).typeId() == Type.TypeID.DATE || +partitionData.getType(i).typeId() == Type.TypeID.TIME || +partitionData.getType(i).typeId() == Type.TypeID.TIMESTAMP || +partitionData.getType(i).typeId() == Type.TypeID.BINARY) { + quoteOpt = "'"; +} -return true; +String fieldName = partitionData.getSchema().getFields().get(i).name(); + +sb.append(fieldName) +.append("=") +.append(quoteOpt) +.append(partSpecMap.get(fieldName)) + .append(quoteOpt); Review Comment: how about Map partSpecMap = new LinkedHashMap<>(); Warehouse.makeSpecFromName(partSpecMap, new Path(partSpec), null); List partitionKeys = hiveTable.getStorageHandler().getPartitionKeys(hiveTable); List partValues = partitionKeys.stream().map( fs -> String.join("=", HiveUtils.unparseIdentifier(fs.getName()), TypeInfoUtils.convertStringToLiteralForSQL(partSpecMap.get(fs.getName()), ((PrimitiveTypeInfo) TypeInfoUtils.getTypeInfoFromTypeString(fs.getType())).getPrimitiveCategory()) ) ).collect(Collectors.toList()); String queryFields = hiveTable.getCols().stream().map(FieldSchema::getName) .filter(col -> !partSpecMap.containsKey(col)) .collect(Collectors.joining(",")); compactionQuery = String.format("insert overwrite table %1$s partition(%2$s) select %4$s from %1$s where %3$s", compactTableName, StringUtils.join(partValues, ","), StringUtils.join(partValues, " and "), queryFields); } table.getStorageHandler().getColumnInfo looks expensive and unnecessary -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
Re: [PR] HIVE-28077: Iceberg: Major QB Compaction on partition level [hive]
deniskuzZ commented on code in PR #5123: URL: https://github.com/apache/hive/pull/5123#discussion_r1575838382 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/compaction/IcebergMajorQueryCompactor.java: ## @@ -44,22 +55,71 @@ public boolean run(CompactorContext context) throws IOException, HiveException, Map tblProperties = context.getTable().getParameters(); LOG.debug("Initiating compaction for the {} table", compactTableName); -String compactionQuery = String.format("insert overwrite table %s select * from % partFields = table.spec().fields().stream().map(PartitionField::name).collect(Collectors.toList()); + String queryFields = table.schema().columns().stream().map(Types.NestedField::name) + .filter(f -> !partFields.contains(f)).collect(Collectors.joining(",")); + PartitionData partitionData = DataFiles.data(table.spec(), partSpec); + compactionQuery = String.format("insert overwrite table %1$s partition(%2$s) select %4$s from %1$s where %3$s", + compactTableName, partDataToSQL(partitionData, partSpec, ","), + partDataToSQL(partitionData, partSpec, " and "), queryFields); +} + +SessionState sessionState = setupQueryCompactionSession(conf, context.getCompactionInfo(), tblProperties); -SessionState sessionState = setupQueryCompactionSession(context.getConf(), -context.getCompactionInfo(), tblProperties); -HiveConf.setVar(context.getConf(), ConfVars.REWRITE_POLICY, RewritePolicy.ALL_PARTITIONS.name()); try { - DriverUtils.runOnDriver(context.getConf(), sessionState, compactionQuery); + DriverUtils.runOnDriver(conf, sessionState, compactionQuery); LOG.info("Completed compaction for table {}", compactTableName); + return true; } catch (HiveException e) { - LOG.error("Error doing query based {} compaction", RewritePolicy.ALL_PARTITIONS.name(), e); - throw new RuntimeException(e); + LOG.error("Failed compacting table {}", compactTableName, e); + throw e; } finally { sessionState.setCompaction(false); } + } + + private String partDataToSQL(PartitionData partitionData, String partSpec, String delimiter) + throws HiveException { + +try { + Map partSpecMap = Warehouse.makeSpecFromName(partSpec); + StringBuilder sb = new StringBuilder(); + + for (int i = 0; i < partitionData.size(); ++i) { +if (i > 0) { + sb.append(delimiter); +} + +String quoteOpt = ""; +if (partitionData.getType(i).typeId() == Type.TypeID.STRING || +partitionData.getType(i).typeId() == Type.TypeID.DATE || +partitionData.getType(i).typeId() == Type.TypeID.TIME || +partitionData.getType(i).typeId() == Type.TypeID.TIMESTAMP || +partitionData.getType(i).typeId() == Type.TypeID.BINARY) { + quoteOpt = "'"; +} -return true; +String fieldName = partitionData.getSchema().getFields().get(i).name(); + +sb.append(fieldName) +.append("=") +.append(quoteOpt) +.append(partSpecMap.get(fieldName)) + .append(quoteOpt); Review Comment: how about Map partSpecMap = new LinkedHashMap<>(); Warehouse.makeSpecFromName(partSpecMap, new Path(partSpec), null); List partitionKeys = hiveTable.getStorageHandler().getPartitionKeys(hiveTable); List partValues = partitionKeys.stream().map( fs -> String.join("=", HiveUtils.unparseIdentifier(fs.getName()), TypeInfoUtils.convertStringToLiteralForSQL(partSpecMap.get(fs.getName()), ((PrimitiveTypeInfo) TypeInfoUtils.getTypeInfoFromTypeString(fs.getType())).getPrimitiveCategory()) ) ).collect(Collectors.toList()); String queryFields = hiveTable.getCols().stream().filter(col -> !partSpecMap.containsKey(col.getName())) .map(FieldSchema::getName) .collect(Collectors.joining(",")); compactionQuery = String.format("insert overwrite table %1$s partition(%2$s) select %4$s from %1$s where %3$s", compactTableName, StringUtils.join(partValues, ","), StringUtils.join(partValues, " and "), queryFields); table.getStorageHandler().getColumnInfo looks expensive and unnecessary -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
Re: [PR] HIVE-28077: Iceberg: Major QB Compaction on partition level [hive]
deniskuzZ commented on code in PR #5123: URL: https://github.com/apache/hive/pull/5123#discussion_r1575838382 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/compaction/IcebergMajorQueryCompactor.java: ## @@ -44,22 +55,71 @@ public boolean run(CompactorContext context) throws IOException, HiveException, Map tblProperties = context.getTable().getParameters(); LOG.debug("Initiating compaction for the {} table", compactTableName); -String compactionQuery = String.format("insert overwrite table %s select * from % partFields = table.spec().fields().stream().map(PartitionField::name).collect(Collectors.toList()); + String queryFields = table.schema().columns().stream().map(Types.NestedField::name) + .filter(f -> !partFields.contains(f)).collect(Collectors.joining(",")); + PartitionData partitionData = DataFiles.data(table.spec(), partSpec); + compactionQuery = String.format("insert overwrite table %1$s partition(%2$s) select %4$s from %1$s where %3$s", + compactTableName, partDataToSQL(partitionData, partSpec, ","), + partDataToSQL(partitionData, partSpec, " and "), queryFields); +} + +SessionState sessionState = setupQueryCompactionSession(conf, context.getCompactionInfo(), tblProperties); -SessionState sessionState = setupQueryCompactionSession(context.getConf(), -context.getCompactionInfo(), tblProperties); -HiveConf.setVar(context.getConf(), ConfVars.REWRITE_POLICY, RewritePolicy.ALL_PARTITIONS.name()); try { - DriverUtils.runOnDriver(context.getConf(), sessionState, compactionQuery); + DriverUtils.runOnDriver(conf, sessionState, compactionQuery); LOG.info("Completed compaction for table {}", compactTableName); + return true; } catch (HiveException e) { - LOG.error("Error doing query based {} compaction", RewritePolicy.ALL_PARTITIONS.name(), e); - throw new RuntimeException(e); + LOG.error("Failed compacting table {}", compactTableName, e); + throw e; } finally { sessionState.setCompaction(false); } + } + + private String partDataToSQL(PartitionData partitionData, String partSpec, String delimiter) + throws HiveException { + +try { + Map partSpecMap = Warehouse.makeSpecFromName(partSpec); + StringBuilder sb = new StringBuilder(); + + for (int i = 0; i < partitionData.size(); ++i) { +if (i > 0) { + sb.append(delimiter); +} + +String quoteOpt = ""; +if (partitionData.getType(i).typeId() == Type.TypeID.STRING || +partitionData.getType(i).typeId() == Type.TypeID.DATE || +partitionData.getType(i).typeId() == Type.TypeID.TIME || +partitionData.getType(i).typeId() == Type.TypeID.TIMESTAMP || +partitionData.getType(i).typeId() == Type.TypeID.BINARY) { + quoteOpt = "'"; +} -return true; +String fieldName = partitionData.getSchema().getFields().get(i).name(); + +sb.append(fieldName) +.append("=") +.append(quoteOpt) +.append(partSpecMap.get(fieldName)) + .append(quoteOpt); Review Comment: how about Map partSpecMap = Warehouse.makeSpecFromName(partSpec); List partitionKeys = hiveTable.getStorageHandler().getPartitionKeys(hiveTable); List partValues = partitionKeys.stream().map( fs -> String.join("=", HiveUtils.unparseIdentifier(fs.getName()), TypeInfoUtils.convertStringToLiteralForSQL(partSpecMap.get(fs.getName()), ((PrimitiveTypeInfo) TypeInfoUtils.getTypeInfoFromTypeString(fs.getType())).getPrimitiveCategory()) ) ).collect(Collectors.toList()); String queryFields = hiveTable.getCols().stream().filter(col -> !partitionKeys.contains(col)) .map(FieldSchema::getName) .collect(Collectors.joining(",")); compactionQuery = String.format("insert overwrite table %1$s partition(%2$s) select %4$s from %1$s where %3$s", compactTableName, StringUtils.join(partValues, ","), StringUtils.join(partValues, " and "), queryFields); table.getStorageHandler().getColumnInfo looks expensive and unnecessary -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
Re: [PR] HIVE-28143: After HIVE-27492 fix, some HPLSQL built-in functions like trim, lower are not working when used in insert statement. [hive]
kasakrisz commented on code in PR #5182: URL: https://github.com/apache/hive/pull/5182#discussion_r1575886634 ## hplsql/src/main/java/org/apache/hive/hplsql/functions/FunctionDatetime.java: ## @@ -162,6 +174,35 @@ void toTimestamp(HplsqlParser.Expr_func_paramsContext ctx) { } } + /** + * FROM_UNIXTIME() function (convert seconds since 1970-01-01 00:00:00 to timestamp) + */ + void fromUnixtime(HplsqlParser.Expr_func_paramsContext ctx) { +int cnt = BuiltinFunctions.getParamCount(ctx); +if (cnt == 0) { + evalNull(); + return; +} +Var value = evalPop(ctx.func_param(0).expr()); +if (value.type != Var.Type.BIGINT) { + Var newVar = new Var(Var.Type.BIGINT); + value = newVar.cast(value); +} +long epoch = value.longValue(); +String format = "-MM-dd HH:mm:ss"; +if (cnt > 1) { + format = Utils.unquoteString(evalPop(ctx.func_param(1).expr()).toString()); +} +evalString(Utils.quoteString(new SimpleDateFormat(format).format(new Date(epoch * 1000; Review Comment: IIUC by reintroducing this function consecutive calls may return different results depending on which implementation is used: Hive or HPL/SQL IMHO it should be consistent. https://github.com/apache/hive/pull/4965/files#r1444226220 ## hplsql/src/main/java/org/apache/hive/hplsql/functions/FunctionMisc.java: ## @@ -113,7 +155,9 @@ void currentUser(HplsqlParser.Expr_spec_funcContext ctx) { } public static Var currentUser() { -return new Var("CURRENT_USER()"); +String currentUser = System.getProperty("user.name"); +currentUser = Utils.quoteString(currentUser); +return new Var(currentUser); Review Comment: Sorry, I'm missing something. IIUC instances of `Var` are the internal representations of an SQL literal. In case of string the single quote characters in the beginning and the end should be removed at parse time and also single quote characters part of the text should be unescaped. If the original SQL form of the literal is needed there is https://github.com/apache/hive/blob/1a969f6642d4081027040f97e6a689bf2e687db3/hplsql/src/main/java/org/apache/hive/hplsql/Var.java#L620-L631 ## hplsql/src/main/java/org/apache/hive/hplsql/functions/FunctionMisc.java: ## @@ -59,6 +64,43 @@ public void register(BuiltinFunctions f) { void activityCount(HplsqlParser.Expr_spec_funcContext ctx) { evalInt(Long.valueOf(exec.getRowCount())); } + + /** + * CAST function + */ + void cast(HplsqlParser.Expr_spec_funcContext ctx) { +if (ctx.expr().size() != 1) { + evalNull(); + return; +} Review Comment: If it should not be removed why not consider reverting HIVE-27492 ? ## hplsql/src/main/java/org/apache/hive/hplsql/Expression.java: ## @@ -565,7 +575,11 @@ public void operatorConcatSql(HplsqlParser.Expr_concatContext ctx) { sql.append("CONCAT("); int cnt = ctx.expr_concat_item().size(); for (int i = 0; i < cnt; i++) { - sql.append(evalPop(ctx.expr_concat_item(i)).toString()); + String concatStr = evalPop(ctx.expr_concat_item(i)).toString(); + if (!concatStr.startsWith("'")) { Review Comment: How about ``` SELECT '#' || TRIM(''' Hello ') || '#'; ``` trim result is: `' Hello` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
Re: [PR] HIVE-27995: Fix inconsistent behavior of LOAD DATA command for partitoned and non-partitioned tables [hive]
chinnaraolalam commented on code in PR #5000: URL: https://github.com/apache/hive/pull/5000#discussion_r1575895273 ## ql/src/java/org/apache/hadoop/hive/ql/parse/LoadSemanticAnalyzer.java: ## @@ -344,7 +358,7 @@ private void analyzeLoad(ASTNode ast) throws SemanticException { } // make sure the arguments make sense -List files = applyConstraintsAndGetFiles(fromURI, ts.tableHandle); +List files = applyConstraintsAndGetFiles(fromURI, ts.tableHandle, srcs); Review Comment: So earlier in partition case, even though file not exists query was successful. Can you check why it is not throwing earlier. After fix it was validated at server level and it will throw error for non existing file paths. So if any such cases are there in existing applications, those queries need to corrected. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
Re: [PR] HIVE-28077: Iceberg: Major QB Compaction on partition level [hive]
deniskuzZ commented on code in PR #5123: URL: https://github.com/apache/hive/pull/5123#discussion_r1575794738 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java: ## @@ -1915,6 +1917,46 @@ private boolean hasUndergonePartitionEvolution(Table table) { .anyMatch(id -> id < table.spec().specId()); } + private boolean isIdentityPartitionTable(org.apache.hadoop.hive.ql.metadata.Table table) { +return getPartitionTransformSpec(table).stream().map(TransformSpec::getTransformType) +.allMatch(type -> type == TransformSpec.TransformType.IDENTITY); + } + + @Override + public org.apache.commons.lang3.tuple.Pair isEligibleForCompaction( + org.apache.hadoop.hive.ql.metadata.Table table, Map partitionSpec) { +if (partitionSpec != null) { + Table icebergTable = IcebergTableUtil.getTable(conf, table.getTTable()); + if (hasUndergonePartitionEvolution(icebergTable)) { +return org.apache.commons.lang3.tuple.Pair.of(false, ErrorMsg.COMPACTION_PARTITION_EVOLUTION); + } + if (!isIdentityPartitionTable(table)) { +return org.apache.commons.lang3.tuple.Pair.of(false, ErrorMsg.COMPACTION_NON_IDENTITY_PARTITION_SPEC); + } +} +return org.apache.commons.lang3.tuple.Pair.of(true, null); + } + + @Override + public List getPartitions(org.apache.hadoop.hive.ql.metadata.Table table, + Map partitionSpec, short limit) throws SemanticException { Review Comment: why do we need limit here? if we need it, we should pass it to `getPartitionNames` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
Re: [PR] HIVE-28077: Iceberg: Major QB Compaction on partition level [hive]
deniskuzZ commented on code in PR #5123: URL: https://github.com/apache/hive/pull/5123#discussion_r1575838382 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/compaction/IcebergMajorQueryCompactor.java: ## @@ -44,22 +55,71 @@ public boolean run(CompactorContext context) throws IOException, HiveException, Map tblProperties = context.getTable().getParameters(); LOG.debug("Initiating compaction for the {} table", compactTableName); -String compactionQuery = String.format("insert overwrite table %s select * from % partFields = table.spec().fields().stream().map(PartitionField::name).collect(Collectors.toList()); + String queryFields = table.schema().columns().stream().map(Types.NestedField::name) + .filter(f -> !partFields.contains(f)).collect(Collectors.joining(",")); + PartitionData partitionData = DataFiles.data(table.spec(), partSpec); + compactionQuery = String.format("insert overwrite table %1$s partition(%2$s) select %4$s from %1$s where %3$s", + compactTableName, partDataToSQL(partitionData, partSpec, ","), + partDataToSQL(partitionData, partSpec, " and "), queryFields); +} + +SessionState sessionState = setupQueryCompactionSession(conf, context.getCompactionInfo(), tblProperties); -SessionState sessionState = setupQueryCompactionSession(context.getConf(), -context.getCompactionInfo(), tblProperties); -HiveConf.setVar(context.getConf(), ConfVars.REWRITE_POLICY, RewritePolicy.ALL_PARTITIONS.name()); try { - DriverUtils.runOnDriver(context.getConf(), sessionState, compactionQuery); + DriverUtils.runOnDriver(conf, sessionState, compactionQuery); LOG.info("Completed compaction for table {}", compactTableName); + return true; } catch (HiveException e) { - LOG.error("Error doing query based {} compaction", RewritePolicy.ALL_PARTITIONS.name(), e); - throw new RuntimeException(e); + LOG.error("Failed compacting table {}", compactTableName, e); + throw e; } finally { sessionState.setCompaction(false); } + } + + private String partDataToSQL(PartitionData partitionData, String partSpec, String delimiter) + throws HiveException { + +try { + Map partSpecMap = Warehouse.makeSpecFromName(partSpec); + StringBuilder sb = new StringBuilder(); + + for (int i = 0; i < partitionData.size(); ++i) { +if (i > 0) { + sb.append(delimiter); +} + +String quoteOpt = ""; +if (partitionData.getType(i).typeId() == Type.TypeID.STRING || +partitionData.getType(i).typeId() == Type.TypeID.DATE || +partitionData.getType(i).typeId() == Type.TypeID.TIME || +partitionData.getType(i).typeId() == Type.TypeID.TIMESTAMP || +partitionData.getType(i).typeId() == Type.TypeID.BINARY) { + quoteOpt = "'"; +} -return true; +String fieldName = partitionData.getSchema().getFields().get(i).name(); + +sb.append(fieldName) +.append("=") +.append(quoteOpt) +.append(partSpecMap.get(fieldName)) + .append(quoteOpt); Review Comment: how about if (partitionData.getType(i).isPrimitive()) { PrimitiveObjectInspector primitiveObjInspector = IcebergObjectInspector.primitive(partitionData.getType(i)) TypeInfoUtils.convertStringToLiteralForSQL(value, primitiveObjInspector.getPrimitiveCategory()) } else { throw new ... } table.getStorageHandler().getColumnInfo looks expensive and unnecessary -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
Re: [PR] HIVE-28077: Iceberg: Major QB Compaction on partition level [hive]
deniskuzZ commented on code in PR #5123: URL: https://github.com/apache/hive/pull/5123#discussion_r1575838382 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/compaction/IcebergMajorQueryCompactor.java: ## @@ -44,22 +55,71 @@ public boolean run(CompactorContext context) throws IOException, HiveException, Map tblProperties = context.getTable().getParameters(); LOG.debug("Initiating compaction for the {} table", compactTableName); -String compactionQuery = String.format("insert overwrite table %s select * from % partFields = table.spec().fields().stream().map(PartitionField::name).collect(Collectors.toList()); + String queryFields = table.schema().columns().stream().map(Types.NestedField::name) + .filter(f -> !partFields.contains(f)).collect(Collectors.joining(",")); + PartitionData partitionData = DataFiles.data(table.spec(), partSpec); + compactionQuery = String.format("insert overwrite table %1$s partition(%2$s) select %4$s from %1$s where %3$s", + compactTableName, partDataToSQL(partitionData, partSpec, ","), + partDataToSQL(partitionData, partSpec, " and "), queryFields); +} + +SessionState sessionState = setupQueryCompactionSession(conf, context.getCompactionInfo(), tblProperties); -SessionState sessionState = setupQueryCompactionSession(context.getConf(), -context.getCompactionInfo(), tblProperties); -HiveConf.setVar(context.getConf(), ConfVars.REWRITE_POLICY, RewritePolicy.ALL_PARTITIONS.name()); try { - DriverUtils.runOnDriver(context.getConf(), sessionState, compactionQuery); + DriverUtils.runOnDriver(conf, sessionState, compactionQuery); LOG.info("Completed compaction for table {}", compactTableName); + return true; } catch (HiveException e) { - LOG.error("Error doing query based {} compaction", RewritePolicy.ALL_PARTITIONS.name(), e); - throw new RuntimeException(e); + LOG.error("Failed compacting table {}", compactTableName, e); + throw e; } finally { sessionState.setCompaction(false); } + } + + private String partDataToSQL(PartitionData partitionData, String partSpec, String delimiter) + throws HiveException { + +try { + Map partSpecMap = Warehouse.makeSpecFromName(partSpec); + StringBuilder sb = new StringBuilder(); + + for (int i = 0; i < partitionData.size(); ++i) { +if (i > 0) { + sb.append(delimiter); +} + +String quoteOpt = ""; +if (partitionData.getType(i).typeId() == Type.TypeID.STRING || +partitionData.getType(i).typeId() == Type.TypeID.DATE || +partitionData.getType(i).typeId() == Type.TypeID.TIME || +partitionData.getType(i).typeId() == Type.TypeID.TIMESTAMP || +partitionData.getType(i).typeId() == Type.TypeID.BINARY) { + quoteOpt = "'"; +} -return true; +String fieldName = partitionData.getSchema().getFields().get(i).name(); + +sb.append(fieldName) +.append("=") +.append(quoteOpt) +.append(partSpecMap.get(fieldName)) + .append(quoteOpt); Review Comment: how about PrimitiveObjectInspector primitiveObjInspector = IcebergObjectInspector.primitive(partitionData.getType(i)) TypeInfoUtils.convertStringToLiteralForSQL(value, primitiveObjInspector.getPrimitiveCategory()) table.getStorageHandler().getColumnInfo looks expensive and unnecessary -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
Re: [PR] HIVE-28077: Iceberg: Major QB Compaction on partition level [hive]
deniskuzZ commented on code in PR #5123: URL: https://github.com/apache/hive/pull/5123#discussion_r1575838382 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/compaction/IcebergMajorQueryCompactor.java: ## @@ -44,22 +55,71 @@ public boolean run(CompactorContext context) throws IOException, HiveException, Map tblProperties = context.getTable().getParameters(); LOG.debug("Initiating compaction for the {} table", compactTableName); -String compactionQuery = String.format("insert overwrite table %s select * from % partFields = table.spec().fields().stream().map(PartitionField::name).collect(Collectors.toList()); + String queryFields = table.schema().columns().stream().map(Types.NestedField::name) + .filter(f -> !partFields.contains(f)).collect(Collectors.joining(",")); + PartitionData partitionData = DataFiles.data(table.spec(), partSpec); + compactionQuery = String.format("insert overwrite table %1$s partition(%2$s) select %4$s from %1$s where %3$s", + compactTableName, partDataToSQL(partitionData, partSpec, ","), + partDataToSQL(partitionData, partSpec, " and "), queryFields); +} + +SessionState sessionState = setupQueryCompactionSession(conf, context.getCompactionInfo(), tblProperties); -SessionState sessionState = setupQueryCompactionSession(context.getConf(), -context.getCompactionInfo(), tblProperties); -HiveConf.setVar(context.getConf(), ConfVars.REWRITE_POLICY, RewritePolicy.ALL_PARTITIONS.name()); try { - DriverUtils.runOnDriver(context.getConf(), sessionState, compactionQuery); + DriverUtils.runOnDriver(conf, sessionState, compactionQuery); LOG.info("Completed compaction for table {}", compactTableName); + return true; } catch (HiveException e) { - LOG.error("Error doing query based {} compaction", RewritePolicy.ALL_PARTITIONS.name(), e); - throw new RuntimeException(e); + LOG.error("Failed compacting table {}", compactTableName, e); + throw e; } finally { sessionState.setCompaction(false); } + } + + private String partDataToSQL(PartitionData partitionData, String partSpec, String delimiter) + throws HiveException { + +try { + Map partSpecMap = Warehouse.makeSpecFromName(partSpec); + StringBuilder sb = new StringBuilder(); + + for (int i = 0; i < partitionData.size(); ++i) { +if (i > 0) { + sb.append(delimiter); +} + +String quoteOpt = ""; +if (partitionData.getType(i).typeId() == Type.TypeID.STRING || +partitionData.getType(i).typeId() == Type.TypeID.DATE || +partitionData.getType(i).typeId() == Type.TypeID.TIME || +partitionData.getType(i).typeId() == Type.TypeID.TIMESTAMP || +partitionData.getType(i).typeId() == Type.TypeID.BINARY) { + quoteOpt = "'"; +} -return true; +String fieldName = partitionData.getSchema().getFields().get(i).name(); + +sb.append(fieldName) +.append("=") +.append(quoteOpt) +.append(partSpecMap.get(fieldName)) + .append(quoteOpt); Review Comment: how about PrimitiveObjectInspector primitiveObjInspector = IcebergObjectInspector.primitive(partitionData.getType(i)) TypeInfoUtils.convertStringToLiteralForSQL(value, primitiveObjInspector.getPrimitiveCategory()) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
Re: [PR] HIVE-28077: Iceberg: Major QB Compaction on partition level [hive]
deniskuzZ commented on code in PR #5123: URL: https://github.com/apache/hive/pull/5123#discussion_r1575801527 ## ql/src/java/org/apache/hadoop/hive/ql/ddl/table/partition/PartitionUtils.java: ## @@ -95,7 +95,7 @@ public static Partition getPartition(Hive db, Table table, Map p throws SemanticException { Partition partition; try { - partition = db.getPartition(table, partitionSpec, false); + partition = db.getPartition(table, partitionSpec); Review Comment: 3rd param, `forceCreate` was omitted, why? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
Re: [PR] HIVE-28121. (2.3) Use direct SQL for transactional altering table parameter [hive]
pan3793 commented on PR #5204: URL: https://github.com/apache/hive/pull/5204#issuecomment-2071642313 kindly ping @sunchao, are we good to go? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
Re: [PR] HIVE-28077: Iceberg: Major QB Compaction on partition level [hive]
deniskuzZ commented on code in PR #5123: URL: https://github.com/apache/hive/pull/5123#discussion_r1575801527 ## ql/src/java/org/apache/hadoop/hive/ql/ddl/table/partition/PartitionUtils.java: ## @@ -95,7 +95,7 @@ public static Partition getPartition(Hive db, Table table, Map p throws SemanticException { Partition partition; try { - partition = db.getPartition(table, partitionSpec, false); + partition = db.getPartition(table, partitionSpec); Review Comment: 3rd param, `forceCreate` was omitted, why? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
Re: [PR] HIVE-28077: Iceberg: Major QB Compaction on partition level [hive]
deniskuzZ commented on code in PR #5123: URL: https://github.com/apache/hive/pull/5123#discussion_r1575794738 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java: ## @@ -1915,6 +1917,46 @@ private boolean hasUndergonePartitionEvolution(Table table) { .anyMatch(id -> id < table.spec().specId()); } + private boolean isIdentityPartitionTable(org.apache.hadoop.hive.ql.metadata.Table table) { +return getPartitionTransformSpec(table).stream().map(TransformSpec::getTransformType) +.allMatch(type -> type == TransformSpec.TransformType.IDENTITY); + } + + @Override + public org.apache.commons.lang3.tuple.Pair isEligibleForCompaction( + org.apache.hadoop.hive.ql.metadata.Table table, Map partitionSpec) { +if (partitionSpec != null) { + Table icebergTable = IcebergTableUtil.getTable(conf, table.getTTable()); + if (hasUndergonePartitionEvolution(icebergTable)) { +return org.apache.commons.lang3.tuple.Pair.of(false, ErrorMsg.COMPACTION_PARTITION_EVOLUTION); + } + if (!isIdentityPartitionTable(table)) { +return org.apache.commons.lang3.tuple.Pair.of(false, ErrorMsg.COMPACTION_NON_IDENTITY_PARTITION_SPEC); + } +} +return org.apache.commons.lang3.tuple.Pair.of(true, null); + } + + @Override + public List getPartitions(org.apache.hadoop.hive.ql.metadata.Table table, + Map partitionSpec, short limit) throws SemanticException { Review Comment: why do we need limit here? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
Re: [PR] HIVE-28077: Iceberg: Major QB Compaction on partition level [hive]
deniskuzZ commented on code in PR #5123: URL: https://github.com/apache/hive/pull/5123#discussion_r1575794099 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java: ## @@ -1915,6 +1917,46 @@ private boolean hasUndergonePartitionEvolution(Table table) { .anyMatch(id -> id < table.spec().specId()); } + private boolean isIdentityPartitionTable(org.apache.hadoop.hive.ql.metadata.Table table) { +return getPartitionTransformSpec(table).stream().map(TransformSpec::getTransformType) +.allMatch(type -> type == TransformSpec.TransformType.IDENTITY); + } + + @Override + public org.apache.commons.lang3.tuple.Pair isEligibleForCompaction( + org.apache.hadoop.hive.ql.metadata.Table table, Map partitionSpec) { +if (partitionSpec != null) { + Table icebergTable = IcebergTableUtil.getTable(conf, table.getTTable()); + if (hasUndergonePartitionEvolution(icebergTable)) { +return org.apache.commons.lang3.tuple.Pair.of(false, ErrorMsg.COMPACTION_PARTITION_EVOLUTION); + } + if (!isIdentityPartitionTable(table)) { +return org.apache.commons.lang3.tuple.Pair.of(false, ErrorMsg.COMPACTION_NON_IDENTITY_PARTITION_SPEC); + } +} +return org.apache.commons.lang3.tuple.Pair.of(true, null); + } + + @Override + public List getPartitions(org.apache.hadoop.hive.ql.metadata.Table table, + Map partitionSpec, short limit) throws SemanticException { +return table.getStorageHandler().getPartitionNames(table, partitionSpec).stream() +.limit(limit > 0 ? limit : Long.MAX_VALUE) +.map(partName -> new DummyPartition(table, partName, partitionSpec)).collect(Collectors.toList()); + } + + @Override + public Partition getPartition(org.apache.hadoop.hive.ql.metadata.Table table, + Map partitionSpec) throws SemanticException { +validatePartSpec(table, partitionSpec); +try { + String partName = Warehouse.makePartName(partitionSpec, false); + return new DummyPartition(table, partName, partitionSpec); +} catch (MetaException e) { Review Comment: you should throw `MetaException` and catch it in the orig place - Analyzer -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
Re: [PR] HIVE-28077: Iceberg: Major QB Compaction on partition level [hive]
deniskuzZ commented on code in PR #5123: URL: https://github.com/apache/hive/pull/5123#discussion_r1575779868 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java: ## @@ -1915,6 +1917,46 @@ private boolean hasUndergonePartitionEvolution(Table table) { .anyMatch(id -> id < table.spec().specId()); } + private boolean isIdentityPartitionTable(org.apache.hadoop.hive.ql.metadata.Table table) { +return getPartitionTransformSpec(table).stream().map(TransformSpec::getTransformType) +.allMatch(type -> type == TransformSpec.TransformType.IDENTITY); + } + + @Override + public org.apache.commons.lang3.tuple.Pair isEligibleForCompaction( + org.apache.hadoop.hive.ql.metadata.Table table, Map partitionSpec) { +if (partitionSpec != null) { + Table icebergTable = IcebergTableUtil.getTable(conf, table.getTTable()); + if (hasUndergonePartitionEvolution(icebergTable)) { +return org.apache.commons.lang3.tuple.Pair.of(false, ErrorMsg.COMPACTION_PARTITION_EVOLUTION); + } + if (!isIdentityPartitionTable(table)) { +return org.apache.commons.lang3.tuple.Pair.of(false, ErrorMsg.COMPACTION_NON_IDENTITY_PARTITION_SPEC); + } +} +return org.apache.commons.lang3.tuple.Pair.of(true, null); + } + + @Override + public List getPartitions(org.apache.hadoop.hive.ql.metadata.Table table, + Map partitionSpec, short limit) throws SemanticException { +return table.getStorageHandler().getPartitionNames(table, partitionSpec).stream() +.limit(limit > 0 ? limit : Long.MAX_VALUE) Review Comment: `Math.min(limit, Long.MAX_VALUE)` ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
Re: [PR] HIVE-28077: Iceberg: Major QB Compaction on partition level [hive]
deniskuzZ commented on code in PR #5123: URL: https://github.com/apache/hive/pull/5123#discussion_r1575777876 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java: ## @@ -1915,6 +1917,46 @@ private boolean hasUndergonePartitionEvolution(Table table) { .anyMatch(id -> id < table.spec().specId()); } + private boolean isIdentityPartitionTable(org.apache.hadoop.hive.ql.metadata.Table table) { +return getPartitionTransformSpec(table).stream().map(TransformSpec::getTransformType) +.allMatch(type -> type == TransformSpec.TransformType.IDENTITY); + } + + @Override + public org.apache.commons.lang3.tuple.Pair isEligibleForCompaction( + org.apache.hadoop.hive.ql.metadata.Table table, Map partitionSpec) { +if (partitionSpec != null) { + Table icebergTable = IcebergTableUtil.getTable(conf, table.getTTable()); + if (hasUndergonePartitionEvolution(icebergTable)) { +return org.apache.commons.lang3.tuple.Pair.of(false, ErrorMsg.COMPACTION_PARTITION_EVOLUTION); + } + if (!isIdentityPartitionTable(table)) { +return org.apache.commons.lang3.tuple.Pair.of(false, ErrorMsg.COMPACTION_NON_IDENTITY_PARTITION_SPEC); + } +} +return org.apache.commons.lang3.tuple.Pair.of(true, null); + } + + @Override + public List getPartitions(org.apache.hadoop.hive.ql.metadata.Table table, + Map partitionSpec, short limit) throws SemanticException { +return table.getStorageHandler().getPartitionNames(table, partitionSpec).stream() Review Comment: why do you need `table.getStorageHandler()`, call `getPartitionNames` directly -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org