Re: [PR] HIVE-28062: Optimize get_partitions_by_names in direct sql [hive]

2024-03-27 Thread via GitHub


simhadri-g commented on PR #5063:
URL: https://github.com/apache/hive/pull/5063#issuecomment-2022346437

   Hi @zhangbutao ,
   TestConflictingDataFiles is a concurrency test. 
   It seems to be failing only with format version 1. I am working on checking 
why its flaky. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28062: Optimize get_partitions_by_names in direct sql [hive]

2024-03-27 Thread via GitHub


zhangbutao merged PR #5063:
URL: https://github.com/apache/hive/pull/5063


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28062: Optimize get_partitions_by_names in direct sql [hive]

2024-03-26 Thread via GitHub


sonarcloud[bot] commented on PR #5063:
URL: https://github.com/apache/hive/pull/5063#issuecomment-2021874314

   ## [![Quality Gate 
Passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/qg-passed-20px.png
 'Quality Gate 
Passed')](https://sonarcloud.io/dashboard?id=apache_hive=5063) 
**Quality Gate passed**  
   Issues  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/passed-16px.png
 '') [14 New 
issues](https://sonarcloud.io/project/issues?id=apache_hive=5063=false=true)
  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/accepted-16px.png
 '') [0 Accepted 
issues](https://sonarcloud.io/component_measures?id=apache_hive=5063=new_accepted_issues=list)
   
   Measures  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/passed-16px.png
 '') [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=5063=false=true)
  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/no-data-16px.png
 '') No data about Coverage  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/no-data-16px.png
 '') No data about Duplication  
 
   [See analysis details on 
SonarCloud](https://sonarcloud.io/dashboard?id=apache_hive=5063)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28062: Optimize get_partitions_by_names in direct sql [hive]

2024-03-26 Thread via GitHub


zhangbutao commented on PR #5063:
URL: https://github.com/apache/hive/pull/5063#issuecomment-2021816061

   > 
http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-5063/13/tests/
 `org.apache.iceberg.mr.hive.TestConflictingDataFiles::testMultiFiltersUpdate` 
I think this is a flaky test. I have seen this faliure many times. Run a flaky 
check to see results. http://ci.hive.apache.org/job/hive-flaky-check/831/
   
   http://ci.hive.apache.org/job/hive-flaky-check/831/testReport/  test report 
shows that this is a flay test. Hi @simhadri-g , do you have any suggestion 
about this flaky test, as i see you wrote this code not long ago. Should we 
disable it now?
   
   Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28062: Optimize get_partitions_by_names in direct sql [hive]

2024-03-26 Thread via GitHub


zhangbutao commented on PR #5063:
URL: https://github.com/apache/hive/pull/5063#issuecomment-2021803258

   
http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-5063/13/tests/
 
   
`org.apache.iceberg.mr.hive.TestConflictingDataFiles::testMultiFiltersUpdate`   
I think this is a flaky test. I have seen this faliure many times.
   Run a flaky check to see results. 
http://ci.hive.apache.org/job/hive-flaky-check/831/


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28062: Optimize get_partitions_by_names in direct sql [hive]

2024-03-26 Thread via GitHub


sonarcloud[bot] commented on PR #5063:
URL: https://github.com/apache/hive/pull/5063#issuecomment-2020647942

   ## [![Quality Gate 
Passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/qg-passed-20px.png
 'Quality Gate 
Passed')](https://sonarcloud.io/dashboard?id=apache_hive=5063) 
**Quality Gate passed**  
   Issues  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/passed-16px.png
 '') [14 New 
issues](https://sonarcloud.io/project/issues?id=apache_hive=5063=false=true)
  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/accepted-16px.png
 '') [0 Accepted 
issues](https://sonarcloud.io/component_measures?id=apache_hive=5063=new_accepted_issues=list)
   
   Measures  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/passed-16px.png
 '') [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=5063=false=true)
  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/no-data-16px.png
 '') No data about Coverage  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/no-data-16px.png
 '') No data about Duplication  
 
   [See analysis details on 
SonarCloud](https://sonarcloud.io/dashboard?id=apache_hive=5063)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28062: Optimize get_partitions_by_names in direct sql [hive]

2024-03-26 Thread via GitHub


sonarcloud[bot] commented on PR #5063:
URL: https://github.com/apache/hive/pull/5063#issuecomment-2019956758

   ## [![Quality Gate 
Passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/qg-passed-20px.png
 'Quality Gate 
Passed')](https://sonarcloud.io/dashboard?id=apache_hive=5063) 
**Quality Gate passed**  
   Issues  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/passed-16px.png
 '') [14 New 
issues](https://sonarcloud.io/project/issues?id=apache_hive=5063=false=true)
  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/accepted-16px.png
 '') [0 Accepted 
issues](https://sonarcloud.io/component_measures?id=apache_hive=5063=new_accepted_issues=list)
   
   Measures  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/passed-16px.png
 '') [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=5063=false=true)
  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/no-data-16px.png
 '') No data about Coverage  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/no-data-16px.png
 '') No data about Duplication  
 
   [See analysis details on 
SonarCloud](https://sonarcloud.io/dashboard?id=apache_hive=5063)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28062: Optimize get_partitions_by_names in direct sql [hive]

2024-03-26 Thread via GitHub


sonarcloud[bot] commented on PR #5063:
URL: https://github.com/apache/hive/pull/5063#issuecomment-2019476713

   ## [![Quality Gate 
Passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/qg-passed-20px.png
 'Quality Gate 
Passed')](https://sonarcloud.io/dashboard?id=apache_hive=5063) 
**Quality Gate passed**  
   Issues  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/passed-16px.png
 '') [14 New 
issues](https://sonarcloud.io/project/issues?id=apache_hive=5063=false=true)
  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/accepted-16px.png
 '') [0 Accepted 
issues](https://sonarcloud.io/component_measures?id=apache_hive=5063=new_accepted_issues=list)
   
   Measures  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/passed-16px.png
 '') [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=5063=false=true)
  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/no-data-16px.png
 '') No data about Coverage  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/no-data-16px.png
 '') No data about Duplication  
 
   [See analysis details on 
SonarCloud](https://sonarcloud.io/dashboard?id=apache_hive=5063)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28062: Optimize get_partitions_by_names in direct sql [hive]

2024-03-25 Thread via GitHub


sonarcloud[bot] commented on PR #5063:
URL: https://github.com/apache/hive/pull/5063#issuecomment-2019258722

   ## [![Quality Gate 
Passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/qg-passed-20px.png
 'Quality Gate 
Passed')](https://sonarcloud.io/dashboard?id=apache_hive=5063) 
**Quality Gate passed**  
   Issues  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/passed-16px.png
 '') [14 New 
issues](https://sonarcloud.io/project/issues?id=apache_hive=5063=false=true)
  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/accepted-16px.png
 '') [0 Accepted 
issues](https://sonarcloud.io/component_measures?id=apache_hive=5063=new_accepted_issues=list)
   
   Measures  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/passed-16px.png
 '') [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=5063=false=true)
  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/no-data-16px.png
 '') No data about Coverage  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/no-data-16px.png
 '') No data about Duplication  
 
   [See analysis details on 
SonarCloud](https://sonarcloud.io/dashboard?id=apache_hive=5063)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28062: Optimize get_partitions_by_names in direct sql [hive]

2024-03-25 Thread via GitHub


sonarcloud[bot] commented on PR #5063:
URL: https://github.com/apache/hive/pull/5063#issuecomment-2017804936

   ## [![Quality Gate 
Passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/qg-passed-20px.png
 'Quality Gate 
Passed')](https://sonarcloud.io/dashboard?id=apache_hive=5063) 
**Quality Gate passed**  
   Issues  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/passed-16px.png
 '') [14 New 
issues](https://sonarcloud.io/project/issues?id=apache_hive=5063=false=true)
  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/accepted-16px.png
 '') [0 Accepted 
issues](https://sonarcloud.io/component_measures?id=apache_hive=5063=new_accepted_issues=list)
   
   Measures  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/passed-16px.png
 '') [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=5063=false=true)
  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/no-data-16px.png
 '') No data about Coverage  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/no-data-16px.png
 '') No data about Duplication  
 
   [See analysis details on 
SonarCloud](https://sonarcloud.io/dashboard?id=apache_hive=5063)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28062: Optimize get_partitions_by_names in direct sql [hive]

2024-03-22 Thread via GitHub


sonarcloud[bot] commented on PR #5063:
URL: https://github.com/apache/hive/pull/5063#issuecomment-2015736134

   ## [![Quality Gate 
Passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/qg-passed-20px.png
 'Quality Gate 
Passed')](https://sonarcloud.io/dashboard?id=apache_hive=5063) 
**Quality Gate passed**  
   Issues  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/passed-16px.png
 '') [24 New 
issues](https://sonarcloud.io/project/issues?id=apache_hive=5063=false=true)
  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/accepted-16px.png
 '') [0 Accepted 
issues](https://sonarcloud.io/component_measures?id=apache_hive=5063=new_accepted_issues=list)
   
   Measures  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/passed-16px.png
 '') [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=5063=false=true)
  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/no-data-16px.png
 '') No data about Coverage  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/no-data-16px.png
 '') No data about Duplication  
 
   [See analysis details on 
SonarCloud](https://sonarcloud.io/dashboard?id=apache_hive=5063)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28062: Optimize get_partitions_by_names in direct sql [hive]

2024-03-21 Thread via GitHub


sonarcloud[bot] commented on PR #5063:
URL: https://github.com/apache/hive/pull/5063#issuecomment-2014142781

   ## [![Quality Gate 
Passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/qg-passed-20px.png
 'Quality Gate 
Passed')](https://sonarcloud.io/dashboard?id=apache_hive=5063) 
**Quality Gate passed**  
   Issues  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/passed-16px.png
 '') [25 New 
issues](https://sonarcloud.io/project/issues?id=apache_hive=5063=false=true)
  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/accepted-16px.png
 '') [0 Accepted 
issues](https://sonarcloud.io/component_measures?id=apache_hive=5063=new_accepted_issues=list)
   
   Measures  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/passed-16px.png
 '') [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=5063=false=true)
  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/no-data-16px.png
 '') No data about Coverage  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/no-data-16px.png
 '') No data about Duplication  
 
   [See analysis details on 
SonarCloud](https://sonarcloud.io/dashboard?id=apache_hive=5063)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28062: Optimize get_partitions_by_names in direct sql [hive]

2024-03-20 Thread via GitHub


sonarcloud[bot] commented on PR #5063:
URL: https://github.com/apache/hive/pull/5063#issuecomment-2011222081

   ## [![Quality Gate 
Passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/qg-passed-20px.png
 'Quality Gate 
Passed')](https://sonarcloud.io/dashboard?id=apache_hive=5063) 
**Quality Gate passed**  
   Issues  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/passed-16px.png
 '') [14 New 
issues](https://sonarcloud.io/project/issues?id=apache_hive=5063=false=true)
  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/accepted-16px.png
 '') [0 Accepted 
issues](https://sonarcloud.io/component_measures?id=apache_hive=5063=new_accepted_issues=list)
   
   Measures  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/passed-16px.png
 '') [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=5063=false=true)
  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/no-data-16px.png
 '') No data about Coverage  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/no-data-16px.png
 '') No data about Duplication  
 
   [See analysis details on 
SonarCloud](https://sonarcloud.io/dashboard?id=apache_hive=5063)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28062: Optimize get_partitions_by_names in direct sql [hive]

2024-03-20 Thread via GitHub


wecharyu commented on code in PR #5063:
URL: https://github.com/apache/hive/pull/5063#discussion_r1532474771


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java:
##
@@ -1050,17 +1073,26 @@ private List 
getPartitionsFromPartitionIds(String catName, String dbN
 + ".\"SD_ID\" " + "  left outer join " + SERDES + " on " + SDS + 
".\"SERDE_ID\" = "
 + SERDES + ".\"SERDE_ID\" " + "where \"PART_ID\" in (" + partIds
 + ") order by \"PART_NAME\" asc";
+// Keep order by name, consistent with JDO.
+ArrayList orderedResult = new 
ArrayList(partIdList.size());
+populatePartitionsByQuery(catName, dbName, tblName, isView, queryText, 
null, projectionFields,
+isAcidTable, args, orderedResult);
+return orderedResult;
+  }
+
+  private void populatePartitionsByQuery(String catName, String dbName, String 
tblName,

Review Comment:
   Addressed. The initial thought was that the result list size can be 
determined by partition names size, so I want to init the result list before 
calling `populatePartitionsByQuery`, but actually the result size can be 
fetched from the sqlResult.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28062: Optimize get_partitions_by_names in direct sql [hive]

2024-03-19 Thread via GitHub


deniskuzZ commented on code in PR #5063:
URL: https://github.com/apache/hive/pull/5063#discussion_r1530534014


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java:
##
@@ -1050,17 +1073,26 @@ private List 
getPartitionsFromPartitionIds(String catName, String dbN
 + ".\"SD_ID\" " + "  left outer join " + SERDES + " on " + SDS + 
".\"SERDE_ID\" = "
 + SERDES + ".\"SERDE_ID\" " + "where \"PART_ID\" in (" + partIds
 + ") order by \"PART_NAME\" asc";
+// Keep order by name, consistent with JDO.
+ArrayList orderedResult = new 
ArrayList(partIdList.size());
+populatePartitionsByQuery(catName, dbName, tblName, isView, queryText, 
null, projectionFields,
+isAcidTable, args, orderedResult);
+return orderedResult;
+  }
+
+  private void populatePartitionsByQuery(String catName, String dbName, String 
tblName,

Review Comment:
   why not return `List getPartitionsByQuery(..)` instead of passing 
it as an input param?
   PS: looks like there are some unused params, remove them from the parent 
method as well
   
https://sonarcloud.io/project/issues?resolved=false=5063=apache_hive=AY19o4k2MYzl9k2s6F6V
   please check sonar report



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28062: Optimize get_partitions_by_names in direct sql [hive]

2024-03-19 Thread via GitHub


deniskuzZ commented on code in PR #5063:
URL: https://github.com/apache/hive/pull/5063#discussion_r1530534014


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java:
##
@@ -1050,17 +1073,26 @@ private List 
getPartitionsFromPartitionIds(String catName, String dbN
 + ".\"SD_ID\" " + "  left outer join " + SERDES + " on " + SDS + 
".\"SERDE_ID\" = "
 + SERDES + ".\"SERDE_ID\" " + "where \"PART_ID\" in (" + partIds
 + ") order by \"PART_NAME\" asc";
+// Keep order by name, consistent with JDO.
+ArrayList orderedResult = new 
ArrayList(partIdList.size());
+populatePartitionsByQuery(catName, dbName, tblName, isView, queryText, 
null, projectionFields,
+isAcidTable, args, orderedResult);
+return orderedResult;
+  }
+
+  private void populatePartitionsByQuery(String catName, String dbName, String 
tblName,

Review Comment:
   why not return `List getPartitionsByQuery(..)` instead of passing 
it as an input param?
   PS: looks like there are some unused params, remove them from the caller 
method as well
   
https://sonarcloud.io/project/issues?resolved=false=5063=apache_hive=AY19o4k2MYzl9k2s6F6V
   please check sonar report



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28062: Optimize get_partitions_by_names in direct sql [hive]

2024-03-19 Thread via GitHub


deniskuzZ commented on code in PR #5063:
URL: https://github.com/apache/hive/pull/5063#discussion_r1530534014


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java:
##
@@ -1050,17 +1073,26 @@ private List 
getPartitionsFromPartitionIds(String catName, String dbN
 + ".\"SD_ID\" " + "  left outer join " + SERDES + " on " + SDS + 
".\"SERDE_ID\" = "
 + SERDES + ".\"SERDE_ID\" " + "where \"PART_ID\" in (" + partIds
 + ") order by \"PART_NAME\" asc";
+// Keep order by name, consistent with JDO.
+ArrayList orderedResult = new 
ArrayList(partIdList.size());
+populatePartitionsByQuery(catName, dbName, tblName, isView, queryText, 
null, projectionFields,
+isAcidTable, args, orderedResult);
+return orderedResult;
+  }
+
+  private void populatePartitionsByQuery(String catName, String dbName, String 
tblName,

Review Comment:
   why not return `List getPartitionsByQuery(..)` instead of passing 
it as an input param?
   PS: looks like there are unused params:
   
https://sonarcloud.io/project/issues?resolved=false=5063=apache_hive=AY19o4k2MYzl9k2s6F6V
   please check sonar report



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28062: Optimize get_partitions_by_names in direct sql [hive]

2024-03-19 Thread via GitHub


deniskuzZ commented on code in PR #5063:
URL: https://github.com/apache/hive/pull/5063#discussion_r1530534014


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java:
##
@@ -1050,17 +1073,26 @@ private List 
getPartitionsFromPartitionIds(String catName, String dbN
 + ".\"SD_ID\" " + "  left outer join " + SERDES + " on " + SDS + 
".\"SERDE_ID\" = "
 + SERDES + ".\"SERDE_ID\" " + "where \"PART_ID\" in (" + partIds
 + ") order by \"PART_NAME\" asc";
+// Keep order by name, consistent with JDO.
+ArrayList orderedResult = new 
ArrayList(partIdList.size());
+populatePartitionsByQuery(catName, dbName, tblName, isView, queryText, 
null, projectionFields,
+isAcidTable, args, orderedResult);
+return orderedResult;
+  }
+
+  private void populatePartitionsByQuery(String catName, String dbName, String 
tblName,

Review Comment:
   why not return `List getPartitionsByQuery(..)` instead of passing 
it as an input param?
   PS: looks like there are some unused params:
   
https://sonarcloud.io/project/issues?resolved=false=5063=apache_hive=AY19o4k2MYzl9k2s6F6V
   please check sonar report



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28062: Optimize get_partitions_by_names in direct sql [hive]

2024-03-19 Thread via GitHub


deniskuzZ commented on code in PR #5063:
URL: https://github.com/apache/hive/pull/5063#discussion_r1530534014


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java:
##
@@ -1050,17 +1073,26 @@ private List 
getPartitionsFromPartitionIds(String catName, String dbN
 + ".\"SD_ID\" " + "  left outer join " + SERDES + " on " + SDS + 
".\"SERDE_ID\" = "
 + SERDES + ".\"SERDE_ID\" " + "where \"PART_ID\" in (" + partIds
 + ") order by \"PART_NAME\" asc";
+// Keep order by name, consistent with JDO.
+ArrayList orderedResult = new 
ArrayList(partIdList.size());
+populatePartitionsByQuery(catName, dbName, tblName, isView, queryText, 
null, projectionFields,
+isAcidTable, args, orderedResult);
+return orderedResult;
+  }
+
+  private void populatePartitionsByQuery(String catName, String dbName, String 
tblName,

Review Comment:
   why not return `List getPartitionsByQuery(..)` instead of passing 
it as an input param?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28062: Optimize get_partitions_by_names in direct sql [hive]

2024-03-19 Thread via GitHub


zhangbutao commented on code in PR #5063:
URL: https://github.com/apache/hive/pull/5063#discussion_r1529931142


##
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestMetaStoreLimitPartitionRequest.java:
##
@@ -73,6 +73,9 @@ public static void beforeTest() throws Exception {
 conf.setBoolVar(HiveConf.ConfVars.METASTORE_TRY_DIRECT_SQL, true);
 conf.setBoolVar(HiveConf.ConfVars.DYNAMIC_PARTITIONING, true);
 conf.setBoolVar(HiveConf.ConfVars.HIVE_CBO_ENABLED, false);
+// Disable loading dynamic partitions from partition names in this test
+// because get_partitions_by_names will also hit partition limit since 
HIVE-28062.
+
conf.setBoolVar(HiveConf.ConfVars.HIVE_LOAD_DYNAMIC_PARTITIONS_SCAN_SPECIFIC_PARTITIONS,
 false);

Review Comment:
   I know it's just a workaround to fix the failed UT and not related to the 
real cases. But IMO, `hive.load.dynamic.partitions.scan.specific.partitions` 
should not be bound together with `getPartitionsByNames `, and we may can use 
other methods like `getPartitionsBy...` to achieve this in future.
   This part maks me little confusing, but this is just a small fix about the 
UT, so i think it does't matter. :)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28062: Optimize get_partitions_by_names in direct sql [hive]

2024-03-19 Thread via GitHub


wecharyu commented on code in PR #5063:
URL: https://github.com/apache/hive/pull/5063#discussion_r1529894051


##
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestMetaStoreLimitPartitionRequest.java:
##
@@ -73,6 +73,9 @@ public static void beforeTest() throws Exception {
 conf.setBoolVar(HiveConf.ConfVars.METASTORE_TRY_DIRECT_SQL, true);
 conf.setBoolVar(HiveConf.ConfVars.DYNAMIC_PARTITIONING, true);
 conf.setBoolVar(HiveConf.ConfVars.HIVE_CBO_ENABLED, false);
+// Disable loading dynamic partitions from partition names in this test
+// because get_partitions_by_names will also hit partition limit since 
HIVE-28062.
+
conf.setBoolVar(HiveConf.ConfVars.HIVE_LOAD_DYNAMIC_PARTITIONS_SCAN_SPECIFIC_PARTITIONS,
 false);

Review Comment:
   It's an option to determine whether use `getPartitionsByNames` to load 
partitions, in this test we disable it because  the setup of this test would be 
failed by partition limit, which is not related to the real tests actually.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28062: Optimize get_partitions_by_names in direct sql [hive]

2024-03-18 Thread via GitHub


zhangbutao commented on code in PR #5063:
URL: https://github.com/apache/hive/pull/5063#discussion_r1528714188


##
ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java:
##
@@ -3199,19 +3199,24 @@ public Map, Partition> 
loadDynamicPartitions(final LoadTable
   
partitionNames.add(Warehouse.makeDynamicPartNameNoTrailingSeperator(details.fullSpec));
 }
   }
-  List partitions = Hive.get().getPartitionsByNames(tbl, 
partitionNames);
-  for(Partition partition : partitions) {
-LOG.debug("HMS partition spec: {}", partition.getSpec());
-partitionDetailsMap.entrySet().parallelStream()
-.filter(entry -> 
entry.getValue().fullSpec.equals(partition.getSpec()))
-.findAny().ifPresent(entry -> {
-  entry.getValue().partition = partition;
-  entry.getValue().hasOldPartition = true;
-});
-  }
-  // no need to fetch partition again in tasks since we have already 
fetched partitions
-  // info in getPartitionsByNames()
-  fetchPartitionInfo = false;
+  try {
+List partitions = Hive.get().getPartitionsByNames(tbl, 
partitionNames);
+for(Partition partition : partitions) {
+  LOG.debug("HMS partition spec: {}", partition.getSpec());
+  partitionDetailsMap.entrySet().parallelStream()
+  .filter(entry -> 
entry.getValue().fullSpec.equals(partition.getSpec()))
+  .findAny().ifPresent(entry -> {
+entry.getValue().partition = partition;
+entry.getValue().hasOldPartition = true;
+  });
+}
+// no need to fetch partition again in tasks since we have already 
fetched partitions
+// info in getPartitionsByNames()
+fetchPartitionInfo = false;
+  } catch (HiveException e) {
+// Failed to fetch partitions in some cases (e.g., partition number 
limit)
+LOG.warn("Fetching partitions by names failed.", e);

Review Comment:
   Please see comment https://github.com/apache/hive/pull/5063/files#r1528710963



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28062: Optimize get_partitions_by_names in direct sql [hive]

2024-03-18 Thread via GitHub


zhangbutao commented on code in PR #5063:
URL: https://github.com/apache/hive/pull/5063#discussion_r1528710963


##
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestMetaStoreLimitPartitionRequest.java:
##
@@ -73,6 +73,9 @@ public static void beforeTest() throws Exception {
 conf.setBoolVar(HiveConf.ConfVars.METASTORE_TRY_DIRECT_SQL, true);
 conf.setBoolVar(HiveConf.ConfVars.DYNAMIC_PARTITIONING, true);
 conf.setBoolVar(HiveConf.ConfVars.HIVE_CBO_ENABLED, false);
+// Disable loading dynamic partitions from partition names in this test
+// because get_partitions_by_names will also hit partition limit since 
HIVE-28062.
+
conf.setBoolVar(HiveConf.ConfVars.HIVE_LOAD_DYNAMIC_PARTITIONS_SCAN_SPECIFIC_PARTITIONS,
 false);

Review Comment:
   `hive.load.dynamic.partitions.scan.specific.partitions` was inroduced by 
HIVE-25178 to fetch only specific relevant **external** tables partitions by 
partition names(using `getPartitionsByNames `method):
   
https://github.com/apache/hive/blob/ffb1165f59defa66b31b4fd9cb6367b71050071b/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L3187-L3202
   
   Here setting `hive.load.dynamic.partitions.scan.specific.partitions` to 
false will give others a false impression that this propertity can be used to 
control the partitions limit. However, It's just a coincidence as it just skip 
to  invoke `getPartitionsByNames `method when this propertity is false, and it 
has nothing to do with partitions limit.
   
   Am i right?
   If it is ok, i would suggest to think another better to fix this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28062: Optimize get_partitions_by_names in direct sql [hive]

2024-03-18 Thread via GitHub


sonarcloud[bot] commented on PR #5063:
URL: https://github.com/apache/hive/pull/5063#issuecomment-2003169974

   ## [![Quality Gate 
Passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/qg-passed-20px.png
 'Quality Gate 
Passed')](https://sonarcloud.io/dashboard?id=apache_hive=5063) 
**Quality Gate passed**  
   Issues  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/passed-16px.png
 '') [11 New 
issues](https://sonarcloud.io/project/issues?id=apache_hive=5063=false=true)
  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/accepted-16px.png
 '') [0 Accepted 
issues](https://sonarcloud.io/component_measures?id=apache_hive=5063=new_accepted_issues=list)
   
   Measures  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/passed-16px.png
 '') [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=5063=false=true)
  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/no-data-16px.png
 '') No data about Coverage  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/no-data-16px.png
 '') No data about Duplication  
 
   [See analysis details on 
SonarCloud](https://sonarcloud.io/dashboard?id=apache_hive=5063)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28062: Optimize get_partitions_by_names in direct sql [hive]

2024-03-18 Thread via GitHub


wecharyu commented on code in PR #5063:
URL: https://github.com/apache/hive/pull/5063#discussion_r1527951368


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java:
##
@@ -1023,33 +1023,43 @@ public  List getPartitionFieldsViaSqlFilter(
 
 
   /** Should be called with the list short enough to not trip up Oracle/etc. */
-  private List getPartitionsFromPartitionNames(String catName, 
String dbName,
+  private List getPartitionsByNames(String catName, String dbName,
   String tblName, Boolean isView, List partNameList, List 
projectionFields,
   boolean isAcidTable, GetPartitionsArgs args) throws MetaException {
 // Get most of the fields for the partNames provided.
 // Assume db and table names are the same for all partition, as provided 
in arguments.
-String partNames = partNameList.stream()
-.map(name -> "'" + name + "'")
-.collect(Collectors.joining(","));
-String queryText =
-"select " + PARTITIONS + ".\"PART_ID\", " + SDS + ".\"SD_ID\", " + SDS 
+ ".\"CD_ID\"," + " "
-+ SERDES + ".\"SERDE_ID\", " + PARTITIONS + ".\"CREATE_TIME\"," + " " 
+ PARTITIONS
-+ ".\"LAST_ACCESS_TIME\", " + SDS + ".\"INPUT_FORMAT\", " + SDS + 
".\"IS_COMPRESSED\","
-+ " " + SDS + ".\"IS_STOREDASSUBDIRECTORIES\", " + SDS + 
".\"LOCATION\", " + SDS
-+ ".\"NUM_BUCKETS\"," + " " + SDS + ".\"OUTPUT_FORMAT\", " + SERDES + 
".\"NAME\", "
-+ SERDES + ".\"SLIB\", " + PARTITIONS + ".\"WRITE_ID\"" + " from " + 
PARTITIONS + ""
-+ "  left outer join " + SDS + " on " + PARTITIONS + ".\"SD_ID\" = " + 
SDS
-+ ".\"SD_ID\" " + "  left outer join " + SERDES + " on " + SDS + 
".\"SERDE_ID\" = "
-+ SERDES + ".\"SERDE_ID\" " + " inner join " + TBLS + " on " + TBLS + 
".\"TBL_ID\" = "
-+ PARTITIONS + ".\"TBL_ID\" " + " inner join " + DBS + " on " + DBS + 
".\"DB_ID\" = "
-+ TBLS + ".\"DB_ID\" " + "where \"PART_NAME\" in (" + partNames + ") "
-+ " and " + TBLS + ".\"TBL_NAME\" = ? and " + DBS + ".\"NAME\" = ? and 
" + DBS
-+ ".\"CTLG_NAME\" = ? order by \"PART_NAME\" asc";
+List queries = new ArrayList<>();
+StringBuilder prefix = new StringBuilder();
+StringBuilder suffix = new StringBuilder();
+
+List quotedPartNames = partNameList.stream()
+.map(DirectSqlUpdatePart::quoteString)
+.collect(Collectors.toList());
+
+prefix.append(
+"select " + PARTITIONS + ".\"PART_ID\"," + SDS + ".\"SD_ID\"," + SDS + 
".\"CD_ID\","
++ SERDES + ".\"SERDE_ID\"," + PARTITIONS + ".\"CREATE_TIME\"," + 
PARTITIONS
++ ".\"LAST_ACCESS_TIME\"," + SDS + ".\"INPUT_FORMAT\"," + SDS + 
".\"IS_COMPRESSED\","
++ SDS + ".\"IS_STOREDASSUBDIRECTORIES\"," + SDS + ".\"LOCATION\"," + 
SDS
++ ".\"NUM_BUCKETS\"," + SDS + ".\"OUTPUT_FORMAT\"," + SERDES + 
".\"NAME\","
++ SERDES + ".\"SLIB\"," + PARTITIONS + ".\"WRITE_ID\"" + " from " + 
PARTITIONS
++ " left outer join " + SDS + " on " + PARTITIONS + ".\"SD_ID\" = " + 
SDS + ".\"SD_ID\" "
++ " left outer join " + SERDES + " on " + SDS + ".\"SERDE_ID\" = " + 
SERDES + ".\"SERDE_ID\" "
++ " inner join " + TBLS + " on " + TBLS + ".\"TBL_ID\" = " + 
PARTITIONS + ".\"TBL_ID\" "
++ " inner join " + DBS + " on " + DBS + ".\"DB_ID\" = " + TBLS + 
".\"DB_ID\" where ");
+suffix.append(
+" and " + TBLS + ".\"TBL_NAME\" = ? and " + DBS + ".\"NAME\" = ? and " 
+ DBS
++ ".\"CTLG_NAME\" = ? order by \"PART_NAME\" asc");
+
+TxnUtils.buildQueryWithINClauseStrings(conf, queries, prefix, suffix, 
quotedPartNames,

Review Comment:
   Addressed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28062: Optimize get_partitions_by_names in direct sql [hive]

2024-03-18 Thread via GitHub


wecharyu commented on code in PR #5063:
URL: https://github.com/apache/hive/pull/5063#discussion_r1527951041


##
ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java:
##
@@ -3199,19 +3199,24 @@ public Map, Partition> 
loadDynamicPartitions(final LoadTable
   
partitionNames.add(Warehouse.makeDynamicPartNameNoTrailingSeperator(details.fullSpec));
 }
   }
-  List partitions = Hive.get().getPartitionsByNames(tbl, 
partitionNames);
-  for(Partition partition : partitions) {
-LOG.debug("HMS partition spec: {}", partition.getSpec());
-partitionDetailsMap.entrySet().parallelStream()
-.filter(entry -> 
entry.getValue().fullSpec.equals(partition.getSpec()))
-.findAny().ifPresent(entry -> {
-  entry.getValue().partition = partition;
-  entry.getValue().hasOldPartition = true;
-});
-  }
-  // no need to fetch partition again in tasks since we have already 
fetched partitions
-  // info in getPartitionsByNames()
-  fetchPartitionInfo = false;
+  try {
+List partitions = Hive.get().getPartitionsByNames(tbl, 
partitionNames);
+for(Partition partition : partitions) {
+  LOG.debug("HMS partition spec: {}", partition.getSpec());
+  partitionDetailsMap.entrySet().parallelStream()
+  .filter(entry -> 
entry.getValue().fullSpec.equals(partition.getSpec()))
+  .findAny().ifPresent(entry -> {
+entry.getValue().partition = partition;
+entry.getValue().hasOldPartition = true;
+  });
+}
+// no need to fetch partition again in tasks since we have already 
fetched partitions
+// info in getPartitionsByNames()
+fetchPartitionInfo = false;
+  } catch (HiveException e) {
+// Failed to fetch partitions in some cases (e.g., partition number 
limit)
+LOG.warn("Fetching partitions by names failed.", e);

Review Comment:
   Reverted this change.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28062: Optimize get_partitions_by_names in direct sql [hive]

2024-03-15 Thread via GitHub


deniskuzZ commented on code in PR #5063:
URL: https://github.com/apache/hive/pull/5063#discussion_r1525998481


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java:
##
@@ -1023,33 +1023,43 @@ public  List getPartitionFieldsViaSqlFilter(
 
 
   /** Should be called with the list short enough to not trip up Oracle/etc. */
-  private List getPartitionsFromPartitionNames(String catName, 
String dbName,
+  private List getPartitionsByNames(String catName, String dbName,
   String tblName, Boolean isView, List partNameList, List 
projectionFields,
   boolean isAcidTable, GetPartitionsArgs args) throws MetaException {
 // Get most of the fields for the partNames provided.
 // Assume db and table names are the same for all partition, as provided 
in arguments.
-String partNames = partNameList.stream()
-.map(name -> "'" + name + "'")
-.collect(Collectors.joining(","));
-String queryText =
-"select " + PARTITIONS + ".\"PART_ID\", " + SDS + ".\"SD_ID\", " + SDS 
+ ".\"CD_ID\"," + " "
-+ SERDES + ".\"SERDE_ID\", " + PARTITIONS + ".\"CREATE_TIME\"," + " " 
+ PARTITIONS
-+ ".\"LAST_ACCESS_TIME\", " + SDS + ".\"INPUT_FORMAT\", " + SDS + 
".\"IS_COMPRESSED\","
-+ " " + SDS + ".\"IS_STOREDASSUBDIRECTORIES\", " + SDS + 
".\"LOCATION\", " + SDS
-+ ".\"NUM_BUCKETS\"," + " " + SDS + ".\"OUTPUT_FORMAT\", " + SERDES + 
".\"NAME\", "
-+ SERDES + ".\"SLIB\", " + PARTITIONS + ".\"WRITE_ID\"" + " from " + 
PARTITIONS + ""
-+ "  left outer join " + SDS + " on " + PARTITIONS + ".\"SD_ID\" = " + 
SDS
-+ ".\"SD_ID\" " + "  left outer join " + SERDES + " on " + SDS + 
".\"SERDE_ID\" = "
-+ SERDES + ".\"SERDE_ID\" " + " inner join " + TBLS + " on " + TBLS + 
".\"TBL_ID\" = "
-+ PARTITIONS + ".\"TBL_ID\" " + " inner join " + DBS + " on " + DBS + 
".\"DB_ID\" = "
-+ TBLS + ".\"DB_ID\" " + "where \"PART_NAME\" in (" + partNames + ") "
-+ " and " + TBLS + ".\"TBL_NAME\" = ? and " + DBS + ".\"NAME\" = ? and 
" + DBS
-+ ".\"CTLG_NAME\" = ? order by \"PART_NAME\" asc";
+List queries = new ArrayList<>();
+StringBuilder prefix = new StringBuilder();
+StringBuilder suffix = new StringBuilder();
+
+List quotedPartNames = partNameList.stream()
+.map(DirectSqlUpdatePart::quoteString)
+.collect(Collectors.toList());
+
+prefix.append(
+"select " + PARTITIONS + ".\"PART_ID\"," + SDS + ".\"SD_ID\"," + SDS + 
".\"CD_ID\","
++ SERDES + ".\"SERDE_ID\"," + PARTITIONS + ".\"CREATE_TIME\"," + 
PARTITIONS
++ ".\"LAST_ACCESS_TIME\"," + SDS + ".\"INPUT_FORMAT\"," + SDS + 
".\"IS_COMPRESSED\","
++ SDS + ".\"IS_STOREDASSUBDIRECTORIES\"," + SDS + ".\"LOCATION\"," + 
SDS
++ ".\"NUM_BUCKETS\"," + SDS + ".\"OUTPUT_FORMAT\"," + SERDES + 
".\"NAME\","
++ SERDES + ".\"SLIB\"," + PARTITIONS + ".\"WRITE_ID\"" + " from " + 
PARTITIONS
++ " left outer join " + SDS + " on " + PARTITIONS + ".\"SD_ID\" = " + 
SDS + ".\"SD_ID\" "
++ " left outer join " + SERDES + " on " + SDS + ".\"SERDE_ID\" = " + 
SERDES + ".\"SERDE_ID\" "
++ " inner join " + TBLS + " on " + TBLS + ".\"TBL_ID\" = " + 
PARTITIONS + ".\"TBL_ID\" "
++ " inner join " + DBS + " on " + DBS + ".\"DB_ID\" = " + TBLS + 
".\"DB_ID\" where ");
+suffix.append(
+" and " + TBLS + ".\"TBL_NAME\" = ? and " + DBS + ".\"NAME\" = ? and " 
+ DBS
++ ".\"CTLG_NAME\" = ? order by \"PART_NAME\" asc");
+
+TxnUtils.buildQueryWithINClauseStrings(conf, queries, prefix, suffix, 
quotedPartNames,

Review Comment:
   sorry, I didn't notice that it's already under `Batchable.runBatched`. Not 
sure we want to use `TxnUtils.buildQueryWithINClauseStrings` here as it's used 
to construct mini-batches as well. 
   Simple `IN` should be sufficient:
   
String quotedPartNames = partNameList.stream()
   .map(DirectSqlUpdatePart::quoteString)
   .collect(Collectors.joining(","));
   
   "where \"PART_NAME\" IN (" + quotedPartNames + ") "
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28062: Optimize get_partitions_by_names in direct sql [hive]

2024-03-15 Thread via GitHub


deniskuzZ commented on code in PR #5063:
URL: https://github.com/apache/hive/pull/5063#discussion_r1525946469


##
ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java:
##
@@ -3199,19 +3199,24 @@ public Map, Partition> 
loadDynamicPartitions(final LoadTable
   
partitionNames.add(Warehouse.makeDynamicPartNameNoTrailingSeperator(details.fullSpec));
 }
   }
-  List partitions = Hive.get().getPartitionsByNames(tbl, 
partitionNames);
-  for(Partition partition : partitions) {
-LOG.debug("HMS partition spec: {}", partition.getSpec());
-partitionDetailsMap.entrySet().parallelStream()
-.filter(entry -> 
entry.getValue().fullSpec.equals(partition.getSpec()))
-.findAny().ifPresent(entry -> {
-  entry.getValue().partition = partition;
-  entry.getValue().hasOldPartition = true;
-});
-  }
-  // no need to fetch partition again in tasks since we have already 
fetched partitions
-  // info in getPartitionsByNames()
-  fetchPartitionInfo = false;
+  try {
+List partitions = Hive.get().getPartitionsByNames(tbl, 
partitionNames);
+for(Partition partition : partitions) {
+  LOG.debug("HMS partition spec: {}", partition.getSpec());
+  partitionDetailsMap.entrySet().parallelStream()
+  .filter(entry -> 
entry.getValue().fullSpec.equals(partition.getSpec()))
+  .findAny().ifPresent(entry -> {
+entry.getValue().partition = partition;
+entry.getValue().hasOldPartition = true;
+  });
+}
+// no need to fetch partition again in tasks since we have already 
fetched partitions
+// info in getPartitionsByNames()
+fetchPartitionInfo = false;
+  } catch (HiveException e) {
+// Failed to fetch partitions in some cases (e.g., partition number 
limit)
+LOG.warn("Fetching partitions by names failed.", e);

Review Comment:
   The only way the client realizes that there is a problem (limit exceeded) is 
if he checks the logs. I think we should throw an exception



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28062: Optimize get_partitions_by_names in direct sql [hive]

2024-03-14 Thread via GitHub


sonarcloud[bot] commented on PR #5063:
URL: https://github.com/apache/hive/pull/5063#issuecomment-1998078108

   ## [![Quality Gate 
Passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/qg-passed-20px.png
 'Quality Gate 
Passed')](https://sonarcloud.io/dashboard?id=apache_hive=5063) 
**Quality Gate passed**  
   Issues  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/passed-16px.png
 '') [12 New 
issues](https://sonarcloud.io/project/issues?id=apache_hive=5063=false=true)
  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/accepted-16px.png
 '') [0 Accepted 
issues](https://sonarcloud.io/component_measures?id=apache_hive=5063=new_accepted_issues=list)
   
   Measures  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/passed-16px.png
 '') [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=5063=false=true)
  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/no-data-16px.png
 '') No data about Coverage  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/no-data-16px.png
 '') No data about Duplication  
 
   [See analysis details on 
SonarCloud](https://sonarcloud.io/dashboard?id=apache_hive=5063)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28062: Optimize get_partitions_by_names in direct sql [hive]

2024-03-14 Thread via GitHub


wecharyu commented on code in PR #5063:
URL: https://github.com/apache/hive/pull/5063#discussion_r1525193228


##
ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java:
##
@@ -3199,19 +3199,24 @@ public Map, Partition> 
loadDynamicPartitions(final LoadTable
   
partitionNames.add(Warehouse.makeDynamicPartNameNoTrailingSeperator(details.fullSpec));
 }
   }
-  List partitions = Hive.get().getPartitionsByNames(tbl, 
partitionNames);
-  for(Partition partition : partitions) {
-LOG.debug("HMS partition spec: {}", partition.getSpec());
-partitionDetailsMap.entrySet().parallelStream()
-.filter(entry -> 
entry.getValue().fullSpec.equals(partition.getSpec()))
-.findAny().ifPresent(entry -> {
-  entry.getValue().partition = partition;
-  entry.getValue().hasOldPartition = true;
-});
-  }
-  // no need to fetch partition again in tasks since we have already 
fetched partitions
-  // info in getPartitionsByNames()
-  fetchPartitionInfo = false;
+  try {
+List partitions = Hive.get().getPartitionsByNames(tbl, 
partitionNames);
+for(Partition partition : partitions) {
+  LOG.debug("HMS partition spec: {}", partition.getSpec());
+  partitionDetailsMap.entrySet().parallelStream()
+  .filter(entry -> 
entry.getValue().fullSpec.equals(partition.getSpec()))
+  .findAny().ifPresent(entry -> {
+entry.getValue().partition = partition;
+entry.getValue().hasOldPartition = true;
+  });
+}
+// no need to fetch partition again in tasks since we have already 
fetched partitions
+// info in getPartitionsByNames()
+fetchPartitionInfo = false;
+  } catch (HiveException e) {
+// Failed to fetch partitions in some cases (e.g., partition number 
limit)
+LOG.warn("Fetching partitions by names failed.", e);

Review Comment:
   I think partition limit is also necessary in `get_partitions_by_names` to 
avoid OOM issue, as for whether we need throw exception to client here, I think 
as a provided client it's reasonable to make the code as robust as possible so 
I want such fallback mechanism.
   If you still prefer throw this exception, I will agree and update the code. 
@zhangbutao @deniskuzZ 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28062: Optimize get_partitions_by_names in direct sql [hive]

2024-03-14 Thread via GitHub


wecharyu commented on code in PR #5063:
URL: https://github.com/apache/hive/pull/5063#discussion_r1525085003


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java:
##
@@ -1027,15 +1021,42 @@ public  List getPartitionFieldsViaSqlFilter(
 }
   }
 
+
+  /** Should be called with the list short enough to not trip up Oracle/etc. */
+  private List getPartitionsFromPartitionNames(String catName, 
String dbName,
+  String tblName, Boolean isView, List partNameList, List 
projectionFields,
+  boolean isAcidTable, GetPartitionsArgs args) throws MetaException {
+// Get most of the fields for the partNames provided.
+// Assume db and table names are the same for all partition, as provided 
in arguments.
+String partNames = partNameList.stream()
+.map(name -> "'" + name + "'")
+.collect(Collectors.joining(","));
+String queryText =
+"select " + PARTITIONS + ".\"PART_ID\", " + SDS + ".\"SD_ID\", " + SDS 
+ ".\"CD_ID\"," + " "

Review Comment:
   Sure.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28062: Optimize get_partitions_by_names in direct sql [hive]

2024-03-14 Thread via GitHub


wecharyu commented on code in PR #5063:
URL: https://github.com/apache/hive/pull/5063#discussion_r1525045863


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java:
##
@@ -742,14 +742,8 @@ public List getPartitionsViaPartNames(final 
String catName, final Str
 return Batchable.runBatched(batchSize, partNames, new Batchable() {
   @Override
   public List run(List input) throws MetaException {
-String filter = "" + PARTITIONS + ".\"PART_NAME\" in (" + 
makeParams(input.size()) + ")";
-List partitionIds = getPartitionIdsViaSqlFilter(catName, dbName, 
tblName,
-filter, input, Collections.emptyList(), null);
-if (partitionIds.isEmpty()) {
-  return Collections.emptyList(); // no partitions, bail early.
-}
-return getPartitionsFromPartitionIds(catName, dbName, tblName, null,
-partitionIds, Collections.emptyList(), false, args);
+return getPartitionsFromPartitionNames(catName, dbName, tblName, null,

Review Comment:
   Addressed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28062: Optimize get_partitions_by_names in direct sql [hive]

2024-03-14 Thread via GitHub


zhangbutao commented on code in PR #5063:
URL: https://github.com/apache/hive/pull/5063#discussion_r1524613898


##
ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java:
##
@@ -3199,19 +3199,24 @@ public Map, Partition> 
loadDynamicPartitions(final LoadTable
   
partitionNames.add(Warehouse.makeDynamicPartNameNoTrailingSeperator(details.fullSpec));
 }
   }
-  List partitions = Hive.get().getPartitionsByNames(tbl, 
partitionNames);
-  for(Partition partition : partitions) {
-LOG.debug("HMS partition spec: {}", partition.getSpec());
-partitionDetailsMap.entrySet().parallelStream()
-.filter(entry -> 
entry.getValue().fullSpec.equals(partition.getSpec()))
-.findAny().ifPresent(entry -> {
-  entry.getValue().partition = partition;
-  entry.getValue().hasOldPartition = true;
-});
-  }
-  // no need to fetch partition again in tasks since we have already 
fetched partitions
-  // info in getPartitionsByNames()
-  fetchPartitionInfo = false;
+  try {
+List partitions = Hive.get().getPartitionsByNames(tbl, 
partitionNames);
+for(Partition partition : partitions) {
+  LOG.debug("HMS partition spec: {}", partition.getSpec());
+  partitionDetailsMap.entrySet().parallelStream()
+  .filter(entry -> 
entry.getValue().fullSpec.equals(partition.getSpec()))
+  .findAny().ifPresent(entry -> {
+entry.getValue().partition = partition;
+entry.getValue().hasOldPartition = true;
+  });
+}
+// no need to fetch partition again in tasks since we have already 
fetched partitions
+// info in getPartitionsByNames()
+fetchPartitionInfo = false;
+  } catch (HiveException e) {
+// Failed to fetch partitions in some cases (e.g., partition number 
limit)
+LOG.warn("Fetching partitions by names failed.", e);

Review Comment:
   I didn't see any fallback code block. BTW, i think no one will care about 
the warn log, so client side(spark & hiveserver2) maybe miss this warn 
exception, and then they maybe lose some partition data.
   
   Therefore, if this PR must change the client behavior, such as the 
Introduced `partition limit exception` , (see the previous ci 
http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-5063/1/tests/
 ), we need throw this exception. But as i said above, it would better to not 
to introduce this incompatible behavior if possible.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28062: Optimize get_partitions_by_names in direct sql [hive]

2024-03-14 Thread via GitHub


deniskuzZ commented on code in PR #5063:
URL: https://github.com/apache/hive/pull/5063#discussion_r1524595561


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java:
##
@@ -1027,15 +1021,42 @@ public  List getPartitionFieldsViaSqlFilter(
 }
   }
 
+
+  /** Should be called with the list short enough to not trip up Oracle/etc. */
+  private List getPartitionsFromPartitionNames(String catName, 
String dbName,
+  String tblName, Boolean isView, List partNameList, List 
projectionFields,
+  boolean isAcidTable, GetPartitionsArgs args) throws MetaException {
+// Get most of the fields for the partNames provided.
+// Assume db and table names are the same for all partition, as provided 
in arguments.
+String partNames = partNameList.stream()
+.map(name -> "'" + name + "'")
+.collect(Collectors.joining(","));
+String queryText =
+"select " + PARTITIONS + ".\"PART_ID\", " + SDS + ".\"SD_ID\", " + SDS 
+ ".\"CD_ID\"," + " "

Review Comment:
   can we reformat the query for readability? put joins on new lines



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28062: Optimize get_partitions_by_names in direct sql [hive]

2024-03-14 Thread via GitHub


deniskuzZ commented on code in PR #5063:
URL: https://github.com/apache/hive/pull/5063#discussion_r1524591862


##
ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java:
##
@@ -3199,19 +3199,24 @@ public Map, Partition> 
loadDynamicPartitions(final LoadTable
   
partitionNames.add(Warehouse.makeDynamicPartNameNoTrailingSeperator(details.fullSpec));
 }
   }
-  List partitions = Hive.get().getPartitionsByNames(tbl, 
partitionNames);
-  for(Partition partition : partitions) {
-LOG.debug("HMS partition spec: {}", partition.getSpec());
-partitionDetailsMap.entrySet().parallelStream()
-.filter(entry -> 
entry.getValue().fullSpec.equals(partition.getSpec()))
-.findAny().ifPresent(entry -> {
-  entry.getValue().partition = partition;
-  entry.getValue().hasOldPartition = true;
-});
-  }
-  // no need to fetch partition again in tasks since we have already 
fetched partitions
-  // info in getPartitionsByNames()
-  fetchPartitionInfo = false;
+  try {
+List partitions = Hive.get().getPartitionsByNames(tbl, 
partitionNames);
+for(Partition partition : partitions) {
+  LOG.debug("HMS partition spec: {}", partition.getSpec());
+  partitionDetailsMap.entrySet().parallelStream()
+  .filter(entry -> 
entry.getValue().fullSpec.equals(partition.getSpec()))
+  .findAny().ifPresent(entry -> {
+entry.getValue().partition = partition;
+entry.getValue().hasOldPartition = true;
+  });
+}
+// no need to fetch partition again in tasks since we have already 
fetched partitions
+// info in getPartitionsByNames()
+fetchPartitionInfo = false;
+  } catch (HiveException e) {
+// Failed to fetch partitions in some cases (e.g., partition number 
limit)
+LOG.warn("Fetching partitions by names failed.", e);

Review Comment:
   oh, is the fallback code below? i don't see we use the same `partitionNames` 
list there.
   what would happen if limit is exceeded? we'll catch the exception and try to 
load 1-by-1? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28062: Optimize get_partitions_by_names in direct sql [hive]

2024-03-14 Thread via GitHub


deniskuzZ commented on PR #5063:
URL: https://github.com/apache/hive/pull/5063#issuecomment-1997085048

   Does this PR introduce any user-facing change?
   No
   
   We didn't have a limit before, didn't we? Right now user might get an 
exception when the limit is exceeded


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28062: Optimize get_partitions_by_names in direct sql [hive]

2024-03-14 Thread via GitHub


deniskuzZ commented on code in PR #5063:
URL: https://github.com/apache/hive/pull/5063#discussion_r1524567598


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java:
##
@@ -742,14 +742,8 @@ public List getPartitionsViaPartNames(final 
String catName, final Str
 return Batchable.runBatched(batchSize, partNames, new Batchable() {
   @Override
   public List run(List input) throws MetaException {
-String filter = "" + PARTITIONS + ".\"PART_NAME\" in (" + 
makeParams(input.size()) + ")";
-List partitionIds = getPartitionIdsViaSqlFilter(catName, dbName, 
tblName,
-filter, input, Collections.emptyList(), null);
-if (partitionIds.isEmpty()) {
-  return Collections.emptyList(); // no partitions, bail early.
-}
-return getPartitionsFromPartitionIds(catName, dbName, tblName, null,
-partitionIds, Collections.emptyList(), false, args);
+return getPartitionsFromPartitionNames(catName, dbName, tblName, null,

Review Comment:
   i think that would be more readable, btw same naming is used across the code



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28062: Optimize get_partitions_by_names in direct sql [hive]

2024-03-14 Thread via GitHub


deniskuzZ commented on code in PR #5063:
URL: https://github.com/apache/hive/pull/5063#discussion_r1524565460


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java:
##
@@ -7516,14 +7516,14 @@ public List get_partitions_by_names(final 
String dbName, final String
 Table table = null;
 Exception ex = null;
 boolean success = false;
-startTableFunction("get_partitions_by_names", parsedCatName, parsedDbName,
-tblName);
+startTableFunction("get_partitions_by_names", parsedCatName, parsedDbName, 
tblName);
 try {
   getMS().openTransaction();
   authorizeTableForPartitionMetadata(parsedCatName, parsedDbName, tblName);
 
   fireReadTablePreEvent(parsedCatName, parsedDbName, tblName);
 
+  checkLimitNumberOfPartitions(tblName, args.getPartNames().size(), -1);

Review Comment:
   ok



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28062: Optimize get_partitions_by_names in direct sql [hive]

2024-03-14 Thread via GitHub


wecharyu commented on code in PR #5063:
URL: https://github.com/apache/hive/pull/5063#discussion_r1524436873


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java:
##
@@ -742,14 +742,8 @@ public List getPartitionsViaPartNames(final 
String catName, final Str
 return Batchable.runBatched(batchSize, partNames, new Batchable() {
   @Override
   public List run(List input) throws MetaException {
-String filter = "" + PARTITIONS + ".\"PART_NAME\" in (" + 
makeParams(input.size()) + ")";
-List partitionIds = getPartitionIdsViaSqlFilter(catName, dbName, 
tblName,
-filter, input, Collections.emptyList(), null);
-if (partitionIds.isEmpty()) {
-  return Collections.emptyList(); // no partitions, bail early.
-}
-return getPartitionsFromPartitionIds(catName, dbName, tblName, null,
-partitionIds, Collections.emptyList(), false, args);
+return getPartitionsFromPartitionNames(catName, dbName, tblName, null,

Review Comment:
   I named this new method in the format of `getPartitionsFromPartitionIds()`, 
do we need change it?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28062: Optimize get_partitions_by_names in direct sql [hive]

2024-03-14 Thread via GitHub


wecharyu commented on code in PR #5063:
URL: https://github.com/apache/hive/pull/5063#discussion_r1524339448


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java:
##
@@ -7516,14 +7516,14 @@ public List get_partitions_by_names(final 
String dbName, final String
 Table table = null;
 Exception ex = null;
 boolean success = false;
-startTableFunction("get_partitions_by_names", parsedCatName, parsedDbName,
-tblName);
+startTableFunction("get_partitions_by_names", parsedCatName, parsedDbName, 
tblName);
 try {
   getMS().openTransaction();
   authorizeTableForPartitionMetadata(parsedCatName, parsedDbName, tblName);
 
   fireReadTablePreEvent(parsedCatName, parsedDbName, tblName);
 
+  checkLimitNumberOfPartitions(tblName, args.getPartNames().size(), -1);

Review Comment:
   Yes, it check by the total size of input names, we assume that all input 
partition names are valid and related partitions exist.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28062: Optimize get_partitions_by_names in direct sql [hive]

2024-03-14 Thread via GitHub


wecharyu commented on code in PR #5063:
URL: https://github.com/apache/hive/pull/5063#discussion_r1524324359


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java:
##
@@ -742,14 +742,8 @@ public List getPartitionsViaPartNames(final 
String catName, final Str
 return Batchable.runBatched(batchSize, partNames, new Batchable() {
   @Override
   public List run(List input) throws MetaException {
-String filter = "" + PARTITIONS + ".\"PART_NAME\" in (" + 
makeParams(input.size()) + ")";
-List partitionIds = getPartitionIdsViaSqlFilter(catName, dbName, 
tblName,

Review Comment:
   Does not find the clues, I guess it's because there are existing 
`getPartitionIdsViaSqlFilter()` and `getPartitionsFromPartitionIds` methods 
that can be used directly.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28062: Optimize get_partitions_by_names in direct sql [hive]

2024-03-13 Thread via GitHub


wecharyu commented on code in PR #5063:
URL: https://github.com/apache/hive/pull/5063#discussion_r1524161004


##
ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java:
##
@@ -3199,19 +3199,24 @@ public Map, Partition> 
loadDynamicPartitions(final LoadTable
   
partitionNames.add(Warehouse.makeDynamicPartNameNoTrailingSeperator(details.fullSpec));
 }
   }
-  List partitions = Hive.get().getPartitionsByNames(tbl, 
partitionNames);
-  for(Partition partition : partitions) {
-LOG.debug("HMS partition spec: {}", partition.getSpec());
-partitionDetailsMap.entrySet().parallelStream()
-.filter(entry -> 
entry.getValue().fullSpec.equals(partition.getSpec()))
-.findAny().ifPresent(entry -> {
-  entry.getValue().partition = partition;
-  entry.getValue().hasOldPartition = true;
-});
-  }
-  // no need to fetch partition again in tasks since we have already 
fetched partitions
-  // info in getPartitionsByNames()
-  fetchPartitionInfo = false;
+  try {
+List partitions = Hive.get().getPartitionsByNames(tbl, 
partitionNames);
+for(Partition partition : partitions) {
+  LOG.debug("HMS partition spec: {}", partition.getSpec());
+  partitionDetailsMap.entrySet().parallelStream()
+  .filter(entry -> 
entry.getValue().fullSpec.equals(partition.getSpec()))
+  .findAny().ifPresent(entry -> {
+entry.getValue().partition = partition;
+entry.getValue().hasOldPartition = true;
+  });
+}
+// no need to fetch partition again in tasks since we have already 
fetched partitions
+// info in getPartitionsByNames()
+fetchPartitionInfo = false;
+  } catch (HiveException e) {
+// Failed to fetch partitions in some cases (e.g., partition number 
limit)
+LOG.warn("Fetching partitions by names failed.", e);

Review Comment:
   There are two ways to fetch partitions from hms in `loadDynamicPartitions()`:
   1. get partitions by names
   2. get partitions one by one
   I think it's reasonable to fallback to the second if the first one failed. 
And we have a warn log to remind client what error happened in 
getPartitionsByNames call.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28062: Optimize get_partitions_by_names in direct sql [hive]

2024-03-13 Thread via GitHub


wecharyu commented on code in PR #5063:
URL: https://github.com/apache/hive/pull/5063#discussion_r1524128589


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java:
##
@@ -1027,15 +1021,42 @@ public  List getPartitionFieldsViaSqlFilter(
 }
   }
 
+
+  /** Should be called with the list short enough to not trip up Oracle/etc. */
+  private List getPartitionsFromPartitionNames(String catName, 
String dbName,
+  String tblName, Boolean isView, List partNameList, List 
projectionFields,
+  boolean isAcidTable, GetPartitionsArgs args) throws MetaException {
+// Get most of the fields for the partNames provided.
+// Assume db and table names are the same for all partition, as provided 
in arguments.
+String partNames = partNameList.stream()
+.map(name -> "'" + name + "'")
+.collect(Collectors.joining(","));
+String queryText =

Review Comment:
   Yes, this query is familiar with following: 
   
https://github.com/wecharyu/hive/blob/a348e5d86adc611e4f72d386815a21a96b667ab4/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/DirectSqlUpdatePart.java#L448-L455



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28062: Optimize get_partitions_by_names in direct sql [hive]

2024-03-13 Thread via GitHub


deniskuzZ commented on code in PR #5063:
URL: https://github.com/apache/hive/pull/5063#discussion_r1523142031


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java:
##
@@ -1027,15 +1021,42 @@ public  List getPartitionFieldsViaSqlFilter(
 }
   }
 
+
+  /** Should be called with the list short enough to not trip up Oracle/etc. */
+  private List getPartitionsFromPartitionNames(String catName, 
String dbName,
+  String tblName, Boolean isView, List partNameList, List 
projectionFields,
+  boolean isAcidTable, GetPartitionsArgs args) throws MetaException {
+// Get most of the fields for the partNames provided.
+// Assume db and table names are the same for all partition, as provided 
in arguments.
+String partNames = partNameList.stream()

Review Comment:
   check `TxnUtils.buildQueryWithINClause`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28062: Optimize get_partitions_by_names in direct sql [hive]

2024-03-13 Thread via GitHub


deniskuzZ commented on code in PR #5063:
URL: https://github.com/apache/hive/pull/5063#discussion_r1523123151


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java:
##
@@ -742,14 +742,8 @@ public List getPartitionsViaPartNames(final 
String catName, final Str
 return Batchable.runBatched(batchSize, partNames, new Batchable() {
   @Override
   public List run(List input) throws MetaException {
-String filter = "" + PARTITIONS + ".\"PART_NAME\" in (" + 
makeParams(input.size()) + ")";
-List partitionIds = getPartitionIdsViaSqlFilter(catName, dbName, 
tblName,
-filter, input, Collections.emptyList(), null);
-if (partitionIds.isEmpty()) {
-  return Collections.emptyList(); // no partitions, bail early.
-}
-return getPartitionsFromPartitionIds(catName, dbName, tblName, null,
-partitionIds, Collections.emptyList(), false, args);
+return getPartitionsFromPartitionNames(catName, dbName, tblName, null,

Review Comment:
   maybe `getPartitionsByNames`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28062: Optimize get_partitions_by_names in direct sql [hive]

2024-03-13 Thread via GitHub


deniskuzZ commented on code in PR #5063:
URL: https://github.com/apache/hive/pull/5063#discussion_r1523113059


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java:
##
@@ -7516,14 +7516,14 @@ public List get_partitions_by_names(final 
String dbName, final String
 Table table = null;
 Exception ex = null;
 boolean success = false;
-startTableFunction("get_partitions_by_names", parsedCatName, parsedDbName,
-tblName);
+startTableFunction("get_partitions_by_names", parsedCatName, parsedDbName, 
tblName);
 try {
   getMS().openTransaction();
   authorizeTableForPartitionMetadata(parsedCatName, parsedDbName, tblName);
 
   fireReadTablePreEvent(parsedCatName, parsedDbName, tblName);
 
+  checkLimitNumberOfPartitions(tblName, args.getPartNames().size(), -1);

Review Comment:
   this checks limit regardless of name filter, isn't it?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28062: Optimize get_partitions_by_names in direct sql [hive]

2024-03-13 Thread via GitHub


deniskuzZ commented on code in PR #5063:
URL: https://github.com/apache/hive/pull/5063#discussion_r1523105974


##
ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java:
##
@@ -3199,19 +3199,24 @@ public Map, Partition> 
loadDynamicPartitions(final LoadTable
   
partitionNames.add(Warehouse.makeDynamicPartNameNoTrailingSeperator(details.fullSpec));
 }
   }
-  List partitions = Hive.get().getPartitionsByNames(tbl, 
partitionNames);
-  for(Partition partition : partitions) {
-LOG.debug("HMS partition spec: {}", partition.getSpec());
-partitionDetailsMap.entrySet().parallelStream()
-.filter(entry -> 
entry.getValue().fullSpec.equals(partition.getSpec()))
-.findAny().ifPresent(entry -> {
-  entry.getValue().partition = partition;
-  entry.getValue().hasOldPartition = true;
-});
-  }
-  // no need to fetch partition again in tasks since we have already 
fetched partitions
-  // info in getPartitionsByNames()
-  fetchPartitionInfo = false;
+  try {
+List partitions = Hive.get().getPartitionsByNames(tbl, 
partitionNames);
+for(Partition partition : partitions) {
+  LOG.debug("HMS partition spec: {}", partition.getSpec());
+  partitionDetailsMap.entrySet().parallelStream()
+  .filter(entry -> 
entry.getValue().fullSpec.equals(partition.getSpec()))
+  .findAny().ifPresent(entry -> {
+entry.getValue().partition = partition;
+entry.getValue().hasOldPartition = true;
+  });
+}
+// no need to fetch partition again in tasks since we have already 
fetched partitions
+// info in getPartitionsByNames()
+fetchPartitionInfo = false;
+  } catch (HiveException e) {
+// Failed to fetch partitions in some cases (e.g., partition number 
limit)
+LOG.warn("Fetching partitions by names failed.", e);

Review Comment:
   i don't see how this change is relevant to the PR, also we should propagate 
the exception back to the client



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28062: Optimize get_partitions_by_names in direct sql [hive]

2024-03-11 Thread via GitHub


zhangbutao commented on PR #5063:
URL: https://github.com/apache/hive/pull/5063#issuecomment-1990334911

   + @dengzhhu653 @deniskuzZ @ayushtkn @saihemanth-cloudera may be interested 
in this improvement. :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28062: Optimize get_partitions_by_names in direct sql [hive]

2024-03-11 Thread via GitHub


zhangbutao commented on code in PR #5063:
URL: https://github.com/apache/hive/pull/5063#discussion_r1520802015


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java:
##
@@ -742,14 +742,8 @@ public List getPartitionsViaPartNames(final 
String catName, final Str
 return Batchable.runBatched(batchSize, partNames, new Batchable() {
   @Override
   public List run(List input) throws MetaException {
-String filter = "" + PARTITIONS + ".\"PART_NAME\" in (" + 
makeParams(input.size()) + ")";
-List partitionIds = getPartitionIdsViaSqlFilter(catName, dbName, 
tblName,

Review Comment:
   > Skipping get partition ids in MetaStoreDirectSql#getPartitionsViaPartNames 
to reduce the Database calls.
   
   Seems that you merge the two sql(getPartitionIdsViaSqlFilter and 
`getPartitionsFromPartitionIds`) into one sql, so that reduce the db calls.
   I think your change maybe reasonable, but i also think there have other 
reasons which can prove why we use two sql to do this before. If you give any 
clues that can explain the history code, that's will be good. :)



##
ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java:
##
@@ -3199,19 +3199,24 @@ public Map, Partition> 
loadDynamicPartitions(final LoadTable
   
partitionNames.add(Warehouse.makeDynamicPartNameNoTrailingSeperator(details.fullSpec));
 }
   }
-  List partitions = Hive.get().getPartitionsByNames(tbl, 
partitionNames);
-  for(Partition partition : partitions) {
-LOG.debug("HMS partition spec: {}", partition.getSpec());
-partitionDetailsMap.entrySet().parallelStream()
-.filter(entry -> 
entry.getValue().fullSpec.equals(partition.getSpec()))
-.findAny().ifPresent(entry -> {
-  entry.getValue().partition = partition;
-  entry.getValue().hasOldPartition = true;
-});
-  }
-  // no need to fetch partition again in tasks since we have already 
fetched partitions
-  // info in getPartitionsByNames()
-  fetchPartitionInfo = false;
+  try {
+List partitions = Hive.get().getPartitionsByNames(tbl, 
partitionNames);
+for(Partition partition : partitions) {
+  LOG.debug("HMS partition spec: {}", partition.getSpec());
+  partitionDetailsMap.entrySet().parallelStream()
+  .filter(entry -> 
entry.getValue().fullSpec.equals(partition.getSpec()))
+  .findAny().ifPresent(entry -> {
+entry.getValue().partition = partition;
+entry.getValue().hasOldPartition = true;
+  });
+}
+// no need to fetch partition again in tasks since we have already 
fetched partitions
+// info in getPartitionsByNames()
+fetchPartitionInfo = false;
+  } catch (HiveException e) {
+// Failed to fetch partitions in some cases (e.g., partition number 
limit)
+LOG.warn("Fetching partitions by names failed.", e);

Review Comment:
   IMO, this exception should be thrown directly, as we need the engines(HS2 & 
Spark) to know that the `load_partition` is failed so that they can do some 
rollback based the exception, or the users know that they should control the 
limit size by setting conf.
   
   BTW, i think we had better not to change this client code (i know you are 
trying to fix the failed tests). The PR should not have any effect on client. 
Maybe we can try to fix this in sever side to minimize the potential 
incompatibilities?
   



##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java:
##
@@ -1027,15 +1021,42 @@ public  List getPartitionFieldsViaSqlFilter(
 }
   }
 
+
+  /** Should be called with the list short enough to not trip up Oracle/etc. */
+  private List getPartitionsFromPartitionNames(String catName, 
String dbName,
+  String tblName, Boolean isView, List partNameList, List 
projectionFields,
+  boolean isAcidTable, GetPartitionsArgs args) throws MetaException {
+// Get most of the fields for the partNames provided.
+// Assume db and table names are the same for all partition, as provided 
in arguments.
+String partNames = partNameList.stream()
+.map(name -> "'" + name + "'")
+.collect(Collectors.joining(","));
+String queryText =

Review Comment:
   Can we ensure that other backend db like oracle & pg can use the sql without 
incompatibility?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional 

Re: [PR] HIVE-28062: Optimize get_partitions_by_names in direct sql [hive]

2024-03-11 Thread via GitHub


wecharyu commented on PR #5063:
URL: https://github.com/apache/hive/pull/5063#issuecomment-1988071530

   @zhangbutao could you take a look again?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28062: Optimize get_partitions_by_names in direct sql [hive]

2024-02-21 Thread via GitHub


sonarcloud[bot] commented on PR #5063:
URL: https://github.com/apache/hive/pull/5063#issuecomment-1956334432

   ## [![Quality Gate 
Passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/qg-passed-20px.png
 'Quality Gate 
Passed')](https://sonarcloud.io/dashboard?id=apache_hive=5063) 
**Quality Gate passed**  
   Issues  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/passed-16px.png
 '') [27 New 
issues](https://sonarcloud.io/project/issues?id=apache_hive=5063=false=true)
   
   Measures  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/passed-16px.png
 '') [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=5063=false=true)
  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/no-data-16px.png
 '') No data about Coverage  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/no-data-16px.png
 '') No data about Duplication  
 
   [See analysis details on 
SonarCloud](https://sonarcloud.io/dashboard?id=apache_hive=5063)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28062: Optimize get_partitions_by_names in direct sql [hive]

2024-02-20 Thread via GitHub


wecharyu commented on PR #5063:
URL: https://github.com/apache/hive/pull/5063#issuecomment-1955783839

   > The failed tests seem to be related.
   
http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-5063/1/tests
   
   @zhangbutao The issue arises due to the introduction of a partition number 
check in the `getPartitionsByNames()` API, fixed it in the client side.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28062: Optimize get_partitions_by_names in direct sql [hive]

2024-02-20 Thread via GitHub


sonarcloud[bot] commented on PR #5063:
URL: https://github.com/apache/hive/pull/5063#issuecomment-1954669668

   ## [![Quality Gate 
Passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/qg-passed-20px.png
 'Quality Gate 
Passed')](https://sonarcloud.io/dashboard?id=apache_hive=5063) 
**Quality Gate passed**  
   Issues  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/passed-16px.png
 '') [27 New 
issues](https://sonarcloud.io/project/issues?id=apache_hive=5063=false=true)
   
   Measures  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/passed-16px.png
 '') [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=5063=false=true)
  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/no-data-16px.png
 '') No data about Coverage  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/no-data-16px.png
 '') No data about Duplication  
 
   [See analysis details on 
SonarCloud](https://sonarcloud.io/dashboard?id=apache_hive=5063)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28062: Optimize get_partitions_by_names in direct sql [hive]

2024-02-19 Thread via GitHub


zhangbutao commented on PR #5063:
URL: https://github.com/apache/hive/pull/5063#issuecomment-1952762647

   The failed tests seem to be related.
   
http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-5063/1/tests


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28062: Optimize get_partitions_by_names in direct sql [hive]

2024-02-06 Thread via GitHub


sonarcloud[bot] commented on PR #5063:
URL: https://github.com/apache/hive/pull/5063#issuecomment-1929054561

   ## [![Quality Gate 
Passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/qg-passed-20px.png
 'Quality Gate 
Passed')](https://sonarcloud.io/dashboard?id=apache_hive=5063) 
**Quality Gate passed**  
   The SonarCloud Quality Gate passed, but some issues were introduced.
   
   [19 New 
issues](https://sonarcloud.io/project/issues?id=apache_hive=5063=false=true)
  
   [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=5063=false=true)
  
   No data about Coverage  
   No data about Duplication  
 
   [See analysis details on 
SonarCloud](https://sonarcloud.io/dashboard?id=apache_hive=5063)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org