roczei commented on PR #39595:
URL: https://github.com/apache/spark/pull/39595#issuecomment-1406970882
Hi @czxm,
Thanks for this fix! I tested it with Cloudera Spark 3.3.0 where the Hive
version is 3.1.3000 and it includes this Hive improvement:
Support hive.metastore.limit.partition.request for get_partitions_ps:
HIVE-23556
Test case 1 (default setting: spark.sql.hasCustomPartitionLocations=true)
Created a hive-site.xml file with the following content:
```
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
<property>
<name>hive.metastore.limit.partition.request</name>
<value>3</value>
</property>
</configuration>
./spark-shell --conf spark.sql.catalogImplementation=hive --jars
./hive-site.xml
```
```
spark.sql("create table student53 (student_name string) using parquet
partitioned by (father_name string, percentage float, section string)")
spark.sql("insert into student53 values('dikshant','ashok', 95, 'A')")
spark.sql("insert into student53 values('raj','ramesh', 90, 'B')")
spark.sql("insert into student53 values('alok','ajit', 85, 'C')")
spark.sql("insert into student53 values('rajendra','naveen', 70, 'D')")
spark.sql("create table student63 (student_name string) using parquet
partitioned by (father_name string, percentage float, section string)")
spark.sql("insert into student63 values('dikshant','ashok', 95, 'A')")
spark.sql("insert into student63 values('raj','ramesh', 90, 'B')")
spark.sql("insert into student63 values('alok','ajit', 85, 'C')")
spark.sql("insert into student63 values('rajendra','naveen', 70, 'D')")
```
Query 1)
```
spark.sql("insert overwrite table student63 partition (section, father_name,
percentage) select * from student53 where section='A'")
```
failed with this error (this is expected):
```
Caused by: org.apache.hadoop.hive.metastore.api.MetaException: Number of
partitions scanned (=4) on table 'student31' exceeds limit (=3). This is
controlled on the metastore server by metastore.limit.partition.request.
```
Query 2)
```
spark.sql("insert overwrite table student63 select * from student53 where
section='C'")
```
failed with this error (this is expected):
```
Caused by: org.apache.hadoop.hive.metastore.api.MetaException: Number of
partitions scanned (=4) on table 'student31' exceeds limit (=3). This is
controlled on the metastore server by metastore.limit.partition.request.
```
The number of partitions are 4 and it tries to do a full partition scan,
therefore we have reached the hive.metastore.limit.partition.request limit
which is currently 3.
Test case 2)
Added this extra parameter to the spark-shell what you have introduced in
this pull request:
```
--conf spark.sql.hasCustomPartitionLocations=false
```
```
$ ./spark-shell --conf spark.sql.catalogImplementation=hive --jars
./hive-site.xml --conf spark.sql.hasCustomPartitionLocations=false
```
Results:
```
scala> spark.sql("insert overwrite table student63 partition (section,
father_name, percentage) select * from student53 where section='A'")
res10: org.apache.spark.sql.DataFrame = []
scala> spark.sql("insert overwrite table student63 select * from student53
where section='C'")
res11: org.apache.spark.sql.DataFrame = []
scala>
```
As you can see it works for me and it resolved the above Hive limit problem.
Before we involve the Spark committers, please create a unit test which
includes this new configuration parameter:
`spark.sql.hasCustomPartitionLocations=false`. For example I would add it to
this test file:
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveSparkSubmitSuite.scala
Related document:
https://spark.apache.org/developer-tools.html#running-individual-tests
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]