huaxingao commented on a change in pull request #34291:
URL: https://github.com/apache/spark/pull/34291#discussion_r730355437
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRelation.scala
##########
@@ -298,17 +299,22 @@ private[sql] case class JDBCRelation(
requiredColumns: Array[String],
finalSchema: StructType,
filters: Array[Filter],
- groupByColumns: Option[Array[String]]): RDD[Row] = {
+ groupByColumns: Option[Array[String]],
+ limit: Option[Limit]): RDD[Row] = {
+ // If limit is pushed down, only a limited number of rows will be
returned. PartitionInfo will
+ // be ignored and the query will be done in one task.
Review comment:
The reason we only want one task is because of the JDBC partition
implementation. JDBC doesn't have physical partition. For JDBC partition such
as
```
val df = spark.read
.option("partitionColumn", "dept")
.option("lowerBound", "0")
.option("upperBound", "6")
.option("numPartitions", "3")
.table("h2.test.employee")
.limit(6)
```
JDBC will have three parallel queries:
```
SELECT * FROM h2.test.employee WHERE dept <2
SELECT * FROM h2.test.employee WHERE dept >= 2 AND dept <4
SELECT * FROM h2.test.employee WHERE dept >= 4
```
I was initially thinking of evenly divided the number N among the queries,
e.g. for LIMIT(6), I will do
```
SELECT * FROM h2.test.employee WHERE dept <2 LIMIT 2
SELECT * FROM h2.test.employee WHERE dept >= 2 AND dept <4 LIMIT 2
SELECT * FROM h2.test.employee WHERE dept >= 4 LIMIT 2
```
but it doesn't work. If there are 8 rows in the table, but the first
partition only has 1 row, the 2nd partition has 1 row, the 3 partition has 6
row, the above queries return 4 rows, but LIMIT(6) should return 6 rows.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]