pan3793 commented on code in PR #56092:
URL: https://github.com/apache/spark/pull/56092#discussion_r3299615400
##########
connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala:
##########
@@ -789,6 +802,27 @@ private[v2] trait V2JDBCTest
checkSamplePushed(df8, false)
checkFilterPushed(df8)
assert(df8.collect().length < 10)
+
+ // SYSTEM sampling pushdown
+ if (supportsTableSampleSystem) {
+ val df9 = sql(s"SELECT * FROM $catalogName.new_table $tableOptions "
+
+ "TABLESAMPLE SYSTEM (50 PERCENT)")
+ checkSamplePushed(df9)
+ if (partitioningEnabled) {
+ multiplePartitionAdditionalCheck(df1, partitionInfo)
+ }
+ assert(df6.collect().length <= 10)
Review Comment:
fix the copy-paste issue - wrong variable reference.
> With PG TABLESAMPLE SYSTEM on a 10-row, single-block table at 50%, the
result will commonly be all 10 rows or 0 rows, so consider asserting something
stronger than `<= 10`
it's true, but I think it's about PG implementation details, not something
in the contract - it has no guarantee that a few rows will be stored in a
single physical block, so I keep the `<= 10` assertion.
##########
connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala:
##########
@@ -789,6 +802,27 @@ private[v2] trait V2JDBCTest
checkSamplePushed(df8, false)
checkFilterPushed(df8)
assert(df8.collect().length < 10)
+
+ // SYSTEM sampling pushdown
+ if (supportsTableSampleSystem) {
+ val df9 = sql(s"SELECT * FROM $catalogName.new_table $tableOptions "
+
+ "TABLESAMPLE SYSTEM (50 PERCENT)")
+ checkSamplePushed(df9)
+ if (partitioningEnabled) {
+ multiplePartitionAdditionalCheck(df1, partitionInfo)
+ }
+ assert(df6.collect().length <= 10)
Review Comment:
fixed the copy-paste issue - wrong variable reference.
> With PG TABLESAMPLE SYSTEM on a 10-row, single-block table at 50%, the
result will commonly be all 10 rows or 0 rows, so consider asserting something
stronger than `<= 10`
it's true, but I think it's about PG implementation details, not something
in the contract - it has no guarantee that a few rows will be stored in a
single physical block, so I keep the `<= 10` assertion.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]