cloud-fan commented on code in PR #56363:
URL: https://github.com/apache/spark/pull/56363#discussion_r3384696230


##########
sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala:
##########
@@ -125,6 +125,22 @@ class SQLMetricsSuite extends SharedSparkSession with 
SQLMetricsTestUtils
     }
   }
 
+  test("SPARK-57313: Sample numOutputRows metric") {
+    Seq("false", "true").foreach { enableWholeStage =>
+      withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> enableWholeStage) {
+        val df = spark.range(0, 1000, 1, 1)
+          .sample(withReplacement = false, fraction = 0.5, seed = 1)
+        val expectedRows = df.collect().length
+        sparkContext.listenerBus.waitUntilEmpty()
+        val sample = df.queryExecution.executedPlan.collect {
+          case s: SampleExec => s
+        }
+        assert(sample.size == 1)
+        assert(sample.head.metrics("numOutputRows").value == expectedRows)
+      }
+    }
+  }
+

Review Comment:
   Extra blank line:
   ```suggestion
     }
   ```



##########
sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala:
##########
@@ -125,6 +125,22 @@ class SQLMetricsSuite extends SharedSparkSession with 
SQLMetricsTestUtils
     }
   }
 
+  test("SPARK-57313: Sample numOutputRows metric") {
+    Seq("false", "true").foreach { enableWholeStage =>
+      withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> enableWholeStage) {
+        val df = spark.range(0, 1000, 1, 1)
+          .sample(withReplacement = false, fraction = 0.5, seed = 1)

Review Comment:
   This only exercises the Bernoulli branch — consider also covering 
`withReplacement = true` (the PoissonSampler path at 
basicPhysicalOperators.scala:499-503), where each duplicate emit should count. 
The same pattern works: `expectedRows` stays deterministic with the fixed seed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to