siknezevic commented on a change in pull request #27246:
URL: https://github.com/apache/spark/pull/27246#discussion_r444585857



##########
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/ExternalAppendOnlyUnsafeRowArrayBenchmark.scala
##########
@@ -182,6 +182,47 @@ object ExternalAppendOnlyUnsafeRowArrayBenchmark extends 
BenchmarkBase {
     }
   }
 
+  def testAgainstUnsafeSorterSpillReader(
+      numSpillThreshold: Int,
+      numRows: Int,
+      numIterators: Int,
+      iterations: Int): Unit = {
+    val rows = testRows(numRows)
+    val benchmark = new Benchmark(s"Spilling  SpillReader with $numRows rows", 
iterations * numRows,
+      output = output)
+
+    benchmark.addCase("UnsafeSorterSpillReader_bufferSize1024") { _: Int =>
+      val array = UnsafeExternalSorter.create(
+        TaskContext.get().taskMemoryManager(),
+        SparkEnv.get.blockManager,
+        SparkEnv.get.serializerManager,
+        TaskContext.get(),
+        null,
+        null,
+        1024,
+        SparkEnv.get.memoryManager.pageSizeBytes,
+        numSpillThreshold,
+        false)
+
+      rows.foreach(x =>
+        array.insertRecord(
+          x.getBaseObject,
+          x.getBaseOffset,
+          x.getSizeInBytes,
+          0,
+          false))
+
+      for (_ <- 0L until numIterators) {

Review comment:
       No, there are not the same. Existing benchmarks read number of rows from 
the spilled files and iterate over the spilled records. My benchmark simulates 
left semi join. In that scenario there is no need to read number of records or 
to iterate over the spilled record. That is the reason why we have "lazy" 
constructor for UnsafeSorterSpillReader to avoid unnecessary reading of data 
from the file.  That is main point of this PR.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to