siknezevic commented on a change in pull request #27246:
URL: https://github.com/apache/spark/pull/27246#discussion_r444585857
##########
File path:
sql/core/src/test/scala/org/apache/spark/sql/execution/ExternalAppendOnlyUnsafeRowArrayBenchmark.scala
##########
@@ -182,6 +182,47 @@ object ExternalAppendOnlyUnsafeRowArrayBenchmark extends
BenchmarkBase {
}
}
+ def testAgainstUnsafeSorterSpillReader(
+ numSpillThreshold: Int,
+ numRows: Int,
+ numIterators: Int,
+ iterations: Int): Unit = {
+ val rows = testRows(numRows)
+ val benchmark = new Benchmark(s"Spilling SpillReader with $numRows rows",
iterations * numRows,
+ output = output)
+
+ benchmark.addCase("UnsafeSorterSpillReader_bufferSize1024") { _: Int =>
+ val array = UnsafeExternalSorter.create(
+ TaskContext.get().taskMemoryManager(),
+ SparkEnv.get.blockManager,
+ SparkEnv.get.serializerManager,
+ TaskContext.get(),
+ null,
+ null,
+ 1024,
+ SparkEnv.get.memoryManager.pageSizeBytes,
+ numSpillThreshold,
+ false)
+
+ rows.foreach(x =>
+ array.insertRecord(
+ x.getBaseObject,
+ x.getBaseOffset,
+ x.getSizeInBytes,
+ 0,
+ false))
+
+ for (_ <- 0L until numIterators) {
Review comment:
No, there are not the same. Existing benchmarks read number of rows from
the spilled files and iterate over the spilled records. My benchmark simulates
left semi join. In that scenario there is no need to read number of records or
to iterate over the spilled record. That is the reason why we have "lazy"
constructor for UnsafeSorterSpillReader to avoid unnecessary reading of data
from the file. That is main point of this PR.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]