Github user jose-torres commented on a diff in the pull request:
https://github.com/apache/spark/pull/20726#discussion_r172322719
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2ScanExec.scala
---
@@ -46,34 +48,46 @@ case class DataSourceV2ScanExec(
new DataSourcePartitioning(
s.outputPartitioning(), AttributeMap(output.map(a => a -> a.name)))
+ case _ if readerFactories.size == 1 => SinglePartition
+
case _ => super.outputPartitioning
}
- private lazy val readerFactories:
java.util.List[DataReaderFactory[UnsafeRow]] = reader match {
- case r: SupportsScanUnsafeRow => r.createUnsafeRowReaderFactories()
+ private lazy val readerFactories: Seq[DataReaderFactory[_]] = reader
match {
--- End diff --
We could, but then both usages of `readerFactories` (along with any future
ones) would have to add a `case r: SupportsScanColumnarBatch if
r.enableBatchRead()` check to figure out which val they're supposed to use. I'm
not a huge fan of lazy vals that will throw errors if anyone tries to
instantiate them.
Both ways seem equally unclean to me, so I'm fine with switching if you
think separating them is better.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]