Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/20726#discussion_r172324207
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2ScanExec.scala
---
@@ -46,34 +48,46 @@ case class DataSourceV2ScanExec(
new DataSourcePartitioning(
s.outputPartitioning(), AttributeMap(output.map(a => a -> a.name)))
+ case _ if readerFactories.size == 1 => SinglePartition
+
case _ => super.outputPartitioning
}
- private lazy val readerFactories:
java.util.List[DataReaderFactory[UnsafeRow]] = reader match {
- case r: SupportsScanUnsafeRow => r.createUnsafeRowReaderFactories()
+ private lazy val readerFactories: Seq[DataReaderFactory[_]] = reader
match {
--- End diff --
I think it is better to have a lazy val throw an exception if it is called
at the wrong time (if it is private) than to add the cast because the exception
at least validates that assumptions are met and can throw a reasonable error
message. The cast might hide the problem, particularly over time as this code
evolves. It would be reasonable to add another path that returns
`Seq[DataReaderFactory[_]]` because that's the method contract, even though
there is an assumption in the callers about how it will behave. As much of the
contract between methods as possible should be expressed in types.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]