Github user jose-torres commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20726#discussion_r172322719
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2ScanExec.scala
 ---
    @@ -46,34 +48,46 @@ case class DataSourceV2ScanExec(
           new DataSourcePartitioning(
             s.outputPartitioning(), AttributeMap(output.map(a => a -> a.name)))
     
    +    case _ if readerFactories.size == 1 => SinglePartition
    +
         case _ => super.outputPartitioning
       }
     
    -  private lazy val readerFactories: 
java.util.List[DataReaderFactory[UnsafeRow]] = reader match {
    -    case r: SupportsScanUnsafeRow => r.createUnsafeRowReaderFactories()
    +  private lazy val readerFactories: Seq[DataReaderFactory[_]] = reader 
match {
    --- End diff --
    
    We could, but then both usages of `readerFactories` (along with any future 
ones) would have to add a `case r: SupportsScanColumnarBatch if 
r.enableBatchRead()` check to figure out which val they're supposed to use. I'm 
not a huge fan of lazy vals that will throw errors if anyone tries to 
instantiate them.
    
    Both ways seem equally unclean to me, so I'm fine with switching if you 
think separating them is better.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to