[GitHub] spark pull request #20726: [SPARK-23574][CORE] Report SinglePartition in Dat...

rdblue Mon, 05 Mar 2018 12:52:38 -0800

Github user rdblue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20726#discussion_r172324207
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2ScanExec.scala
 ---
    @@ -46,34 +48,46 @@ case class DataSourceV2ScanExec(
           new DataSourcePartitioning(
             s.outputPartitioning(), AttributeMap(output.map(a => a -> a.name)))
     
    +    case _ if readerFactories.size == 1 => SinglePartition
    +
         case _ => super.outputPartitioning
       }
     
    -  private lazy val readerFactories: 
java.util.List[DataReaderFactory[UnsafeRow]] = reader match {
    -    case r: SupportsScanUnsafeRow => r.createUnsafeRowReaderFactories()
    +  private lazy val readerFactories: Seq[DataReaderFactory[_]] = reader 
match {
    --- End diff --
    
    I think it is better to have a lazy val throw an exception if it is called 
at the wrong time (if it is private) than to add the cast because the exception 
at least validates that assumptions are met and can throw a reasonable error 
message. The cast might hide the problem, particularly over time as this code 
evolves. It would be reasonable to add another path that returns 
`Seq[DataReaderFactory[_]]` because that's the method contract, even though 
there is an assumption in the callers about how it will behave. As much of the 
contract between methods as possible should be expressed in types.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #20726: [SPARK-23574][CORE] Report SinglePartition in Dat...

Reply via email to