[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

rdblue Wed, 08 Aug 2018 08:58:40 -0700

Github user rdblue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22009#discussion_r208638600
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2ScanExec.scala
 ---
    @@ -93,21 +81,17 @@ case class DataSourceV2ScanExec(
             sparkContext,
             sqlContext.conf.continuousStreamingExecutorQueueSize,
             sqlContext.conf.continuousStreamingExecutorPollIntervalMs,
    -        partitions).asInstanceOf[RDD[InternalRow]]
    -
    -    case r: SupportsScanColumnarBatch if r.enableBatchRead() =>
    -      new DataSourceRDD(sparkContext, 
batchPartitions).asInstanceOf[RDD[InternalRow]]
    +        partitions,
    +        schema,
    +        
partitionReaderFactory.asInstanceOf[ContinuousPartitionReaderFactory])
    --- End diff --
    
    However you want to do it is fine with me, but I've seen excessive casting 
in the SQL back-end so I'm against adding it when it isn't necessary, like this 
case.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

Reply via email to