Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21560#discussion_r196607009
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousDataSourceRDD.scala
 ---
    @@ -51,7 +51,7 @@ class ContinuousDataSourceRDD(
         sc: SparkContext,
         dataQueueSize: Int,
         epochPollIntervalMs: Long,
    -    @transient private val readerFactories: Seq[InputPartition[UnsafeRow]])
    +    private val readerFactories: Seq[InputPartition[UnsafeRow]])
    --- End diff --
    
    since all the partitions do no need all the factories, the right thing to 
do is to put partition's factory in the partition object. This is so that the 
all factories are not serialized for all tasks.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to