Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22207#discussion_r212409340
  
    --- Diff: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceRDD.scala
 ---
    @@ -77,44 +77,6 @@ private[kafka010] class KafkaSourceRDD(
         offsetRanges.zipWithIndex.map { case (o, i) => new 
KafkaSourceRDDPartition(i, o) }.toArray
       }
     
    -  override def count(): Long = offsetRanges.map(_.size).sum
    --- End diff --
    
    These methods are never used as Dataset always uses this RDD: 
https://github.com/apache/spark/blob/2a0a8f753bbdc8c251f8e699c0808f35b94cfd20/sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala#L113
 and `MapPartitionsRDD` just calls the default RDD implementation. In addition, 
they may return wrong answers when `failOnDataLoss=false`. Hence, I just 
removed them.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to