Github user zsxwing commented on a diff in the pull request:
https://github.com/apache/spark/pull/22207#discussion_r212409340
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceRDD.scala
---
@@ -77,44 +77,6 @@ private[kafka010] class KafkaSourceRDD(
offsetRanges.zipWithIndex.map { case (o, i) => new
KafkaSourceRDDPartition(i, o) }.toArray
}
- override def count(): Long = offsetRanges.map(_.size).sum
--- End diff --
These methods are never used as Dataset always uses this RDD:
https://github.com/apache/spark/blob/2a0a8f753bbdc8c251f8e699c0808f35b94cfd20/sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala#L113
and `MapPartitionsRDD` just calls the default RDD implementation. In addition,
they may return wrong answers when `failOnDataLoss=false`. Hence, I just
removed them.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]