Ryan Moore created SPARK-27164: ---------------------------------- Summary: RDD.countApprox on empty RDDs schedules jobs which never complete Key: SPARK-27164 URL: https://issues.apache.org/jira/browse/SPARK-27164 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.4.0, 2.2.3 Environment: macOS, Spark-2.4.0 with Hadoop 2.7 running on Java 11.0.1
Also observed on: macOS, Spark-2.2.3 with Hadoop 2.7 running on Java 1.8.0_151 Reporter: Ryan Moore When calling `countApprox` on an RDD which has no partitions (such as those created by `sparkContext.emptyRDD`) a job is scheduled with 0 stages and 0 tasks. That job appears under the "Active Jobs" in the Spark UI until it is either killed or the Spark context is shut down. {code:java} Using Scala version 2.11.12 (OpenJDK 64-Bit Server VM, Java 11.0.1) Type in expressions to have them evaluated. Type :help for more information. scala> val ints = sc.makeRDD(Seq(1)) ints: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at makeRDD at <console>:24 scala> ints.countApprox(1000) res0: org.apache.spark.partial.PartialResult[org.apache.spark.partial.BoundedDouble] = (final: [1.000, 1.000]) // PartialResult is returned, Scheduled job completed scala> ints.filter(_ => false).countApprox(1000) res1: org.apache.spark.partial.PartialResult[org.apache.spark.partial.BoundedDouble] = (final: [0.000, 0.000]) // PartialResult is returned, Scheduled job completed scala> sc.emptyRDD[Int].countApprox(1000) res5: org.apache.spark.partial.PartialResult[org.apache.spark.partial.BoundedDouble] = (final: [0.000, 0.000]) // PartialResult is returned, Scheduled job is ACTIVE but never completes scala> sc.union(Nil : Seq[org.apache.spark.rdd.RDD[Int]]).countApprox(1000) res16: org.apache.spark.partial.PartialResult[org.apache.spark.partial.BoundedDouble] = (final: [0.000, 0.000]) // PartialResult is returned, Scheduled job is ACTIVE but never completes {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org