thanks for reply~~ I had solved the problem and found the reason, because I used the Master node to upload files to hdfs, this action may take up a lot of Master's network resources. When I changed to use another computer none of the cluster to upload these files, it got the correct result.
QingFeng -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/streaming-on-hdfs-can-detected-all-new-file-but-the-sum-of-all-the-rdd-count-not-equals-which-had-ded-tp5572p5635.html Sent from the Apache Spark User List mailing list archive at Nabble.com.