I execute it as follows: $SPARK_HOME/bin/spark-submit --master <master url> --class org.apache.spark.examples.streaming.HdfsWordCount target/scala-2.10/spark_stream_examples-assembly-1.0.jar <hdfsdir>
After I start the job, I add a new test file in hdfsdir. It is a large text file which I will not be able to copy here. But it probably has at least 100 distinct words. But the streaming output has only about 5-6 words along with their counts as follows. I then stop the job after some time. Time ... (word1, cnt1) (word2, cnt2) (word3, cnt3) (word4, cnt4) (word5, cnt5) Time ... Time ... -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/HdfsWordCount-only-counts-some-of-the-words-tp14929p14967.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org