I'm new to PySpark and spark streaming. According to the documentation about countByValue <https://spark.apache.org/docs/latest/streaming-programming-guide.html#transformations-on-dstreams > and countByValueAndWindow <https://spark.apache.org/docs/latest/streaming-programming-guide.html#window-operations> :
*countByValue*: When called on a DStream of elements of type K, return a new DStream of (K, Long) pairs where the value of each key is its frequency in each RDD of the source DStream. *countByValueAndWindow*: When called on a DStream of (K, V) pairs, returns a new DStream of (K, Long) pairs where the value of each key is its frequency within a sliding window. Like in reduceByKeyAndWindow, the number of reduce tasks is configurable through an optional argument. So basically the return value for these two functions should be *a list of (K, Long) pairs*, right? However, when I am doing some experiments, the return value is a list of integers, instead of pairs. What's more, in the official test codes on Github for pyspark, which are here <https://github.com/apache/spark/blob/master/python/pyspark/streaming/tests.py#L282> and here <https://github.com/apache/spark/blob/master/python/pyspark/streaming/tests.py#L634> : You can see the "expected results" are *a list of integers*! I thought I misunderstand the documentation somehow until I saw the same test codes on Github for scala, which are here <https://github.com/apache/spark/blob/d83c2f9f0b08d6d5d369d9fae04cdb15448e7f0d/streaming/src/test/scala/org/apache/spark/streaming/BasicOperationsSuite.scala#L159 > and here <https://github.com/apache/spark/blob/d83c2f9f0b08d6d5d369d9fae04cdb15448e7f0d/streaming/src/test/scala/org/apache/spark/streaming/WindowOperationsSuite.scala#L260> . Similar test cases, but the results are *a list of pairs* this time! So in summary, the documentation and scala test cases told us the result are pairs. But the python test cases and my own experiments showed that the result are integers. Could someone help me explain this a little bit? Any helps will be much appreciated! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-streaming-in-python-questions-about-countByValue-and-countByValueAndWindow-tp25584.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org