[GitHub] [spark] zero323 commented on a change in pull request #34273: [SPARK-36997][PYTHON][TESTS] Run mypy tests against ml, sql, streaming and core examples

GitBox Wed, 13 Oct 2021 04:07:16 -0700


zero323 commented on a change in pull request #34273:
URL: https://github.com/apache/spark/pull/34273#discussion_r727948460




##########
File path: examples/src/main/python/streaming/network_wordjoinsentiments.py
##########
@@ -50,10 +51,17 @@ def print_happiest_words(rdd):
     sc = SparkContext(appName="PythonStreamingNetworkWordJoinSentiments")
     ssc = StreamingContext(sc, 5)
 
+    def line_to_tuple(line: str) -> Tuple[str, str]:
+        try:
+            k, v = line.split(" ")
+            return k, v
+        except ValueError:
+            return "", ""
+
     # Read in the word-sentiment list and create a static RDD from it
     word_sentiments_file_path = "data/streaming/AFINN-111.txt"
     word_sentiments = ssc.sparkContext.textFile(word_sentiments_file_path) \
-        .map(lambda line: tuple(line.split("\t")))
+        .map(line_to_tuple)

Review comment:
       I am really not sure how to handle this.
   
   ```python
   .map(lambda line: tuple(line.split("\t")))
   ```
   returns `RDD[Tuple[str, ...]]`, which will cause cascade of errors later.
   
   We could exclude this file from type check, add casts, rewrite code to:
   
   ```python
   .map(lambda line: line.split("\t"))
   .map(lambda xs: (xs[0], xs[1]))
   ````
   
   but this felt like the least intrusive and the cleanest approach.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] zero323 commented on a change in pull request #34273: [SPARK-36997][PYTHON][TESTS] Run mypy tests against ml, sql, streaming and core examples

Reply via email to