[GitHub] [spark] zero323 commented on a change in pull request #34273: [SPARK-36997][PYTHON][TESTS] Run mypy tests against ml, sql, streaming and core examples

GitBox Mon, 08 Nov 2021 18:16:44 -0800


zero323 commented on a change in pull request #34273:
URL: https://github.com/apache/spark/pull/34273#discussion_r745236503




##########
File path: examples/src/main/python/avro_inputformat.py
##########
@@ -75,7 +75,7 @@
         schema_rdd = sc.textFile(sys.argv[2], 1).collect()
         conf = {"avro.schema.input.key": reduce(lambda x, y: x + y, 
schema_rdd)}
 
-    avro_rdd = sc.newAPIHadoopFile(
+    avro_rdd = sc.newAPIHadoopFile(  # type: ignore[var-annotated]

Review comment:
       The code is generic, in a sense that we don't know what `avro_rdd` 
actually is, other than being `RDD[Tuple[Any, Any]]`. Since we depend on 
passing strings and reading schema from file, there is really no way to fill 
`Any` here, which makes the whole annotations useless in practice. But 
   
   
   ```python
   from typing import Any, Tuple
   ...
   from pyspark.rdd import RDD
   ...
   avro_rdd: RDD[Tuple[Any, Any]] = ...
   ```
   should silence this one without ignore.
   
   The real question is probably what we mean by 
   
   > verbose and close to the standard.
   
   and what style of coding we want to promote. 
   
   There is a lot of value in annotating library code. However, when it comes 
to simple scripts like our examples, I wouldn't really expect users to add 
annotations, maybe with exception to annotation functions.
   
   Also, there is another consideration here ‒ at least some of these are 
included in the examples. If we decide to prefer annotated code instead of 
ignores (or annotate things in general), all required type imports will have to 
go into rendered examples, to make things copy-pasteable. It is not only quite 
verbose and adds a lot of noise to otherwise simple snippets, but also is 
related to the problems that we discussed offline.
   
   Finally, some ignores are probably unavoidable, unless we want to ban 
certain methods (talking about you, `DataFrame.head`) from examples whatsoever, 
 because of the API design.
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] zero323 commented on a change in pull request #34273: [SPARK-36997][PYTHON][TESTS] Run mypy tests against ml, sql, streaming and core examples

Reply via email to