zero323 commented on a change in pull request #34273:
URL: https://github.com/apache/spark/pull/34273#discussion_r745236503
##########
File path: examples/src/main/python/avro_inputformat.py
##########
@@ -75,7 +75,7 @@
schema_rdd = sc.textFile(sys.argv[2], 1).collect()
conf = {"avro.schema.input.key": reduce(lambda x, y: x + y,
schema_rdd)}
- avro_rdd = sc.newAPIHadoopFile(
+ avro_rdd = sc.newAPIHadoopFile( # type: ignore[var-annotated]
Review comment:
The code is generic, in a sense that we don't know what `avro_rdd`
actually is, other than being `RDD[Tuple[Any, Any]]`. Since we depend on
passing strings and reading schema from file, there is really no way to fill
`Any` here, which makes the whole annotations useless in practice. But
```python
from typing import Any, Tuple
...
from pyspark.rdd import RDD
...
avro_rdd: RDD[Tuple[Any, Any]] = ...
```
should silence this one without ignore.
The real question is probably what we mean by
> verbose and close to the standard.
and what style of coding we want to promote.
There is a lot of value in annotating library code. However, when it comes
to simple scripts like our examples, I wouldn't really expect users to add
annotations, maybe with exception to annotation functions.
Also, there is another consideration here ‒ at least some of these are
included in the examples. If we decide to prefer annotated code instead of
ignores (or annotate things in general), all required type imports will have to
go into rendered examples, to make things copy-pasteable. It is not only quite
verbose and adds a lot of noise to otherwise simple snippets, but also is
related to the problems that we discussed offline.
Finally, some ignores are probably unavoidable, unless we want to ban
certain methods (talking about you, `DataFrame.head`) from examples whatsoever,
because of the API design.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]