zero323 commented on a change in pull request #29122:
URL: https://github.com/apache/spark/pull/29122#discussion_r456756157
##########
File path: python/pyspark/sql/functions.py
##########
@@ -2392,7 +2393,7 @@ def json_tuple(col, *fields):
@since(2.1)
-def from_json(col, schema, options={}):
+def from_json(col, schema, options: Dict = None):
Review comment:
Pre 3.6 wasn't an issue anyway.. In practice, given the size of the
project and overall complexity of annotations, stubs seem to be a somewhat
better choice anyway (avoid cyclic imports or certain `Generic` pitfalls for
starters, not being stuck in legacy Python support). This means that potential
migration might be relatively easy.
My biggest concern is that over four years I haven't seen any active Spark
contributors willing to help with maintenance (and that's despite gentle
pings). Additionally annotation ecosystem is still evolving ‒ so having a
project that can iterate faster than annotated codebase is not a bad thing.
The biggest advantage of maintaining annotations alongside actual codebase
is that it requires conscious choices of signatures ‒ not every type of
signature can be annotated in a meaningful way and some can result in exploding
overloads. But it doesn't really remove the maintenance overhead ‒ for example
changes in the ML API, mostly invisible for the final user, wreaked havoc, as
it is the most vulnerable part of code (a lot of generics there).
Anyway... That's just my 2 cents. If you reanimate the discussion. please
ping me ‒ I'd like to see how it goes. TIA
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]