zero323 commented on a change in pull request #34892:
URL: https://github.com/apache/spark/pull/34892#discussion_r770905121
##########
File path: python/pyspark/context.py
##########
@@ -288,9 +287,11 @@ def _do_init(
# Create a single Accumulator in Java that we'll send all our updates
through;
# they will be passed back to us through a TCP server
+ assert self._gateway is not None
Review comment:
> Why are the asserts needed, out of curiosity
Long story short ‒ `_gateway`, `_jvm`, etc. are class variables that might
have or might have not been initialized
https://github.com/apache/spark/blob/6e45b04db48008fa033b09df983d3bd1c4f790ea/python/pyspark/context.py#L155-L158
so all are defined `Optional[_]`.
On runtime we blindly assume that we deal with active context and these are
set, but that's not clear for the type checker ‒ these assertions serve as
`mypy` hints "hey, I know this stuff is not `None` here".
There are other ways of handling this (`casts`, `ignores`, we even discussed
helper methods, but this seems to be the least intrusive approach and the
closest to our actual expectations).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]