squito commented on a change in pull request #23337: [SPARK-26019][PYSPARK]
Allow insecure py4j gateways
URL: https://github.com/apache/spark/pull/23337#discussion_r242335813
##########
File path: python/pyspark/tests.py
##########
@@ -2381,6 +2382,34 @@ def test_startTime(self):
with SparkContext() as sc:
self.assertGreater(sc.startTime, 0)
+ def test_forbid_insecure_gateway(self):
+ # By default, we fail immediately if you try to create a SparkContext
+ # with an insecure gateway
+ gateway = _launch_gateway(insecure=True)
+ with self.assertRaises(Exception) as context:
+ SparkContext(gateway=gateway)
+ self.assertIn("insecure py4j gateway", context.exception.message)
+ self.assertIn("spark.python.allowInsecurePy4j",
context.exception.message)
+ self.assertIn("removed in Spark 3.0", context.exception.message)
+
+ def test_allow_insecure_gateway_with_conf(self):
+ with SparkContext._lock:
+ SparkContext._gateway = None
+ SparkContext._jvm = None
Review comment:
this part of the test really bothers me, so I'd like to explain to
reviewers. Without this, the test passes -- but it passes even without the
changes to the main code! Or rather, it only passes when its run as part of
the entire suite, it would fail when run individually.
What's happening is that `SparkContext._gateway` and `SparkContext._jvm`
don't get reset by most tests (eg., they are not reset in `sc.stop()`), so a
test running before this one will set those variables, and then this test will
end up holding on to a gateway which *does* have the `auth_token` set, and so
the accumulator server would still work.
Now that in itself sounds crazy to me, and seems like a problem for things
like Zeppelin. I tried just adding these two lines into `sc.stop()`, but then
when I ran all the tests, I got a lot of ` java.io.IOException: error=23, Too
many open files in system`. So maybe something else is not getting properly
cleaned up properly in the pyspark tests?
I was hoping somebody else might have some ideas about what is going on or
if there is a better way to do this.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]