[GitHub] squito commented on a change in pull request #23337: [SPARK-26019][PYSPARK] Allow insecure py4j gateways

GitBox Mon, 17 Dec 2018 14:12:41 -0800

squito commented on a change in pull request #23337: [SPARK-26019][PYSPARK] 
Allow insecure py4j gateways
URL: https://github.com/apache/spark/pull/23337#discussion_r242335813


 ##########
 File path: python/pyspark/tests.py
 ##########
 @@ -2381,6 +2382,34 @@ def test_startTime(self):
         with SparkContext() as sc:
             self.assertGreater(sc.startTime, 0)
 
+    def test_forbid_insecure_gateway(self):
+        # By default, we fail immediately if you try to create a SparkContext
+        # with an insecure gateway
+        gateway = _launch_gateway(insecure=True)
+        with self.assertRaises(Exception) as context:
+            SparkContext(gateway=gateway)
+        self.assertIn("insecure py4j gateway", context.exception.message)
+        self.assertIn("spark.python.allowInsecurePy4j", 
context.exception.message)
+        self.assertIn("removed in Spark 3.0", context.exception.message)
+
+    def test_allow_insecure_gateway_with_conf(self):
+        with SparkContext._lock:
+            SparkContext._gateway = None
+            SparkContext._jvm = None
 
 Review comment:
   this part of the test really bothers me, so I'd like to explain to 
reviewers.  Without this, the test passes -- but it passes even without the 
changes to the main code!  Or rather, it only passes when its run as part of 
the entire suite, it would fail when run individually.
   
   What's happening is that `SparkContext._gateway` and `SparkContext._jvm` 
don't get reset by most tests (eg., they are not reset in `sc.stop()`), so a 
test running before this one will set those variables, and then this test will 
end up holding on to a gateway which *does* have the `auth_token` set, and so 
the accumulator server would still work.
   
   Now that in itself sounds crazy to me, and seems like a problem for things 
like Zeppelin.  I tried just adding these two lines into `sc.stop()`, but then 
when I ran all the tests, I got a lot of ` java.io.IOException: error=23, Too 
many open files in system`.  So maybe something else is not getting properly 
cleaned up properly in the pyspark tests?
   
   I was hoping somebody else might have some ideas about what is going on or 
if there is a better way to do this.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] squito commented on a change in pull request #23337: [SPARK-26019][PYSPARK] Allow insecure py4j gateways

Reply via email to