[GitHub] [spark] dchvn commented on a change in pull request #34238: [SPARK-36969][PYTHON] Inline type hints for SparkContext

GitBox Mon, 11 Oct 2021 00:27:39 -0700


dchvn commented on a change in pull request #34238:
URL: https://github.com/apache/spark/pull/34238#discussion_r725847097




##########
File path: python/pyspark/conf.py
##########
@@ -195,21 +205,21 @@ def get(self, key, defaultValue=None):
             else:
                 return self._conf.get(key, defaultValue)
 
-    def getAll(self):
+    def getAll(self) -> List[Tuple[str, str]]:
         """Get all values as a list of key-value pairs."""
         if self._jconf is not None:
             return [(elem._1(), elem._2()) for elem in self._jconf.getAll()]
         else:
-            return self._conf.items()
+            return [(k, v) for k, v in self._conf.items()]

Review comment:
       convert to ```List[Tuple[str, str]]```

##########
File path: python/pyspark/context.py
##########
@@ -285,24 +340,27 @@ def _do_init(self, master, appName, sparkHome, pyFiles, 
environment, batchSize,
             dump_path = self._conf.get("spark.python.profile.dump", None)
             self.profiler_collector = ProfilerCollector(profiler_cls, 
dump_path)
         else:
-            self.profiler_collector = None
+            self.profiler_collector = None  # type: ignore[assignment]
 
         # create a signal handler which would be invoked on receiving SIGINT
-        def signal_handler(signal, frame):
+        def signal_handler(signal: Any, frame: Any) -> None:

Review comment:
       I am not sure for this method!

##########
File path: python/pyspark/context.py
##########
@@ -118,20 +134,36 @@ class SparkContext(object):
     ValueError: ...
     """
 
-    _gateway = None
-    _jvm = None
+    _gateway = None  # type: JavaGateway
+    _jvm = None  # type: JavaObject
     _next_accum_id = 0
-    _active_spark_context = None
+    _active_spark_context = None  # type: SparkContext
     _lock = RLock()
-    _python_includes = None  # zip and egg files that need to be added to 
PYTHONPATH
-
-    PACKAGE_EXTENSIONS = ('.zip', '.egg', '.jar')
-
-    def __init__(self, master=None, appName=None, sparkHome=None, pyFiles=None,
-                 environment=None, batchSize=0, serializer=PickleSerializer(), 
conf=None,
-                 gateway=None, jsc=None, profiler_cls=BasicProfiler):
-        if (conf is None or
-                conf.get("spark.executor.allowSparkContext", "false").lower() 
!= "true"):
+    # zip and egg files that need to be added to PYTHONPATH
+    _python_includes: List[str] = None  # type: ignore[assignment]
+
+    PACKAGE_EXTENSIONS: Iterable[str] = ('.zip', '.egg', '.jar')
+
+    def __init__(
+        self,
+        master: Optional[str] = None,
+        appName: Optional[str] = None,
+        sparkHome: Optional[str] = None,
+        pyFiles: Optional[List[str]] = None,
+        environment: Optional[Dict[str, str]] = None,
+        batchSize: int = 0,
+        serializer: Serializer = PickleSerializer(),
+        conf: Optional[SparkConf] = None,
+        gateway: Optional[JavaGateway] = None,
+        jsc: Optional[JavaObject] = None,
+        profiler_cls: type = BasicProfiler,
+    ) -> None:
+        if (
+            conf is None
+            or conf.get(
+                "spark.executor.allowSparkContext", "false"
+            ).lower() != "true"  # type: ignore[union-attr]

Review comment:
       Should we check ```conf.get(key, ...) != None``` ?

##########
File path: python/pyspark/context.py
##########
@@ -540,13 +614,13 @@ def parallelize(self, c, numSlices=None):
             size = len(c)
             if size == 0:
                 return self.parallelize([], numSlices)
-            step = c[1] - c[0] if size > 1 else 1
-            start0 = c[0]
+            step = c[1] - c[0] if size > 1 else 1  # type: ignore[index]

Review comment:
       ```Iterable``` is not indexable

##########
File path: python/pyspark/context.py
##########
@@ -390,17 +458,17 @@ def getOrCreate(cls, conf=None):
         with SparkContext._lock:
             if SparkContext._active_spark_context is None:
                 SparkContext(conf=conf or SparkConf())
-            return SparkContext._active_spark_context
+            return SparkContext._active_spark_context  # type: 
ignore[return-value]

Review comment:
       do not need ignore here

##########
File path: python/pyspark/context.py
##########
@@ -562,19 +636,28 @@ def f(split, iterator):
         # Make sure we distribute data evenly if it's smaller than 
self.batchSize
         if "__len__" not in dir(c):
             c = list(c)    # Make it a list so we can compute its length
-        batchSize = max(1, min(len(c) // numSlices, self._batchSize or 1024))
+        batchSize = max(
+            1,
+            min(len(c) // numSlices, self._batchSize or 1024)  # type: 
ignore[arg-type]

Review comment:
       ```Iterable``` does not have ```len```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] dchvn commented on a change in pull request #34238: [SPARK-36969][PYTHON] Inline type hints for SparkContext

Reply via email to