Re: [PR] [SPARK-46687][PYTHON][CONNECT] Basic support of SparkSession-based memory profiler [spark]

via GitHub Fri, 26 Jan 2024 16:22:06 -0800


ueshin commented on code in PR #44775:
URL: https://github.com/apache/spark/pull/44775#discussion_r1468255186



##########
python/pyspark/profiler.py:
##########
@@ -196,16 +197,41 @@ def add(
             for subcode in filter(inspect.iscode, code.co_consts):
                 self.add(subcode, toplevel_code=toplevel_code)
 
+    class CodeMapForUDFV2(CodeMap):
+        def add(
+            self,
+            code: Any,
+            toplevel_code: Optional[Any] = None,
+        ) -> None:
+            if code in self:
+                return
+
+            if toplevel_code is None:
+                toplevel_code = code
+                filename = code.co_filename
+                self._toplevel.append((filename, code))
+                self[code] = {}
+            else:
+                self[code] = self[toplevel_code]
+            for subcode in filter(inspect.iscode, code.co_consts):
+                self.add(subcode, toplevel_code=toplevel_code)
+
+        def items(self) -> Iterator[Tuple[str, Iterator[Tuple[int, Any]]]]:
+            """Iterate on the toplevel code blocks."""
+            for filename, code in self._toplevel:
+                measures = self[code]
+                if not measures:
+                    continue  # skip if no measurement
+                linenos = range(min(measures), max(measures) + 1)

Review Comment:
   We may want to delay to generate the full `linenos` until showing the 
results to reduce the intermediate data?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-46687][PYTHON][CONNECT] Basic support of SparkSession-based memory profiler [spark]

Reply via email to