susheel-aroskar commented on code in PR #53076:
URL: https://github.com/apache/spark/pull/53076#discussion_r2667058431
##########
python/pyspark/sql/connect/client/core.py:
##########
@@ -607,6 +614,55 @@ def fromProto(cls, pb: pb2.ConfigResponse) ->
"ConfigResult":
)
+def _is_pyspark_source(filename: str) -> bool:
+ """Check if the given filename is from the pyspark package."""
+ return filename.startswith(PYSPARK_ROOT)
+
+
+def _retrieve_stack_frames() -> List[CallSite]:
+ """
+ Return a list of CallSites representing the relevant stack frames in the
callstack.
+ """
+ frames = traceback.extract_stack()
+
+ filtered_stack_frames = []
+ for i, frame in enumerate(frames):
+ filename, lineno, func, _ = frame
+ if _is_pyspark_source(filename):
+ # Do not include PySpark internal frames as they are not user
application code
+ break
+ if i + 1 < len(frames):
+ _, _, func, _ = frames[i + 1]
+ filtered_stack_frames.append(CallSite(function=func, file=filename,
linenum=lineno))
Review Comment:
This is the stack trace filled for
`test_call_stack_trace_captures_correct_calling_context`
```Python
method_name: "level1"
file_name:
"/Users/saroskar/Github/debug-improvement/python/pyspark/sql/tests/connect/client/test_client_call_stack_trace.py"
line_number: 274
method_name: "level2"
file_name:
"/Users/saroskar/Github/debug-improvement/python/pyspark/sql/tests/connect/client/test_client_call_stack_trace.py"
line_number: 272
method_name: "level3"
file_name:
"/Users/saroskar/Github/debug-improvement/python/pyspark/sql/tests/connect/client/test_client_call_stack_trace.py"
line_number: 268
```
So it is recording the name of the function invoked along with the line
number where it is invoked. For example, function `level1()` is invoked on the
line# 274 (the line `req = level1()`).
I believe this format - name of the function called + it's call location in
the callee - is more useful from the point of view of a developer who is trying
to debug the error from the server side logs etc. For example, in most cases
the function of interest being invoked would be some data frame action like
`collect()` or `count()`. There may be multiple of these action calls present
in a single callee function too. So showing which DF action was invoked
(action's name) + exact location in the code where it was invoked will make
things unambiguous IMO. I suspect that's why `first_spark_call` follows similar
logic.
wdyt?
##########
python/pyspark/sql/connect/client/core.py:
##########
@@ -1273,6 +1329,10 @@ def _execute_plan_request_with_metadata(
)
req.operation_id = operation_id
self._update_request_with_user_context_extensions(req)
+
+ call_stack_trace = _build_call_stack_trace()
+ if call_stack_trace:
Review Comment:
Changed, thanks.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]