HyukjinKwon commented on pull request #33874:
URL: https://github.com/apache/spark/pull/33874#issuecomment-914962534
Just adding these fields into `status.py` should be enough. e.g.)
```diff
diff --git a/python/pyspark/status.py b/python/pyspark/status.py
index a6fa7dd3144..f342ee38a2d 100644
--- a/python/pyspark/status.py
+++ b/python/pyspark/status.py
@@ -28,7 +28,7 @@ class SparkJobInfo(namedtuple("SparkJobInfo", "jobId
stageIds status")):
class SparkStageInfo(namedtuple("SparkStageInfo",
"stageId currentAttemptId name numTasks
numActiveTasks "
- "numCompletedTasks numFailedTasks")):
+ "numCompletedTasks numFailedTasks
inputBytes inputRecords")):
"""
Exposes information about Spark Stages.
"""
diff --git a/python/pyspark/status.pyi b/python/pyspark/status.pyi
index 0558e245f49..8ea885693bb 100644
--- a/python/pyspark/status.pyi
+++ b/python/pyspark/status.pyi
@@ -32,6 +32,8 @@ class SparkStageInfo(NamedTuple):
numActiveTasks: int
numCompletedTasks: int
numFailedTasks: int
+ inputBytes: int
+ inputRecords: int
class StatusTracker:
def __init__(self, jtracker: JavaObject) -> None: ...
diff --git a/python/pyspark/tests/test_context.py
b/python/pyspark/tests/test_context.py
index 4611d038f96..2c28fbabcc8 100644
--- a/python/pyspark/tests/test_context.py
+++ b/python/pyspark/tests/test_context.py
@@ -239,6 +239,8 @@ class ContextTests(unittest.TestCase):
self.assertEqual(1, len(job.stageIds))
stage = tracker.getStageInfo(job.stageIds[0])
self.assertEqual(rdd.getNumPartitions(), stage.numTasks)
+ self.assertGreater(0, stage.inputBytes)
+ self.assertEqual(10, stage.inputRecords)
sc.cancelAllJobs()
t.join()
```
BTW, please keep the Github PR template as is
(https://github.com/apache/spark/blob/master/.github/PULL_REQUEST_TEMPLATE),
and describe, at "Does this PR introduce any user-facing change?", which
interface it adds with an example preferably.
Also, please add a test at `StatusTrackerSuite`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]