HyukjinKwon commented on pull request #33874:
URL: https://github.com/apache/spark/pull/33874#issuecomment-914962534


   Just adding these fields into `status.py` should be enough. e.g.)
   
   ```diff
   diff --git a/python/pyspark/status.py b/python/pyspark/status.py
   index a6fa7dd3144..f342ee38a2d 100644
   --- a/python/pyspark/status.py
   +++ b/python/pyspark/status.py
   @@ -28,7 +28,7 @@ class SparkJobInfo(namedtuple("SparkJobInfo", "jobId 
stageIds status")):
   
    class SparkStageInfo(namedtuple("SparkStageInfo",
                                    "stageId currentAttemptId name numTasks 
numActiveTasks "
   -                                "numCompletedTasks numFailedTasks")):
   +                                "numCompletedTasks numFailedTasks 
inputBytes inputRecords")):
        """
        Exposes information about Spark Stages.
        """
   diff --git a/python/pyspark/status.pyi b/python/pyspark/status.pyi
   index 0558e245f49..8ea885693bb 100644
   --- a/python/pyspark/status.pyi
   +++ b/python/pyspark/status.pyi
   @@ -32,6 +32,8 @@ class SparkStageInfo(NamedTuple):
        numActiveTasks: int
        numCompletedTasks: int
        numFailedTasks: int
   +    inputBytes: int
   +    inputRecords: int
   
    class StatusTracker:
        def __init__(self, jtracker: JavaObject) -> None: ...
   diff --git a/python/pyspark/tests/test_context.py 
b/python/pyspark/tests/test_context.py
   index 4611d038f96..2c28fbabcc8 100644
   --- a/python/pyspark/tests/test_context.py
   +++ b/python/pyspark/tests/test_context.py
   @@ -239,6 +239,8 @@ class ContextTests(unittest.TestCase):
                self.assertEqual(1, len(job.stageIds))
                stage = tracker.getStageInfo(job.stageIds[0])
                self.assertEqual(rdd.getNumPartitions(), stage.numTasks)
   +            self.assertGreater(0, stage.inputBytes)
   +            self.assertEqual(10, stage.inputRecords)
   
                sc.cancelAllJobs()
                t.join()
   ```
   
   BTW, please keep the Github PR template as is 
(https://github.com/apache/spark/blob/master/.github/PULL_REQUEST_TEMPLATE), 
and describe, at "Does this PR introduce any user-facing change?",  which 
interface it adds with an example preferably.
   
   Also, please add a test at `StatusTrackerSuite`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to