GitHub user rxin opened a pull request:
https://github.com/apache/spark/pull/1259
[SPARK-2317] Improve task logging.
We use TID to indicate task logging. However, TID itself does not capture
stage or retries, making it harder to correlate with the application itself.
This pull request changes all logging messages for tasks to include both the
TID and the stage id, stage attempt, task id, and task attempt. I've consulted
various people but unfortunately this is a really hard task.
Driver log looks like:
```
14/06/28 18:53:29 INFO DAGScheduler: Submitting 10 missing tasks from Stage
0 (MappedRDD[1] at map at <console>:13)
14/06/28 18:53:29 INFO TaskSchedulerImpl: Adding task set 0.0 with 10 tasks
14/06/28 18:53:29 INFO TaskSetManager: Re-computing pending task lists.
14/06/28 18:53:29 INFO TaskSetManager: Starting task 0.0:0.0 (TID 0,
localhost, PROCESS_LOCAL, 1203 bytes, 5 ms)
14/06/28 18:53:29 INFO TaskSetManager: Starting task 0.0:1.0 (TID 1,
localhost, PROCESS_LOCAL, 1203 bytes, 3 ms)
14/06/28 18:53:29 INFO TaskSetManager: Starting task 0.0:2.0 (TID 2,
localhost, PROCESS_LOCAL, 1203 bytes, 3 ms)
14/06/28 18:53:29 INFO TaskSetManager: Starting task 0.0:3.0 (TID 3,
localhost, PROCESS_LOCAL, 1203 bytes, 3 ms)
14/06/28 18:53:29 INFO TaskSetManager: Starting task 0.0:4.0 (TID 4,
localhost, PROCESS_LOCAL, 1203 bytes, 3 ms)
14/06/28 18:53:29 INFO TaskSetManager: Starting task 0.0:5.0 (TID 5,
localhost, PROCESS_LOCAL, 1203 bytes, 3 ms)
14/06/28 18:53:29 INFO TaskSetManager: Starting task 0.0:6.0 (TID 6,
localhost, PROCESS_LOCAL, 1203 bytes, 3 ms)
14/06/28 18:53:29 INFO TaskSetManager: Starting task 0.0:7.0 (TID 7,
localhost, PROCESS_LOCAL, 1203 bytes, 3 ms)
14/06/28 18:53:29 INFO TaskSetManager: Starting task 0.0:8.0 (TID 8,
localhost, PROCESS_LOCAL, 1203 bytes, 5 ms)
14/06/28 18:53:29 INFO TaskSetManager: Finished 0.0:7.0 (TID 7) in 50 ms on
localhost (1/10)
14/06/28 18:53:29 INFO TaskSetManager: Starting task 0.0:9.0 (TID 9,
localhost, PROCESS_LOCAL, 1203 bytes, 4 ms)
14/06/28 18:53:29 INFO Executor: Running task 0.0:9.0 (TID 9)
14/06/28 18:53:29 INFO TaskSetManager: Finished 0.0:5.0 (TID 5) in 68 ms on
localhost (2/10)
14/06/28 18:53:29 INFO TaskSetManager: Finished 0.0:6.0 (TID 6) in 70 ms on
localhost (3/10)
14/06/28 18:53:29 INFO Executor: Finished task 0.0:9.0 (TID 9). 517 bytes
result sent to driver
14/06/28 18:53:29 INFO TaskSetManager: Finished 0.0:4.0 (TID 4) in 80 ms on
localhost (4/10)
14/06/28 18:53:29 INFO TaskSetManager: Finished 0.0:3.0 (TID 3) in 85 ms on
localhost (5/10)
14/06/28 18:53:29 INFO TaskSetManager: Finished 0.0:2.0 (TID 2) in 91 ms on
localhost (6/10)
14/06/28 18:53:29 INFO TaskSetManager: Finished 0.0:1.0 (TID 1) in 98 ms on
localhost (7/10)
14/06/28 18:53:29 INFO TaskSetManager: Finished 0.0:0.0 (TID 0) in 109 ms
on localhost (8/10)
14/06/28 18:53:29 INFO TaskSetManager: Finished 0.0:8.0 (TID 8) in 39 ms on
localhost (9/10)
14/06/28 18:53:29 INFO TaskSetManager: Finished 0.0:9.0 (TID 9) in 30 ms on
localhost (10/10)
```
Executor log looks like
```
14/06/28 18:53:29 INFO Executor: Running task 0.0:0.0 (TID 0)
14/06/28 18:53:29 INFO Executor: Running task 0.0:1.0 (TID 1)
14/06/28 18:53:29 INFO Executor: Running task 0.0:5.0 (TID 5)
14/06/28 18:53:29 INFO Executor: Running task 0.0:3.0 (TID 3)
14/06/28 18:53:29 INFO Executor: Running task 0.0:2.0 (TID 2)
14/06/28 18:53:29 INFO Executor: Running task 0.0:6.0 (TID 6)
14/06/28 18:53:29 INFO Executor: Running task 0.0:4.0 (TID 4)
14/06/28 18:53:29 INFO Executor: Running task 0.0:7.0 (TID 7)
14/06/28 18:53:29 INFO Executor: Finished task 0.0:0.0 (TID 0). 517 bytes
result sent to driver
14/06/28 18:53:29 INFO Executor: Finished task 0.0:3.0 (TID 3). 517 bytes
result sent to driver
14/06/28 18:53:29 INFO Executor: Finished task 0.0:5.0 (TID 5). 517 bytes
result sent to driver
14/06/28 18:53:29 INFO Executor: Finished task 0.0:4.0 (TID 4). 517 bytes
result sent to driver
14/06/28 18:53:29 INFO Executor: Finished task 0.0:1.0 (TID 1). 517 bytes
result sent to driver
14/06/28 18:53:29 INFO Executor: Finished task 0.0:7.0 (TID 7). 517 bytes
result sent to driver
14/06/28 18:53:29 INFO Executor: Finished task 0.0:2.0 (TID 2). 517 bytes
result sent to driver
14/06/28 18:53:29 INFO Executor: Finished task 0.0:6.0 (TID 6). 517 bytes
result sent to driver
14/06/28 18:53:29 INFO Executor: Running task 0.0:8.0 (TID 8)
14/06/28 18:53:29 INFO Executor: Finished task 0.0:8.0 (TID 8). 517 bytes
result sent to driver
```
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/rxin/spark betterTaskLogging
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/1259.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1259
----
commit 53888e3e97747c8a43a86fb3b219e672992619f6
Author: Reynold Xin <[email protected]>
Date: 2014-06-29T01:54:33Z
[SPARK-2317] Improve task logging.
commit 4659b2008ab2e0e710a19d8cbd93004168a44188
Author: Reynold Xin <[email protected]>
Date: 2014-06-29T01:57:09Z
Remove unused variable.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---