Github user ericl commented on the pull request:
https://github.com/apache/spark/pull/12248#issuecomment-207581308
@srowen, suppose you have a existing service running Spark jobs that read
from a custom datasource. You want to add log4j trace annotations in order to
attribute datasource logs back to the original caller of the service. However
you want to avoid invasive changes to the existing code. This is a two-line
change with the proposed API.
```
// in RPC server running as driver
def receive(request: RPC) {
sc.setLocalProperty("traceId", request.traceId) // add this line
...
}
// in datasource library running on executors
def handleRead(...) {
log4j.MDC.put("traceId", TaskContext.getLocalProperty("traceId")) //
add this line
...
}
```
The alternative is to explicitly reference `traceId` in each of the tasks,
but this would clutter application code with many references to diagnostics
info, discouraging the use of diagnostic tools.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]