Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/11012#issuecomment-178793683
Quoting from the JIRA ticket:
> In our particular case, this behavior manifests since the
currentTaskAttemptId() method is returning -1 for each Spark receiver task.
This in and of itself could be a bug and is something I'm going to look into.
I think this is definitely a bug. I believe that the intent here was that
we'd return a dummy `taskAttemptId` on the driver but that any code running in
a task should have a valid `TaskContext` thread local and thus a valid task
attempt id. `TaskContext` isn't an inheritable thread-local, though, so we'll
have to explicitly propagate it from the top-level task thread to the receiver
threads in order to address this.
Even if we did fix the `TaskContext` propagation issue, the fix in this
patch would still be necessary because we'd still have to be properly
thread-safe in case a multi-threaded receiver was storing blocks.
Intuitively, the idea of adding extra synchronization here seems right to
me, although I'd like to take a closer look at the changes here to see whether
this will introduce performance problems: my guess is that the
under-synchronization might have been caused by a desire to avoid holding
monitors/locks during expensive operations.
@zsxwing, do you know why the streaming longevity / memory leak tests
didn't catch this leak?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]