Github user danny0405 commented on the issue:
https://github.com/apache/storm/pull/2618
@agresch
Yeah , the problem still exists but i have no time to fix it now, i will
try to take it.
---
Github user agresch commented on the issue:
https://github.com/apache/storm/pull/2618
Couple of comments back to @revans2 from Apr5.
1) We don't delete the blobs on the nimbus side for a while after we kill
the topology. - Would we also prevent the user from doing so
Github user agresch commented on the issue:
https://github.com/apache/storm/pull/2618
Just curious what the plan is fo this PR?
---
Github user revans2 commented on the issue:
https://github.com/apache/storm/pull/2618
@danny0405 Sorry about the long delay. I also got rather busy with other
things.
My personal choice would be a combination of 1 and 2. We have run into an
issue internally where very
Github user danny0405 commented on the issue:
https://github.com/apache/storm/pull/2618
@revans2
Sorry to leave for a long time cause i'm on a training course. Do you have
any good idea how can we fix or promote this?
---
Github user danny0405 commented on the issue:
https://github.com/apache/storm/pull/2618
I am inclined to choose option 3 because:
1. We now have already made an RPC request for killing/starting-worker from
master to supervisors as soon as we the event happens on master. So the
Github user revans2 commented on the issue:
https://github.com/apache/storm/pull/2618
OK good I do understand the problem.
There really are a few ways that I see we can make the stack trace much
less likely to come out in the common case. The following are in my preferred
Github user danny0405 commented on the issue:
https://github.com/apache/storm/pull/2618
@danny0405
I agree with you that the race condition is between nimbus deleting the
blobs and the supervisor fully processing the topology being killed.
But i still think we should
Github user revans2 commented on the issue:
https://github.com/apache/storm/pull/2618
@danny0405
I just created #2622 to fix the race condition in AsyncLocalizer. It does
conflict a lot with this patch, so I wanted to make sure you saw it and had a
chance to give feedback
Github user danny0405 commented on the issue:
https://github.com/apache/storm/pull/2618
@revans2
To me the race condition has none of the business of AsyncLocalize#
`requestDownloadBaseTopologyBlobs `, it's the race condition on
AsyncLocalize#`topologyBlobs` of timer task
Github user revans2 commented on the issue:
https://github.com/apache/storm/pull/2618
Just FYI I files STORM-3020 to address the race that I just found.
---
Github user revans2 commented on the issue:
https://github.com/apache/storm/pull/2618
@danny0405
{{updateBlobs}} does not need to be guarded by a lock. This is what I was
talking about with the code being complex.
{{requestDownloadBaseTopologyBlobs}} is protected
Github user danny0405 commented on the issue:
https://github.com/apache/storm/pull/2618
@revans2
I did this path for the concurrent race condition on
`AsyncLocalize#topologyBlobs` of func: `updateBlobs` and `releaseSlotFor`,
`AsyncLocalize#topologyBlobs` overdue keys will be
Github user revans2 commented on the issue:
https://github.com/apache/storm/pull/2618
I am trying to understand the reasons behind this change. Is this jira
just to remove an exception that shows up in the logs? Or is that exception
actually causing a problem?
The reason I
14 matches
Mail list logo