[GitHub] storm issue #2669: [STORM-3055] remove conext connection cache

2018-05-31 Thread revans2
Github user revans2 commented on the issue:

https://github.com/apache/storm/pull/2669
  
Actually we just hit this in production so thanks again @pczb for finding 
and fixing this.


---


[GitHub] storm issue #2669: [STORM-3055] remove conext connection cache

2018-05-17 Thread pczb
Github user pczb commented on the issue:

https://github.com/apache/storm/pull/2669
  
@srdo @revans2 thanks for your review, i just squash all commit in one


---


[GitHub] storm issue #2669: [STORM-3055] remove conext connection cache

2018-05-17 Thread revans2
Github user revans2 commented on the issue:

https://github.com/apache/storm/pull/2669
  
Once you make the corresponding changes on #2661 I would be happy to merge 
them both in.


---


[GitHub] storm issue #2669: [STORM-3055] remove conext connection cache

2018-05-16 Thread srdo
Github user srdo commented on the issue:

https://github.com/apache/storm/pull/2669
  
LGTM, thanks for addressing comments


---


[GitHub] storm issue #2669: [STORM-3055] remove conext connection cache

2018-05-15 Thread pczb
Github user pczb commented on the issue:

https://github.com/apache/storm/pull/2669
  
@srdo add doc specification, i was wonder if we need remove synchronized 
when bind port
@revans2  just as srdo says the connection will be closed by the worker 
before worker shutdown.
i will squash all commit after review


---


[GitHub] storm issue #2669: [STORM-3055] remove conext connection cache

2018-05-15 Thread srdo
Github user srdo commented on the issue:

https://github.com/apache/storm/pull/2669
  
As far as I can tell all connections are still being shut down with this 
change. The connections cached by WorkerState are closed during worker shutdown 
in 
https://github.com/apache/storm/blob/14b0b4fc5e0945456769fd58a3595188e3dea234/storm-client/src/jvm/org/apache/storm/daemon/worker/Worker.java#L449,
 and the context is shut down a few lines further down. 

The WorkerState.refreshConnections method also makes sure to never create a 
connection for a NodeInfo that is already present, so I don't think we're 
leaking connections there that would need to be picked up by context.term. 

Would like to see the cleanup @revans2 mentioned, as well as a change in 
the IContext docs so it's specified that the created connections are new.


---


[GitHub] storm issue #2669: [STORM-3055] remove conext connection cache

2018-05-14 Thread revans2
Github user revans2 commented on the issue:

https://github.com/apache/storm/pull/2669
  
The original code added this so when shutting down the context we could be 
sure that all ongoing connections were also terminated as cleanly as possible.  
After this change that is only true for the server side and not the client side.

What are the ramifications of that?  I think it was just defensive 
programming so it is probably fine to make this change from that perspective, 
but I would to make sure that my understanding of the problem matches yours.

From the comments in STORM-3055 the error appears to be triggered when a 
supervisor is shut down, wiped clean (so it gets a new supervisor id), and then 
brought back up.  At that point nimbus schedules the worker on the same 
node/slot as before, but with a new supervisor ID.  This confuses the 
connection caching because when updating the connections it gets a list of 
connections to shut down and a separate list of connections to create.  The new 
connections are created, but in this case a new one is not created because we 
already have it open.  Then the old unneeded connections are torn down, but in 
this case the connection is needed.

Looking at the javadocs for IContext. It looks like the caching really does 
violate the spec, but it is a bit of a gray area.


https://github.com/apache/storm/blob/53f38bc31f2fd315a520ba6b86c0a60be08381cc/storm-client/src/jvm/org/apache/storm/messaging/IContext.java#L48-L57

I am fine with removing the caching like this JIRA does, but I do want to 
see the code cleaned up, because without the caching there is a lot of extra 
code that can be removed.


---


[GitHub] storm issue #2669: [STORM-3055] remove conext connection cache

2018-05-11 Thread pczb
Github user pczb commented on the issue:

https://github.com/apache/storm/pull/2669
  
@revans2 could you help review this, i see your pull request #2436 is 
related to connection cache . and the travis fail is not related to this pr.


---