Am 07.10.2013 um 14:16 schrieb Henrik Johansen <[email protected]>:

> 
> On Oct 6, 2013, at 12:51 , Norbert Hartl <[email protected]> wrote:
> 
>> I took some time to analyze my current problem with external semaphores. I 
>> was just reluctant to raise the limit in my image because I want the problem 
>> solved. I logged the management of external semaphores and discovered that 
>> the table fills if a connection times out. 
>> 
>> The problem turns out to be in 
>> 
>> Socket>>#connectTo: hostAddress port: port waitForConnectionFor: timeout 
>>      "Initiate a connection to the given port at the given host 
>>      address. Waits until the connection is established or time outs."
>>      self connectNonBlockingTo: hostAddress port: port.
>>      self
>>              waitForConnectionFor: timeout
>>              ifTimedOut: [ConnectionTimedOut signal: 'Cannot connect to '
>>                                      , (NetNameResolver stringFromAddress: 
>> hostAddress) , ':' , port asString]
>> 
>> When a socket is created three external semaphores are registered in the 
>> ExternalSemaphoreTable. If a connection times out the exception is thrown 
>> but the Socket still has his resources attached. 
>> 
>> So e.g. in 
>> 
>> SocketStream class>>#openConnectionToHost: hostIP port: portNumber timeout: 
>> timeout
>>      | socket |
>>      socket := Socket new.
>>      socket connectTo: hostIP port: portNumber waitForConnectionFor: timeout.
>>      ^self on: socket
>> 
>> it holds locally a socket (with semaphores registered) but on exception time 
>> the reference to the socket gets lost and the semaphores stay registered. 
>> The only way to unregister is on finalization time but I think it should 
>> work better. So I would add a destroy before the exception is raised.
>> 
>> Socket>>#connectTo: hostAddress port: port waitForConnectionFor: timeout 
>>      "Initiate a connection to the given port at the given host 
>>      address. Waits until the connection is established or time outs."
>>      self connectNonBlockingTo: hostAddress port: port.
>>      self
>>              waitForConnectionFor: timeout
>>              ifTimedOut: [
>>                      self destroy.
>>                      ConnectionTimedOut signal: 'Cannot connect to '
>>                                      , (NetNameResolver stringFromAddress: 
>> hostAddress) , ':' , port asString]
>> 
>> I opened a ticket at https://pharo.fogbugz.com/f/cases/11797 but I'm not 
>> sure how I am supposed to provide fixes made against a pharo2.0 image. 
>> Probably I should fix this againt 3.0 but then I'm still a 2.0 user :) 
>> 
>> Norbert
> Destroying the socket manually there would only be an optimization, right? 

Yes. Even if it is not necessary I think freeing resources at the earliest 
possible time is a good thing. The problem with my change is that I have this 
gut feeling it is not a good one. At least I didn't see my image hang with this 
change applied. 

> It's still cleaned eventually by a finalization action?

Yes, that is what makes me wonder.  Why do you say "eventually"? What would be 
the conditions for finalize not being called on an object at garbage collection 
time?

I log some details and print the external objects size into a lock file. The 
time of the last hangs the last printed lines where about an external objects 
size between 230 and 240. I had one test where I interacted with the image and 
did a manual garbage collect but the external object size was much higher than 
expected. This is all not very detailed but you get confused very easily 
hunting problems that are hard to reproduce. 

So I don't have a glue what it is. Need more time to look into it. But I'm 
running out of ideas what it could be. I'm not even sure it is the external 
semaphores directly.

Norbert

> Finalization all of a sudden stopping to work is an unlikely culprit of 
> running out of external object space, load above what the size can handle, or 
> actual leak(s) are much more probable. 
> 
> Thankfully, there are a limited number of places leaks could occur, for 
> Socket you have improper uses of acceptFrom:/initialize: (multiple sends to 
> the same socket).
> 
> In a stock image, there's an obscure way that might occur afaict, if class 
> methods acceptFrom:/newXXX are called, and the primitive calls in the new 
> instance's acceptFrom:/initialize: returns a handle with status InvalidSocket 
> instead of nil.
> Then, the external objects will not be unregistered before repeatWithGCIf: 
> clears them by calling them again.
> (No idea if this can ever really happen though...)
> 
> To flush out if such leaks are the cause, it might be useful to adjust 
> initialize:/acceptFrom: with helpful logging/errors, as per
> https://pharo.fogbugz.com/f/cases/5465/Prevent-accidental-semaphore-leaks-when-using-Socket-acceptFrom
> 
> There's a few other users of registerExternalObject:,
> - InputEventFetcher >> #startUp (Used to leak when saving but not quitting 
> due to missing symmetric unregister in #shutDown, should be fixed in 2.0 )
> -  AsynchFile (none uses this anyways right, so didn't bother checking)
> - NetNameResolver >> initializeNetwork - (This potentially leaked if network 
> could ever be in an uninitialized state 2x in a session, but should be fixed 
> in 2.0)
> 
> As long as both the above are fixed in your image, checking if there are 
> non-stock users of registerExternalObject:/initialize:/acceptFrom: is 
> probably the best bet to find trouble spots without waiting for another crash.
> 
> Cheers,
> Henry

Reply via email to