So, finally it turned out that the culprit is in my own code. I was logging 
exception objects that have a signaler context pointing to the socket. This way 
every connection timeout I added the exception to a collection preventing 
unregistering of external resources.

Norbert

Am 11.10.2013 um 15:02 schrieb Norbert Hartl <[email protected]>:

> 
> 
> Am 11.10.2013 um 10:53 schrieb Sven Van Caekenberghe <[email protected]>:
> 
>> 
>> On 11 Oct 2013, at 10:24, Norbert Hartl <[email protected]> wrote:
>> 
>>> I can report that the behavior is different now. There were two new vm 
>>> releases this week in ppa. The first one didn't work but the second changed 
>>> something. My application was never running that long. It is more than a 
>>> day now having an actual external objects table size of 623 which wasn't 
>>> ever reached before. So I would say that there is chance that this 
>>> particular problem is gone. I monitor this further and I think that this 
>>> wasn't the only problem. But then it is another problem.
>> 
>> Yeah, but not knowing your application load, 623, which would be about 200 
>> sockets (3 semaphores per sockets), is still a lot to be active at the same 
>> time. Can you in some way invoke a full GC externally, like using 
>> ZnReadEvalPrintDelegate and see if it eventually drops due to finalization ? 
>> It should, at least that is what I see.
>> 
> Yes, that's what I meant. There is always only one outgoing connection at a 
> time. Every 15 seconds one request is issued. So you see why expect more to 
> find.
> I'm travelling right now and will have a deeper look after being back
> 
> Norbert
>>> Thanks to all of you who've helped solving this. If it comes to the VM 
>>> being the source of problems it is always extra annoying because it is way 
>>> harder to change something there.
>>> 
>>> Norbert
>>> 
>>> 
>>> Am 08.10.2013 um 11:27 schrieb Igor Stasenko <[email protected]>:
>>> 
>>>> 
>>>> 
>>>> 
>>>> On 7 October 2013 18:36, Norbert Hartl <[email protected]> wrote:
>>>> 
>>>> Am 07.10.2013 um 16:36 schrieb Igor Stasenko <[email protected]>:
>>>> 
>>>>> 1 thing.
>>>>> 
>>>>> can you tell me what given expression yields for your VM/image:
>>>>> 
>>>>> Smalltalk vm maxExternalSemaphores
>>>>> 
>>>>> (if it gives you number less than 10000000 then i think i know what is 
>>>>> your problem :)
>>>> It is 10000000
>>>> 
>>>> What would be the problem if it would be smaller?
>>>> 
>>>> 
>>>> that just means your VM don't have external object size cap.
>>>> I changed the implementation to not have hard limit (the arbitrary large 
>>>> number
>>>> is there just to be "compatible" with previous implementation).
>>>> 
>>>> This means, that you can actually change in your image the check and 
>>>> completely ignore limits 
>>>> and just keep growing if it necessary. 
>>>> 
>>>> Now, since you using VM which don't have a limit, but problem still 
>>>> persists,
>>>> it seems like it somewhere else.. :/ 
>>>>> i just found that after one merge, my changes get lost
>>>>> we're just plugged them back in, and it should be back again with newer 
>>>>> VMs..
>>>>> but the problem could be more than just semaphores.. if merge broken 
>>>>> this, it may break 
>>>>> many other things, so we need time to check
>>>> I try to look at it some more time. I'm using the pharo-vm from the 
>>>> launchpad build. Are the changes supposed to be in this one?
>>>> 
>>>> Norbert
>>>> 
>>>> Launchpad? You mean ppa? I can't say i remember all the details how 
>>>> changes to VM source
>>>> gets into ppa distro, and how fast they get there. @Damien, can you 
>>>> enlighten us?
>>>> 
>>>> 
>>>> Well, the VM which i downloaded recently using zero-conf script, having 
>>>> limit back to 256. Just some merge mistake, which now is fixed.. means 
>>>> that couple builds will use limit-based implementation.. but then 
>>>> it will be back to my implementaiton.
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> On 7 October 2013 12:31, Norbert Hartl <[email protected]> wrote:
>>>>> 
>>>>> Am 07.10.2013 um 11:28 schrieb Henrik Johansen 
>>>>> <[email protected]>:
>>>>> 
>>>>>> 
>>>>>> On Oct 7, 2013, at 11:16 , Norbert Hartl <[email protected]> wrote:
>>>>>> 
>>>>>>> As I need an image that runs longer than 24 hours I'm looking at some 
>>>>>>> stuff and wonder. Can anybody explain me the rationale for a code like 
>>>>>>> this
>>>>>>> 
>>>>>>> maxExternalSemaphores: aSize 
>>>>>>>   "This method should never be called as result of normal program
>>>>>>>   execution. If it is however, handle it differently:
>>>>>>>   - In development, signal an error to promt user to set a bigger size
>>>>>>>   at startup immediately.
>>>>>>>   - In production, accept the cost of potentially unhandled interrupts,
>>>>>>>   but log the action for later review.
>>>>>>> 
>>>>>>>   See comment in maxExternalObjectsSilently: why this behaviour is
>>>>>>>   desirable, "
>>>>>>>   "Can't find a place where development/production is decided.
>>>>>>>   Suggest Smalltalk image inProduction, but use an overridable temp
>>>>>>>   meanwhile. "
>>>>>>>   | inProduction |
>>>>>>>   self maxExternalSemaphores
>>>>>>>       ifNil: [^ 0].
>>>>>>>   inProduction := false.
>>>>>>>   ^ inProduction
>>>>>>>       ifTrue: [self maxExternalSemaphoresSilently: aSize.
>>>>>>>           self crTrace: 'WARNING: Had to increase size of semaphore 
>>>>>>> signal handling table due to many external objects concurrently in use';
>>>>>>>                crTrace: 'You should increase this size at startup using 
>>>>>>> #maxExternalObjectsSilently:';
>>>>>>>                crTrace: 'Current table size: ' , self 
>>>>>>> maxExternalSemaphores printString]
>>>>>>>       ifFalse: ["Smalltalk image"
>>>>>>>           self error: 'Not enough space for external objects, set a 
>>>>>>> larger size at startup!'
>>>>>>>           "Smalltalk image"]
>>>>>>> 
>>>>>>> I have reported this once but got no feedback so I like to have a few 
>>>>>>> opinions.
>>>>>>> 
>>>>>>> The report is here: https://pharo.fogbugz.com/f/cases/10839/
>>>>>>> 
>>>>>>> Norbert
>>>>>> 
>>>>>> The rationale is that inProduction would be some global setting, not yet 
>>>>>> in place when the code was written…
>>>>>> Excessive simultaneous Semaphore usage is something that should be 
>>>>>> caught during development, in which case it's better to get an active 
>>>>>> notification, than having it logged somewhere.
>>>>> 
>>>>> Agreed. But didn't work in my case because it needed roughly 20 hours and 
>>>>> an instable remote backend to trigger the problem. And somehow I forgot 
>>>>> to install my logger as Transcript so there is no warning message. I saw 
>>>>> only dead images in the morning. 
>>>>> This not satisfactory but on the other hand this type of problems are 
>>>>> hard to solve anyway. My feeling tells me there is more to discover. 
>>>>> Sockets resources get unregistered at finalization time but this didn't 
>>>>> work either. I would have said that the unlikely situation that no 
>>>>> garbage collection ran could be the case. But it can't because in 
>>>>> ExternalSemaphoreTable>>#freedSlotsIn:ratherThanIncreaseSizeTo: there is 
>>>>> explicit garbage collection. 
>>>>> 
>>>>>> If I've understood correctly, it's moot on newer Pharo VM's, where 
>>>>>> there's no limit on the semtable size, but for legacy code a startup 
>>>>>> item setting size using maxExternalObjectsSilently: (as suggested in the 
>>>>>> Warning text), is still a more proper fix than setting inProduction to 
>>>>>> true and crossing your fingers hoping no signals will be lost during 
>>>>>> table growth.
>>>>> 
>>>>> Ah, I didn't know about the risk of loosing signals while resizing the 
>>>>> table. Thanks for that. Don't get me wrong I wasn't proposing to set 
>>>>> inProduction in effect. I don't think that automatically growing resource 
>>>>> management is a proper way to design a system. There is always a range of 
>>>>> resources you need for your use case. Not setting an upper bound for this 
>>>>> just covers leaking behavior.
>>>>> 
>>>>> Norbert
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> -- 
>>>>> Best regards,
>>>>> Igor Stasenko.
>>>> 
>>>> 
>>>> 
>>>> 
>>>> -- 
>>>> Best regards,
>>>> Igor Stasenko.
>> 
>> 
> 


Reply via email to