Am 11.10.2013 um 10:53 schrieb Sven Van Caekenberghe <[email protected]>:
> > On 11 Oct 2013, at 10:24, Norbert Hartl <[email protected]> wrote: > >> I can report that the behavior is different now. There were two new vm >> releases this week in ppa. The first one didn't work but the second changed >> something. My application was never running that long. It is more than a day >> now having an actual external objects table size of 623 which wasn't ever >> reached before. So I would say that there is chance that this particular >> problem is gone. I monitor this further and I think that this wasn't the >> only problem. But then it is another problem. > > Yeah, but not knowing your application load, 623, which would be about 200 > sockets (3 semaphores per sockets), is still a lot to be active at the same > time. Can you in some way invoke a full GC externally, like using > ZnReadEvalPrintDelegate and see if it eventually drops due to finalization ? > It should, at least that is what I see. > Yes, that's what I meant. There is always only one outgoing connection at a time. Every 15 seconds one request is issued. So you see why expect more to find. I'm travelling right now and will have a deeper look after being back Norbert >> Thanks to all of you who've helped solving this. If it comes to the VM being >> the source of problems it is always extra annoying because it is way harder >> to change something there. >> >> Norbert >> >> >> Am 08.10.2013 um 11:27 schrieb Igor Stasenko <[email protected]>: >> >>> >>> >>> >>> On 7 October 2013 18:36, Norbert Hartl <[email protected]> wrote: >>> >>> Am 07.10.2013 um 16:36 schrieb Igor Stasenko <[email protected]>: >>> >>>> 1 thing. >>>> >>>> can you tell me what given expression yields for your VM/image: >>>> >>>> Smalltalk vm maxExternalSemaphores >>>> >>>> (if it gives you number less than 10000000 then i think i know what is >>>> your problem :) >>> It is 10000000 >>> >>> What would be the problem if it would be smaller? >>> >>> >>> that just means your VM don't have external object size cap. >>> I changed the implementation to not have hard limit (the arbitrary large >>> number >>> is there just to be "compatible" with previous implementation). >>> >>> This means, that you can actually change in your image the check and >>> completely ignore limits >>> and just keep growing if it necessary. >>> >>> Now, since you using VM which don't have a limit, but problem still >>> persists, >>> it seems like it somewhere else.. :/ >>>> i just found that after one merge, my changes get lost >>>> we're just plugged them back in, and it should be back again with newer >>>> VMs.. >>>> but the problem could be more than just semaphores.. if merge broken this, >>>> it may break >>>> many other things, so we need time to check >>> I try to look at it some more time. I'm using the pharo-vm from the >>> launchpad build. Are the changes supposed to be in this one? >>> >>> Norbert >>> >>> Launchpad? You mean ppa? I can't say i remember all the details how changes >>> to VM source >>> gets into ppa distro, and how fast they get there. @Damien, can you >>> enlighten us? >>> >>> >>> Well, the VM which i downloaded recently using zero-conf script, having >>> limit back to 256. Just some merge mistake, which now is fixed.. means that >>> couple builds will use limit-based implementation.. but then >>> it will be back to my implementaiton. >>>> >>>> >>>> >>>> >>>> >>>> On 7 October 2013 12:31, Norbert Hartl <[email protected]> wrote: >>>> >>>> Am 07.10.2013 um 11:28 schrieb Henrik Johansen >>>> <[email protected]>: >>>> >>>>> >>>>> On Oct 7, 2013, at 11:16 , Norbert Hartl <[email protected]> wrote: >>>>> >>>>>> As I need an image that runs longer than 24 hours I'm looking at some >>>>>> stuff and wonder. Can anybody explain me the rationale for a code like >>>>>> this >>>>>> >>>>>> maxExternalSemaphores: aSize >>>>>> "This method should never be called as result of normal program >>>>>> execution. If it is however, handle it differently: >>>>>> - In development, signal an error to promt user to set a bigger size >>>>>> at startup immediately. >>>>>> - In production, accept the cost of potentially unhandled interrupts, >>>>>> but log the action for later review. >>>>>> >>>>>> See comment in maxExternalObjectsSilently: why this behaviour is >>>>>> desirable, " >>>>>> "Can't find a place where development/production is decided. >>>>>> Suggest Smalltalk image inProduction, but use an overridable temp >>>>>> meanwhile. " >>>>>> | inProduction | >>>>>> self maxExternalSemaphores >>>>>> ifNil: [^ 0]. >>>>>> inProduction := false. >>>>>> ^ inProduction >>>>>> ifTrue: [self maxExternalSemaphoresSilently: aSize. >>>>>> self crTrace: 'WARNING: Had to increase size of semaphore >>>>>> signal handling table due to many external objects concurrently in use'; >>>>>> crTrace: 'You should increase this size at startup using >>>>>> #maxExternalObjectsSilently:'; >>>>>> crTrace: 'Current table size: ' , self >>>>>> maxExternalSemaphores printString] >>>>>> ifFalse: ["Smalltalk image" >>>>>> self error: 'Not enough space for external objects, set a >>>>>> larger size at startup!' >>>>>> "Smalltalk image"] >>>>>> >>>>>> I have reported this once but got no feedback so I like to have a few >>>>>> opinions. >>>>>> >>>>>> The report is here: https://pharo.fogbugz.com/f/cases/10839/ >>>>>> >>>>>> Norbert >>>>> >>>>> The rationale is that inProduction would be some global setting, not yet >>>>> in place when the code was written… >>>>> Excessive simultaneous Semaphore usage is something that should be caught >>>>> during development, in which case it's better to get an active >>>>> notification, than having it logged somewhere. >>>> >>>> Agreed. But didn't work in my case because it needed roughly 20 hours and >>>> an instable remote backend to trigger the problem. And somehow I forgot to >>>> install my logger as Transcript so there is no warning message. I saw only >>>> dead images in the morning. >>>> This not satisfactory but on the other hand this type of problems are hard >>>> to solve anyway. My feeling tells me there is more to discover. Sockets >>>> resources get unregistered at finalization time but this didn't work >>>> either. I would have said that the unlikely situation that no garbage >>>> collection ran could be the case. But it can't because in >>>> ExternalSemaphoreTable>>#freedSlotsIn:ratherThanIncreaseSizeTo: there is >>>> explicit garbage collection. >>>> >>>>> If I've understood correctly, it's moot on newer Pharo VM's, where >>>>> there's no limit on the semtable size, but for legacy code a startup item >>>>> setting size using maxExternalObjectsSilently: (as suggested in the >>>>> Warning text), is still a more proper fix than setting inProduction to >>>>> true and crossing your fingers hoping no signals will be lost during >>>>> table growth. >>>> >>>> Ah, I didn't know about the risk of loosing signals while resizing the >>>> table. Thanks for that. Don't get me wrong I wasn't proposing to set >>>> inProduction in effect. I don't think that automatically growing resource >>>> management is a proper way to design a system. There is always a range of >>>> resources you need for your use case. Not setting an upper bound for this >>>> just covers leaking behavior. >>>> >>>> Norbert >>>> >>>> >>>> >>>> >>>> >>>> -- >>>> Best regards, >>>> Igor Stasenko. >>> >>> >>> >>> >>> -- >>> Best regards, >>> Igor Stasenko. > >
