<quote who="Toby Blake"> > Hi there, > > Firstly, many thanks for the replies...
np. > >> Hi Toby. >> >>> For largely historical reasons we run slapd servers on most clients >>> (this will probably change in the future - I'm just giving this >>> information as background). >> >> Why? > > Why will this change or why did we do it in the first place? I wasn't > party to these decisions at the time, so I can't really comment on the > reasons for them. I could speculate wildly, but I'd prefer not to. Understood. > >>> We're seeing problems when some of these >>> machines are busy, particularly, it seems, with memory intensive >>> activity, although it's hard to substantiate as I generally only see >>> the machines after they've broken. It's annoying as I can't reproduce >>> these problems. >> >> It's going to be hard to pin point then ;-) How much memory/CPU >> etc. do these clients have and what other services do they provide? > > They're typically desktop or lab machines for academics, students, > etc. Hardware-wise they're Dell desktop boxes of a few years old - a > 2.4GHz processor with 512MB of memory is typical. Something I should > have mentioned is that they're running Fedora Core 5, with a few > running FC6. OK. > > As for what services they provide, general desktop services, but also > could be running long-running or intensive jobs. Many of the machines > are also in a condor pool and this does seem to cause more problems. > > Do you know if slapd gets unhappy if other processes use up lots of > memory? This is my current line of investigation - I'll try to make > it unhappy by using increasing amounts of memory. Yes. > > I suppose what I'm trying to determine is - is it the client activity > that's causing problems (i.e. a misbehaving client or similar) or is > it slapd itself getting unhappy for other reasons (possibly due to > resources being used by other programs)? Or a combination of both? Probably both. If a client keeps sending lots of bind/search requests at once, slapd will queue/defer them. > >>> We see quite a few problems with slapd getting into a state where it's >>> deferring operations, for whatever reason - I think I understand these >>> - these are when slapd basically says sorry, I'm too busy doing X, so >>> I'll defer Y until I have time. Is this accurate? >> >> Yes. What kind of clients are searching/binding to them? Local? > > All local. As for what kind of clients - typical linux desktop > activity I suppose. Hard to be specific about this really, as it will > change from host to host. OK. Is this happening on all desktops then? > >>> The second case I'm also seeing is bdb complaining about locks being >>> no longer valid, e.g. >>> >>> slapd[3780]: bdb(dc=inf,dc=ed,dc=ac,dc=uk): DB_LOCK->lock_put: Lock is >>> no >>> longer valid >>> >>> slapd seems to keep going for the time being until getting into a >>> state where it defers all binding operations and goes into some kind >>> of spin where it sits at 99% cpu and has to be killed with a -9. >> >> Is everything local? Nothing mounted locally, like NFS for the directory >> data. > > Machines will have both NFS and AFS for home directory data. Not the data directory then, ok. > >>> I suppose I have a couple of questions about the "Lock is no longer >>> valid" error.... >>> >>> - What causes it? >>> - Is it something I can prevent by configuration changes (for >>> instance, would increasing the numbers of locks, lockers and objects >>> help?) >> >> One for the dev team. I do know this is an error message from >> Berkeley DB by grepping the source. > > Yes, I saw it in the source, but don't know it well enough to be sure > of what's causing it. Likewise. > >>> We're running openldap 2.3.35 with ITS#4924 and ITS#4925 patches with >>> a bdb backend running 4.2.52 with all 6 recommended patches. >> >> I hope you mean 5, as there are only 5 listed on the Oracle site. > > As Quanah said, there are 6. > >>> The only DBCONFIG settings we currently have are: >>> >>> dbconfig set_cachesize 0 67108864 1 >>> dbconfig set_lg_regionmax 262144 >>> dbconfig set_lg_bsize 2097152 >> >> I take it dbconfig is a keyword you've added for this example, as >> it's not valid. > > Sorry, I should have been more specific - this is in slapd.conf - look > in the man page for slapd-bdb - this is just a way of getting > directives into DB_CONFIG. Yeah, my mistake. I forgot about that way. > > Cheers > Toby >
