On Tue, May 22, 2012 at 6:24 PM, Simon Matthews
<[email protected]> wrote:
> Some googling on Berkeley DB shows that it is safe for concurrent access by
> different users, so it should be safe to run db_checkpoint without shutting
> down the qmaster.

I just quickly scanned the Berkeley DB spooling code - if you are not
using the BDB RPC server then Grid Engine (at least all versions of
Grid Engine distributed by the Open Grid Scheduler project) should be
able to handle checkpointing & archiving within qmaster - ie. without
using external commands.

It's not an issue to use external BDB commands to clear the
transaction logs (BDB is designed to have external transaction log
cleanup commands), but I was wondering if you at some point in time
were using BDB RPC spooling??

Rayson



>
> Simon
>
>
>>
>>
>> Rayson
>>
>>
>>
>> On Fri, May 18, 2012 at 2:39 PM, Simon Matthews
>> <[email protected]> wrote:
>> > Thanks for pointing this out to me
>> >
>> > The documentation says that it should be used every minute if the
>> > configuration uses a BDB server. I don't use a BDB server, but the
>> > storage
>> > method I use is BDB (not flat files). If I should use this checkppoint
>> > script, how often should I run it, and should I shut down the qmaster to
>> > run
>> > it?
>> >
>> >>
>> >>
>> >> Rayson
>> >>
>> >>
>> >>
>> >> On Fri, May 18, 2012 at 1:17 PM, Simon Matthews
>> >> <[email protected]> wrote:
>> >> > After SGE was killed by the OOM killed, the  file (a berkely db file)
>> >> > in
>> >> > my
>> >> > cluster was 1.4GB. I did a db_dump and db_load, on this file,
>> >> > resulting
>> >> > in a
>> >> > much smaller file.
>> >> >
>> >> > However, this then raised the question -- how is this file
>> >> > maintained?
>> >> > Presumably, it holds the information on jobs in all states (queued,
>> >> > running
>> >> > and finished). How do the finished jobs get removed from this file?
>> >> > Obviously, I don't want the file to grow without limit.
>> >> >
>> >> > We are now putting about 50k jobs into our small cluster every day
>> >> > (many
>> >> > finish running in a fraction of a second).
>> >> >
>> >> > Simon
>> >> >
>> >> > _______________________________________________
>> >> > users mailing list
>> >> > [email protected]
>> >> > https://gridengine.org/mailman/listinfo/users
>> >> >
>> >
>> >
>
>

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to