The impressive shoot-the-messenger wasn't corrected while I was off-air, and the original bad advice and extra confusion being sown could end up with people's spools being clobbered.
Rayson Ho <[email protected]> writes: > Dave Love has > started countless bashing on the Open Grid Scheduler Project, and > spreading FUDs against Open Grid Scheduler/Grid Engine, People can assess the accuracy of that to calibrate other claims. Let's address the program behaviour behind the poisonous smoke screen. > Note that Dave Love referenced the Berkeley DB documentation, which says, > > "[DB_PRIVATE] should not be specified if more than a single process > is accessing the environment because it is likely to cause database > corruption and unpredictable behavior. For example, if both a server > application and Berkeley DB utilities (for example, db_archive, > db_checkpoint or db_stat) are expected to access the environment, the > DB_PRIVATE flag should not be specified." It's presumably authoritative and people should take note. If it's just wrong, the people who know better than the BDB maintainers should have got it corrected. > TRUTH: > When DB_PRIVATE is used, the *BerkeleyDB environment* is not backed by > a physical file in the filesystem, and the BerkeleyDB utilities that > try to open the environment would find that the environment file is [There's no single "environment file". The normal disk-backed "regions" are in several __db... files.] > missing and act accordingly. So the next question one might have is, > as the environment file is missing, wouldn't db_archive, > db_checkpoint, or db_stat then fail to do their job?? Well, the OP was advised to run them. They are definitely likely not to function safely, per the documentation, whether or not they're "doing their job". I'd have thought someone lecturing like this would know what they do. Note the modifications behind qmaster's back (where db_dump is the one people would want to run): spooldb$ ls -l total 52 -rw------- 1 sgeadmin sgeadmin 10485760 2012-06-10 23:04 log.0000000001 -rw-r--r-- 1 sgeadmin sgeadmin 24576 2012-06-10 23:03 sge -rw------- 1 sgeadmin sgeadmin 8192 2012-06-10 23:00 sge_job spooldb$ db_stat -d sge | head -n 3 Tue Jun 12 11:11:16 2012 Local time 53162 Btree magic number 9 Btree version number spooldb$ db_dump sge | head -n 3 VERSION=3 format=bytevalue type=btree spooldb$ ls -l total 52 -rw------- 1 sgeadmin sgeadmin 10485760 2012-06-10 23:04 log.0000000001 -rw-r--r-- 1 sgeadmin sgeadmin 24576 2012-06-12 11:11 sge -rw------- 1 sgeadmin sgeadmin 8192 2012-06-10 23:00 sge_job spooldb$ db_checkpoint -1 spooldb$ ls -l total 52 -rw------- 1 sgeadmin sgeadmin 10485760 2012-06-12 11:12 log.0000000001 -rw-r--r-- 1 sgeadmin sgeadmin 24576 2012-06-12 11:11 sge -rw------- 1 sgeadmin sgeadmin 8192 2012-06-10 23:00 sge_job > Note that when > Grid Engine is not spooling onto a Berkeley RPC server, the > db_archive, db_checkpoint functionality is implemented inside the > qmaster (the qmaster handles that by calling the Berkeley DB > programming APIs), and that is the reason why the "bdb_checkpoint.sh" > script is not needed when one is not using the RPC spooling server > (this is true since SGE 6.0 - we did not change the user interface). Of course, but the OP was advised to run them. [Checkpointing is obvious by inspection of the spool, without the need to read the code for those who haven't already looked at it.] > And > thus to support sites running Berkeley RPC spooling and wanting to > upgrade to Grid Engine 2011.11, we do support RPC spooling in GE > 2011.11, C.f. <http://sourceforge.net/mailarchive/message.php?msg_id=29359255>. (The DB server code is actually still in SoGE, and spooldb will build with BDB versions which don't have it, but the install script support was removed, similarly to OGS.) Yes, the save_sge_config.sh reference in the bootstrap man page is wrong, which must have been a paste error of another script I was going to include. Obviously I know how the script works, having modified it, but we're not all perfect. (By the way the "backup" option advertised by spoolinit isn't actually implemented.) > It's WRONG again!! > > - Note that inst_sge -bup accesses the BerkeleyDB files directly using > "cp -f", The polite word is "disingenuous": gridscheduler/trunk$ grep -A 1 db_dump source/dist/util/install_modules/inst_common.sh # DUMPIT="$SGE_ROOT/utilbin/sol-sparc/db_dump -f" # ExecuteAsAdmin $DUMPIT $backup_dir/$DATE.dump -h $db_home sge -- DUMPIT="$SGE_UTILBIN/db_dump -f" ExecuteAsAdmin $DUMPIT $backup_dir/$DATE.dump -h $db_home sge -- DB_BIN="$SGE_ROOT/utilbin/sol-sparc/db_load $SGE_ROOT/utilbin/sol-sparc/db_dump" DB_LIB="$SGE_ROOT/lib/sol-sparc/libdb-4.2.so" -- $INFOTEXT "32 bit version of db_load or db_dump not found. These binaries needs \n" \ "to be installed to perform a backup/restore of your BDB RPC Server. \n" \ -- $INFOTEXT -log "32 bit version of db_load or db_dump not found. These binaries needs \n" \ "to be installed to perform a backup/restore of your BDB RPC Server. \n" \ Alternatively run it with "sh -x" to check what creates the .dump file. > As a responsible person and a responsible Grid Engine implementor, I > would read the "How to Upgrade to 6.1 Software Using Classic/Berkeley > DB Spooling" docs first to understand how users are supposed to > upgrade from 1 version of Grid Engine to another. [...] > Ref: http://docs.oracle.com/cd/E19957-01/820-0697/eoqss/index.html Skip the smoke and mirrors and see the current documentation on the relevant feature of "automatic" (live) backup at <http://docs.oracle.com/cd/E24901_01/doc.62/e21973/chapter2.htm#CHDEJAAG>, although it's basically the same as in the 6.1 docs. I'd assumed commercial OGS customers got extra doc with a health warning about that sort of thing that others don't, but it seems not. > So it is wrong to claim that DB_PRIVATE is not safe to use with > inst_sge -bup, giving users WRONG impressions that it is safe when > DB_PRIVATE is not used. DB_PRIVATE just does not play any roles in > this context. One *NEEDS TO* shutdown the cluster first before doing > the inst_sge -bup backup!! The OGS inst_sge apparently implements the method in the Oracle docs, and I assume sites use it. Better rant at those implementors/documentors. > As William Bryce is not a technical person (he has technical > background, but not down to the level to understand the details), he > was misled by you. What if Univa launches new market ads saying how > unsafe Open Grid Scheduler/Grid Engine is when using BerkeleyDB > spooling based on your info?? I gather Bill takes documentation seriously and he obviously has experts to consult, so I doubt <http://gridengine.eu/grid-engine-internals/104-univa-grid-engine-81-features-part-4-new-spooling-method-postgresql-spooling-2012-06-01>¹ relies on anything I've said. Is it just "showing how naïve and technically incapable other Grid Engine implementors are" who originally implemented the feature and avoided a private database? If the limits of my technical incompetence are trusting developers' documentation in agreement with experiment, and cock-ups writing documentation, then I'll be pleased and surprised. __ 1. Would someone like to work on free postgres spooling <https://arc.liv.ac.uk/trac/SGE/ticket/1331>? -- Community Grid Engine: http://arc.liv.ac.uk/SGE/ _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
