Everyone,

Note that as a very nice person, I don't normally start any sort of
bashing on the mailing lists - esp. the Grid Engine mailing lists
which I have been a member of for over 10 years (and doing nothing but
responding to thousands of messages and helping hundreds of people on
the SGE users list). However, in the past 12 months, Dave Love has
started countless bashing on the Open Grid Scheduler Project, and
spreading FUDs against Open Grid Scheduler/Grid Engine, while at the
same time copying code from Open Grid Scheduler/Grid Engine to benefit
his fork.

The email responses at the bottom (ref: maintaining spooldb/sge_job)
are very good examples of showing how naïve and technically incapable
other Grid Engine implementors are.


1) In Grid Engine 2011.11, we added the ability to place the
BerkeleyDB files in any network filesystems, including NFSv2, NFSv3,
or GlusterFS, Lustre, etc... We did it by passing in the DB_PRIVATE
flag when opening the DB environment.

Note that Dave Love referenced the Berkeley DB documentation, which says,

 "[DB_PRIVATE] should not be specified if more than a single process
is accessing the environment because it is likely to cause database
corruption and unpredictable behavior. For example, if both a server
application and Berkeley DB utilities (for example, db_archive,
db_checkpoint or db_stat) are expected to access the environment, the
DB_PRIVATE flag should not be specified."


TRUTH:
When DB_PRIVATE is used, the *BerkeleyDB environment* is not backed by
a physical file in the filesystem, and the BerkeleyDB utilities that
try to open the environment would find that the environment file is
missing and act accordingly. So the next question one might have is,
as the environment file is missing, wouldn't db_archive,
db_checkpoint, or db_stat then fail to do their job?? Note that when
Grid Engine is not spooling onto a Berkeley RPC server, the
db_archive, db_checkpoint functionality is implemented inside the
qmaster (the qmaster handles that by calling the Berkeley DB
programming APIs), and that is the reason why the "bdb_checkpoint.sh"
script is not needed when one is not using the RPC spooling server
(this is true since SGE 6.0 - we did not change the user interface).


And if you ask, "what if I am using the RPC spooling server?". All
features declared as "Depreciated" in Open Grid Scheduler have a
2-year grace period of support before they are totally removed. And
thus to support sites running Berkeley RPC spooling and wanting to
upgrade to Grid Engine 2011.11, we do support RPC spooling in GE
2011.11, but the spooling code does not add the DB_PRIVATE flag when
opening the RPC spooling environment in RPC spooling mode. Thus your
"bdb_checkpoint.sh" script, which calls db_archive & db_checkpoint, is
again happy with the Berkeley DB setup in Grid Engine 2011.11.


This shows, Dave Love technical "ability" (or lack thereof). Also in
many cases Dave is copying code line by line, and in some cases adding
extra things when using the Open Grid Scheduler code because he
*thinks* something is wrong, and in other cases adding code that has
lots of regressions (just count the number of bug reports he gets),
and then finally again how much of a copycat he is.


2) From the manpage written by Dave Love:
 {
       Currently the only
       valid value for options is private, which means to open the database
       with the DB_PRIVATE flag to specify that it is only accessed by a
       single process.  This allows the database directory to be on an NFSv3
       filesystem (as opposed to an NFSv4 one, which is otherwise necessary),
       but it is unsafe to access it with any other program.  In particular,
       this means that the backup scripts (inst_sge -bup and
       util/upgrade_modules/save_sge_config.sh should not be used while
       qmaster is running with berkeleydb spooling).
 }

It's WRONG again!!

- Note that inst_sge -bup accesses the BerkeleyDB files directly using
"cp -f", and using the DB_PRIVATE flag or not, one is not supposed to
use Unix filesystem commands to read or write a live BerkeleyDB file.

As a responsible person and a responsible Grid Engine implementor, I
would read the "How to Upgrade to 6.1 Software Using Classic/Berkeley
DB Spooling" docs first to understand how users are supposed to
upgrade from 1 version of Grid Engine to another.

It says:

 1. Shut down the entire cluster.
 2. Back up your cluster.
     Type the following command:
       inst_sge -bup

Ref: http://docs.oracle.com/cd/E19957-01/820-0697/eoqss/index.html

So it is wrong to claim that DB_PRIVATE is not safe to use with
inst_sge -bup, giving users WRONG impressions that it is safe when
DB_PRIVATE is not used. DB_PRIVATE just does not play any roles in
this context. One *NEEDS TO* shutdown the cluster first before doing
the inst_sge -bup backup!!

- Also note that util/upgrade_modules/save_sge_config.sh uses SGE
command line utilities to access the qmaster config file. The SGE
command line utilities do not access the BerkeleyDB files directly,
but they send requests to the qmaster. Again, it is the qmaster that
is accessing the Berkeley DB files, and DB_PRIVATE does not play any
roles in this context as the qmaster is the only reader & writer of
the Berkeley DB.


3) Need your help, community!!

If any of you receive marketing materials or FUD against the Open Grid
Scheduler Project, whether from Univa or Dave Love, please let me
know. (So far, my only complain about Univa was the Google Ads, which
they removed a few months ago.)

As this is not the first time Dave Love spreads false FUDs, I would
like to ask Dave to stop all Open Grid Scheduler bashing. As William
Bryce is not a technical person (he has technical background, but not
down to the level to understand the details), he was misled by you.
What if Univa launches new market ads saying how unsafe Open Grid
Scheduler/Grid Engine is when using BerkeleyDB spooling based on your
info??

I really am not blaming Bill Bryce - we used to work together, and we
both are in Toronto (but I'm leaving Canada soon). Also, Univa is way
more responsible than Dave Love. My advice to Bill is just don't
listen to Dave Love... and may be we can have a little reunion with
the other ex-coworkers when I visit Toronto again next year!

Rayson



On Sat, May 26, 2012 at 10:40 AM, William Bryce <[email protected]> wrote:
> I second what Dave just pointed out below :-)  The documentation seems
> pretty clear to me.
>
> Bill.
>
>
> On Sat, May 26, 2012 at 10:12 AM, Dave Love <[email protected]> wrote:
>> That depends.  OGS changed the spool handling so that the database is
>> opened with the DB_PRIVATE flag, which is definitely not safe.  See
>>
>> <http://docs.oracle.com/cd/E17276_01/html/api_reference/C/envopen.html#open_DB_PRIVATE>.
>> You need the choice so that you can at least do live backups from other
>> filesystems, as described in
>> <http://arc.liv.ac.uk/SGE/htmlman/htmlman5/bootstrap.html>, but that's
>> only in the development version currently.

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to