Author: challngr Date: Wed Jan 13 21:09:43 2016 New Revision: 1724512 URL: http://svn.apache.org/viewvc?rev=1724512&view=rev Log: UIMA-4745 Database updates to duccbookk.
Added: uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part4/admin/ducc-database.tex Modified: uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part4/admin/admin-commands.tex uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part4/admin/ducc-properties.tex uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part4/ducc-aguide.tex uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part4/install.tex Modified: uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part4/admin/admin-commands.tex URL: http://svn.apache.org/viewvc/uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part4/admin/admin-commands.tex?rev=1724512&r1=1724511&r2=1724512&view=diff ============================================================================== --- uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part4/admin/admin-commands.tex (original) +++ uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part4/admin/admin-commands.tex Wed Jan 13 21:09:43 2016 @@ -35,6 +35,8 @@ The command \ducchome/admin/start\_ducc is used to start DUCC processes. If run with no parameters it takes the following actions: \begin{itemize} + \item Starts the ActiveMQ server. + \item Starts the database. \item Starts the management processes Resource Manager, Orchestrator, Process Manager, Services Manager, and Web Server on the local node (where start\_ducc is executed. \item Starts an agent process on every node named in the default node list. @@ -76,6 +78,8 @@ start\_ducc -c sm -c pm -c rm -c or@bj22 \item[sm]The Service Manager \item[ws]The Web Server \item[agent]Node Agents + \item[broker] ActiveMQ broker + \item[db] Database \end{description} \item[--nothreading] If specified, the command does not run in multi-threaded mode @@ -138,9 +142,16 @@ start\_ducc -c sm -c pm -c rm -c or@bj22 \label{subsec:admin.stop-ducc} \subsubsection{{\em Description:}} - Stop\_ducc is used to stop DUCC processes. If run with no parameters it takes the following - actions: - \todo Garbled by maven or docbook, update this + Stop\_ducc is used to stop DUCC processes. At least one parameter is required. + When {\em -a} is specified, the following actions are taken: + \begin{itemize} + \item Uses the ActiveMQ broker to broadcast a shutdown request to all + DUCC compoments, other than the ActiveMQ broker itself, and the database. + \item Waits a bit, for all daemons to stop. + \item Stops the database. + \item Stops the ActiveMQ broker. + \end{itemize} + \subsubsection{\em Usage:} @@ -202,10 +213,19 @@ start_ducc -c rm \item[pm] The Process Manager. \item[sm] The Service Manager. \item[ws] The Web Server. + \item[db] The database. \item[broker] The ActiveMQ broker (only if the broker is auto-managed). \item[agent\@node] Node Agent on the specified node. \end{description} + \item[-w, --wait {[time in seconds]}] If given, this signals the time to wait + after broadcasting the shutdown signal, and before stopping the ActiveMQ broker itself. + If not specified, the default is 60 seconds. + + NOTE: In production systems, it is generally wise to use the default of 60 seconds. For + test systems a shorter wait speeds cycle time. Be sure to use {\em check\_ducc -k} after + {\em stop\_ducc} if you change the wait time to insure all processes are actually stopped. + \item[--nothreading] If specified, the command does not run in multi-threaded mode even if it is supported on the local platform. @@ -595,3 +615,90 @@ Nodepool power \paragraph{Notes:} None. + +\subsection{db\_create} +\label{subsec:cli.db.create} + + \paragraph{Description:} + This command is used to initialize the database. Normally the database is initialized + during {\em ducc\_post\_install} but if this is an existing DUCC installation that is + being migrated from a version that does not use the database, it will be necessary to + initialize the database with this command. + + This command performs the following actions: + \begin{enumerate} + \item Starts the database. + \item Disables the default database superuser. + \item Installs a database superuser as ``ducc'' and sets the password + to a password of your choice, which you are prompted for. The password is saved + in DUCC\_HOME/resources.private/ducc.private.properties. + \item Installs the DUCC database schema. + \item Stops the database. + \end{enumerate} + + + This command takes no parameters. It prompts interactively for your desired + database superuser password. + + NOTE: The database user and password are NOT RELATED to any login ID on the system, + they are used and maintained by the database only. + +\subsection{db\_loader} +\label{subsec:cli.db.loader} + + \paragraph{Description:} + This command is used to copy the data from DUCC's older file-based persistence + into the database. The database schema must already exist, created either + with {\em ducc\_post\_install} or with {\em db\_create}. + + This command performs the following actions: + \begin{enumerate} + \item Starts the database. + \item Drops some of the indexes in the database. + \item Loads the Orchestrator checkpoint file from {\em DUCC\_HOME/state/orchestrator.chkpt}. + \item Loads all job history from {\em DUCC\_HOME/history/jobs}. + \item Loads all reservation history from {\em DUCC\_HOME/history/reservations}. + \item Loads all service instance and AP history from {\em DUCC\_HOME/history/services}. + \item Loads the service registry from {\em DUCC\_HONE/state/services}. + \item Loads the service registry histroy from {\em DUCC\_HOME/history/service-registry}. + \item Reloads the Orchestratory checkpoint, as a spot-check of the loader's instrumentation (to insure + load times stay reasonable.) + \item Re-installs the DUCC database schema. + \item Stops the database. + \item Optionally renames the file-based state so if you rerun the command, the data does not get reloaded. + \end{enumerate} + + When the command exits, DUCC should be ready to run with all its state in the database. + + This command takes two parameters, a pointer to the DUCC\_HOME you want to load from, and + a flag to disable the rename of the file-based state. + + \paragraph{Usage:} + \begin{description} + \item[db\_loader -i {\em some-ducc-home} {[--no-archive]}] + Load the database from the specified DUCC\_HOME, and optionally do not archive the original files + by renaming them. + \end{description} + + \paragraph{Options:} + \begin{description} + \item[$-i$ {\em some-ducc-home}] en + This specifies the DUCC\_HOME you wish to load. Most of the time it is the DUCC\_HOME you + are running within, but it can be some other DUCC\_HOME if you have multiple installations and + want other history and state loaded. + \item[$--no-archive$] + If specified, the original files are not renamed. Note that only the directories in {\em history} + and {\em state} are renamed. To restore these, simply rename them back without the {\em archive} + suffix. + \end{description} + + \paragraph{Example:} +\begin{verbatim} +db_loader -i /home/ducc/ducc_runtime +db_loader -i /home/ducc.old/ducc_runtime --no-archive +\end{verbatim} + + \paragraph{Notes:} + The console shows progress of the loader. Full details of the load are written to a log {\em db-loader-log} + in the usual DUCC log directory, for reference and potential problem determination if something goes wrong. + Added: uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part4/admin/ducc-database.tex URL: http://svn.apache.org/viewvc/uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part4/admin/ducc-database.tex?rev=1724512&view=auto ============================================================================== --- uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part4/admin/ducc-database.tex (added) +++ uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part4/admin/ducc-database.tex Wed Jan 13 21:09:43 2016 @@ -0,0 +1,108 @@ +% +% Licensed to the Apache Software Foundation (ASF) under one +% or more contributor license agreements. See the NOTICE file +% distributed with this work for additional information +% regarding copyright ownership. The ASF licenses this file +% to you under the Apache License, Version 2.0 (the +% "License"); you may not use this file except in compliance +% with the License. You may obtain a copy of the License at +% +% http://www.apache.org/licenses/LICENSE-2.0 +% +% Unless required by applicable law or agreed to in writing, +% software distributed under the License is distributed on an +% "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +% KIND, either express or implied. See the License for the +% specific language governing permissions and limitations +% under the License. +% +\section{DUCC Database Integration} +\label{sec:ducc.database} + + As of Version 2.1.0, DUCC uses the \href{https://cassandra.apache.org/}{Apache Cassandra} + database instead of the filesystem to manage + history and the service registry. Additionally, the Resource Manager maintains + current scheduling and node state in the database. + + \subsection{Overview} + + During first-time installaion, the \hyperref[subsec:install.single-user]{\em ducc\_post\_install} utility + prompts for a (database) super-user password. If a password is provided, the utility + proceeds to configure the database and install the schema. If a password + of ``bypass'' is given, database installation is bypassed and the file system + is used for history and services. The Resource Manager does not attempt to + persist its state in the filesystem. + + If database integration is bypased during \hyperref[subsec:install.single-user]{\em ducc\_post\_install}, it may be + installed later with the utilities \hyperref[subsec:cli.db.create]{\em db\_create} and \hyperref[subsec:cli.db.loader]{\em db\_loader}. + + If DUCC is being upgraded, generally \hyperref[subsec:install.single-user]{\em ducc\_post\_install} is not used, in + which case, again, \hyperref[subsec:cli.db.create]{\em db\_create} and \hyperref[subsec:cli.db.loader]{\em db\_loader} may be used to + convert the older file-based state to the database. + + \subsubsection{Orchestrator use of the Database} + + The Orchestrator persists two types of work: + \begin{enumerate} + \item All work history. This includes jobs, reservations, service instances, and + arbitrary processes. This history is what the webserver uses to display details + on previously run jobs. Prior to the database, this data was saved in the + {\em DUCC\_HOME/history directory}. + \item Checkpoint. On every state change, the Orchestrator saves the state of + all running and allocated work in the system. This is used to recover reservations + when DUCC is started, and to allow hot-start of the Orchestrator without losing work. + Prior to the database, this data was saved in the file {\em DUCC\_HOME/state/orchestrator.ckpt}. + \end{enumerate} + + \subsubsection{Service Manager use of the Database} + The service manager uses the database to store the service registy and all state + of active services. Prior to the database, this data was saved in Java properties files + in the directory {\em DUCC\_HOME/state/services}. + + When a service is ``unregistered'' it is not physically removed from the database. Instead, + a bit is set indicating the service is no long active. These registrations may be + recovered if needed by querying the database. Prior to the database, this data was saved + in {\em DUCC\_HOME/history/service-registry}. + + \subsubsection{Resource Manager use of the Database} + The resource manager saves its entire runtime state in the database. Prior to the + database, this dynamnic state was not saved or directly accessible. + + \subsubsection{Webserver use of the Database} + The web server uses the database in read-only mode to fetch work history, service + registrations, and node status. Previosly to the database most of this information + was fetched from the filesystem. Node status was inferred using the Agent publications; + with the database, the webserver has direct access to the Resource Manager's view of the + DUCC nodes, providing a much more accurate picture of the system. + +\subsection{Database Scripting Utilities} + Database support is fully integrated with the DUCC start, stop, and check utilities as + well as the post installation scripting. + + In addition two utilities are supplied to enable migration of older installations to + enable the database: + + \begin{description} + \item[db\_create] The \hyperref[subsec:cli.db.create]{db\_create} utility creates the database schema, disables the + default database superuser, installs a read-only guest id, and installs the + main DUCC super user ID. Note that database IDs are in no way related to + operating system IDs. + \item[db\_loader] The \hyperref[subsec:cli.db.loader]{db\_loader} utility migrates an existing file-based DUCC + system to use the database. It copies in the job history, orchestrator checkpoint, + and the service registry. + \end{description} + + Use the cross-references above for additional details on the utilities. + +\subsection{Database Configuration} + Most database configuration is accomplished by setting approriate values into + your local \hyperref[subsec:ducc.database.properties]{\em site.ducc.properties}. See + the linked section for details. + + For first-time installations, the utility {\em ducc\_post\_install} prompts + for database installation, and if it is not bypassed, reasonable defaults for + all database-related properties are established. + + For existing installations, the {\em db\_create} utility installs the + database scheme and updates your {\em site.ducc.properties} with reasonable + defaults. Modified: uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part4/admin/ducc-properties.tex URL: http://svn.apache.org/viewvc/uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part4/admin/ducc-properties.tex?rev=1724512&r1=1724511&r2=1724512&view=diff ============================================================================== --- uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part4/admin/ducc-properties.tex (original) +++ uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part4/admin/ducc-properties.tex Wed Jan 13 21:09:43 2016 @@ -1603,6 +1603,143 @@ \end{description} +\subsection{Database Configuration Properties} +\label{subsec:ducc.database.properties} + + \begin{description} + + \item[ducc.database.host] \hfill \\ + This is the name of the host where the database is run. It usually defaults to the + same host as the ducc.head. Those knowledgable of the database can install the + database elsewhere. Use this parameter to specify that location. + + To disable use of the database, set this parameter to the string {\em --disabled--}. + \begin{description} + \item[Default Value] The same as your ducc.head. + \item[Type] Tuning + \end{description} + + \item[ducc.database.jmx.port] \hfill \\ + This is the JMX port used by the database. Normally it need not be changed. This + port is ONLY available on the host where the database runs. To allow access from + other hosts, check the Cassandra documentation. DUCC only directly supports + access from the configured database host. + \begin{description} + \item[Default Value] 7199 + \item[Type] Tuning + \end{description} + + \item[ducc.database.mem.heap] \hfill \\ + This is the value used to set {\em Xmx and Xms} when the database starts. The + Cassandra database makes an attempt to determine the best value of this. The + default is one-half of real memory, up to a maximum of 8G. It is reccomended that + the default be used. However, small installations may reduce this to as little + as 512M. Note that both Xmx and Xms are set. + \begin{description} + \item[Default Value] Determined by Cassandra, up to 8G max. + \item[Type] Tuning + \end{description} + + \item[ducc.database.mem.new] \hfill \\ + This is the default for the ``young'' generation when the JVM needs more memory. + In general, the default is correct. If you're not familiar with Java's memory + management it is safest to not modify this. + \begin{description} + \item[Default Value] 100M + \item[Type] Tuning + \end{description} + + \item[ducc.service.persistence.impl] \hfill \\ + This specifies the class used to implement persistence for the Service Manager's registry. + The installation procedures for the database automatically update your {\em site.ducc.properties} + to use the correct default. + + There + are two supported values: +\begin{verbatim} +org.apache.uima.ducc.common.persistence.services.StateServices +org.apache.uima.ducc.database.StateServicesDb +\end{verbatim} + + The first value implements the service registry in the file system in the directory + {\tt DUCC\_HOME/state/services}. + + When the database is installed, the service registry is implemented over the database. + + \begin{description} + \item[Default Value] When the database is enabled: +\begin{verbatim} + org.apache.uima.ducc.database.StateServicesDb +\end{verbatim} + + When the database is not enabled: +\begin{verbatim} + org.apache.uima.ducc.common.persistence.services.StateServices +\end{verbatim} + \item[Type] Private + \end{description} + + + \item[ducc.job.history.impl] \hfill \\ + This specifies the class used to implement persistence for job history and the + Orchestrator checkpoint. + The installation procedures for the database automatically update your {\em site.ducc.properties} + to use the correct default. + + The two supported values are: +\begin{verbatim} +org.apache.uima.ducc.transport.event.common.history.HistoryPersistenceManager +org.apache.uima.ducc.database.HistoryManagerDb +\end{verbatim} + + The first value causes job history to be stroed in {\tt DUCC\_HOME/history} + and the Orchestrator checkpoint to be stored in {\tt DUCC\_HOME/orchestrator.ckpt}. + + The second causes both history and checkpoint to be saved in the database. + + \begin{description} + \item[Default Value] If the database is enabled: +\begin{verbatim} + org.apache.uima.ducc.database.HistoryManagerDb +\end{verbatim} + If the database is not enabled: +\begin{verbatim} + org.apache.uima.ducc.transport.event.common.history.HistoryPersistenceManager +\end{verbatim} + \item[Type] Private + \end{description} + + + \item[ducc.rm.persistence.impl] \hfill \\ + This specifies the class used to implement persistence for the Resource Manager's + dynamic state. + The installation procedures for the database automatically update your {\em site.ducc.properties} + to use the correct default. + + The two supported values are: +\begin{verbatim} +org.apache.uima.ducc.database.RmStatePersistence +org.apache.uima.ducc.common.persistence.rm.NullRmStatePersistence +\end{verbatim} + + The first value implements RM's use of the database to store its dynamic state. The second + disables RM state persistence. There is no implementation that persists RM state + in the filesystem. + + \begin{description} + \item[Default Value] If the + database is enabled: +\begin{verbatim} + org.apache.uima.ducc.database.RmStatePersistence +\end{verbatim} + If the database is not enabled: +\begin{verbatim} + org.apache.uima.ducc.common.persistence.rm.NullRmStatePersistence +\end{verbatim} + \item[Type] Private + \end{description} + \end{description} + \section{ducc.private.properties} \label{sec:ducc.private.properties} @@ -1618,4 +1755,20 @@ \item[Type] Local \end{description} \end{description} + + +\subsection{Database Properties} + + \begin{description} + + \item[db\_password] \hfill \\ + This is the database superuser password. It is set during {\em ducc\_post\_install} or {\em db\_create}. Both + these procedures prompt for the password. There is no default; the value must be supplied by the DUCC administrator. + + NOTE: The database superuser ID is always ``ducc'', and is set during database installation. + \begin{description} + \item[Default Value] None, must be set by the administrator. + \item[Type] Local + \end{description} + \end{description} Modified: uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part4/ducc-aguide.tex URL: http://svn.apache.org/viewvc/uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part4/ducc-aguide.tex?rev=1724512&r1=1724511&r2=1724512&view=diff ============================================================================== --- uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part4/ducc-aguide.tex (original) +++ uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part4/ducc-aguide.tex Wed Jan 13 21:09:43 2016 @@ -38,6 +38,7 @@ \input{part4/admin/ducc-classes.tex} \input{part4/admin/ducc-nodes.tex} \input{part4/admin/ducc-users.tex} +\input{part4/admin/ducc-database.tex} %% This is a section \input{part4/admin/admin-commands.tex} @@ -56,10 +57,11 @@ \input{part4/sm.tex} %% A chapter -\input {part4/sim.tex} +\input {part4/web.tex} %% A chapter -\input {part4/web.tex} +\input {part4/sim.tex} + \chapter{Understanding the DUCC logs} \input{part4/system-logs.tex} Modified: uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part4/install.tex URL: http://svn.apache.org/viewvc/uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part4/install.tex?rev=1724512&r1=1724511&r2=1724512&view=diff ============================================================================== --- uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part4/install.tex (original) +++ uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part4/install.tex Wed Jan 13 21:09:43 2016 @@ -226,6 +226,9 @@ The post-installation script performs th \item Verifies that the correct level of Java and Python are installed and available. \item Creates a default nodelist, \duccruntime/resources/ducc.nodes, containing the name of the node you are installing on. \item Defines the ``ducc head'' node to be to node you are installing from. + \item Initializes the database. A prompt for the database password is given; to bypass + database installation give the password {\em bypass}. (The database can be installed + at a later date.) \item Sets up the default https keystore for the webserver. \item Installs the DUCC documentation ``ducc book'' into the DUCC webserver root. \item Builds and installs the C program, ``ducc\_ling'', into the default location.