Author: phunt Date: Sat Jan 31 01:23:15 2009 New Revision: 739480 URL: http://svn.apache.org/viewvc?rev=739480&view=rev Log: ZOOKEEPER-229. improve documentation regarding user's responsibility to cleanup datadir (snaps/logs)
Modified: hadoop/zookeeper/trunk/CHANGES.txt hadoop/zookeeper/trunk/build.xml hadoop/zookeeper/trunk/docs/zookeeperAdmin.html hadoop/zookeeper/trunk/docs/zookeeperAdmin.pdf hadoop/zookeeper/trunk/docs/zookeeperStarted.html hadoop/zookeeper/trunk/docs/zookeeperStarted.pdf hadoop/zookeeper/trunk/src/docs/src/documentation/content/xdocs/zookeeperAdmin.xml hadoop/zookeeper/trunk/src/docs/src/documentation/content/xdocs/zookeeperStarted.xml Modified: hadoop/zookeeper/trunk/CHANGES.txt URL: http://svn.apache.org/viewvc/hadoop/zookeeper/trunk/CHANGES.txt?rev=739480&r1=739479&r2=739480&view=diff ============================================================================== --- hadoop/zookeeper/trunk/CHANGES.txt (original) +++ hadoop/zookeeper/trunk/CHANGES.txt Sat Jan 31 01:23:15 2009 @@ -141,6 +141,9 @@ ZOOKEEPER-215. expand system test environment (breed via phunt) + ZOOKEEPER-229. improve documentation regarding user's responsibility to + cleanup datadir (snaps/logs) (mahadev via phunt) + NEW FEATURES: ZOOKEEPER-276. Bookkeeper contribution (Flavio and Luca Telloli via mahadev) Modified: hadoop/zookeeper/trunk/build.xml URL: http://svn.apache.org/viewvc/hadoop/zookeeper/trunk/build.xml?rev=739480&r1=739479&r2=739480&view=diff ============================================================================== --- hadoop/zookeeper/trunk/build.xml (original) +++ hadoop/zookeeper/trunk/build.xml Sat Jan 31 01:23:15 2009 @@ -333,6 +333,8 @@ <include name="org/apache/zookeeper/Watcher.java"/> <include name="org/apache/zookeeper/ZooDefs.java"/> <include name="org/apache/zookeeper/ZooKeeper.java"/> + <include name="org/apache/zookeeper/server/LogFormatter.java"/> + <include name="org/apache/zookeeper/server/PurgeTxnLog.java"/> <exclude name="org/apache/zookeeper/server/quorum/QuorumPacket"/> </fileset> <packageset dir="${src_generated.dir}"> Modified: hadoop/zookeeper/trunk/docs/zookeeperAdmin.html URL: http://svn.apache.org/viewvc/hadoop/zookeeper/trunk/docs/zookeeperAdmin.html?rev=739480&r1=739479&r2=739480&view=diff ============================================================================== --- hadoop/zookeeper/trunk/docs/zookeeperAdmin.html (original) +++ hadoop/zookeeper/trunk/docs/zookeeperAdmin.html Sat Jan 31 01:23:15 2009 @@ -231,6 +231,17 @@ <a href="#sc_administering">Administering</a> </li> <li> +<a href="#sc_maintenance">Maintenance</a> +<ul class="minitoc"> +<li> +<a href="#Ongoing+Data+Directory+Cleanup">Ongoing Data Directory Cleanup</a> +</li> +<li> +<a href="#Debug+Log+Cleanup+%28log4j%29">Debug Log Cleanup (log4j)</a> +</li> +</ul> +</li> +<li> <a href="#sc_monitoring">Monitoring</a> </li> <li> @@ -269,7 +280,7 @@ <a href="#The+Log+Directory">The Log Directory</a> </li> <li> -<a href="#File+Management">File Management</a> +<a href="#sc_filemanagement">File Management</a> </li> </ul> </li> @@ -472,7 +483,7 @@ consists of a single line containing only the text of that machine's id. So <span class="codefrag filename">myid</span> of server 1 would contain the text "1" and nothing else. The id must be unique within the - ensemble.</p> + ensemble and should have a value between 1 and 255.</p> </li> @@ -629,6 +640,15 @@ <li> <p> +<a href="#sc_maintenance">Maintenance</a> +</p> + +</li> + + +<li> + +<p> <a href="#sc_monitoring">Monitoring</a> </p> @@ -698,7 +718,7 @@ </li> </ul> -<a name="N101A6"></a><a name="sc_designing"></a> +<a name="N101AE"></a><a name="sc_designing"></a> <h3 class="h4">Designing a ZooKeeper Deployment</h3> <p>The reliablity of ZooKeeper rests on two basic assumptions.</p> <ol> @@ -725,7 +745,7 @@ to hold true. Some of these are cross-machines considerations, and others are things you should consider for each and every machine in your deployment.</p> -<a name="N101C2"></a><a name="sc_CrossMachineRequirements"></a> +<a name="N101CA"></a><a name="sc_CrossMachineRequirements"></a> <h4>Cross Machine Requirements</h4> <p>For the ZooKeeper service to be active, there must be a majority of non-failing machines that can communicate with @@ -743,7 +763,7 @@ failure of that switch could cause a correlated failure and bring down the service. The same holds true of shared power circuits, cooling systems, etc.</p> -<a name="N101CF"></a><a name="Single+Machine+Requirements"></a> +<a name="N101D7"></a><a name="Single+Machine+Requirements"></a> <h4>Single Machine Requirements</h4> <p>If ZooKeeper has to contend with other applications for access to resourses like storage media, CPU, network, or @@ -784,19 +804,61 @@ </li> </ul> -<a name="N101ED"></a><a name="sc_provisioning"></a> +<a name="N101F5"></a><a name="sc_provisioning"></a> <h3 class="h4">Provisioning</h3> <p></p> -<a name="N101F6"></a><a name="sc_strengthsAndLimitations"></a> +<a name="N101FE"></a><a name="sc_strengthsAndLimitations"></a> <h3 class="h4">Things to Consider: ZooKeeper Strengths and Limitations</h3> <p></p> -<a name="N101FF"></a><a name="sc_administering"></a> +<a name="N10207"></a><a name="sc_administering"></a> <h3 class="h4">Administering</h3> <p></p> -<a name="N10208"></a><a name="sc_monitoring"></a> +<a name="N10210"></a><a name="sc_maintenance"></a> +<h3 class="h4">Maintenance</h3> +<p>Little long term maintenance is required for a ZooKeeper + cluster however you must be aware of the following:</p> +<a name="N10219"></a><a name="Ongoing+Data+Directory+Cleanup"></a> +<h4>Ongoing Data Directory Cleanup</h4> +<p>The ZooKeeper <a href="#var_datadir">Data + Directory</a> contains files which are a persistent copy + of the znodes stored by a particular serving ensemble. These + are the snapshot and transactional log files. As changes are + made to the znodes these changes are appended to a + transaction log, occasionally, when a log grows large, a + snapshot of the current state of all znodes will be written + to the filesystem. This snapshot supercedes all previous + logs. + </p> +<p>A ZooKeeper server <strong>will not remove + old snapshots and log files</strong>, this is the + responsibility of the operator. Every serving environment is + different and therefore the requirements of managing these + files may differ from install to install (backup for example). + </p> +<p>The PurgeTxnLog utility implements a simple retention + policy that administrators can use. The <a href="api/index.html">API docs</a> contains details on + calling conventions (arguments, etc...). + </p> +<p>In the following example the last count snapshots and + their corresponding logs are retained and the others are + deleted. The value of <count> should typically be + greater than 3 (although not required, this provides 3 backups + in the unlikely event a recent log has become corrupted). This + can be run as a cron job on the ZooKeeper server machines to + clean up the logs daily.</p> +<pre class="code"> java -cp zookeeper.jar:log4j.jar:conf org.apache.zookeeper.server.PurgeTxnLog <dataDir> <snapDir> -n <count></pre> +<a name="N1023A"></a><a name="Debug+Log+Cleanup+%28log4j%29"></a> +<h4>Debug Log Cleanup (log4j)</h4> +<p>See the section on <a href="#sc_logging">logging</a> in this document. It is + expected that you will setup a rolling file appender using the + in-built log4j feature. The sample configuration file in the + release tar's conf/log4j.properties provides an example of + this. + </p> +<a name="N10249"></a><a name="sc_monitoring"></a> <h3 class="h4">Monitoring</h3> <p></p> -<a name="N10211"></a><a name="sc_logging"></a> +<a name="N10252"></a><a name="sc_logging"></a> <h3 class="h4">Logging</h3> <p>ZooKeeper uses <strong>log4j</strong> version 1.2 as its logging infrastructure. The ZooKeeper default <span class="codefrag filename">log4j.properties</span> @@ -806,10 +868,10 @@ <p>For more information, see <a href="http://logging.apache.org/log4j/1.2/manual.html#defaultInit">Log4j Default Initialization Procedure</a> of the log4j manual.</p> -<a name="N10231"></a><a name="sc_troubleshooting"></a> +<a name="N10272"></a><a name="sc_troubleshooting"></a> <h3 class="h4">Troubleshooting</h3> <p></p> -<a name="N1023A"></a><a name="sc_configuration"></a> +<a name="N1027B"></a><a name="sc_configuration"></a> <h3 class="h4">Configuration Parameters</h3> <p>ZooKeeper's behavior is governed by the ZooKeeper configuration file. This file is designed so that the exact same file can be used by @@ -817,7 +879,7 @@ layouts are the same. If servers use different configuration files, care must be taken to ensure that the list of servers in all of the different configuration files match.</p> -<a name="N10243"></a><a name="sc_minimumConfiguration"></a> +<a name="N10284"></a><a name="sc_minimumConfiguration"></a> <h4>Minimum Configuration</h4> <p>Here are the minimum configuration keywords that must be defined in the configuration file:</p> @@ -864,7 +926,7 @@ </dd> </dl> -<a name="N1026A"></a><a name="sc_advancedConfiguration"></a> +<a name="N102AB"></a><a name="sc_advancedConfiguration"></a> <h4>Advanced Configuration</h4> <p>The configuration settings in the section are optional. You can use them to further fine tune the behaviour of your ZooKeeper servers. @@ -955,7 +1017,7 @@ </dd> </dl> -<a name="N102CA"></a><a name="sc_clusterOptions"></a> +<a name="N1030B"></a><a name="sc_clusterOptions"></a> <h4>Cluster Options</h4> <p>The options in this section are designed for use with an ensemble of servers -- that is, when deploying clusters of servers.</p> @@ -1045,7 +1107,7 @@ </dl> <p></p> -<a name="N10327"></a><a name="Unsafe+Options"></a> +<a name="N10368"></a><a name="Unsafe+Options"></a> <h4>Unsafe Options</h4> <p>The following options can be useful, but be careful when you use them. The risk of each is explained along with the explanation of what @@ -1090,7 +1152,7 @@ </dd> </dl> -<a name="N10359"></a><a name="sc_zkCommands"></a> +<a name="N1039A"></a><a name="sc_zkCommands"></a> <h3 class="h4">ZooKeeper Commands: The Four Letter Words</h3> <p>ZooKeeper responds to a small set of commands. Each command is composed of four letters. You issue the commands to ZooKeeper via telnet @@ -1163,7 +1225,7 @@ <pre class="code">$ echo ruok | nc 127.0.0.1 5111 imok </pre> -<a name="N103A0"></a><a name="sc_dataFileManagement"></a> +<a name="N103E1"></a><a name="sc_dataFileManagement"></a> <h3 class="h4">Data File Management</h3> <p>ZooKeeper stores its data in a data directory and its transaction log in a transaction log directory. By default these two directories are @@ -1171,7 +1233,7 @@ transaction log files in a separate directory than the data files. Throughput increases and latency decreases when transaction logs reside on a dedicated log devices.</p> -<a name="N103A9"></a><a name="The+Data+Directory"></a> +<a name="N103EA"></a><a name="The+Data+Directory"></a> <h4>The Data Directory</h4> <p>This directory has two files in it:</p> <ul> @@ -1217,14 +1279,14 @@ idempotent nature of its updates. By replaying the transaction log against fuzzy snapshots ZooKeeper gets the state of the system at the end of the log.</p> -<a name="N103E5"></a><a name="The+Log+Directory"></a> +<a name="N10426"></a><a name="The+Log+Directory"></a> <h4>The Log Directory</h4> <p>The Log Directory contains the ZooKeeper transaction logs. Before any update takes place, ZooKeeper ensures that the transaction that represents the update is written to non-volatile storage. A new log file is started each time a snapshot is begun. The log file's suffix is the first zxid written to that log.</p> -<a name="N103EF"></a><a name="File+Management"></a> +<a name="N10430"></a><a name="sc_filemanagement"></a> <h4>File Management</h4> <p>The format of snapshot and log files does not change between standalone ZooKeeper servers and different configurations of @@ -1235,13 +1297,16 @@ state of ZooKeeper servers and even restore that state. The LogFormatter class allows an administrator to look at the transactions in a log.</p> -<p>The ZooKeeper server creates snapshot and log files, but never - deletes them. The retention policy of the data and log files is - implemented outside of the ZooKeeper server. The server itself only - needs the latest complete fuzzy snapshot and the log files from the - start of that snapshot. The PurgeTxnLog utility implements a simple - retention policy that administrators can use.</p> -<a name="N10400"></a><a name="sc_commonProblems"></a> +<p>The ZooKeeper server creates snapshot and log files, but + never deletes them. The retention policy of the data and log + files is implemented outside of the ZooKeeper server. The + server itself only needs the latest complete fuzzy snapshot + and the log files from the start of that snapshot. See the + <a href="#sc_maintenance">maintenance</a> section in + this document for more details on setting a retention policy + and maintenance of ZooKeeper storage. + </p> +<a name="N10445"></a><a name="sc_commonProblems"></a> <h3 class="h4">Things to Avoid</h3> <p>Here are some common problems you can avoid by configuring ZooKeeper correctly:</p> @@ -1295,7 +1360,7 @@ </dd> </dl> -<a name="N10424"></a><a name="sc_bestPractices"></a> +<a name="N10469"></a><a name="sc_bestPractices"></a> <h3 class="h4">Best Practices</h3> <p>For best results, take note of the following list of good Zookeeper practices. <em>[tbd...]</em> Modified: hadoop/zookeeper/trunk/docs/zookeeperAdmin.pdf URL: http://svn.apache.org/viewvc/hadoop/zookeeper/trunk/docs/zookeeperAdmin.pdf?rev=739480&r1=739479&r2=739480&view=diff ============================================================================== Binary files - no diff available. Modified: hadoop/zookeeper/trunk/docs/zookeeperStarted.html URL: http://svn.apache.org/viewvc/hadoop/zookeeper/trunk/docs/zookeeperStarted.html?rev=739480&r1=739479&r2=739480&view=diff ============================================================================== --- hadoop/zookeeper/trunk/docs/zookeeperStarted.html (original) +++ hadoop/zookeeper/trunk/docs/zookeeperStarted.html Sat Jan 31 01:23:15 2009 @@ -198,6 +198,9 @@ <a href="#sc_InstallingSingleMode">Standalone Operation</a> </li> <li> +<a href="#sc_FileManagement">Managing ZooKeeper Storage</a> +</li> +<li> <a href="#sc_ConnectingToZooKeeper">Connecting to ZooKeeper</a> </li> <li> @@ -313,7 +316,13 @@ This is fine for most development situations, but to run ZooKeeper in replicated mode, please see <a href="#sc_RunningReplicatedZooKeeper">Running Replicated ZooKeeper</a>.</p> -<a name="N10083"></a><a name="sc_ConnectingToZooKeeper"></a> +<a name="N10083"></a><a name="sc_FileManagement"></a> +<h3 class="h4">Managing ZooKeeper Storage</h3> +<p>For long running production systems ZooKeeper storage must + be managed externally (dataDir and logs). See the section on + <a href="zookeeperAdmin.html#sc_maintenance">maintenance</a> for + more details.</p> +<a name="N10091"></a><a name="sc_ConnectingToZooKeeper"></a> <h3 class="h4">Connecting to ZooKeeper</h3> <p>Once ZooKeeper is running, you have several options for connection to it:</p> @@ -363,7 +372,7 @@ </li> </ul> -<a name="N100C6"></a><a name="sc_ProgrammingToZooKeeper"></a> +<a name="N100D4"></a><a name="sc_ProgrammingToZooKeeper"></a> <h3 class="h4">Programming to ZooKeeper</h3> <p>ZooKeeper has a Java bindings and C bindings. They are functionally equivalent. The C bindings exist in two variants: single @@ -371,7 +380,7 @@ is done. For more information, see the <a href="zookeeperProgrammers.html#ch_programStructureWithExample.html">Programming Examples in the ZooKeeper Programmer's Guide</a> for sample code using of the different APIs.</p> -<a name="N100D4"></a><a name="sc_RunningReplicatedZooKeeper"></a> +<a name="N100E2"></a><a name="sc_RunningReplicatedZooKeeper"></a> <h3 class="h4">Running Replicated ZooKeeper</h3> <p>Running ZooKeeper in standalone mode is convenient for evaluation, some development, and testing. But in production, you should run @@ -431,7 +440,7 @@ </div> </div> -<a name="N10111"></a><a name="Other+Optimizations"></a> +<a name="N1011F"></a><a name="Other+Optimizations"></a> <h3 class="h4">Other Optimizations</h3> <p>There are a couple of other configuration parameters that can greatly increase performance:</p> Modified: hadoop/zookeeper/trunk/docs/zookeeperStarted.pdf URL: http://svn.apache.org/viewvc/hadoop/zookeeper/trunk/docs/zookeeperStarted.pdf?rev=739480&r1=739479&r2=739480&view=diff ============================================================================== Binary files - no diff available. Modified: hadoop/zookeeper/trunk/src/docs/src/documentation/content/xdocs/zookeeperAdmin.xml URL: http://svn.apache.org/viewvc/hadoop/zookeeper/trunk/src/docs/src/documentation/content/xdocs/zookeeperAdmin.xml?rev=739480&r1=739479&r2=739480&view=diff ============================================================================== --- hadoop/zookeeper/trunk/src/docs/src/documentation/content/xdocs/zookeeperAdmin.xml (original) +++ hadoop/zookeeper/trunk/src/docs/src/documentation/content/xdocs/zookeeperAdmin.xml Sat Jan 31 01:23:15 2009 @@ -295,6 +295,10 @@ </listitem> <listitem> + <para><xref linkend="sc_maintenance" /></para> + </listitem> + + <listitem> <para><xref linkend="sc_monitoring" /></para> </listitem> @@ -429,6 +433,65 @@ <para></para> </section> + <section id="sc_maintenance"> + <title>Maintenance</title> + + <para>Little long term maintenance is required for a ZooKeeper + cluster however you must be aware of the following:</para> + + <section> + <title>Ongoing Data Directory Cleanup</title> + + <para>The ZooKeeper <ulink url="#var_datadir">Data + Directory</ulink> contains files which are a persistent copy + of the znodes stored by a particular serving ensemble. These + are the snapshot and transactional log files. As changes are + made to the znodes these changes are appended to a + transaction log, occasionally, when a log grows large, a + snapshot of the current state of all znodes will be written + to the filesystem. This snapshot supercedes all previous + logs. + </para> + + <para>A ZooKeeper server <emphasis role="bold">will not remove + old snapshots and log files</emphasis>, this is the + responsibility of the operator. Every serving environment is + different and therefore the requirements of managing these + files may differ from install to install (backup for example). + </para> + + <para>The PurgeTxnLog utility implements a simple retention + policy that administrators can use. The <ulink + url="ext:api/index">API docs</ulink> contains details on + calling conventions (arguments, etc...). + </para> + + <para>In the following example the last count snapshots and + their corresponding logs are retained and the others are + deleted. The value of <count> should typically be + greater than 3 (although not required, this provides 3 backups + in the unlikely event a recent log has become corrupted). This + can be run as a cron job on the ZooKeeper server machines to + clean up the logs daily.</para> + + <programlisting> java -cp zookeeper.jar:log4j.jar:conf org.apache.zookeeper.server.PurgeTxnLog <dataDir> <snapDir> -n <count></programlisting> + + </section> + + <section> + <title>Debug Log Cleanup (log4j)</title> + + <para>See the section on <ulink + url="#sc_logging">logging</ulink> in this document. It is + expected that you will setup a rolling file appender using the + in-built log4j feature. The sample configuration file in the + release tar's conf/log4j.properties provides an example of + this. + </para> + </section> + + </section> + <section id="sc_monitoring"> <title>Monitoring</title> @@ -482,7 +545,7 @@ </listitem> </varlistentry> - <varlistentry> + <varlistentry id="var_datadir"> <term>dataDir</term> <listitem> @@ -914,7 +977,7 @@ suffix is the first zxid written to that log.</para> </section> - <section> + <section id="sc_filemanagement"> <title>File Management</title> <para>The format of snapshot and log files does not change between @@ -928,12 +991,15 @@ LogFormatter class allows an administrator to look at the transactions in a log.</para> - <para>The ZooKeeper server creates snapshot and log files, but never - deletes them. The retention policy of the data and log files is - implemented outside of the ZooKeeper server. The server itself only - needs the latest complete fuzzy snapshot and the log files from the - start of that snapshot. The PurgeTxnLog utility implements a simple - retention policy that administrators can use.</para> + <para>The ZooKeeper server creates snapshot and log files, but + never deletes them. The retention policy of the data and log + files is implemented outside of the ZooKeeper server. The + server itself only needs the latest complete fuzzy snapshot + and the log files from the start of that snapshot. See the + <ulink url="#sc_maintenance">maintenance</ulink> section in + this document for more details on setting a retention policy + and maintenance of ZooKeeper storage. + </para> </section> </section> Modified: hadoop/zookeeper/trunk/src/docs/src/documentation/content/xdocs/zookeeperStarted.xml URL: http://svn.apache.org/viewvc/hadoop/zookeeper/trunk/src/docs/src/documentation/content/xdocs/zookeeperStarted.xml?rev=739480&r1=739479&r2=739480&view=diff ============================================================================== --- hadoop/zookeeper/trunk/src/docs/src/documentation/content/xdocs/zookeeperStarted.xml (original) +++ hadoop/zookeeper/trunk/src/docs/src/documentation/content/xdocs/zookeeperStarted.xml Sat Jan 31 01:23:15 2009 @@ -73,7 +73,7 @@ stable</ulink> release from one of the Apache Download Mirrors.</para> </section> - + <section id="sc_InstallingSingleMode"> <title>Standalone Operation</title> @@ -151,6 +151,15 @@ url="#sc_RunningReplicatedZooKeeper">Running Replicated ZooKeeper</ulink>.</para> </section> + + <section id="sc_FileManagement"> + <title>Managing ZooKeeper Storage</title> + <para>For long running production systems ZooKeeper storage must + be managed externally (dataDir and logs). See the section on + <ulink + url="zookeeperAdmin.html#sc_maintenance">maintenance</ulink> for + more details.</para> + </section> <section id="sc_ConnectingToZooKeeper"> <title>Connecting to ZooKeeper</title>