http://git-wip-us.apache.org/repos/asf/kudu/blob/1fefa84c/docs/quickstart.html ---------------------------------------------------------------------- diff --git a/docs/quickstart.html b/docs/quickstart.html deleted file mode 100644 index 7932b9a..0000000 --- a/docs/quickstart.html +++ /dev/null @@ -1,431 +0,0 @@ ---- -title: Apache Kudu Quickstart -layout: default -active_nav: docs -last_updated: 'Last updated 2018-06-14 08:17:56 PDT' ---- -<!-- - -Licensed under the Apache License, Version 2.0 (the "License"); -you may not use this file except in compliance with the License. -You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. ---> - - -<div class="container"> - <div class="row"> - <div class="col-md-9"> - -<h1>Apache Kudu Quickstart</h1> - <div id="preamble"> -<div class="sectionbody"> -<div class="paragraph"> -<p>Follow these instructions to set up and run the Kudu VM, and start with Kudu, Kudu_Impala, -and CDH in minutes.</p> -</div> -</div> -</div> -<div class="sect1"> -<h2 id="quickstart_vm"><a class="link" href="#quickstart_vm">Get The Kudu Quickstart VM</a></h2> -<div class="sectionbody"> -<div class="sect2"> -<h3 id="_prerequisites"><a class="link" href="#_prerequisites">Prerequisites</a></h3> -<div class="olist arabic"> -<ol class="arabic"> -<li> -<p>Install <a href="https://www.virtualbox.org/">Oracle Virtualbox</a>. The VM has been tested to work -with VirtualBox version 4.3 on Ubuntu 14.04 and VirtualBox version 5 on OSX -10.9. VirtualBox is also included in most package managers: apt-get, brew, etc.</p> -</li> -<li> -<p>After the installation, make sure that <code>VBoxManage</code> is in your <code>PATH</code> by using the -<code>which VBoxManage</code> command.</p> -</li> -</ol> -</div> -</div> -<div class="sect2"> -<h3 id="_installation"><a class="link" href="#_installation">Installation</a></h3> -<div class="paragraph"> -<p>To download and start the VM, execute the following command in a terminal window.</p> -</div> -<div class="listingblock"> -<div class="content"> -<pre class="highlight"><code class="language-bash" data-lang="bash">$ curl -s https://raw.githubusercontent.com/cloudera/kudu-examples/master/demo-vm-setup/bootstrap.sh | bash</code></pre> -</div> -</div> -<div class="paragraph"> -<p>This command downloads a shell script which clones the <code>kudu-examples</code> Git repository and -then downloads a VM image of about 1.2GB size into the current working -directory.<sup class="footnote">[<a id="_footnoteref_1" class="footnote" href="#_footnote_1" title="View footnote.">1</a>]</sup> You can examine the script after downloading it by removing -the <code>| bash</code> component of the command above. Once the setup is complete, you can verify -that everything works by connecting to the guest via SSH:</p> -</div> -<div class="listingblock"> -<div class="content"> -<pre class="highlight"><code class="language-bash" data-lang="bash">$ ssh demo@quickstart.cloudera</code></pre> -</div> -</div> -<div class="paragraph"> -<p>The username and password for the demo account are both <code>demo</code>. In addition, the <code>demo</code> -user has password-less <code>sudo</code> privileges so that you can install additional software or -manage the guest OS. You can also access the <code>kudu-examples</code> as a shared folder in -<code>/home/demo/kudu-examples/</code> on the guest or from your VirtualBox shared folder location on -the host. This is a quick way to make scripts or data visible to the guest.</p> -</div> -<div class="paragraph"> -<p>You can quickly verify if Kudu and Impala are running by executing the following commands:</p> -</div> -<div class="listingblock"> -<div class="content"> -<pre class="highlight"><code class="language-bash" data-lang="bash">$ ps aux | grep kudu -$ ps aux | grep impalad</code></pre> -</div> -</div> -<div class="paragraph"> -<p>If you have issues connecting to the VM or one of the processes is not running, make sure -to consult the <a href="#trouble">Troubleshooting</a> section.</p> -</div> -</div> -</div> -</div> -<div class="sect1"> -<h2 id="_load_data"><a class="link" href="#_load_data">Load Data</a></h2> -<div class="sectionbody"> -<div class="paragraph"> -<p>To practice some typical operations with Kudu and Impala, we’ll use the -<a href="https://data.sfgov.org/Transportation/Raw-AVL-GPS-data/5fk7-ivit/data">San Francisco MTA -GPS dataset</a>. This dataset contains raw location data transmitted periodically from -sensors installed on the buses in the SF MTA’s fleet.</p> -</div> -<div class="olist arabic"> -<ol class="arabic"> -<li> -<p>Download the sample data and load it into HDFS</p> -<div class="paragraph"> -<p>First we’ll download the sample dataset, prepare it, and upload it into the HDFS -cluster.</p> -</div> -<div class="paragraph"> -<p>The SF MTA’s site is often a bit slow, so we’ve mirrored a sample CSV file from the -dataset at <a href="http://kudu-sample-data.s3.amazonaws.com/sfmtaAVLRawData01012013.csv.gz" class="bare">http://kudu-sample-data.s3.amazonaws.com/sfmtaAVLRawData01012013.csv.gz</a></p> -</div> -<div class="paragraph"> -<p>The original dataset uses DOS-style line endings, so we’ll convert it to -UNIX-style during the upload process using <code>tr</code>.</p> -</div> -<div class="listingblock"> -<div class="content"> -<pre class="highlight"><code class="language-bash" data-lang="bash">$ wget http://kudu-sample-data.s3.amazonaws.com/sfmtaAVLRawData01012013.csv.gz -$ hdfs dfs -mkdir /sfmta -$ zcat sfmtaAVLRawData01012013.csv.gz | tr -d '\r' | hadoop fs -put - /sfmta/data.csv</code></pre> -</div> -</div> -</li> -<li> -<p>Create a new external Impala table to access the plain text data. To connect to Impala -in the virtual machine issue the following command:</p> -<div class="listingblock"> -<div class="content"> -<pre class="highlight"><code class="language-bash" data-lang="bash">ssh demo@quickstart.cloudera -t impala-shell</code></pre> -</div> -</div> -<div class="paragraph"> -<p>Now, you can execute the following commands:</p> -</div> -<div class="listingblock"> -<div class="content"> -<pre class="highlight"><code class="language-sql" data-lang="sql">CREATE EXTERNAL TABLE sfmta_raw ( - revision int, - report_time string, - vehicle_tag int, - longitude float, - latitude float, - speed float, - heading float -) -ROW FORMAT DELIMITED -FIELDS TERMINATED BY ',' -LOCATION '/sfmta/' -TBLPROPERTIES ('skip.header.line.count'='1');</code></pre> -</div> -</div> -</li> -<li> -<p>Validate if the data was actually loaded run the following command:</p> -<div class="listingblock"> -<div class="content"> -<pre class="highlight"><code class="language-sql" data-lang="sql">SELECT count(*) FROM sfmta_raw; - -+----------+ -| count(*) | -+----------+ -| 859086 | -+----------+</code></pre> -</div> -</div> -</li> -<li> -<p>Next we’ll create a Kudu table and load the data. Note that we convert -the string <code>report_time</code> field into a unix-style timestamp for more efficient -storage.</p> -<div class="listingblock"> -<div class="content"> -<pre class="highlight"><code class="language-sql" data-lang="sql">CREATE TABLE sfmta -PRIMARY KEY (report_time, vehicle_tag) -PARTITION BY HASH(report_time) PARTITIONS 8 -STORED AS KUDU -AS SELECT - UNIX_TIMESTAMP(report_time, 'MM/dd/yyyy HH:mm:ss') AS report_time, - vehicle_tag, - longitude, - latitude, - speed, - heading -FROM sfmta_raw; - -+------------------------+ -| summary | -+------------------------+ -| Inserted 859086 row(s) | -+------------------------+ -Fetched 1 row(s) in 5.75s</code></pre> -</div> -</div> -<div class="paragraph"> -<p>The created table uses a composite primary key. See -<a href="kudu_impala_integration.html#kudu_impala">Kudu Impala Integration</a> for a more detailed -introduction to the extended SQL syntax for Impala.</p> -</div> -</li> -</ol> -</div> -</div> -</div> -<div class="sect1"> -<h2 id="_read_and_modify_data"><a class="link" href="#_read_and_modify_data">Read and Modify Data</a></h2> -<div class="sectionbody"> -<div class="paragraph"> -<p>Now that the data is stored in Kudu, you can run queries against it. The following query -finds the data point containing the highest recorded vehicle speed.</p> -</div> -<div class="listingblock"> -<div class="content"> -<pre class="highlight"><code class="language-sql" data-lang="sql">SELECT * FROM sfmta ORDER BY speed DESC LIMIT 1; - -+-------------+-------------+--------------------+-------------------+-------------------+---------+ -| report_time | vehicle_tag | longitude | latitude | speed | heading | -+-------------+-------------+--------------------+-------------------+-------------------+---------+ -| 1357022342 | 5411 | -122.3968811035156 | 37.76665878295898 | 68.33300018310547 | 82 | -+-------------+-------------+--------------------+-------------------+-------------------+---------+</code></pre> -</div> -</div> -<div class="paragraph"> -<p>With a quick <a href="https://www.google.com/search?q=122.3968811035156W+37.76665878295898N">Google search</a> -we can see that this bus was traveling east on 16th street at 68MPH. -At first glance, this seems unlikely to be true. Perhaps we do some research -and find that this bus’s sensor equipment was broken and we decide to -remove the data. With Kudu this is very easy to correct using standard -SQL:</p> -</div> -<div class="listingblock"> -<div class="content"> -<pre class="highlight"><code class="language-sql" data-lang="sql">DELETE FROM sfmta WHERE vehicle_tag = '5411'; - --- Modified 1169 row(s), 0 row error(s) in 0.25s</code></pre> -</div> -</div> -</div> -</div> -<div class="sect1"> -<h2 id="_next_steps"><a class="link" href="#_next_steps">Next steps</a></h2> -<div class="sectionbody"> -<div class="paragraph"> -<p>The above example showed how to load, query, and mutate a static dataset with Impala -and Kudu. The real power of Kudu, however, is the ability to ingest and mutate data -in a streaming fashion.</p> -</div> -<div class="paragraph"> -<p>As an exercise to learn the Kudu programmatic APIs, try implementing a program -that uses the <a href="http://www.nextbus.com/xmlFeedDocs/NextBusXMLFeed.pdf">SFMTA -XML data feed</a> to ingest this same dataset in real time into the Kudu table.</p> -</div> -<div class="sect2"> -<h3 id="trouble"><a class="link" href="#trouble">Troubleshooting</a></h3> -<div class="sect3"> -<h4 id="_problems_accessing_the_vm_via_ssh"><a class="link" href="#_problems_accessing_the_vm_via_ssh">Problems accessing the VM via SSH</a></h4> -<div class="ulist"> -<ul> -<li> -<p>Make sure the host has a SSH client installed.</p> -</li> -<li> -<p>Make sure the VM is running, by running the following command and checking for a VM called <code>kudu-demo</code>:</p> -<div class="listingblock"> -<div class="content"> -<pre class="highlight"><code class="language-bash" data-lang="bash">$ VBoxManage list runningvms</code></pre> -</div> -</div> -</li> -<li> -<p>Verify that the VM’s IP address is included in the host’s <code>/etc/hosts</code> file. You should -see a line that includes an IP address followed by the hostname -<code>quickstart.cloudera</code>. To check the running VM’s IP address, use the <code>VBoxManage</code> -command below.</p> -<div class="listingblock"> -<div class="content"> -<pre class="highlight"><code class="language-bash" data-lang="bash">$ VBoxManage guestproperty get kudu-demo /VirtualBox/GuestInfo/Net/0/V4/IP -Value: 192.168.56.100</code></pre> -</div> -</div> -</li> -<li> -<p>If you’ve used a Cloudera Quickstart VM before, your <code>.ssh/known_hosts</code> file may -contain references to the previous VM’s SSH credentials. Remove any references to -<code>quickstart.cloudera</code> from this file.</p> -</li> -</ul> -</div> -</div> -<div class="sect3"> -<h4 id="_failing_with_lack_of_sse4_2_support_when_running_inside_virtualbox"><a class="link" href="#_failing_with_lack_of_sse4_2_support_when_running_inside_virtualbox">Failing with lack of SSE4.2 support when running inside VirtualBox</a></h4> -<div class="ulist"> -<ul> -<li> -<p>Running Kudu currently requires a CPU that supports SSE4.2 (Nehalem or later for Intel). To pass through SSE4.2 support into the guest VM, refer to the <a href="https://www.virtualbox.org/manual/ch09.html#sse412passthrough">VirtualBox documentation</a></p> -</li> -</ul> -</div> -</div> -</div> -</div> -</div> -<div class="sect1"> -<h2 id="_next_steps_2"><a class="link" href="#_next_steps_2">Next Steps</a></h2> -<div class="sectionbody"> -<div class="ulist"> -<ul> -<li> -<p><a href="installation.html">Installing Kudu</a></p> -</li> -<li> -<p><a href="configuration.html">Configuring Kudu</a></p> -</li> -</ul> -</div> -</div> -</div> - </div> - <div class="col-md-3"> - - <div id="toc" data-spy="affix" data-offset-top="70"> - <ul> - - <li> - - <a href="index.html">Introducing Kudu</a> - </li> - <li> - - <a href="release_notes.html">Kudu Release Notes</a> - </li> - <li> -<span class="active-toc">Getting Started with Kudu</span> - <ul class="sectlevel1"> -<li><a href="#quickstart_vm">Get The Kudu Quickstart VM</a> -<ul class="sectlevel2"> -<li><a href="#_prerequisites">Prerequisites</a></li> -<li><a href="#_installation">Installation</a></li> -</ul> -</li> -<li><a href="#_load_data">Load Data</a></li> -<li><a href="#_read_and_modify_data">Read and Modify Data</a></li> -<li><a href="#_next_steps">Next steps</a> -<ul class="sectlevel2"> -<li><a href="#trouble">Troubleshooting</a></li> -</ul> -</li> -<li><a href="#_next_steps_2">Next Steps</a></li> -</ul> - </li> - <li> - - <a href="installation.html">Installation Guide</a> - </li> - <li> - - <a href="configuration.html">Configuring Kudu</a> - </li> - <li> - - <a href="kudu_impala_integration.html">Using Impala with Kudu</a> - </li> - <li> - - <a href="administration.html">Administering Kudu</a> - </li> - <li> - - <a href="troubleshooting.html">Troubleshooting Kudu</a> - </li> - <li> - - <a href="developing.html">Developing Applications with Kudu</a> - </li> - <li> - - <a href="schema_design.html">Kudu Schema Design</a> - </li> - <li> - - <a href="security.html">Kudu Security</a> - </li> - <li> - - <a href="transaction_semantics.html">Kudu Transaction Semantics</a> - </li> - <li> - - <a href="background_tasks.html">Background Maintenance Tasks</a> - </li> - <li> - - <a href="configuration_reference.html">Kudu Configuration Reference</a> - </li> - <li> - - <a href="command_line_tools_reference.html">Kudu Command Line Tools Reference</a> - </li> - <li> - - <a href="known_issues.html">Known Issues and Limitations</a> - </li> - <li> - - <a href="contributing.html">Contributing to Kudu</a> - </li> - <li> - - <a href="export_control.html">Export Control Notice</a> - </li> - </ul> - </div> - </div> - </div> -</div> - - - <div id="footnotes"> - <hr> - <div class="footnote" id="_footnote_1"> - <a href="#_footnoteref_1">1</a>. In addition, the script will create a host-only network between host and guest and setup an entry in the <code>/etc/hosts</code> file with the name <code>quickstart.cloudera</code> and the guest’s IP address. - </div> - </div> \ No newline at end of file
http://git-wip-us.apache.org/repos/asf/kudu/blob/1fefa84c/docs/release_notes.html ---------------------------------------------------------------------- diff --git a/docs/release_notes.html b/docs/release_notes.html deleted file mode 100644 index fe866a6..0000000 --- a/docs/release_notes.html +++ /dev/null @@ -1,662 +0,0 @@ ---- -title: Apache Kudu 1.7.1 Release Notes -layout: default -active_nav: docs -last_updated: 'Last updated 2018-06-15 07:22:05 PDT' ---- -<!-- - -Licensed under the Apache License, Version 2.0 (the "License"); -you may not use this file except in compliance with the License. -You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. ---> - - -<div class="container"> - <div class="row"> - <div class="col-md-9"> - -<h1>Apache Kudu 1.7.1 Release Notes</h1> - <div class="sect1"> -<h2 id="rn_1.7.1_fixed_issues"><a class="link" href="#rn_1.7.1_fixed_issues">Fixed Issues</a></h2> -<div class="sectionbody"> -<div class="paragraph"> -<p>Apache Kudu 1.7.1 is a bug-fix release which fixes critical issues in Kudu 1.7.0.</p> -</div> -<div class="ulist"> -<ul> -<li> -<p>Fixed and issue where a leader replica could report a follower’s health status -as FAILED instead of FAILED_UNRECOVERABLE. In configurations where the tablet -replication factor equals to the total number of tablet servers in the cluster, -that lead to situations where the tablet could not be automatically recovered -until a new leader was elected or corresponding tablet servers were restarted. -(see <a href="https://issues.apache.org/jira/browse/KUDU-2367">KUDU-2367</a>).</p> -</li> -<li> -<p>Fixed an issue where Kudu would fail to start if RLIMIT_NPROC was set to -1. -(see <a href="https://issues.apache.org/jira/browse/KUDU-2377">KUDU-2377</a>).</p> -</li> -<li> -<p>Fixed an issue where <code>kudu-spark</code> was unable to connect to secure clusters. -(see <a href="https://issues.apache.org/jira/browse/KUDU-2379">KUDU-2379</a>).</p> -</li> -<li> -<p>Fixed an issue where the <code>kudu-python</code> client would not compile in environments -where <code>__int128</code> is not supported. This was most commonly el6 environments. -(see <a href="https://issues.apache.org/jira/browse/KUDU-2412">KUDU-2412</a>).</p> -</li> -<li> -<p>Fixed an issue where unaligned loads of <code>__int128</code> integers could result -in a crash. -(see <a href="https://issues.apache.org/jira/browse/KUDU-2378">KUDU-2378</a>).</p> -</li> -<li> -<p>Fixed a bug in <code>PartialRow.setMin</code> that could lead to incorrect partition -pruning when a <code>decimal</code> column is part of the tables range partition but -not a part of the query predicate. -(see <a href="https://issues.apache.org/jira/browse/KUDU-2416">KUDU-2416</a>).</p> -</li> -<li> -<p>Fixed an equality check on <code>decimal</code> column predicates that could result -in pruning that is too conservative.</p> -</li> -<li> -<p>Fixed an issue where ColumnSchema.toString() would throw a -NullPointerException on non-decimal types.</p> -</li> -<li> -<p>Added an optimization that improves the performance when scanning tables -with large consecutive runs of deleted rows. For example, users may use -'DELETE' all rows in a table or partition before re-adding them, or they -may delete all data corresponding to some prefix of the PK.</p> -</li> -<li> -<p>Fixed an issue where moving single-replica tablets via -<code>kudu tablet change_config move_replica</code> does not work. -(see <a href="https://issues.apache.org/jira/browse/KUDU-2443">KUDU-2443</a>).</p> -</li> -</ul> -</div> -</div> -</div> -<h1 id="rn_1.7.0_release_notes" class="sect0"><a class="link" href="#rn_1.7.0_release_notes">Apache Kudu 1.7.0 Release Notes</a></h1> -<div class="sect1"> -<h2 id="rn_1.7.0_upgrade_notes"><a class="link" href="#rn_1.7.0_upgrade_notes">Upgrade Notes</a></h2> -<div class="sectionbody"> -<div class="ulist"> -<ul> -<li> -<p>Upgrading directly from Kudu 1.6.0 is supported and no special upgrade steps -are required. A rolling upgrade of the server side will <em>not</em> work because -the default replica management scheme changed, and running masters and tablet -servers with different replica management schemes is not supported, see -<a href="#rn_1.7.0_incompatible_changes">Incompatible Changes in Kudu 1.7.0</a> for details. However, mixing client and -server sides of different versions is not a problem. You can still -update your clients before your servers or vice versa. -When upgrading to Kudu 1.7, it is required to first shut down all Kudu processes -across the cluster, then upgrade the software on all servers, then restart -the Kudu processes on all servers in the cluster.</p> -</li> -</ul> -</div> -</div> -</div> -<div class="sect1"> -<h2 id="rn_1.7.0_obsoletions"><a class="link" href="#rn_1.7.0_obsoletions">Obsoletions</a></h2> -<div class="sectionbody"> -<div class="ulist"> -<ul> -<li> -<p>The <code>tcmalloc_contention_time</code> metric, which previously tracked the amount -of time spent in memory allocator lock contention, has been removed.</p> -</li> -</ul> -</div> -</div> -</div> -<div class="sect1"> -<h2 id="rn_1.7.0_deprecations"><a class="link" href="#rn_1.7.0_deprecations">Deprecations</a></h2> -<div class="sectionbody"> -<div class="ulist"> -<ul> -<li> -<p>Support for Java 7 has been deprecated since Kudu 1.5.0 and may be removed in -the next major release.</p> -</li> -</ul> -</div> -</div> -</div> -<div class="sect1"> -<h2 id="rn_1.7.0_new_features"><a class="link" href="#rn_1.7.0_new_features">New features</a></h2> -<div class="sectionbody"> -<div class="ulist"> -<ul> -<li> -<p>Kudu now supports the decimal column type. The decimal type is a numeric data type -with fixed scale and precision suitable for financial and other arithmetic -calculations where the imprecise representation and rounding behavior of float and -double make those types impractical. The decimal type is also useful for integers -larger than int64 and cases with fractional values in a primary key. -See <a href="schema_design.html#decimal">Decimal Type</a> for more details.</p> -</li> -<li> -<p>The strategy Kudu uses for automatically healing tablets which have lost a -replica due to server or disk failures has been improved. The new re-replication -strategy, or replica management scheme, first adds a replacement tablet replica -before evicting the failed one. With the previous replica management scheme, -the system first evicts the failed replica and then adds a replacement. The new -replica management scheme allows for much faster recovery of tablets in -scenarios where one tablet server goes down and then returns back shortly after -5 minutes or so. The new scheme also provides substantially better overall -stability on clusters with frequent server failures. -(see <a href="https://issues.apache.org/jira/browse/KUDU-1097">KUDU-1097</a>).</p> -</li> -<li> -<p>The <code>kudu fs update_dirs</code> tool now supports removing directories. Unless the -<code>--force</code> flag is specified, Kudu will not allow the removal of a directory -across which tablets are configured to spread data. If specified, all tablet -replicas configured to use that directory will fail upon starting up and be -replicated elsewhere, provided a majority exists elsewhere.</p> -</li> -<li> -<p>Users can use the new <code>--fs_metadata_dir</code> to specify the directory in which -to place tablet-specific metadata. It is recommended, although not -necessary, that this be placed on a high-performance drive with high -bandwidth and low latency, e.g. a solid-state drive. If not specified, -metadata will be placed in the directory specified by <code>--fs_wal_dir</code>, or in -the directory specified by the first entry of <code>--fs_data_dirs</code> if metadata -already exists there from a pre-Kudu 1.7 deployment. Kudu will not -automatically move existing metadata based on this configuration.</p> -</li> -<li> -<p>Kudu 1.7 introduces a new scan read mode READ_YOUR_WRITES. Users can specify -READ_YOUR_WRITES when creating a new scanner in C++, Java and Python clients. -If this mode is used, the client will perform a read such that it follows all -previously known writes and reads from this client. Reads in this mode ensure -read-your-writes and read-your-reads session guarantees, while minimizing -latency caused by waiting for outstanding write transactions to complete. -Note that this is still an experimental feature which may be stabilized in -future releases.</p> -</li> -<li> -<p>The tablet server web UI scans dashboard (/scans) has been improved with -several new features, including: showing the most recently completed scans, -a pseudo-SQL scan descriptor that concisely shows the selected columns and -applied predicates, and more complete and better documented scan statistics.</p> -</li> -<li> -<p>Kudu daemons now expose a web page <code>/stacks</code> which dumps the current stack -trace of every thread running in the server. This information can be helpful -when diagnosing performance issues.</p> -</li> -</ul> -</div> -</div> -</div> -<div class="sect1"> -<h2 id="_optimizations_and_improvements"><a class="link" href="#_optimizations_and_improvements">Optimizations and improvements</a></h2> -<div class="sectionbody"> -<div class="ulist"> -<ul> -<li> -<p>By default, each tablet replica will now stripe data blocks across 3 data -directories instead of all data directories. This decreases the likelihood -that any given tablet will be affected in the event of a single disk failure. -No substantial performance impact is expected due to this feature based on -<a href="https://github.com/apache/kudu/commit/60276c54a221d554287c6645df7df542fe6d6443">performance testing</a>. -This change only affects new replicas created after upgrading to Kudu 1.7.</p> -</li> -<li> -<p>Kudu servers previously offered the ability to enable a separate metrics log -which stores periodic snapshots of all metrics available on a server. This -functionality is now available as part of a more general âdiagnostics logâ -which is enabled by default. The diagnostics log includes periodic dumps of -server metrics as well as collections of thread stack traces. The default -configuration ensures that no more than 640MB of diagnostics logs are retained, -and typically the space consumption is significantly less due to compression. -The format and contents of this log file are documented in the -<a href="administration.html">Administration guide</a>.</p> -</li> -<li> -<p>The handling of errors in the synchronous Java client has been improved so that, -when an exception is thrown, the stack trace indicates the correct location -where the client function was invoked rather than a call stack of an internal -worker thread. The original call stack from the worker thread is available as -a âsuppressed exceptionâ.</p> -</li> -<li> -<p>The logging of errors in the Java client has been improved to exclude exception -stack traces for expected scenarios such as failure to connect to a server in a -cluster. Instead, only a single line informational message will be logged in -such cases to aid in debugging.</p> -</li> -<li> -<p>The Java client now uses a predefined prioritized list of TLS ciphers when -establishing an encrypted connection to Kudu servers. This cipher list matches -the list of ciphers preferred for server-to-server communication and ensures -that the most efficient and secure ciphers are preferred. When the Kudu client -is running on Java 8 or newer, this provides a substantial speed-up to read -and write performance.</p> -</li> -<li> -<p>Reporting for the <code>kudu cluster ksck</code> tool has been updated so tablets and -tables with on-going tablet copies are shown as "recovering". Additional -reporting changes have been made to make various common scenarios, -particularly tablet copies, less alarming.</p> -</li> -<li> -<p>The performance of inserting rows containing many string or binary columns has -been improved, especially in the case of highly concurrent write workloads.</p> -</li> -<li> -<p>By default, Spark tasks that scan Kudu will now be able to scan non-leader -replicas. This allows Spark to more easily schedule kudu-spark tasks local to -the data. Users can disable this behavior by passing 'leader_only' to the -'kudu.scanLocality' option."</p> -</li> -<li> -<p>The number of OS threads used in the steady state and during bursts of -activity (such as in Raft leader elections triggered by a node failure) has -been drastically reduced and should no longer exceed the value of <code>ulimit -u</code>. -As such, it should no longer be necessary to increase the value of <code>ulimit -u</code> -(or of /proc/sys/kernel/threads-max) in order to run a Kudu tablet server in -most cases. -(see <a href="https://issues.apache.org/jira/browse/KUDU-1913">KUDU-1913</a>).</p> -</li> -<li> -<p>An issue where sparse column predicates could cause excessive data-block reads -has been fixed. Previously in certain scans with sparsely matching predicates -on multiple columns, Kudu would read and decode the same data blocks many times. -The improvement typically results in a 5-10x performance increase for the -affected scans. -(see <a href="https://issues.apache.org/jira/browse/KUDU-2231">KUDU-2231</a>).</p> -</li> -<li> -<p>The efficiency and on-disk size of large updated values has been improved. -This will improve update-heavy workloads which overwrite large (1KiB+) values. -(see <a href="https://issues.apache.org/jira/browse/KUDU-2253">KUDU-2253</a>).</p> -</li> -</ul> -</div> -</div> -</div> -<div class="sect1"> -<h2 id="rn_1.7.0_fixed_issues"><a class="link" href="#rn_1.7.0_fixed_issues">Fixed Issues</a></h2> -<div class="sectionbody"> -<div class="ulist"> -<ul> -<li> -<p>Fixed a scenario where the on-disk data of a tablet server was completely -erased and and a new tablet server was started on the same host. This issue -could prevent tablet replicas previously hosted on the server from being -evicted and re-replicated. -Tablets now immediately evict replicas that respond with a different server -UUID than expected. -(see <a href="https://issues.apache.org/jira/browse/KUDU-1613">KUDU-1613</a>).</p> -</li> -<li> -<p>Fixed a rare race condition when connecting to masters during their -startup which might cause a client to get a response without a CA certificate -and/or authentication token. This would cause the client to fail to authenticate -with other servers in the cluster. The leader master now always sends a CA -certificate and an authentication token (when applicable) to a Kudu client -with a successful ConnectToMaster response. -(see <a href="https://issues.apache.org/jira/browse/KUDU-1927">KUDU-1927</a>).</p> -</li> -<li> -<p>The Kudu Java client now will retry a connection if no master is discovered as a -leader, and the user has a valid authentication token. This avoids failure -in recoverable cases when masters are in the process of the very first leader -election after starting up. -(see <a href="https://issues.apache.org/jira/browse/KUDU-2262">KUDU-2262</a>).</p> -</li> -<li> -<p>The Java client will now automatically attempt to re-acquire Kerberos -credentials from the ticket cache when the prior credentials are about to -expire. This allows client instances to persist longer than the expiration -time of a single Kerberos ticket so long as some other process renews the -credentials in the ticket cache. Documentation on interacting with Kerberos -authentication has been added to the Javadoc for the <code>AsyncKuduClient</code> class. -(see <a href="https://issues.apache.org/jira/browse/KUDU-2264">KUDU-2264</a>).</p> -</li> -<li> -<p>Follower masters are now able to verify authentication tokens even if they have never -been a leader. Prior to this fix, if a follower master had never been a leader, -clients would be unable to authenticate to that master, resulting in spurious -error messages being logged. -(see <a href="https://issues.apache.org/jira/browse/KUDU-2265">KUDU-2265</a>).</p> -</li> -<li> -<p>Fixed a tablet server crash when a tablet replica is deleted during a scan. -(see <a href="https://issues.apache.org/jira/browse/KUDU-2295">KUDU-2295</a>).</p> -</li> -<li> -<p>The evaluation order of predicates in scans with multiple predicates has been -made deterministic. Due to a bug, this was not necessarily the case previously. -Predicates are applied in most to least selective order, with ties broken by -column index. The evaluation order may change in the future, particularly when -better column statistics are made available internally. -(see <a href="https://issues.apache.org/jira/browse/KUDU-2312">KUDU-2312</a>).</p> -</li> -<li> -<p>Previously, the <code>kudu tablet change_config move_replica</code> tool required all -tablet servers in the cluster to be available when performing a move. This -restriction has been relaxed: only the tablet server that will receive a replica -of the tablet being moved and the hosts of the tablet’s existing replicas need to be -available for the move to occur. -(see <a href="https://issues.apache.org/jira/browse/KUDU-2331">KUDU-2331</a>).</p> -</li> -<li> -<p>Fixed a bug in the Java client which prevented the client from locating the -new leader master after a leader failover in the case that the previous leader -either remained online or restarted quickly. This bug resulted in the client -timing out operations with errors indicating that there was no leader master. -(see <a href="https://issues.apache.org/jira/browse/KUDU-2343">KUDU-2343</a>).</p> -</li> -<li> -<p>The Unix process username of the client is now included inside the exported -security credentials, so that the effective username of clients who import -credentials and subsequently use unauthenticated (SASL PLAIN) connections -matches the client who exported the security credentials. For example, this is -useful to let the Spark executors know which username to use if the Spark -driver has no authentication token. This change only affects clusters with -encryption disabled using <code>--rpc-encryption=disabled</code>. -(see <a href="https://issues.apache.org/jira/browse/KUDU-2259">KUDU-2259</a>).</p> -</li> -</ul> -</div> -</div> -</div> -<div class="sect1"> -<h2 id="rn_1.7.0_wire_compatibility"><a class="link" href="#rn_1.7.0_wire_compatibility">Wire Protocol compatibility</a></h2> -<div class="sectionbody"> -<div class="paragraph"> -<p>Kudu 1.7.0 is wire-compatible with previous versions of Kudu:</p> -</div> -<div class="ulist"> -<ul> -<li> -<p>Kudu 1.7 clients may connect to servers running Kudu 1.0 or later. If the client uses -features that are not available on the target server, an error will be returned.</p> -</li> -<li> -<p>Rolling upgrade between Kudu 1.6 and Kudu 1.7 servers is believed to be possible -though has not been sufficiently tested. Users are encouraged to shut down all nodes -in the cluster, upgrade the software, and then restart the daemons on the new version.</p> -</li> -<li> -<p>Kudu 1.0 clients may connect to servers running Kudu 1.7 with the exception of the -below-mentioned restrictions regarding secure clusters.</p> -</li> -</ul> -</div> -<div class="paragraph"> -<p>The authentication features introduced in Kudu 1.3 place the following limitations -on wire compatibility between Kudu 1.7 and versions earlier than 1.3:</p> -</div> -<div class="ulist"> -<ul> -<li> -<p>If a Kudu 1.7 cluster is configured with authentication or encryption set to "required", -clients older than Kudu 1.3 will be unable to connect.</p> -</li> -<li> -<p>If a Kudu 1.7 cluster is configured with authentication and encryption set to "optional" -or "disabled", older clients will still be able to connect.</p> -</li> -</ul> -</div> -</div> -</div> -<div class="sect1"> -<h2 id="rn_1.7.0_incompatible_changes"><a class="link" href="#rn_1.7.0_incompatible_changes">Incompatible Changes in Kudu 1.7.0</a></h2> -<div class="sectionbody"> -<div class="ulist"> -<ul> -<li> -<p>The newly introduced replica management scheme is not compatible with the -old scheme, so it’s not possible to run pre-1.7 Kudu masters with -1.7 Kudu tablet servers or vice versa. This is a server-side -incompatibility only and it does not affect client compatibility. In other words, -Kudu clients of prior versions are compatible with upgraded Kudu clusters.</p> -<div class="ulist"> -<ul> -<li> -<p>Kudu masters of 1.7 version will not register Kudu tablet servers of 1.6 -and prior versions.</p> -</li> -<li> -<p>Kudu tablet servers of 1.7 version will not work with Kudu masters of 1.6 -and prior versions.</p> -</li> -</ul> -</div> -</li> -<li> -<p>The format of the previously-optional metrics log has changed to include a -human-readable timestamp on each line. The path of the log file has also -changed with the word âdiagnosticsâ replacing the word âmetricsâ in the file -name. The metrics log has been optimized to only include those metrics which -have changed in between successive samples, and to not include entity attributes -such as tablet partition information in the log. -(see <a href="https://issues.apache.org/jira/browse/KUDU-2297">KUDU-2297</a>).</p> -</li> -</ul> -</div> -<div class="sect2"> -<h3 id="rn_1.7.0_client_compatibility"><a class="link" href="#rn_1.7.0_client_compatibility">Client Library Compatibility</a></h3> -<div class="ulist"> -<ul> -<li> -<p>The Kudu 1.7 Java client library is API- and ABI-compatible with Kudu 1.6. Applications -written against Kudu 1.6 will compile and run against the Kudu 1.7 client library and -vice-versa.</p> -</li> -<li> -<p>The Kudu 1.7 C++ client is API- and ABI-forward-compatible with Kudu 1.6. -Applications written and compiled against the Kudu 1.6 client library will run without -modification against the Kudu 1.7 client library. Applications written and compiled -against the Kudu 1.7 client library will run without modification against the Kudu 1.6 -client library.</p> -</li> -<li> -<p>The Kudu 1.7 Python client is API-compatible with Kudu 1.6. Applications -written against Kudu 1.6 will continue to run against the Kudu 1.7 client -and vice-versa.</p> -</li> -<li> -<p>Kudu 1.7 clients that attempt to create a table with a decimal column on a -target server running Kudu 1.6 or earlier will receive an error response. -Similarly Kudu clients running Kudu 1.6 or earlier will result in an error -when attempting to access any table containing containing a decimal -column.</p> -</li> -</ul> -</div> -</div> -</div> -</div> -<div class="sect1"> -<h2 id="rn_1.7.0_known_issues"><a class="link" href="#rn_1.7.0_known_issues">Known Issues and Limitations</a></h2> -<div class="sectionbody"> -<div class="paragraph"> -<p>Please refer to the <a href="known_issues.html">Known Issues and Limitations</a> section of the -documentation.</p> -</div> -</div> -</div> -<div class="sect1"> -<h2 id="rn_1.7.0_contributors"><a class="link" href="#rn_1.7.0_contributors">Contributors</a></h2> -<div class="sectionbody"> -<div class="paragraph"> -<p>Kudu 1.7 includes contributions from 22 people, including two first-time -contributors, Clemens Valiente and Tsuyoshi Ozawa.</p> -</div> -<div class="paragraph"> -<p>Thank you for helping to make Kudu even better!</p> -</div> -</div> -</div> -<div class="sect1"> -<h2 id="resources_and_next_steps"><a class="link" href="#resources_and_next_steps">Resources</a></h2> -<div class="sectionbody"> -<div class="ulist"> -<ul> -<li> -<p><a href="http://kudu.apache.org">Kudu Website</a></p> -</li> -<li> -<p><a href="http://github.com/apache/kudu">Kudu GitHub Repository</a></p> -</li> -<li> -<p><a href="index.html">Kudu Documentation</a></p> -</li> -<li> -<p><a href="prior_release_notes.html">Release notes for older releases</a></p> -</li> -</ul> -</div> -</div> -</div> -<div class="sect1"> -<h2 id="_installation_options"><a class="link" href="#_installation_options">Installation Options</a></h2> -<div class="sectionbody"> -<div class="paragraph"> -<p>For full installation details, see <a href="installation.html">Kudu Installation</a>.</p> -</div> -</div> -</div> -<div class="sect1"> -<h2 id="_next_steps"><a class="link" href="#_next_steps">Next Steps</a></h2> -<div class="sectionbody"> -<div class="ulist"> -<ul> -<li> -<p><a href="quickstart.html">Kudu Quickstart</a></p> -</li> -<li> -<p><a href="installation.html">Installing Kudu</a></p> -</li> -<li> -<p><a href="configuration.html">Configuring Kudu</a></p> -</li> -</ul> -</div> -</div> -</div> - </div> - <div class="col-md-3"> - - <div id="toc" data-spy="affix" data-offset-top="70"> - <ul> - - <li> - - <a href="index.html">Introducing Kudu</a> - </li> - <li> -<span class="active-toc">Kudu Release Notes</span> - <ul class="sectlevel1"> -<li><a href="#rn_1.7.1_fixed_issues">Fixed Issues</a></li> -<li><a href="#rn_1.7.0_release_notes">Apache Kudu 1.7.0 Release Notes</a> -<ul class="sectlevel1"> -<li><a href="#rn_1.7.0_upgrade_notes">Upgrade Notes</a></li> -<li><a href="#rn_1.7.0_obsoletions">Obsoletions</a></li> -<li><a href="#rn_1.7.0_deprecations">Deprecations</a></li> -<li><a href="#rn_1.7.0_new_features">New features</a></li> -<li><a href="#_optimizations_and_improvements">Optimizations and improvements</a></li> -<li><a href="#rn_1.7.0_fixed_issues">Fixed Issues</a></li> -<li><a href="#rn_1.7.0_wire_compatibility">Wire Protocol compatibility</a></li> -<li><a href="#rn_1.7.0_incompatible_changes">Incompatible Changes in Kudu 1.7.0</a> -<ul class="sectlevel2"> -<li><a href="#rn_1.7.0_client_compatibility">Client Library Compatibility</a></li> -</ul> -</li> -<li><a href="#rn_1.7.0_known_issues">Known Issues and Limitations</a></li> -<li><a href="#rn_1.7.0_contributors">Contributors</a></li> -<li><a href="#resources_and_next_steps">Resources</a></li> -<li><a href="#_installation_options">Installation Options</a></li> -<li><a href="#_next_steps">Next Steps</a></li> -</ul> -</li> -</ul> - </li> - <li> - - <a href="quickstart.html">Getting Started with Kudu</a> - </li> - <li> - - <a href="installation.html">Installation Guide</a> - </li> - <li> - - <a href="configuration.html">Configuring Kudu</a> - </li> - <li> - - <a href="kudu_impala_integration.html">Using Impala with Kudu</a> - </li> - <li> - - <a href="administration.html">Administering Kudu</a> - </li> - <li> - - <a href="troubleshooting.html">Troubleshooting Kudu</a> - </li> - <li> - - <a href="developing.html">Developing Applications with Kudu</a> - </li> - <li> - - <a href="schema_design.html">Kudu Schema Design</a> - </li> - <li> - - <a href="security.html">Kudu Security</a> - </li> - <li> - - <a href="transaction_semantics.html">Kudu Transaction Semantics</a> - </li> - <li> - - <a href="background_tasks.html">Background Maintenance Tasks</a> - </li> - <li> - - <a href="configuration_reference.html">Kudu Configuration Reference</a> - </li> - <li> - - <a href="command_line_tools_reference.html">Kudu Command Line Tools Reference</a> - </li> - <li> - - <a href="known_issues.html">Known Issues and Limitations</a> - </li> - <li> - - <a href="contributing.html">Contributing to Kudu</a> - </li> - <li> - - <a href="export_control.html">Export Control Notice</a> - </li> - </ul> - </div> - </div> - </div> -</div> \ No newline at end of file http://git-wip-us.apache.org/repos/asf/kudu/blob/1fefa84c/docs/scaling_guide.html ---------------------------------------------------------------------- diff --git a/docs/scaling_guide.html b/docs/scaling_guide.html deleted file mode 100644 index 24002c1..0000000 --- a/docs/scaling_guide.html +++ /dev/null @@ -1,455 +0,0 @@ ---- -title: Apache Kudu Scaling Guide -layout: default -active_nav: docs -last_updated: 'Last updated 2018-06-14 08:17:56 PDT' ---- -<!-- - -Licensed under the Apache License, Version 2.0 (the "License"); -you may not use this file except in compliance with the License. -You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. ---> - - -<div class="container"> - <div class="row"> - <div class="col-md-9"> - -<h1>Apache Kudu Scaling Guide</h1> - <div id="preamble"> -<div class="sectionbody"> -<div class="paragraph"> -<p>This document describes in detail how Kudu scales with respect to various system resources, -including memory, file descriptors, and threads. See the -<a href="known_issues.html#_scale">scaling limits</a> for the maximum recommended parameters of a Kudu -cluster. They can be used to estimate roughly the number of servers required for a given quantity -of data.</p> -</div> -<div class="admonitionblock warning"> -<table> -<tr> -<td class="icon"> -<i class="fa icon-warning" title="Warning"></i> -</td> -<td class="content"> -The recommendations and conclusions here are only approximations. Appropriate numbers -depend on use case. There is no substitute for measurement and monitoring of resources used during a -representative workload. -</td> -</tr> -</table> -</div> -</div> -</div> -<div class="sect1"> -<h2 id="_terms"><a class="link" href="#_terms">Terms</a></h2> -<div class="sectionbody"> -<div class="paragraph"> -<p>We will use the following terms:</p> -</div> -<div class="ulist"> -<ul> -<li> -<p><strong>hot replica</strong>: A tablet replica that is continuously receiving writes. For example, in a time -series use case, tablet replicas for the most recent range partition on a time column would be -continuously receiving the latest data, and would be hot replicas.</p> -</li> -<li> -<p><strong>cold replica</strong>: A tablet replica that is not hot, i.e. a replica that is not frequently receiving -writes, for example, once every few minutes. A cold replica may be read from. For example, in a time -series use case, tablet replicas for previous range partitions on a time column would not receive -writes at all, or only occasionally receive late updates or additions, but may be constantly read.</p> -</li> -<li> -<p><strong>data on disk</strong>: The total amount of data stored on a tablet server across all disks, -post-replication, post-compression, and post-encoding.</p> -</li> -</ul> -</div> -</div> -</div> -<div class="sect1"> -<h2 id="_example_workload"><a class="link" href="#_example_workload">Example Workload</a></h2> -<div class="sectionbody"> -<div class="paragraph"> -<p>The sections below perform sample calculations using the following parameters:</p> -</div> -<div class="ulist"> -<ul> -<li> -<p>200 hot replicas per tablet server</p> -</li> -<li> -<p>1600 cold replicas per tablet server</p> -</li> -<li> -<p>8TB of data on disk per tablet server (about 4.5GB/replica)</p> -</li> -<li> -<p>512MB block cache</p> -</li> -<li> -<p>40 cores per server</p> -</li> -<li> -<p>limit of 32000 file descriptors per server</p> -</li> -<li> -<p>a read workload with 1 frequently-scanned table with 40 columns</p> -</li> -</ul> -</div> -<div class="paragraph"> -<p>This workload resembles a time series use case, where the hot replicas correspond to the most recent -range partition on time.</p> -</div> -</div> -</div> -<div class="sect1"> -<h2 id="memory"><a class="link" href="#memory">Memory</a></h2> -<div class="sectionbody"> -<div class="paragraph"> -<p>The flag <code>--memory_limit_hard_bytes</code> determines the maximum amount of memory that a Kudu tablet -server may use. The amount of memory used by a tablet server scales with data size, write workload, -and read concurrency. The following table provides numbers that can be used to compute a rough -estimate of memory usage.</p> -</div> -<table class="tableblock frame-all grid-all spread"> -<caption class="title">Table 1. Tablet Server Memory Usage</caption> -<colgroup> -<col style="width: 33.3333%;"> -<col style="width: 33.3333%;"> -<col style="width: 33.3334%;"> -</colgroup> -<thead> -<tr> -<th class="tableblock halign-left valign-top">Type</th> -<th class="tableblock halign-left valign-top">Multiplier</th> -<th class="tableblock halign-left valign-top">Description</th> -</tr> -</thead> -<tbody> -<tr> -<td class="tableblock halign-left valign-top"><p class="tableblock">Memory required per TB of data on disk</p></td> -<td class="tableblock halign-left valign-top"><p class="tableblock">1.5GB per 1TB data on disk</p></td> -<td class="tableblock halign-left valign-top"><p class="tableblock">Amount of memory per unit of data on disk required for -basic operation of the tablet server.</p></td> -</tr> -<tr> -<td class="tableblock halign-left valign-top"><p class="tableblock">Hot Replicas' MemRowSets and DeltaMemStores</p></td> -<td class="tableblock halign-left valign-top"><p class="tableblock">minimum 128MB per hot replica</p></td> -<td class="tableblock halign-left valign-top"><p class="tableblock">Minimum amount of data -to flush per MemRowSet flush. For most use cases, updates should be rare compared to inserts, so the -DeltaMemStores should be very small.</p></td> -</tr> -<tr> -<td class="tableblock halign-left valign-top"><p class="tableblock">Scans</p></td> -<td class="tableblock halign-left valign-top"><p class="tableblock">256KB per column per core for read-heavy tables</p></td> -<td class="tableblock halign-left valign-top"><p class="tableblock">Amount of memory used by scanners, and which -will be constantly needed for tables which are constantly read.</p></td> -</tr> -<tr> -<td class="tableblock halign-left valign-top"><p class="tableblock">Block Cache</p></td> -<td class="tableblock halign-left valign-top"><p class="tableblock">Fixed by <code>--block_cache_capacity_mb</code> (default 512MB)</p></td> -<td class="tableblock halign-left valign-top"><p class="tableblock">Amount of memory reserved for use by the -block cache.</p></td> -</tr> -</tbody> -</table> -<div class="paragraph"> -<p>Using this information for the example load gives the following breakdown of memory usage:</p> -</div> -<table class="tableblock frame-all grid-all spread"> -<caption class="title">Table 2. Example Tablet Server Memory Usage</caption> -<colgroup> -<col style="width: 50%;"> -<col style="width: 50%;"> -</colgroup> -<thead> -<tr> -<th class="tableblock halign-left valign-top">Type</th> -<th class="tableblock halign-left valign-top">Amount</th> -</tr> -</thead> -<tbody> -<tr> -<td class="tableblock halign-left valign-top"><p class="tableblock">8TB data on disk</p></td> -<td class="tableblock halign-left valign-top"><p class="tableblock">8TB * 1.5GB / 1TB = 12GB</p></td> -</tr> -<tr> -<td class="tableblock halign-left valign-top"><p class="tableblock">200 hot replicas</p></td> -<td class="tableblock halign-left valign-top"><p class="tableblock">200 * 128MB = 25.6GB</p></td> -</tr> -<tr> -<td class="tableblock halign-left valign-top"><p class="tableblock">1 40-column, frequently-scanned table</p></td> -<td class="tableblock halign-left valign-top"><p class="tableblock">40 * 40 * 256KB = 409.6MB</p></td> -</tr> -<tr> -<td class="tableblock halign-left valign-top"><p class="tableblock">Block Cache</p></td> -<td class="tableblock halign-left valign-top"><p class="tableblock"><code>--block_cache_capacity_mb=512</code> = 512MB</p></td> -</tr> -<tr> -<td class="tableblock halign-left valign-top"><p class="tableblock">Expected memory usage</p></td> -<td class="tableblock halign-left valign-top"><p class="tableblock">38.5GB</p></td> -</tr> -<tr> -<td class="tableblock halign-left valign-top"><p class="tableblock">Recommended hard limit</p></td> -<td class="tableblock halign-left valign-top"><p class="tableblock">52GB</p></td> -</tr> -</tbody> -</table> -<div class="paragraph"> -<p>Using this as a rough estimate of Kudu’s memory usage, select a memory limit so that the expected -memory usage of Kudu is around 50-75% of the hard limit.</p> -</div> -<div class="sect2"> -<h3 id="_verifying_if_a_memory_limit_is_sufficient"><a class="link" href="#_verifying_if_a_memory_limit_is_sufficient">Verifying if a Memory Limit is sufficient</a></h3> -<div class="paragraph"> -<p>After configuring an appropriate memory limit with <code>--memory_limit_hard_bytes</code>, run a workload and -monitor the Kudu tablet server process’s RAM usage. The memory usage should stay around 50-75% of -the hard limit, with occasional spikes above 75% but below 100%. If the tablet server runs above 75% -consistently, the memory limit should be increased.</p> -</div> -<div class="paragraph"> -<p>Additionally, it’s also useful to monitor the logs for memory rejections, which look like:</p> -</div> -<div class="listingblock"> -<div class="content"> -<pre>Service unavailable: Soft memory limit exceeded (at 96.35% of capacity)</pre> -</div> -</div> -<div class="paragraph"> -<p>and watch the memory rejections metrics:</p> -</div> -<div class="ulist"> -<ul> -<li> -<p><code>leader_memory_pressure_rejections</code></p> -</li> -<li> -<p><code>follower_memory_pressure_rejections</code></p> -</li> -<li> -<p><code>transaction_memory_pressure_rejections</code></p> -</li> -</ul> -</div> -<div class="paragraph"> -<p>Occasional rejections due to memory pressure are fine and act as backpressure to clients. Clients -will transparently retry operations. However, no operations should time out.</p> -</div> -</div> -</div> -</div> -<div class="sect1"> -<h2 id="file_descriptors"><a class="link" href="#file_descriptors">File Descriptors</a></h2> -<div class="sectionbody"> -<div class="paragraph"> -<p>Processes are allotted a maximum number of open file descriptors (also referred to as fds). If a -tablet server attempts to open too many fds, it will typically crash with a message saying something -like "too many open files". The following table summarizes the sources of file descriptor usage in a -Kudu tablet server process:</p> -</div> -<table class="tableblock frame-all grid-all spread"> -<caption class="title">Table 3. Tablet Server File Descriptor Usage</caption> -<colgroup> -<col style="width: 33.3333%;"> -<col style="width: 33.3333%;"> -<col style="width: 33.3334%;"> -</colgroup> -<thead> -<tr> -<th class="tableblock halign-left valign-top">Type</th> -<th class="tableblock halign-left valign-top">Multiplier</th> -<th class="tableblock halign-left valign-top">Description</th> -</tr> -</thead> -<tbody> -<tr> -<td class="tableblock halign-left valign-top"><p class="tableblock">File cache</p></td> -<td class="tableblock halign-left valign-top"><p class="tableblock">Fixed by <code>--block_manager_max_open_files</code> (default 40% of process maximum)</p></td> -<td class="tableblock halign-left valign-top"><p class="tableblock">Maximum allowed open fds reserved for use by -the file cache.</p></td> -</tr> -<tr> -<td class="tableblock halign-left valign-top"><p class="tableblock">Hot replicas</p></td> -<td class="tableblock halign-left valign-top"><p class="tableblock">2 per WAL segment, 1 per WAL index</p></td> -<td class="tableblock halign-left valign-top"><p class="tableblock">Number of fds used by hot replicas. See below -for more explanation.</p></td> -</tr> -<tr> -<td class="tableblock halign-left valign-top"><p class="tableblock">Cold replicas</p></td> -<td class="tableblock halign-left valign-top"><p class="tableblock">3 per cold replica</p></td> -<td class="tableblock halign-left valign-top"><p class="tableblock">Number of fds used per cold replica: 2 for the single WAL -segment and 1 for the single WAL index.</p></td> -</tr> -</tbody> -</table> -<div class="paragraph"> -<p>Every replica has at least one WAL segment and at least one WAL index, and should have the same -number of segments and indices; however, the number of segments and indices can be greater for a -replica if one of its peer replicas is falling behind. WAL segment and index fds are closed as WALs -are garbage collected.</p> -</div> -<div class="paragraph"> -<p>Using this information for the example load gives the following breakdown of file descriptor usage, -under the assumption that some replicas are lagging and using 10 WAL segments:</p> -</div> -<table class="tableblock frame-all grid-all spread"> -<caption class="title">Table 4. Example Tablet Server File Descriptor Usage</caption> -<colgroup> -<col style="width: 50%;"> -<col style="width: 50%;"> -</colgroup> -<thead> -<tr> -<th class="tableblock halign-left valign-top">Type</th> -<th class="tableblock halign-left valign-top">Amount</th> -</tr> -</thead> -<tbody> -<tr> -<td class="tableblock halign-left valign-top"><p class="tableblock">file cache</p></td> -<td class="tableblock halign-left valign-top"><p class="tableblock">40% * 32000 fds = 12800 fds</p></td> -</tr> -<tr> -<td class="tableblock halign-left valign-top"><p class="tableblock">1600 cold replicas</p></td> -<td class="tableblock halign-left valign-top"><p class="tableblock">1600 cold replicas * 3 fds / cold replica = 4800 fds</p></td> -</tr> -<tr> -<td class="tableblock halign-left valign-top"><p class="tableblock">200 hot replicas</p></td> -<td class="tableblock halign-left valign-top"><p class="tableblock">(2 / segment * 10 segments/hot replica * 200 hot replicas) + (1 / index * 10 indices / hot replica * 200 hot replicas) = 6000 fds</p></td> -</tr> -<tr> -<td class="tableblock halign-left valign-top"><p class="tableblock">Total</p></td> -<td class="tableblock halign-left valign-top"><p class="tableblock">23600 fds</p></td> -</tr> -</tbody> -</table> -<div class="paragraph"> -<p>So for this example, the tablet server process has about 32000 - 23600 = 8400 fds to spare.</p> -</div> -<div class="paragraph"> -<p>There is typically no downside to configuring a higher file descriptor limit if approaching the -currently configured limit.</p> -</div> -</div> -</div> -<div class="sect1"> -<h2 id="threads"><a class="link" href="#threads">Threads</a></h2> -<div class="sectionbody"> -<div class="paragraph"> -<p>Processes are allotted a maximum number of threads by the operating system, and this limit is -typically difficult or impossible to change. Therefore, this section is more informational than -advisory.</p> -</div> -<div class="paragraph"> -<p>If a Kudu tablet server’s thread count exceeds the OS limit, it will crash, usually with a message -in the logs like "pthread_create failed: Resource temporarily unavailable". If the system thread -count limit is exceeded, other processes on the same node may also crash.</p> -</div> -<div class="paragraph"> -<p>Threads and threadpools are used all over Kudu for various purposes, but the number of threads found -in nearly all of these does not scale with load or data/tablet size; instead, the number of threads -is either a hardcoded constant, a constant defined by a configuration parameter, or based on a -static dimension (such as the number of CPU cores).</p> -</div> -<div class="paragraph"> -<p>The only exception to this is the WAL append thread, one of which exists for every "hot" replica. -Note that all replicas may be considered hot at startup, so tablet servers' thread usage will -generally peak when started and settle down thereafter.</p> -</div> -</div> -</div> - </div> - <div class="col-md-3"> - - <div id="toc" data-spy="affix" data-offset-top="70"> - <ul> - - <li> - - <a href="index.html">Introducing Kudu</a> - </li> - <li> - - <a href="release_notes.html">Kudu Release Notes</a> - </li> - <li> - - <a href="quickstart.html">Getting Started with Kudu</a> - </li> - <li> - - <a href="installation.html">Installation Guide</a> - </li> - <li> - - <a href="configuration.html">Configuring Kudu</a> - </li> - <li> - - <a href="kudu_impala_integration.html">Using Impala with Kudu</a> - </li> - <li> - - <a href="administration.html">Administering Kudu</a> - </li> - <li> - - <a href="troubleshooting.html">Troubleshooting Kudu</a> - </li> - <li> - - <a href="developing.html">Developing Applications with Kudu</a> - </li> - <li> - - <a href="schema_design.html">Kudu Schema Design</a> - </li> - <li> - - <a href="security.html">Kudu Security</a> - </li> - <li> - - <a href="transaction_semantics.html">Kudu Transaction Semantics</a> - </li> - <li> - - <a href="background_tasks.html">Background Maintenance Tasks</a> - </li> - <li> - - <a href="configuration_reference.html">Kudu Configuration Reference</a> - </li> - <li> - - <a href="command_line_tools_reference.html">Kudu Command Line Tools Reference</a> - </li> - <li> - - <a href="known_issues.html">Known Issues and Limitations</a> - </li> - <li> - - <a href="contributing.html">Contributing to Kudu</a> - </li> - <li> - - <a href="export_control.html">Export Control Notice</a> - </li> - </ul> - </div> - </div> - </div> -</div> \ No newline at end of file