http://git-wip-us.apache.org/repos/asf/kudu/blob/1fefa84c/docs/contributing.html ---------------------------------------------------------------------- diff --git a/docs/contributing.html b/docs/contributing.html deleted file mode 100644 index 8d1e82c..0000000 --- a/docs/contributing.html +++ /dev/null @@ -1,804 +0,0 @@ ---- -title: Contributing to Apache Kudu -layout: default -active_nav: docs -last_updated: 'Last updated 2018-06-15 07:22:05 PDT' ---- -<!-- - -Licensed under the Apache License, Version 2.0 (the "License"); -you may not use this file except in compliance with the License. -You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. ---> - - -<div class="container"> - <div class="row"> - <div class="col-md-9"> - -<h1>Contributing to Apache Kudu</h1> - <div class="sect1"> -<h2 id="_contributing_patches_using_gerrit"><a class="link" href="#_contributing_patches_using_gerrit">Contributing Patches Using Gerrit</a></h2> -<div class="sectionbody"> -<div class="paragraph"> -<p>The Kudu team uses Gerrit for code review, rather than Github pull requests. Typically, -you pull from Github but push to Gerrit, and Gerrit is used to review code and merge -it into Github.</p> -</div> -<div class="paragraph"> -<p>See the <a href="https://www.mediawiki.org/wiki/Gerrit/Tutorial">Gerrit Tutorial</a> -for an overview of using Gerrit for code review.</p> -</div> -<div class="sect2"> -<h3 id="_initial_setup_for_gerrit"><a class="link" href="#_initial_setup_for_gerrit">Initial Setup for Gerrit</a></h3> -<div class="olist arabic"> -<ol class="arabic"> -<li> -<p>Sign in to <a href="http://gerrit.cloudera.org:8080">Gerrit</a> using your Github username.</p> -</li> -<li> -<p>Go to <a href="http://gerrit.cloudera.org:8080/#/settings/">Settings</a>. Update your name -and email address on the <strong>Contact Information</strong> page, and upload a SSH public key. -If you do not update your name, it will show up as "Anonymous Coward" in Gerrit reviews.</p> -</li> -<li> -<p>If you have not done so, clone the main Kudu repository. By default, the main remote -is called <code>origin</code>. When you fetch or pull, you will do so from <code>origin</code>.</p> -<div class="listingblock"> -<div class="content"> -<pre class="highlight"><code class="language-bash" data-lang="bash">git clone https://github.com/apache/kudu</code></pre> -</div> -</div> -</li> -<li> -<p>Change to the new <code>kudu</code> directory.</p> -</li> -<li> -<p>Add a <code>gerrit</code> remote. In the following command, substitute <username> with your -Github username.</p> -<div class="listingblock"> -<div class="content"> -<pre class="highlight"><code class="language-bash" data-lang="bash">git remote add gerrit ssh://<username>@gerrit.cloudera.org:29418/kudu</code></pre> -</div> -</div> -</li> -<li> -<p>Run the following command to install the -Gerrit <code>commit-msg</code> hook. Use the following command, replacing <code><username></code> with your -Github username.</p> -<div class="listingblock"> -<div class="content"> -<pre>gitdir=$(git rev-parse --git-dir); scp -p -P 29418 <username>@gerrit.cloudera.org:hooks/commit-msg ${gitdir}/hooks/</pre> -</div> -</div> -</li> -<li> -<p>Be sure you have set the Kudu repository to use <code>pull --rebase</code> by default. You -can use the following two commands, assuming you have only ever checked out <code>master</code> -so far:</p> -<div class="listingblock"> -<div class="content"> -<pre>git config branch.autosetuprebase always -git config branch.master.rebase true</pre> -</div> -</div> -<div class="paragraph"> -<p>If for some reason you had already checked out branches other than <code>master</code>, substitute -<code>master</code> for the other branch names in the second command above.</p> -</div> -</li> -</ol> -</div> -</div> -<div class="sect2"> -<h3 id="_submitting_patches"><a class="link" href="#_submitting_patches">Submitting Patches</a></h3> -<div class="paragraph"> -<p>To submit a patch, first commit your change (using a descriptive multi-line -commit message if possible), then push the request to the <code>gerrit</code> remote. For instance, to push a change -to the <code>master</code> branch:</p> -</div> -<div class="listingblock"> -<div class="content"> -<pre>git push gerrit HEAD:refs/for/master --no-thin</pre> -</div> -</div> -<div class="paragraph"> -<p>or to push a change to the <code>gh-pages</code> branch (to update the website):</p> -</div> -<div class="listingblock"> -<div class="content"> -<pre>git push gerrit HEAD:refs/for/gh-pages --no-thin</pre> -</div> -</div> -<div class="admonitionblock tip"> -<table> -<tr> -<td class="icon"> -<i class="fa icon-tip" title="Tip"></i> -</td> -<td class="content"> -While preparing a patch for review, it’s a good idea to follow -<a href="https://git-scm.com/book/en/v2/Distributed-Git-Contributing-to-a-Project#_commit_guidelines">generic git commit guidelines and good practices</a>. -</td> -</tr> -</table> -</div> -<div class="admonitionblock note"> -<table> -<tr> -<td class="icon"> -<i class="fa icon-note" title="Note"></i> -</td> -<td class="content"> -The <code>--no-thin</code> argument is a workaround to prevent an error in Gerrit. See -<a href="https://code.google.com/p/gerrit/issues/detail?id=1582" class="bare">https://code.google.com/p/gerrit/issues/detail?id=1582</a>. -</td> -</tr> -</table> -</div> -<div class="admonitionblock tip"> -<table> -<tr> -<td class="icon"> -<i class="fa icon-tip" title="Tip"></i> -</td> -<td class="content"> -Consider creating Git aliases for the above commands. Gerrit also includes -a command-line tool called -<a href="https://www.mediawiki.org/wiki/Gerrit/Tutorial#Installing_git-review">git-review</a>, -which you may find helpful. -</td> -</tr> -</table> -</div> -<div class="paragraph"> -<p>Gerrit will add a change ID to your commit message and will create a Gerrit review, -whose URL will be emitted as part of the push reply. If desired, you can send a message -to the <code>kudu-dev</code> mailing list, explaining your patch and requesting review.</p> -</div> -<div class="paragraph"> -<p>After getting feedback, you can update or amend your commit, (for instance, using -a command like <code>git commit --amend</code>) while leaving the Change -ID intact. Push your change to Gerrit again, and this will create a new patch set -in Gerrit and notify all reviewers about the change.</p> -</div> -<div class="paragraph"> -<p>When your code has been reviewed and is ready to be merged into the Kudu code base, -a Kudu committer will merge it using Gerrit. You can discard your local branch.</p> -</div> -</div> -<div class="sect2"> -<h3 id="_abandoning_a_review"><a class="link" href="#_abandoning_a_review">Abandoning a Review</a></h3> -<div class="paragraph"> -<p>If your patch is not accepted or you decide to pull it from consideration, you can -use the Gerrit UI to <strong>Abandon</strong> the patch. It will still show in Gerrit’s history, -but will not be listed as a pending review.</p> -</div> -</div> -<div class="sect2"> -<h3 id="_reviewing_patches_in_gerrit"><a class="link" href="#_reviewing_patches_in_gerrit">Reviewing Patches In Gerrit</a></h3> -<div class="paragraph"> -<p>You can view a unified or side-by-side diff of changes in Gerrit using the web UI. -To leave a comment, click the relevant line number or highlight the relevant part -of the line, and type 'c' to bring up a comment box. To submit your comments and/or -your review status, go up to the top level of the review and click <strong>Reply</strong>. You can -add additional top-level comments here, and submit them.</p> -</div> -<div class="paragraph"> -<p>To check out code from a Gerrit review, click <strong>Download</strong> and paste the relevant Git -commands into your Git client. You can then update the commit and push to Gerrit to -submit a patch to the review, even if you were not the original reviewer.</p> -</div> -<div class="paragraph"> -<p>Gerrit allows you to vote on a review. A vote of <code>+2</code> from at least one committer -(besides the submitter) is required before the patch can be merged.</p> -</div> -</div> -</div> -</div> -<div class="sect1"> -<h2 id="_c_code_style"><a class="link" href="#_c_code_style">C++ Code Style</a></h2> -<div class="sectionbody"> -<div class="paragraph"> -<p>Get familiar with these guidelines so that your contributions can be reviewed and -integrated quickly and easily.</p> -</div> -<div class="paragraph"> -<p>In general, Kudu follows the -<a href="https://google.github.io/styleguide/cppguide.html">Google C++ Style Guide</a>, -with the following exceptions:</p> -</div> -<div class="sect2"> -<h3 id="_notes_on_c_11"><a class="link" href="#_notes_on_c_11">Notes on C++ 11</a></h3> -<div class="paragraph"> -<p>Kudu uses C++ 11. Check out this handy guide to C++ 11 move semantics and rvalue -references: <a href="https://www.chromium.org/rvalue-references" class="bare">https://www.chromium.org/rvalue-references</a></p> -</div> -<div class="paragraph"> -<p>We aim to follow most of the same guidelines, such as, where possible, migrating -away from <code>foo.Pass()</code> in favor of <code>std::move(foo)</code>.</p> -</div> -</div> -<div class="sect2"> -<h3 id="_limitations_on_code_boost_code_use"><a class="link" href="#_limitations_on_code_boost_code_use">Limitations on <code>boost</code> Use</a></h3> -<div class="paragraph"> -<p><code>boost</code> classes from header-only libraries can be used in cases where a suitable -replacement does not exist in the Kudu code base. However:</p> -</div> -<div class="ulist"> -<ul> -<li> -<p>Do not introduce dependencies on <code>boost</code> classes where equivalent functionality -exists in the standard C++ library or in <code>src/kudu/gutil/</code>. For example, prefer -<code>strings::Split()</code> from <code>gutil</code> rather than <code>boost::split</code>.</p> -</li> -<li> -<p>Prefer using functionality from <code>boost</code> rather than re-implementing the same -functionality, <em>unless</em> using the <code>boost</code> functionality requires excessive use of -C++ features which are disallowed by our style guidelines. For example, -<code>boost::spirit</code> is heavily based on template metaprogramming and should not be used.</p> -</li> -<li> -<p>Do not use <code>boost</code> in any public headers for the Kudu C++ client, because -<code>boost</code> commonly breaks backward compatibility, and passing data between two -<code>boost</code> versions (one by the user, one by Kudu) causes serious issues.</p> -</li> -</ul> -</div> -<div class="paragraph"> -<p>When in doubt about introducing a new dependency on any <code>boost</code> functionality, -it is best to email <code>d...@kudu.apache.org</code> to start a discussion.</p> -</div> -</div> -<div class="sect2"> -<h3 id="_line_length"><a class="link" href="#_line_length">Line length</a></h3> -<div class="paragraph"> -<p>The Kudu team allows line lengths of 100 characters per line, rather than Google’s standard of 80. Try to -keep under 80 where possible, but you can spill over to 100 or so if necessary.</p> -</div> -</div> -<div class="sect2"> -<h3 id="_pointers"><a class="link" href="#_pointers">Pointers</a></h3> -<div class="paragraph"> -<div class="title">Smart Pointers and Singly-Owned Pointers</div> -<p>Generally, most objects should have clear "single-owner" semantics. -Most of the time, singly-owned objects can be wrapped in a <code>unique_ptr<></code> -which ensures deletion on scope exit and prevents accidental copying.</p> -</div> -<div class="paragraph"> -<p>If an object is singly owned, but referenced from multiple places, such as when -the pointed-to object is known to be valid at least as long as the pointer itself, -associate a comment with the constructor which takes and stores the raw pointer, -as in the following example.</p> -</div> -<div class="listingblock"> -<div class="content"> -<pre class="highlight"><code class="language-c++" data-lang="c++"> // 'blah' must remain valid for the lifetime of this class - MyClass(const Blah* blah) : - blah_(blah) { - }</code></pre> -</div> -</div> -<div class="admonitionblock note"> -<table> -<tr> -<td class="icon"> -<i class="fa icon-note" title="Note"></i> -</td> -<td class="content"> -Older parts of the Kudu code base use <code>gscoped_ptr</code> instead of -<code>unique_ptr</code>. These are hold-overs from before Kudu adopted C++11. -New code should not use <code>gscoped_ptr</code> except when necessary to interface -with existing code. Alternatively, consider updating usages as you come -across them. -</td> -</tr> -</table> -</div> -<div class="admonitionblock warning"> -<table> -<tr> -<td class="icon"> -<i class="fa icon-warning" title="Warning"></i> -</td> -<td class="content"> -Using <code>std::auto_ptr</code> is strictly disallowed because of its difficult and -bug-prone semantics. Besides, <code>std::auto_ptr</code> is declared deprecated -since C++11. -</td> -</tr> -</table> -</div> -<div class="paragraph"> -<div class="title">Smart Pointers for Multiply-Owned Pointers:</div> -<p>Although single ownership is ideal, sometimes it is not possible, particularly -when multiple threads are in play and the lifetimes of the pointers are not -clearly defined. In these cases, you can use either <code>std::shared_ptr</code> or -Kudu’s own <code>scoped_refptr</code> from <em>gutil/ref_counted.hpp</em>. Each of these mechanisms -relies on reference counting to automatically delete the referent once no more -pointers remain. The key difference between these two types of pointers is that -<code>scoped_refptr</code> requires that the object extend a <code>RefCounted</code> base class, and -stores its reference count inside the object storage itself, while <code>shared_ptr</code> -maintains a separate reference count on the heap.</p> -</div> -<div class="paragraph"> -<p>The pros and cons are:</p> -</div> -<div class="ulist none"> -<div class="title"><code>shared_ptr</code></div> -<ul class="none"> -<li> -<p><span class="icon green"><i class="fa fa-plus-circle"></i></span> can be used with any type of object, without the -object deriving from a special base class</p> -</li> -<li> -<p><span class="icon green"><i class="fa fa-plus-circle"></i></span> part of the standard library and familiar to most -C++ developers</p> -</li> -<li> -<p><span class="icon green"><i class="fa fa-plus-circle"></i></span> supports the <code>weak_ptr</code> use cases:</p> -<div class="ulist"> -<ul> -<li> -<p>a temporary ownership when an object needs to be accessed only if it exists</p> -</li> -<li> -<p>break circular references of <code>shared_ptr</code>, if any exists due to aggregation</p> -</li> -</ul> -</div> -</li> -<li> -<p><span class="icon green"><i class="fa fa-plus-circle"></i></span> you can convert from the -<code>shared_ptr</code> into the <code>weak_ptr</code> and back</p> -</li> -<li> -<p><span class="icon green"><i class="fa fa-plus-circle"></i></span> if creating an instance with -<code>std::make_shared<>()</code> only one allocation is made (since C++11; -a non-binding requirement in the Standard, though)</p> -</li> -<li> -<p><span class="icon red"><i class="fa fa-minus-circle"></i></span> if creating a new object with -<code>shared_ptr<T> p(new T)</code> requires two allocations (one to create the ref count, -and one to create the object)</p> -</li> -<li> -<p><span class="icon red"><i class="fa fa-minus-circle"></i></span> the ref count may not be near the object on the heap, -so extra cache misses may be incurred on access</p> -</li> -<li> -<p><span class="icon red"><i class="fa fa-minus-circle"></i></span> the <code>shared_ptr</code> instance itself requires 16 bytes -(pointer to the ref count and pointer to the object)</p> -</li> -</ul> -</div> -<div class="ulist none"> -<div class="title"><code>scoped_refptr</code></div> -<ul class="none"> -<li> -<p><span class="icon green"><i class="fa fa-plus-circle fa-pro"></i></span> only requires a single allocation, and ref count -is on the same cache line as the object</p> -</li> -<li> -<p><span class="icon green"><i class="fa fa-plus-circle fa-pro"></i></span> the pointer only requires 8 bytes (since -the ref count is within the object)</p> -</li> -<li> -<p><span class="icon green"><i class="fa fa-plus-circle fa-pro"></i></span> you can manually increase or decrease -reference counts when more control is required</p> -</li> -<li> -<p><span class="icon green"><i class="fa fa-plus-circle fa-pro"></i></span> you can convert from a raw pointer back -to a <code>scoped_refptr</code> safely without worrying about double freeing</p> -</li> -<li> -<p><span class="icon green"><i class="fa fa-plus-circle fa-pro"></i></span> since we control the implementation, we -can implement features, such as debug builds that capture the stack trace of every -referent to help debug leaks.</p> -</li> -<li> -<p><span class="icon red"><i class="fa fa-minus-circle fa-con"></i></span> the referred-to object must inherit -from <code>RefCounted</code></p> -</li> -<li> -<p><span class="icon red"><i class="fa fa-minus-circle fa-con"></i></span> does not support the <code>weak_ptr</code> use cases</p> -</li> -</ul> -</div> -<div class="paragraph"> -<p>Since <code>scoped_refptr</code> is generally faster and smaller, try to use it -rather than <code>shared_ptr</code> in new code. Existing code uses <code>shared_ptr</code> -in many places. When interfacing with that code, you can continue to use <code>shared_ptr</code>.</p> -</div> -</div> -<div class="sect2"> -<h3 id="_function_binding_and_callbacks"><a class="link" href="#_function_binding_and_callbacks">Function Binding and Callbacks</a></h3> -<div class="paragraph"> -<p>Existing code uses <code>boost::bind</code> and <code>boost::function</code> for function binding and -callbacks. For new code, use the <code>Callback</code> and <code>Bind</code> classes in <code>gutil</code> instead. -While less full-featured (<code>Bind</code> doesn’t support argument -place holders, wrapped function pointers, or function objects), they provide -more options by the way of argument lifecycle management. For example, a -bound argument whose class extends <code>RefCounted</code> will be incremented during <code>Bind</code> -and decremented when the <code>Callback</code> goes out of scope.</p> -</div> -<div class="paragraph"> -<p>See the large file comment in <em>gutil/callback.h</em> for more details, and -<em>util/callback_bind-test.cc</em> for examples.</p> -</div> -</div> -<div class="sect2"> -<h3 id="_gflags"><a class="link" href="#_gflags">GFlags</a></h3> -<div class="paragraph"> -<p>Kudu uses gflags for both command-line and file-based configuration. Use these guidelines -to add a new gflag. All new gflags must conform to these -guidelines. Existing non-conformant ones will be made conformant in time.</p> -</div> -<div class="paragraph"> -<div class="title">Name</div> -<p>The gflag’s name conveys a lot of information, so choose a good name. The name -will propagate into other systems, such as the -<a href="configuration_reference.html">Configuration Reference</a>.</p> -</div> -<div class="ulist"> -<ul> -<li> -<p>The different parts of a multi-word name should be separated by underscores. -For example, <code>fs_data_dirs</code>.</p> -</li> -<li> -<p>The name should be prefixed with the context that it affects. For example, -<code>webserver_num_worker_threads</code> and <code>cfile_default_block_size</code>. Context can be -difficult to define, so bear in mind that this prefix will be -used to group similar gflags together. If the gflag affects the entire -process, it should not be prefixed.</p> -</li> -<li> -<p>If the gflag is for a quantity, the name should be suffixed with the units. -For example, <code>tablet_copy_idle_timeout_ms</code>.</p> -</li> -<li> -<p>Where possible, use short names. This will save time for those entering -command line options by hand.</p> -</li> -<li> -<p>The name is part of Kudu’s compatibility contract, and should not change -without very good reason.</p> -</li> -</ul> -</div> -<div class="paragraph"> -<div class="title">Default value</div> -<p>Choosing a default value is generally simple, but like the name, it propagates -into other systems.</p> -</div> -<div class="ulist"> -<ul> -<li> -<p>The default value is part of Kudu’s compatibility contract, and should not -change without very good reason.</p> -</li> -</ul> -</div> -<div class="paragraph"> -<div class="title">Description</div> -<p>The gflag’s description should supplement the name and provide additional -context and information. Like the name, the description propagates into other -systems.</p> -</div> -<div class="ulist"> -<ul> -<li> -<p>The description may include multiple sentences. Each should begin with a -capital letter, end with a period, and begin one space after the previous.</p> -</li> -<li> -<p>The description should NOT include the gflag’s type or default value; they are -provided out-of-band.</p> -</li> -<li> -<p>The description should be in the third person. Do not use words like <code>you</code>.</p> -</li> -<li> -<p>A gflag description can be changed freely; it is not expected to remain the -same across Kudu releases.</p> -</li> -</ul> -</div> -<div class="paragraph"> -<div class="title">Tags</div> -<p>Kudu’s gflag tagging mechanism adds machine-readable context to each gflag, for -use in consuming systems such as documentation or management tools. See the large block -comment in <em>flag_tags.h</em> for guidelines.</p> -</div> -<div class="ulist"> -<div class="title">Miscellaneous</div> -<ul> -<li> -<p>Avoid creating multiple gflags for the same logical parameter. For -example, many Kudu binaries need to configure a WAL directory. Rather than -creating <code>foo_wal_dir</code> and <code>bar_wal_dir</code> gflags, better to have a single -<code>kudu_wal_dir</code> gflag for use universally.</p> -</li> -</ul> -</div> -</div> -</div> -</div> -<div class="sect1"> -<h2 id="_java_code_style"><a class="link" href="#_java_code_style">Java Code Style</a></h2> -<div class="sectionbody"> -<div class="sect2"> -<h3 id="_preconditions_vs_assert_in_the_kudu_java_client"><a class="link" href="#_preconditions_vs_assert_in_the_kudu_java_client">Preconditions vs assert in the Kudu Java client</a></h3> -<div class="paragraph"> -<p>Use <code>assert</code> for verification of the static (i.e. non-runtime) internal -invariants. Internal means the pre- and post-conditions which are -completely under control of the code of a class or a function itself and cannot -be influenced by input parameters and other runtime/dynamic conditions.</p> -</div> -<div class="paragraph"> -<p>Use <code>Preconditions</code> for verification of the input parameters and the other -conditions which are outside of the control of the local code, or conditions -which are dependent on the state of other objects/components in runtime.</p> -</div> -<div class="listingblock"> -<div class="content"> -<pre class="highlight"><code class="language-java" data-lang="java">Object pop() { - // Use Preconditions here because the external user of the class should not - // call pop() on an empty stack, but the stack itself is internally consistent - Preconditions.checkState(curSize > 0, "queue must not be empty"); - Object toReturn = data[--curSize]; - // Use an assert here because if we ended up with a negative size counter, - // that's an indication of a broken implementation of the stack; i.e. it's - // an invariant, not a state check. - assert curSize >= 0; - return toReturn; -}</code></pre> -</div> -</div> -<div class="paragraph"> -<p>However, keep in mind that <code>assert</code> checks are enabled only when the JVM is -run with <code>-ea</code> option. So, if some dynamic condition is crucial for the -overall consistency (e.g. a data loss can occur if some dynamic condition is not -satisfied and the code continues its execution), consider throwing an -<code>AssertionError</code>:</p> -</div> -<div class="listingblock"> -<div class="content"> -<pre class="highlight"><code class="language-java" data-lang="java">if (!isCriticalConditionSatisfied) { - throw new AssertionError("cannot continue: data loss is possible otherwise"); -}</code></pre> -</div> -</div> -<div class="sect3"> -<h4 id="_references"><a class="link" href="#_references">References</a></h4> -<div class="ulist"> -<ul> -<li> -<p><a href="https://docs.oracle.com/javase/8/docs/technotes/guides/language/assert.html">Programming With Assertions</a></p> -</li> -<li> -<p><a href="https://github.com/google/guava/wiki/PreconditionsExplained">Guava Preconditions Explained</a></p> -</li> -</ul> -</div> -</div> -</div> -</div> -</div> -<div class="sect1"> -<h2 id="_code_cmake_code_style_guide"><a class="link" href="#_code_cmake_code_style_guide"><code>CMake</code> Style Guide</a></h2> -<div class="sectionbody"> -<div class="paragraph"> -<p><code>CMake</code> allows commands in lower, upper, or mixed case. To keep -the CMake files consistent, please use the following guidelines:</p> -</div> -<div class="ulist"> -<ul> -<li> -<p><strong>built-in commands</strong> in lowercase</p> -</li> -</ul> -</div> -<div class="listingblock"> -<div class="content"> -<pre>add_subdirectory(some/path)</pre> -</div> -</div> -<div class="ulist"> -<ul> -<li> -<p><strong>built-in arguments</strong> in uppercase</p> -</li> -</ul> -</div> -<div class="listingblock"> -<div class="content"> -<pre>message(STATUS "message goes here")</pre> -</div> -</div> -<div class="ulist"> -<ul> -<li> -<p><strong>custom commands or macros</strong> in uppercase</p> -</li> -</ul> -</div> -<div class="listingblock"> -<div class="content"> -<pre>ADD_KUDU_TEST(some-test)</pre> -</div> -</div> -</div> -</div> -<div class="sect1"> -<h2 id="_testing"><a class="link" href="#_testing">Testing</a></h2> -<div class="sectionbody"> -<div class="dlist"> -<dl> -<dt class="hdlist1">All new code should have tests.</dt> -<dd> -<p>Add new tests either in existing files, or create new test files as necessary.</p> -</dd> -<dt class="hdlist1">All bug fixes should have tests.</dt> -<dd> -<p>It’s OK to fix a bug without adding a -new test if it’s triggered by an existing test case. For example, if a -race shows up when running a multi-threaded system test after 20 -minutes or so, it’s worth trying to make a more targeted test case to -trigger the bug. But if that’s hard to do, the existing system test -should be enough.</p> -</dd> -<dt class="hdlist1">Tests should run quickly (< 1s).</dt> -<dd> -<p>If you want to write a time-intensive -test, make the runtime dependent on <code>KuduTest#AllowSlowTests</code>, which is -enabled via the <code>KUDU_ALLOW_SLOW_TESTS</code> environment variable and is -used by Jenkins test execution.</p> -</dd> -<dt class="hdlist1">Tests which run a number of iterations of some task should use a <code>gflags</code> command-line argument for the number of iterations.</dt> -<dd> -<p>This is handy for writing quick stress tests or performance tests.</p> -</dd> -<dt class="hdlist1">Commits which may affect performance should include before/after <code>perf-stat(1)</code> output.</dt> -<dd> -<p>This will show performance improvement or non-regression. -Performance-sensitive code should include some test case which can be used as a -targeted benchmark.</p> -</dd> -</dl> -</div> -</div> -</div> -<div class="sect1"> -<h2 id="_documentation"><a class="link" href="#_documentation">Documentation</a></h2> -<div class="sectionbody"> -<div class="paragraph"> -<p>See the -<a href="https://github.com/apache/kudu/blob/master/docs/design-docs/doc-style-guide.adoc">Documentation Style Guide</a> -for guidelines about contributing to the official Kudu documentation.</p> -</div> -</div> -</div> - </div> - <div class="col-md-3"> - - <div id="toc" data-spy="affix" data-offset-top="70"> - <ul> - - <li> - - <a href="index.html">Introducing Kudu</a> - </li> - <li> - - <a href="release_notes.html">Kudu Release Notes</a> - </li> - <li> - - <a href="quickstart.html">Getting Started with Kudu</a> - </li> - <li> - - <a href="installation.html">Installation Guide</a> - </li> - <li> - - <a href="configuration.html">Configuring Kudu</a> - </li> - <li> - - <a href="kudu_impala_integration.html">Using Impala with Kudu</a> - </li> - <li> - - <a href="administration.html">Administering Kudu</a> - </li> - <li> - - <a href="troubleshooting.html">Troubleshooting Kudu</a> - </li> - <li> - - <a href="developing.html">Developing Applications with Kudu</a> - </li> - <li> - - <a href="schema_design.html">Kudu Schema Design</a> - </li> - <li> - - <a href="security.html">Kudu Security</a> - </li> - <li> - - <a href="transaction_semantics.html">Kudu Transaction Semantics</a> - </li> - <li> - - <a href="background_tasks.html">Background Maintenance Tasks</a> - </li> - <li> - - <a href="configuration_reference.html">Kudu Configuration Reference</a> - </li> - <li> - - <a href="command_line_tools_reference.html">Kudu Command Line Tools Reference</a> - </li> - <li> - - <a href="known_issues.html">Known Issues and Limitations</a> - </li> - <li> -<span class="active-toc">Contributing to Kudu</span> - <ul class="sectlevel1"> -<li><a href="#_contributing_patches_using_gerrit">Contributing Patches Using Gerrit</a> -<ul class="sectlevel2"> -<li><a href="#_initial_setup_for_gerrit">Initial Setup for Gerrit</a></li> -<li><a href="#_submitting_patches">Submitting Patches</a></li> -<li><a href="#_abandoning_a_review">Abandoning a Review</a></li> -<li><a href="#_reviewing_patches_in_gerrit">Reviewing Patches In Gerrit</a></li> -</ul> -</li> -<li><a href="#_c_code_style">C++ Code Style</a> -<ul class="sectlevel2"> -<li><a href="#_notes_on_c_11">Notes on C++ 11</a></li> -<li><a href="#_limitations_on_code_boost_code_use">Limitations on <code>boost</code> Use</a></li> -<li><a href="#_line_length">Line length</a></li> -<li><a href="#_pointers">Pointers</a></li> -<li><a href="#_function_binding_and_callbacks">Function Binding and Callbacks</a></li> -<li><a href="#_gflags">GFlags</a></li> -</ul> -</li> -<li><a href="#_java_code_style">Java Code Style</a> -<ul class="sectlevel2"> -<li><a href="#_preconditions_vs_assert_in_the_kudu_java_client">Preconditions vs assert in the Kudu Java client</a> -<ul class="sectlevel3"> -<li><a href="#_references">References</a></li> -</ul> -</li> -</ul> -</li> -<li><a href="#_code_cmake_code_style_guide"><code>CMake</code> Style Guide</a></li> -<li><a href="#_testing">Testing</a></li> -<li><a href="#_documentation">Documentation</a></li> -</ul> - </li> - <li> - - <a href="export_control.html">Export Control Notice</a> - </li> - </ul> - </div> - </div> - </div> -</div> \ No newline at end of file
http://git-wip-us.apache.org/repos/asf/kudu/blob/1fefa84c/docs/developing.html ---------------------------------------------------------------------- diff --git a/docs/developing.html b/docs/developing.html deleted file mode 100644 index d196fa8..0000000 --- a/docs/developing.html +++ /dev/null @@ -1,467 +0,0 @@ ---- -title: Developing Applications With Apache Kudu -layout: default -active_nav: docs -last_updated: 'Last updated 2018-06-15 07:22:05 PDT' ---- -<!-- - -Licensed under the Apache License, Version 2.0 (the "License"); -you may not use this file except in compliance with the License. -You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. ---> - - -<div class="container"> - <div class="row"> - <div class="col-md-9"> - -<h1>Developing Applications With Apache Kudu</h1> - <div id="preamble"> -<div class="sectionbody"> -<div class="paragraph"> -<p>Kudu provides C++, Java and Python client APIs, as well as reference examples to illustrate -their use.</p> -</div> -<div class="admonitionblock warning"> -<table> -<tr> -<td class="icon"> -<i class="fa icon-warning" title="Warning"></i> -</td> -<td class="content"> -Use of server-side or private interfaces is not supported, and interfaces -which are not part of public APIs have no stability guarantees. -</td> -</tr> -</table> -</div> -</div> -</div> -<div class="sect1"> -<h2 id="_viewing_the_api_documentation"><a class="link" href="#_viewing_the_api_documentation">Viewing the API Documentation</a></h2> -<div class="sectionbody"> -<div class="paragraph"> -<div class="title">C++ API Documentation</div> -<p>You can view the <a href="../cpp-client-api/index.html">C++ client API documentation</a> -online. Alternatively, after <a href="#build_from_source">building Kudu from source</a>, -you can additionally build the <code>doxygen</code> target (e.g., run <code>make doxygen</code> -if using make) and use the locally generated API documentation by opening -<code>docs/doxygen/client_api/html/index.html</code> file in your favorite Web browser.</p> -</div> -<div class="admonitionblock note"> -<table> -<tr> -<td class="icon"> -<i class="fa icon-note" title="Note"></i> -</td> -<td class="content"> -In order to build the <code>doxygen</code> target, it’s necessary to have -doxygen with Dot (graphviz) support installed at your build machine. If -you installed doxygen after building Kudu from source, you will need to run -<code>cmake</code> again to pick up the doxygen location and generate appropriate -targets. -</td> -</tr> -</table> -</div> -<div class="paragraph"> -<div class="title">Java API Documentation</div> -<p>You can view the <a href="../apidocs/index.html">Java API documentation</a> online. Alternatively, -after <a href="#build_java_client">building the Java client</a>, Java API documentation is available -in <code>java/kudu-client/target/apidocs/index.html</code>.</p> -</div> -</div> -</div> -<div class="sect1"> -<h2 id="_working_examples"><a class="link" href="#_working_examples">Working Examples</a></h2> -<div class="sectionbody"> -<div class="paragraph"> -<p>Several example applications are provided in the -<a href="https://github.com/cloudera/kudu-examples">kudu-examples</a> Github -repository. Each example includes a <code>README</code> that shows how to compile and run -it. These examples illustrate correct usage of the Kudu APIs, as well as how to -set up a virtual machine to run Kudu. The following list includes some of the -examples that are available today. Check the repository itself in case this list goes -out of date.</p> -</div> -<div class="dlist"> -<dl> -<dt class="hdlist1"><code>java/java-example</code></dt> -<dd> -<p>A simple Java application which connects to a Kudu instance, creates a table, writes data to it, then drops the table.</p> -</dd> -<dt class="hdlist1"><code>java/collectl</code></dt> -<dd> -<p>A small Java application which listens on a TCP socket for time series data corresponding to the Collectl wire protocol. -The commonly-available collectl tool can be used to send example data to the server.</p> -</dd> -<dt class="hdlist1"><code>java/insert-loadgen</code></dt> -<dd> -<p>A Java application that generates random insert load.</p> -</dd> -<dt class="hdlist1"><code>python/dstat-kudu</code></dt> -<dd> -<p>An example program that shows how to use the Kudu Python API to load data into a new / existing Kudu table -generated by an external program, <code>dstat</code> in this case.</p> -</dd> -<dt class="hdlist1"><code>python/graphite-kudu</code></dt> -<dd> -<p>An experimental plugin for using graphite-web with Kudu as a backend.</p> -</dd> -<dt class="hdlist1"><code>demo-vm-setup</code></dt> -<dd> -<p>Scripts to download and run a VirtualBox virtual machine with Kudu already installed. -See <a href="quickstart.html">Quickstart</a> for more information.</p> -</dd> -</dl> -</div> -<div class="paragraph"> -<p>These examples should serve as helpful starting points for your own Kudu applications and integrations.</p> -</div> -<div class="sect2"> -<h3 id="_maven_artifacts"><a class="link" href="#_maven_artifacts">Maven Artifacts</a></h3> -<div class="paragraph"> -<p>The following Maven <code><dependency></code> element is valid for the Apache Kudu public release -(since 1.0.0):</p> -</div> -<div class="listingblock"> -<div class="content"> -<pre class="highlight"><code class="language-xml" data-lang="xml"><dependency> - <groupId>org.apache.kudu</groupId> - <artifactId>kudu-client</artifactId> - <version>1.1.0</version> -</dependency></code></pre> -</div> -</div> -<div class="paragraph"> -<p>Convenience binary artifacts for the Java client and various Java integrations (e.g. Spark, Flume) -are also now available via the <a href="http://repository.apache.org">ASF Maven repository</a> and -<a href="https://mvnrepository.com/artifact/org.apache.kudu">Maven Central repository</a>.</p> -</div> -</div> -</div> -</div> -<div class="sect1"> -<h2 id="_example_impala_commands_with_kudu"><a class="link" href="#_example_impala_commands_with_kudu">Example Impala Commands With Kudu</a></h2> -<div class="sectionbody"> -<div class="paragraph"> -<p>See <a href="kudu_impala_integration.html">Using Impala With Kudu</a> for guidance on installing -and using Impala with Kudu, including several <code>impala-shell</code> examples.</p> -</div> -</div> -</div> -<div class="sect1"> -<h2 id="_kudu_integration_with_spark"><a class="link" href="#_kudu_integration_with_spark">Kudu Integration with Spark</a></h2> -<div class="sectionbody"> -<div class="paragraph"> -<p>Kudu integrates with Spark through the Data Source API as of version 1.0.0. -Include the kudu-spark dependency using the --packages option:</p> -</div> -<div class="paragraph"> -<p>Use the kudu-spark_2.10 artifact if using Spark with Scala 2.10</p> -</div> -<div class="listingblock"> -<div class="content"> -<pre class="highlight"><code>spark-shell --packages org.apache.kudu:kudu-spark_2.10:1.1.0</code></pre> -</div> -</div> -<div class="paragraph"> -<p>Use kudu-spark2_2.11 artifact if using Spark 2 with Scala 2.11</p> -</div> -<div class="listingblock"> -<div class="content"> -<pre class="highlight"><code>spark-shell --packages org.apache.kudu:kudu-spark2_2.11:1.1.0</code></pre> -</div> -</div> -<div class="paragraph"> -<p>then import kudu-spark and create a dataframe:</p> -</div> -<div class="listingblock"> -<div class="content"> -<pre class="highlight"><code class="language-scala" data-lang="scala">import org.apache.kudu.spark.kudu._ -import org.apache.kudu.client._ -import collection.JavaConverters._ - -// Read a table from Kudu -val df = sqlContext.read.options(Map("kudu.master" -> "kudu.master:7051","kudu.table" -> "kudu_table")).kudu - -// Query using the Spark API... -df.select("id").filter("id" >= 5).show() - -// ...or register a temporary table and use SQL -df.registerTempTable("kudu_table") -val filteredDF = sqlContext.sql("select id from kudu_table where id >= 5").show() - -// Use KuduContext to create, delete, or write to Kudu tables -val kuduContext = new KuduContext("kudu.master:7051", sqlContext.sparkContext) - -// Create a new Kudu table from a dataframe schema -// NB: No rows from the dataframe are inserted into the table -kuduContext.createTable( - "test_table", df.schema, Seq("key"), - new CreateTableOptions() - .setNumReplicas(1) - .addHashPartitions(List("key").asJava, 3)) - -// Insert data -kuduContext.insertRows(df, "test_table") - -// Delete data -kuduContext.deleteRows(filteredDF, "test_table") - -// Upsert data -kuduContext.upsertRows(df, "test_table") - -// Update data -val alteredDF = df.select("id", $"count" + 1) -kuduContext.updateRows(filteredRows, "test_table" - -// Data can also be inserted into the Kudu table using the data source, though the methods on KuduContext are preferred -// NB: The default is to upsert rows; to perform standard inserts instead, set operation = insert in the options map -// NB: Only mode Append is supported -df.write.options(Map("kudu.master"-> "kudu.master:7051", "kudu.table"-> "test_table")).mode("append").kudu - -// Check for the existence of a Kudu table -kuduContext.tableExists("another_table") - -// Delete a Kudu table -kuduContext.deleteTable("unwanted_table")</code></pre> -</div> -</div> -<div class="sect2"> -<h3 id="_using_spark_with_a_secure_kudu_cluster"><a class="link" href="#_using_spark_with_a_secure_kudu_cluster">Using Spark with a Secure Kudu Cluster</a></h3> -<div class="paragraph"> -<p>The Kudu Spark integration is able to operate on secure Kudu clusters which have -authentication and encryption enabled, but the submitter of the Spark job must -provide the proper credentials. For Spark jobs using the default 'client' deploy -mode, the submitting user must have an active Kerberos ticket granted through -<code>kinit</code>. For Spark jobs using the 'cluster' deploy mode, a Kerberos principal -name and keytab location must be provided through the <code>--principal</code> and -<code>--keytab</code> arguments to <code>spark2-submit</code>.</p> -</div> -</div> -<div class="sect2"> -<h3 id="_spark_integration_known_issues_and_limitations"><a class="link" href="#_spark_integration_known_issues_and_limitations">Spark Integration Known Issues and Limitations</a></h3> -<div class="ulist"> -<ul> -<li> -<p>Spark 2.2+ requires Java 8 at runtime even though Kudu Spark 2.x integration -is Java 7 compatible. Spark 2.2 is the default dependency version as of -Kudu 1.5.0.</p> -</li> -<li> -<p>Kudu tables with a name containing upper case or non-ascii characters must be -assigned an alternate name when registered as a temporary table.</p> -</li> -<li> -<p>Kudu tables with a column name containing upper case or non-ascii characters -may not be used with SparkSQL. Columns may be renamed in Kudu to work around -this issue.</p> -</li> -<li> -<p><code><></code> and <code>OR</code> predicates are not pushed to Kudu, and instead will be evaluated -by the Spark task. Only <code>LIKE</code> predicates with a suffix wildcard are pushed to -Kudu, meaning that <code>LIKE "FOO%"</code> is pushed down but <code>LIKE "FOO%BAR"</code> isn’t.</p> -</li> -<li> -<p>Kudu does not support every type supported by Spark SQL. For example, -<code>Date</code> and complex types are not supported.</p> -</li> -<li> -<p>Kudu tables may only be registered as temporary tables in SparkSQL. -Kudu tables may not be queried using HiveContext.</p> -</li> -</ul> -</div> -</div> -</div> -</div> -<div class="sect1"> -<h2 id="_kudu_python_client"><a class="link" href="#_kudu_python_client">Kudu Python Client</a></h2> -<div class="sectionbody"> -<div class="paragraph"> -<p>The Kudu Python client provides a Python friendly interface to the C++ client API. -The sample below demonstrates the use of part of the Python client.</p> -</div> -<div class="listingblock"> -<div class="content"> -<pre class="highlight"><code class="language-python" data-lang="python">import kudu -from kudu.client import Partitioning -from datetime import datetime - -# Connect to Kudu master server -client = kudu.connect(host='kudu.master', port=7051) - -# Define a schema for a new table -builder = kudu.schema_builder() -builder.add_column('key').type(kudu.int64).nullable(False).primary_key() -builder.add_column('ts_val', type_=kudu.unixtime_micros, nullable=False, compression='lz4') -schema = builder.build() - -# Define partitioning schema -partitioning = Partitioning().add_hash_partitions(column_names=['key'], num_buckets=3) - -# Create new table -client.create_table('python-example', schema, partitioning) - -# Open a table -table = client.table('python-example') - -# Create a new session so that we can apply write operations -session = client.new_session() - -# Insert a row -op = table.new_insert({'key': 1, 'ts_val': datetime.utcnow()}) -session.apply(op) - -# Upsert a row -op = table.new_upsert({'key': 2, 'ts_val': "2016-01-01T00:00:00.000000"}) -session.apply(op) - -# Updating a row -op = table.new_update({'key': 1, 'ts_val': ("2017-01-01", "%Y-%m-%d")}) -session.apply(op) - -# Delete a row -op = table.new_delete({'key': 2}) -session.apply(op) - -# Flush write operations, if failures occur, capture print them. -try: - session.flush() -except kudu.KuduBadStatus as e: - print(session.get_pending_errors()) - -# Create a scanner and add a predicate -scanner = table.scanner() -scanner.add_predicate(table['ts_val'] == datetime(2017, 1, 1)) - -# Open Scanner and read all tuples -# Note: This doesn't scale for large scans -result = scanner.open().read_all_tuples()</code></pre> -</div> -</div> -</div> -</div> -<div class="sect1"> -<h2 id="_integration_with_mapreduce_yarn_and_other_frameworks"><a class="link" href="#_integration_with_mapreduce_yarn_and_other_frameworks">Integration with MapReduce, YARN, and Other Frameworks</a></h2> -<div class="sectionbody"> -<div class="paragraph"> -<p>Kudu was designed to integrate with MapReduce, YARN, Spark, and other frameworks in -the Hadoop ecosystem. See -<a href="https://github.com/apache/kudu/blob/master/java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/RowCounter.java">RowCounter.java</a> -and -<a href="https://github.com/apache/kudu/blob/master/java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportCsv.java">ImportCsv.java</a> -for examples which you can model your own integrations on. Stay tuned for more examples -using YARN and Spark in the future.</p> -</div> -</div> -</div> - </div> - <div class="col-md-3"> - - <div id="toc" data-spy="affix" data-offset-top="70"> - <ul> - - <li> - - <a href="index.html">Introducing Kudu</a> - </li> - <li> - - <a href="release_notes.html">Kudu Release Notes</a> - </li> - <li> - - <a href="quickstart.html">Getting Started with Kudu</a> - </li> - <li> - - <a href="installation.html">Installation Guide</a> - </li> - <li> - - <a href="configuration.html">Configuring Kudu</a> - </li> - <li> - - <a href="kudu_impala_integration.html">Using Impala with Kudu</a> - </li> - <li> - - <a href="administration.html">Administering Kudu</a> - </li> - <li> - - <a href="troubleshooting.html">Troubleshooting Kudu</a> - </li> - <li> -<span class="active-toc">Developing Applications with Kudu</span> - <ul class="sectlevel1"> -<li><a href="#_viewing_the_api_documentation">Viewing the API Documentation</a></li> -<li><a href="#_working_examples">Working Examples</a> -<ul class="sectlevel2"> -<li><a href="#_maven_artifacts">Maven Artifacts</a></li> -</ul> -</li> -<li><a href="#_example_impala_commands_with_kudu">Example Impala Commands With Kudu</a></li> -<li><a href="#_kudu_integration_with_spark">Kudu Integration with Spark</a> -<ul class="sectlevel2"> -<li><a href="#_using_spark_with_a_secure_kudu_cluster">Using Spark with a Secure Kudu Cluster</a></li> -<li><a href="#_spark_integration_known_issues_and_limitations">Spark Integration Known Issues and Limitations</a></li> -</ul> -</li> -<li><a href="#_kudu_python_client">Kudu Python Client</a></li> -<li><a href="#_integration_with_mapreduce_yarn_and_other_frameworks">Integration with MapReduce, YARN, and Other Frameworks</a></li> -</ul> - </li> - <li> - - <a href="schema_design.html">Kudu Schema Design</a> - </li> - <li> - - <a href="security.html">Kudu Security</a> - </li> - <li> - - <a href="transaction_semantics.html">Kudu Transaction Semantics</a> - </li> - <li> - - <a href="background_tasks.html">Background Maintenance Tasks</a> - </li> - <li> - - <a href="configuration_reference.html">Kudu Configuration Reference</a> - </li> - <li> - - <a href="command_line_tools_reference.html">Kudu Command Line Tools Reference</a> - </li> - <li> - - <a href="known_issues.html">Known Issues and Limitations</a> - </li> - <li> - - <a href="contributing.html">Contributing to Kudu</a> - </li> - <li> - - <a href="export_control.html">Export Control Notice</a> - </li> - </ul> - </div> - </div> - </div> -</div> \ No newline at end of file http://git-wip-us.apache.org/repos/asf/kudu/blob/1fefa84c/docs/export_control.html ---------------------------------------------------------------------- diff --git a/docs/export_control.html b/docs/export_control.html deleted file mode 100644 index feddf3b..0000000 --- a/docs/export_control.html +++ /dev/null @@ -1,154 +0,0 @@ ---- -title: Export Control Notice -layout: default -active_nav: docs -last_updated: 'Last updated 2018-06-14 08:17:56 PDT' ---- -<!-- - -Licensed under the Apache License, Version 2.0 (the "License"); -you may not use this file except in compliance with the License. -You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. ---> - - -<div class="container"> - <div class="row"> - <div class="col-md-9"> - -<h1>Export Control Notice</h1> - <div id="preamble"> -<div class="sectionbody"> -<div class="paragraph"> -<p>This distribution includes cryptographic software. The country in -which you currently reside may have restrictions on the import, -possession, use, and/or re-export to another country, of -encryption software. BEFORE using any encryption software, please -check your country’s laws, regulations and policies concerning the -import, possession, or use, and re-export of encryption software, to -see if this is permitted. See <a href="http://www.wassenaar.org/" class="bare">http://www.wassenaar.org/</a> for more -information.</p> -</div> -<div class="paragraph"> -<p>The U.S. Government Department of Commerce, Bureau of Industry and -Security (BIS), has classified this software as Export Commodity -Control Number (ECCN) 5D002.C.1, which includes information security -software using or performing cryptographic functions with asymmetric -algorithms. The form and manner of this Apache Software Foundation -distribution makes it eligible for export under the License Exception -ENC Technology Software Unrestricted (TSU) exception (see the BIS -Export Administration Regulations, Section 740.13) for both object -code and source code.</p> -</div> -<div class="paragraph"> -<p>The following provides more details on the included cryptographic -software:</p> -</div> -<div class="ulist"> -<ul> -<li> -<p>This software uses OpenSSL to enable TLS-encrypted connections, -generate keys for asymmetric cryptography, and generate and -verify signatures using those keys.</p> -</li> -<li> -<p>This software uses Java SE Security libraries including the -Java Secure Socket Extension (JSSE), Java Generic Security Service -(JGSS), and Java Authentication and Authorization APIs (JAAS) -to provide secure authentication and TLS-protected transport.</p> -</li> -</ul> -</div> -</div> -</div> - </div> - <div class="col-md-3"> - - <div id="toc" data-spy="affix" data-offset-top="70"> - <ul> - - <li> - - <a href="index.html">Introducing Kudu</a> - </li> - <li> - - <a href="release_notes.html">Kudu Release Notes</a> - </li> - <li> - - <a href="quickstart.html">Getting Started with Kudu</a> - </li> - <li> - - <a href="installation.html">Installation Guide</a> - </li> - <li> - - <a href="configuration.html">Configuring Kudu</a> - </li> - <li> - - <a href="kudu_impala_integration.html">Using Impala with Kudu</a> - </li> - <li> - - <a href="administration.html">Administering Kudu</a> - </li> - <li> - - <a href="troubleshooting.html">Troubleshooting Kudu</a> - </li> - <li> - - <a href="developing.html">Developing Applications with Kudu</a> - </li> - <li> - - <a href="schema_design.html">Kudu Schema Design</a> - </li> - <li> - - <a href="security.html">Kudu Security</a> - </li> - <li> - - <a href="transaction_semantics.html">Kudu Transaction Semantics</a> - </li> - <li> - - <a href="background_tasks.html">Background Maintenance Tasks</a> - </li> - <li> - - <a href="configuration_reference.html">Kudu Configuration Reference</a> - </li> - <li> - - <a href="command_line_tools_reference.html">Kudu Command Line Tools Reference</a> - </li> - <li> - - <a href="known_issues.html">Known Issues and Limitations</a> - </li> - <li> - - <a href="contributing.html">Contributing to Kudu</a> - </li> - <li> -<span class="active-toc">Export Control Notice</span> - - </li> - </ul> - </div> - </div> - </div> -</div> \ No newline at end of file http://git-wip-us.apache.org/repos/asf/kudu/blob/1fefa84c/docs/images/hash-hash-partitioning-example.png ---------------------------------------------------------------------- diff --git a/docs/images/hash-hash-partitioning-example.png b/docs/images/hash-hash-partitioning-example.png deleted file mode 100644 index c843f73..0000000 Binary files a/docs/images/hash-hash-partitioning-example.png and /dev/null differ http://git-wip-us.apache.org/repos/asf/kudu/blob/1fefa84c/docs/images/hash-partitioning-example.png ---------------------------------------------------------------------- diff --git a/docs/images/hash-partitioning-example.png b/docs/images/hash-partitioning-example.png deleted file mode 100644 index 56de4e8..0000000 Binary files a/docs/images/hash-partitioning-example.png and /dev/null differ http://git-wip-us.apache.org/repos/asf/kudu/blob/1fefa84c/docs/images/hash-range-partitioning-example.png ---------------------------------------------------------------------- diff --git a/docs/images/hash-range-partitioning-example.png b/docs/images/hash-range-partitioning-example.png deleted file mode 100644 index 6e16ada..0000000 Binary files a/docs/images/hash-range-partitioning-example.png and /dev/null differ http://git-wip-us.apache.org/repos/asf/kudu/blob/1fefa84c/docs/images/kudu-architecture-2.png ---------------------------------------------------------------------- diff --git a/docs/images/kudu-architecture-2.png b/docs/images/kudu-architecture-2.png deleted file mode 100644 index fcaeba5..0000000 Binary files a/docs/images/kudu-architecture-2.png and /dev/null differ http://git-wip-us.apache.org/repos/asf/kudu/blob/1fefa84c/docs/images/range-partitioning-example.png ---------------------------------------------------------------------- diff --git a/docs/images/range-partitioning-example.png b/docs/images/range-partitioning-example.png deleted file mode 100644 index 23eac01..0000000 Binary files a/docs/images/range-partitioning-example.png and /dev/null differ http://git-wip-us.apache.org/repos/asf/kudu/blob/1fefa84c/docs/index.html ---------------------------------------------------------------------- diff --git a/docs/index.html b/docs/index.html deleted file mode 100644 index 84cd120..0000000 --- a/docs/index.html +++ /dev/null @@ -1,470 +0,0 @@ ---- -title: Introducing Apache Kudu -layout: default -active_nav: docs -last_updated: 'Last updated 2018-06-14 08:17:56 PDT' ---- -<!-- - -Licensed under the Apache License, Version 2.0 (the "License"); -you may not use this file except in compliance with the License. -You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. ---> - - -<div class="container"> - <div class="row"> - <div class="col-md-9"> - -<h1>Introducing Apache Kudu</h1> - <div id="preamble"> -<div class="sectionbody"> -<div class="paragraph"> -<p>Kudu is a columnar storage manager developed for the Apache Hadoop platform. Kudu shares -the common technical properties of Hadoop ecosystem applications: it runs on commodity -hardware, is horizontally scalable, and supports highly available operation.</p> -</div> -<div class="paragraph"> -<p>Kudu’s design sets it apart. Some of Kudu’s benefits include:</p> -</div> -<div class="ulist"> -<ul> -<li> -<p>Fast processing of OLAP workloads.</p> -</li> -<li> -<p>Integration with MapReduce, Spark and other Hadoop ecosystem components.</p> -</li> -<li> -<p>Tight integration with Apache Impala, making it a good, mutable alternative to -using HDFS with Apache Parquet.</p> -</li> -<li> -<p>Strong but flexible consistency model, allowing you to choose consistency -requirements on a per-request basis, including the option for strict-serializable consistency.</p> -</li> -<li> -<p>Strong performance for running sequential and random workloads simultaneously.</p> -</li> -<li> -<p>Easy to administer and manage with Cloudera Manager.</p> -</li> -<li> -<p>High availability. Tablet Servers and Masters use the <a href="#raft">Raft Consensus Algorithm</a>, which ensures that -as long as more than half the total number of replicas is available, the tablet is available for -reads and writes. For instance, if 2 out of 3 replicas or 3 out of 5 replicas are available, the tablet -is available.</p> -<div class="paragraph"> -<p>Reads can be serviced by read-only follower tablets, even in the event of a -leader tablet failure.</p> -</div> -</li> -<li> -<p>Structured data model.</p> -</li> -</ul> -</div> -<div class="paragraph"> -<p>By combining all of these properties, Kudu targets support for families of -applications that are difficult or impossible to implement on current generation -Hadoop storage technologies. A few examples of applications for which Kudu is a great -solution are:</p> -</div> -<div class="ulist"> -<ul> -<li> -<p>Reporting applications where newly-arrived data needs to be immediately available for end users</p> -</li> -<li> -<p>Time-series applications that must simultaneously support:</p> -<div class="ulist"> -<ul> -<li> -<p>queries across large amounts of historic data</p> -</li> -<li> -<p>granular queries about an individual entity that must return very quickly</p> -</li> -</ul> -</div> -</li> -<li> -<p>Applications that use predictive models to make real-time decisions with periodic -refreshes of the predictive model based on all historic data</p> -</li> -</ul> -</div> -<div class="paragraph"> -<p>For more information about these and other scenarios, see <a href="#kudu_use_cases">Example Use Cases</a>.</p> -</div> -</div> -</div> -<div class="sect1"> -<h2 id="_kudu_impala_integration_features"><a class="link" href="#_kudu_impala_integration_features">Kudu-Impala Integration Features</a></h2> -<div class="sectionbody"> -<div class="dlist"> -<dl> -<dt class="hdlist1"><code>CREATE/ALTER/DROP TABLE</code></dt> -<dd> -<p>Impala supports creating, altering, and dropping tables using Kudu as the persistence layer. -The tables follow the same internal / external approach as other tables in Impala, -allowing for flexible data ingestion and querying.</p> -</dd> -<dt class="hdlist1"><code>INSERT</code></dt> -<dd> -<p>Data can be inserted into Kudu tables in Impala using the same syntax as -any other Impala table like those using HDFS or HBase for persistence.</p> -</dd> -<dt class="hdlist1"><code>UPDATE</code> / <code>DELETE</code></dt> -<dd> -<p>Impala supports the <code>UPDATE</code> and <code>DELETE</code> SQL commands to modify existing data in -a Kudu table row-by-row or as a batch. The syntax of the SQL commands is chosen -to be as compatible as possible with existing standards. In addition to simple <code>DELETE</code> -or <code>UPDATE</code> commands, you can specify complex joins with a <code>FROM</code> clause in a subquery.</p> -</dd> -<dt class="hdlist1">Flexible Partitioning</dt> -<dd> -<p>Similar to partitioning of tables in Hive, Kudu allows you to dynamically -pre-split tables by hash or range into a predefined number of tablets, in order -to distribute writes and queries evenly across your cluster. You can partition by -any number of primary key columns, by any number of hashes, and an optional list of -split rows. See <a href="schema_design.html">Schema Design</a>.</p> -</dd> -<dt class="hdlist1">Parallel Scan</dt> -<dd> -<p>To achieve the highest possible performance on modern hardware, the Kudu client -used by Impala parallelizes scans across multiple tablets.</p> -</dd> -<dt class="hdlist1">High-efficiency queries</dt> -<dd> -<p>Where possible, Impala pushes down predicate evaluation to Kudu, so that predicates -are evaluated as close as possible to the data. Query performance is comparable -to Parquet in many workloads.</p> -</dd> -</dl> -</div> -<div class="paragraph"> -<p>For more details regarding querying data stored in Kudu using Impala, please -refer to the Impala documentation.</p> -</div> -</div> -</div> -<div class="sect1"> -<h2 id="_concepts_and_terms"><a class="link" href="#_concepts_and_terms">Concepts and Terms</a></h2> -<div class="sectionbody"> -<div id="kudu_columnar_data_store" class="paragraph"> -<div class="title">Columnar Data Store</div> -<p>Kudu is a <em>columnar data store</em>. A columnar data store stores data in strongly-typed -columns. With a proper design, it is superior for analytical or data warehousing -workloads for several reasons.</p> -</div> -<div class="dlist"> -<dl> -<dt class="hdlist1">Read Efficiency</dt> -<dd> -<p>For analytical queries, you can read a single column, or a portion -of that column, while ignoring other columns. This means you can fulfill your query -while reading a minimal number of blocks on disk. With a row-based store, you need -to read the entire row, even if you only return values from a few columns.</p> -</dd> -<dt class="hdlist1">Data Compression</dt> -<dd> -<p>Because a given column contains only one type of data, -pattern-based compression can be orders of magnitude more efficient than -compressing mixed data types, which are used in row-based solutions. Combined -with the efficiencies of reading data from columns, compression allows you to -fulfill your query while reading even fewer blocks from disk. See -<a href="schema_design.html#encoding">Data Compression</a></p> -</dd> -</dl> -</div> -<div class="paragraph"> -<div class="title">Table</div> -<p>A <em>table</em> is where your data is stored in Kudu. A table has a schema and -a totally ordered primary key. A table is split into segments called tablets.</p> -</div> -<div class="paragraph"> -<div class="title">Tablet</div> -<p>A <em>tablet</em> is a contiguous segment of a table, similar to a <em>partition</em> in -other data storage engines or relational databases. A given tablet is -replicated on multiple tablet servers, and at any given point in time, -one of these replicas is considered the leader tablet. Any replica can service -reads, and writes require consensus among the set of tablet servers serving the tablet.</p> -</div> -<div class="paragraph"> -<div class="title">Tablet Server</div> -<p>A <em>tablet server</em> stores and serves tablets to clients. For a -given tablet, one tablet server acts as a leader, and the others act as -follower replicas of that tablet. Only leaders service write requests, while -leaders or followers each service read requests. Leaders are elected using -<a href="#raft">Raft Consensus Algorithm</a>. One tablet server can serve multiple tablets, and one tablet can be served -by multiple tablet servers.</p> -</div> -<div class="paragraph"> -<div class="title">Master</div> -<p>The <em>master</em> keeps track of all the tablets, tablet servers, the -<a href="#catalog_table">Catalog Table</a>, and other metadata related to the cluster. At a given point -in time, there can only be one acting master (the leader). If the current leader -disappears, a new master is elected using <a href="#raft">Raft Consensus Algorithm</a>.</p> -</div> -<div class="paragraph"> -<p>The master also coordinates metadata operations for clients. For example, when -creating a new table, the client internally sends the request to the master. The -master writes the metadata for the new table into the catalog table, and -coordinates the process of creating tablets on the tablet servers.</p> -</div> -<div class="paragraph"> -<p>All the master’s data is stored in a tablet, which can be replicated to all the -other candidate masters.</p> -</div> -<div class="paragraph"> -<p>Tablet servers heartbeat to the master at a set interval (the default is once -per second).</p> -</div> -<div id="raft" class="paragraph"> -<div class="title">Raft Consensus Algorithm</div> -<p>Kudu uses the <a href="https://raft.github.io/">Raft consensus algorithm</a> as -a means to guarantee fault-tolerance and consistency, both for regular tablets and for master -data. Through Raft, multiple replicas of a tablet elect a <em>leader</em>, which is responsible -for accepting and replicating writes to <em>follower</em> replicas. Once a write is persisted -in a majority of replicas it is acknowledged to the client. A given group of <code>N</code> replicas -(usually 3 or 5) is able to accept writes with at most <code>(N - 1)/2</code> faulty replicas.</p> -</div> -<div id="catalog_table" class="paragraph"> -<div class="title">Catalog Table</div> -<p>The <em>catalog table</em> is the central location for -metadata of Kudu. It stores information about tables and tablets. The catalog -table may not be read or written directly. Instead, it is accessible -only via metadata operations exposed in the client API.</p> -</div> -<div class="paragraph"> -<p>The catalog table stores two categories of metadata:</p> -</div> -<div class="dlist"> -<dl> -<dt class="hdlist1">Tables</dt> -<dd> -<p>table schemas, locations, and states</p> -</dd> -<dt class="hdlist1">Tablets</dt> -<dd> -<p>the list of existing tablets, which tablet servers have replicas of -each tablet, the tablet’s current state, and start and end keys.</p> -</dd> -</dl> -</div> -<div class="paragraph"> -<div class="title">Logical Replication</div> -<p>Kudu replicates operations, not on-disk data. This is referred to as <em>logical replication</em>, -as opposed to <em>physical replication</em>. This has several advantages:</p> -</div> -<div class="ulist"> -<ul> -<li> -<p>Although inserts and updates do transmit data over the network, deletes do not need -to move any data. The delete operation is sent to each tablet server, which performs -the delete locally.</p> -</li> -<li> -<p>Physical operations, such as compaction, do not need to transmit the data over the -network in Kudu. This is different from storage systems that use HDFS, where -the blocks need to be transmitted over the network to fulfill the required number of -replicas.</p> -</li> -<li> -<p>Tablets do not need to perform compactions at the same time or on the same schedule, -or otherwise remain in sync on the physical storage layer. This decreases the chances -of all tablet servers experiencing high latency at the same time, due to compactions -or heavy write loads.</p> -</li> -</ul> -</div> -</div> -</div> -<div class="sect1"> -<h2 id="_architectural_overview"><a class="link" href="#_architectural_overview">Architectural Overview</a></h2> -<div class="sectionbody"> -<div class="paragraph"> -<p>The following diagram shows a Kudu cluster with three masters and multiple tablet -servers, each serving multiple tablets. It illustrates how Raft consensus is used -to allow for both leaders and followers for both the masters and tablet servers. In -addition, a tablet server can be a leader for some tablets, and a follower for others. -Leaders are shown in gold, while followers are shown in blue.</p> -</div> -<div class="imageblock"> -<div class="content"> -<img src="./images/kudu-architecture-2.png" alt="Kudu Architecture" width="800"> -</div> -</div> -</div> -</div> -<div class="sect1"> -<h2 id="kudu_use_cases"><a class="link" href="#kudu_use_cases">Example Use Cases</a></h2> -<div class="sectionbody"> -<div class="paragraph"> -<div class="title">Streaming Input with Near Real Time Availability</div> -<p>A common challenge in data analysis is one where new data arrives rapidly and constantly, -and the same data needs to be available in near real time for reads, scans, and -updates. Kudu offers the powerful combination of fast inserts and updates with -efficient columnar scans to enable real-time analytics use cases on a single storage layer.</p> -</div> -<div class="paragraph"> -<div class="title">Time-series application with widely varying access patterns</div> -<p>A time-series schema is one in which data points are organized and keyed according -to the time at which they occurred. This can be useful for investigating the -performance of metrics over time or attempting to predict future behavior based -on past data. For instance, time-series customer data might be used both to store -purchase click-stream history and to predict future purchases, or for use by a -customer support representative. While these different types of analysis are occurring, -inserts and mutations may also be occurring individually and in bulk, and become available -immediately to read workloads. Kudu can handle all of these access patterns -simultaneously in a scalable and efficient manner.</p> -</div> -<div class="paragraph"> -<p>Kudu is a good fit for time-series workloads for several reasons. With Kudu’s support for -hash-based partitioning, combined with its native support for compound row keys, it is -simple to set up a table spread across many servers without the risk of "hotspotting" -that is commonly observed when range partitioning is used. Kudu’s columnar storage engine -is also beneficial in this context, because many time-series workloads read only a few columns, -as opposed to the whole row.</p> -</div> -<div class="paragraph"> -<p>In the past, you might have needed to use multiple data stores to handle different -data access patterns. This practice adds complexity to your application and operations, -and duplicates your data, doubling (or worse) the amount of storage -required. Kudu can handle all of these access patterns natively and efficiently, -without the need to off-load work to other data stores.</p> -</div> -<div class="paragraph"> -<div class="title">Predictive Modeling</div> -<p>Data scientists often develop predictive learning models from large sets of data. The -model and the data may need to be updated or modified often as the learning takes -place or as the situation being modeled changes. In addition, the scientist may want -to change one or more factors in the model to see what happens over time. Updating -a large set of data stored in files in HDFS is resource-intensive, as each file needs -to be completely rewritten. In Kudu, updates happen in near real time. The scientist -can tweak the value, re-run the query, and refresh the graph in seconds or minutes, -rather than hours or days. In addition, batch or incremental algorithms can be run -across the data at any time, with near-real-time results.</p> -</div> -<div class="paragraph"> -<div class="title">Combining Data In Kudu With Legacy Systems</div> -<p>Companies generate data from multiple sources and store it in a variety of systems -and formats. For instance, some of your data may be stored in Kudu, some in a traditional -RDBMS, and some in files in HDFS. You can access and query all of these sources and -formats using Impala, without the need to change your legacy systems.</p> -</div> -</div> -</div> -<div class="sect1"> -<h2 id="_next_steps"><a class="link" href="#_next_steps">Next Steps</a></h2> -<div class="sectionbody"> -<div class="ulist"> -<ul> -<li> -<p><a href="quickstart.html">Get Started With Kudu</a></p> -</li> -<li> -<p><a href="installation.html">Installing Kudu</a></p> -</li> -</ul> -</div> -</div> -</div> - </div> - <div class="col-md-3"> - - <div id="toc" data-spy="affix" data-offset-top="70"> - <ul> - - <li> -<span class="active-toc">Introducing Kudu</span> - <ul class="sectlevel1"> -<li><a href="#_kudu_impala_integration_features">Kudu-Impala Integration Features</a></li> -<li><a href="#_concepts_and_terms">Concepts and Terms</a></li> -<li><a href="#_architectural_overview">Architectural Overview</a></li> -<li><a href="#kudu_use_cases">Example Use Cases</a></li> -<li><a href="#_next_steps">Next Steps</a></li> -</ul> - </li> - <li> - - <a href="release_notes.html">Kudu Release Notes</a> - </li> - <li> - - <a href="quickstart.html">Getting Started with Kudu</a> - </li> - <li> - - <a href="installation.html">Installation Guide</a> - </li> - <li> - - <a href="configuration.html">Configuring Kudu</a> - </li> - <li> - - <a href="kudu_impala_integration.html">Using Impala with Kudu</a> - </li> - <li> - - <a href="administration.html">Administering Kudu</a> - </li> - <li> - - <a href="troubleshooting.html">Troubleshooting Kudu</a> - </li> - <li> - - <a href="developing.html">Developing Applications with Kudu</a> - </li> - <li> - - <a href="schema_design.html">Kudu Schema Design</a> - </li> - <li> - - <a href="security.html">Kudu Security</a> - </li> - <li> - - <a href="transaction_semantics.html">Kudu Transaction Semantics</a> - </li> - <li> - - <a href="background_tasks.html">Background Maintenance Tasks</a> - </li> - <li> - - <a href="configuration_reference.html">Kudu Configuration Reference</a> - </li> - <li> - - <a href="command_line_tools_reference.html">Kudu Command Line Tools Reference</a> - </li> - <li> - - <a href="known_issues.html">Known Issues and Limitations</a> - </li> - <li> - - <a href="contributing.html">Contributing to Kudu</a> - </li> - <li> - - <a href="export_control.html">Export Control Notice</a> - </li> - </ul> - </div> - </div> - </div> -</div> \ No newline at end of file