[37/52] [abbrv] [partial] kudu git commit: Updating web site for Kudu 1.8.0 release

abukor Fri, 26 Oct 2018 12:04:02 -0700
http://git-wip-us.apache.org/repos/asf/kudu/blob/1fefa84c/docs/kudu_impala_integration.html
----------------------------------------------------------------------
diff --git a/docs/kudu_impala_integration.html 
b/docs/kudu_impala_integration.html
deleted file mode 100644
index 945d9a8..0000000
--- a/docs/kudu_impala_integration.html
+++ /dev/null
@@ -1,1126 +0,0 @@
----
-title: Using Apache Kudu with Apache Impala
-layout: default
-active_nav: docs
-last_updated: 'Last updated 2018-06-15 07:22:05 PDT'
----
-<!--
-
-Licensed under the Apache License, Version 2.0 (the "License");
-you may not use this file except in compliance with the License.
-You may obtain a copy of the License at
-
-    http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing, software
-distributed under the License is distributed on an "AS IS" BASIS,
-WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-See the License for the specific language governing permissions and
-limitations under the License.
--->
-
-
-<div class="container">
-  <div class="row">
-    <div class="col-md-9">
-
-<h1>Using Apache Kudu with Apache Impala</h1>
-      <div id="preamble">
-<div class="sectionbody">
-<div class="paragraph">
-<p>Kudu has tight integration with Apache Impala, allowing you to use Impala
-to insert, query, update, and delete data from Kudu tablets using 
Impala&#8217;s SQL
-syntax, as an alternative to using the <a 
href="installation.html#view_api">Kudu APIs</a>
-to build a custom Kudu application. In addition, you can use JDBC or ODBC to 
connect
-existing or new applications written in any language, framework, or business 
intelligence
-tool to your Kudu data, using Impala as the broker.</p>
-</div>
-</div>
-</div>
-<div class="sect1">
-<h2 id="_requirements"><a class="link" 
href="#_requirements">Requirements</a></h2>
-<div class="sectionbody">
-<div class="ulist">
-<ul>
-<li>
-<p>This documentation is specific to the certain versions of Impala. The syntax
-described will work only in the following releases:</p>
-<div class="ulist">
-<ul>
-<li>
-<p>The version of Impala 2.7.0 that ships with CDH 5.10. <code>SELECT 
VERSION()</code> will
-report <code>impalad version 2.7.0-cdh5.10.0</code>.</p>
-</li>
-<li>
-<p>Apache Impala 2.8.0 releases compiled from source. <code>SELECT 
VERSION()</code> will
-report <code>impalad version 2.8.0</code>.</p>
-</li>
-</ul>
-</div>
-</li>
-</ul>
-</div>
-<div class="paragraph">
-<p>Older versions of Impala 2.7 (including the special 
<code>IMPALA_KUDU</code> releases
-previously available) have incompatible syntax. Future versions are likely to 
be
-compatible with this syntax, but we recommend checking that this is the latest
-available documentation corresponding to the appropriate version you have
-installed.</p>
-</div>
-<div class="ulist">
-<ul>
-<li>
-<p>This documentation does not describe Impala installation procedures. Please
-refer to the Impala documentation and be sure that you are able to run simple
-queries against Impala tables on HDFS before proceeding.</p>
-</li>
-</ul>
-</div>
-</div>
-</div>
-<div class="sect1">
-<h2 id="_configuration"><a class="link" 
href="#_configuration">Configuration</a></h2>
-<div class="sectionbody">
-<div class="paragraph">
-<p>No configuration changes are required within Kudu to enable access from 
Impala.</p>
-</div>
-<div class="paragraph">
-<p>Although not strictly necessary, it is recommended to configure Impala with 
the
-locations of the Kudu Master servers:</p>
-</div>
-<div class="ulist">
-<ul>
-<li>
-<p>Set the 
<code>--kudu_master_hosts=&lt;master1&gt;[:port],&lt;master2&gt;[:port],&lt;master3&gt;[:port]</code>
-flag in the Impala service configuration. If you are using Cloudera Manager,
-please refer to the appropriate Cloudera Manager documentation to do so.</p>
-</li>
-</ul>
-</div>
-<div class="paragraph">
-<p>If this flag is not set within the Impala service, it will be necessary to 
manually
-provide this configuration each time you create a table by specifying the
-<code>kudu_master_addresses</code> property inside a 
<code>TBLPROPERTIES</code> clause.</p>
-</div>
-<div class="paragraph">
-<p>The rest of this guide assumes that the configuration has been set.</p>
-</div>
-</div>
-</div>
-<div class="sect1">
-<h2 id="_using_the_impala_shell"><a class="link" 
href="#_using_the_impala_shell">Using the Impala Shell</a></h2>
-<div class="sectionbody">
-<div class="admonitionblock note">
-<table>
-<tr>
-<td class="icon">
-<i class="fa icon-note" title="Note"></i>
-</td>
-<td class="content">
-This is only a small sub-set of Impala Shell functionality. For more details, 
see the
-<a 
href="http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/impala_impala_shell.html";>Impala
 Shell</a> documentation.
-</td>
-</tr>
-</table>
-</div>
-<div class="ulist">
-<ul>
-<li>
-<p>Start Impala Shell using the <code>impala-shell</code> command. By default, 
<code>impala-shell</code>
-attempts to connect to the Impala daemon on <code>localhost</code> on port 
21000. To connect
-to a different host,, use the <code>-i &lt;host:port&gt;</code> option. To 
automatically connect to
-a specific Impala database, use the <code>-d &lt;database&gt;</code> option. 
For instance, if all your
-Kudu tables are in Impala in the database <code>impala_kudu</code>, use 
<code>-d impala_kudu</code> to use
-this database.</p>
-</li>
-<li>
-<p>To quit the Impala Shell, use the following command: <code>quit;</code></p>
-</li>
-</ul>
-</div>
-<div class="sect2">
-<h3 id="_internal_and_external_impala_tables"><a class="link" 
href="#_internal_and_external_impala_tables">Internal and External Impala 
Tables</a></h3>
-<div class="paragraph">
-<p>When creating a new Kudu table using Impala, you can create the table as an 
internal
-table or an external table.</p>
-</div>
-<div class="dlist">
-<dl>
-<dt class="hdlist1">Internal</dt>
-<dd>
-<p>An internal table is managed by Impala, and when you drop it from Impala,
-the data and the table truly are dropped. When you create a new table using 
Impala,
-it is generally a internal table.</p>
-</dd>
-<dt class="hdlist1">External</dt>
-<dd>
-<p>An external table (created by <code>CREATE EXTERNAL TABLE</code>) is not 
managed by
-Impala, and dropping such a table does not drop the table from its source 
location
-(here, Kudu). Instead, it only removes the mapping between Impala and Kudu. 
This is
-the mode used in the syntax provided by Kudu for mapping an existing table to 
Impala.</p>
-</dd>
-</dl>
-</div>
-<div class="paragraph">
-<p>See the
-<a 
href="http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/impala_tables.html";>Impala
 documentation</a>
-for more information about internal and external tables.</p>
-</div>
-</div>
-<div class="sect2">
-<h3 id="_querying_an_existing_kudu_table_in_impala"><a class="link" 
href="#_querying_an_existing_kudu_table_in_impala">Querying an Existing Kudu 
Table In Impala</a></h3>
-<div class="paragraph">
-<p>Tables created through the Kudu API or other integrations such as Apache 
Spark
-are not automatically visible in Impala. To query them, you must first create
-an external table within Impala to map the Kudu table into an Impala 
database:</p>
-</div>
-<div class="listingblock">
-<div class="content">
-<pre class="highlight"><code class="language-sql" data-lang="sql">CREATE 
EXTERNAL TABLE my_mapping_table
-STORED AS KUDU
-TBLPROPERTIES (
-  'kudu.table_name' = 'my_kudu_table'
-);</code></pre>
-</div>
-</div>
-</div>
-<div class="sect2">
-<h3 id="kudu_impala_create_table"><a class="link" 
href="#kudu_impala_create_table">Creating a New Kudu Table From Impala</a></h3>
-<div class="paragraph">
-<p>Creating a new table in Kudu from Impala is similar to mapping an existing 
Kudu table
-to an Impala table, except that you need to specify the schema and partitioning
-information yourself.</p>
-</div>
-<div class="paragraph">
-<p>Use the following example as a guideline. Impala first creates the table, 
then creates
-the mapping.</p>
-</div>
-<div class="listingblock">
-<div class="content">
-<pre class="highlight"><code class="language-sql" data-lang="sql">CREATE TABLE 
my_first_table
-(
-  id BIGINT,
-  name STRING,
-  PRIMARY KEY(id)
-)
-PARTITION BY HASH PARTITIONS 16
-STORED AS KUDU;</code></pre>
-</div>
-</div>
-<div class="paragraph">
-<p>In the <code>CREATE TABLE</code> statement, the columns that comprise the 
primary key must
-be listed first. Additionally, primary key columns are implicitly marked 
<code>NOT NULL</code>.</p>
-</div>
-<div class="paragraph">
-<p>When creating a new Kudu table, you are required to specify a distribution 
scheme.
-See <a href="#partitioning_tables">Partitioning Tables</a>. The table creation 
example above is distributed into
-16 partitions by hashing the <code>id</code> column, for simplicity. See
-<a href="#partitioning_rules_of_thumb">Partitioning Rules of Thumb</a> for 
guidelines on partitioning.</p>
-</div>
-<div class="sect3">
-<h4 id="_code_create_table_as_select_code"><a class="link" 
href="#_code_create_table_as_select_code"><code>CREATE TABLE AS 
SELECT</code></a></h4>
-<div class="paragraph">
-<p>You can create a table by querying any other table or tables in Impala, 
using a <code>CREATE
-TABLE &#8230;&#8203; AS SELECT</code> statement. The following example imports 
all rows from an existing table
-<code>old_table</code> into a Kudu table <code>new_table</code>. The names and 
types of columns in <code>new_table</code>
-will determined from the columns in the result set of the <code>SELECT</code> 
statement. Note that you must
-additionally specify the primary key and partitioning.</p>
-</div>
-<div class="listingblock">
-<div class="content">
-<pre class="highlight"><code class="language-sql" data-lang="sql">CREATE TABLE 
new_table
-PRIMARY KEY (ts, name)
-PARTITION BY HASH(name) PARTITIONS 8
-STORED AS KUDU
-AS SELECT ts, name, value FROM old_table;</code></pre>
-</div>
-</div>
-</div>
-<div class="sect3">
-<h4 id="_specifying_tablet_partitioning"><a class="link" 
href="#_specifying_tablet_partitioning">Specifying Tablet Partitioning</a></h4>
-<div class="paragraph">
-<p>Tables are divided into tablets which are each served by one or more tablet
-servers. Ideally, tablets should split a table&#8217;s data relatively 
equally. Kudu currently
-has no mechanism for automatically (or manually) splitting a pre-existing 
tablet.
-Until this feature has been implemented, <strong>you must specify your 
partitioning when
-creating a table</strong>. When designing your table schema, consider primary 
keys that will allow you to
-split your table into partitions which grow at similar rates. You can designate
-partitions using a <code>PARTITION BY</code> clause when creating a table 
using Impala:</p>
-</div>
-<div class="admonitionblock note">
-<table>
-<tr>
-<td class="icon">
-<i class="fa icon-note" title="Note"></i>
-</td>
-<td class="content">
-Impala keywords, such as <code>group</code>, are enclosed by back-tick 
characters when
-they are not used in their keyword sense.
-</td>
-</tr>
-</table>
-</div>
-<div class="listingblock">
-<div class="content">
-<pre class="highlight"><code class="language-sql" data-lang="sql">CREATE TABLE 
cust_behavior (
-  _id BIGINT PRIMARY KEY,
-  salary STRING,
-  edu_level INT,
-  usergender STRING,
-  `group` STRING,
-  city STRING,
-  postcode STRING,
-  last_purchase_price FLOAT,
-  last_purchase_date BIGINT,
-  category STRING,
-  sku STRING,
-  rating INT,
-  fulfilled_date BIGINT
-)
-PARTITION BY RANGE (_id)
-(
-    PARTITION VALUES &lt; 1439560049342,
-    PARTITION 1439560049342 &lt;= VALUES &lt; 1439566253755,
-    PARTITION 1439566253755 &lt;= VALUES &lt; 1439572458168,
-    PARTITION 1439572458168 &lt;= VALUES &lt; 1439578662581,
-    PARTITION 1439578662581 &lt;= VALUES &lt; 1439584866994,
-    PARTITION 1439584866994 &lt;= VALUES &lt; 1439591071407,
-    PARTITION 1439591071407 &lt;= VALUES
-)
-STORED AS KUDU;</code></pre>
-</div>
-</div>
-<div class="paragraph">
-<p>If you have multiple primary key columns, you can specify partition bounds
-using tuple syntax: <code>('va',1), ('ab',2)</code>. The expression must be 
valid JSON.</p>
-</div>
-</div>
-<div class="sect3">
-<h4 id="_impala_databases_and_kudu"><a class="link" 
href="#_impala_databases_and_kudu">Impala Databases and Kudu</a></h4>
-<div class="paragraph">
-<p>Every Impala table is contained within a namespace called a 
<em>database</em>. The default
-database is called <code>default</code>, and users may create and drop 
additional databases
-as desired.</p>
-</div>
-<div class="paragraph">
-<p>When a managed Kudu table is created from within Impala, the corresponding
-Kudu table will be named <code>my_database::table_name</code>.</p>
-</div>
-</div>
-<div class="sect3">
-<h4 id="_impala_keywords_not_supported_for_kudu_tables"><a class="link" 
href="#_impala_keywords_not_supported_for_kudu_tables">Impala Keywords Not 
Supported for Kudu Tables</a></h4>
-<div class="paragraph">
-<p>The following Impala keywords are not supported when creating Kudu tables:
-- <code>PARTITIONED</code>
-- <code>LOCATION</code>
-- <code>ROWFORMAT</code></p>
-</div>
-</div>
-</div>
-<div class="sect2">
-<h3 id="_optimizing_performance_for_evaluating_sql_predicates"><a class="link" 
href="#_optimizing_performance_for_evaluating_sql_predicates">Optimizing 
Performance for Evaluating SQL Predicates</a></h3>
-<div class="paragraph">
-<p>If the <code>WHERE</code> clause of your query includes comparisons with 
the operators
-<code>=</code>, <code>&lt;=</code>, '\&lt;', '\&gt;', <code>&gt;=</code>, 
<code>BETWEEN</code>, or <code>IN</code>, Kudu evaluates the condition directly
-and only returns the relevant results. This provides optimum performance, 
because Kudu
-only returns the relevant results to Impala. For predicates <code>!=</code>, 
<code>LIKE</code>, or any other
-predicate type supported by Impala, Kudu does not evaluate the predicates 
directly, but
-returns all results to Impala and relies on Impala to evaluate the remaining 
predicates and
-filter the results accordingly. This may cause differences in performance, 
depending
-on the delta of the result set before and after evaluating the 
<code>WHERE</code> clause.</p>
-</div>
-</div>
-<div class="sect2">
-<h3 id="partitioning_tables"><a class="link" 
href="#partitioning_tables">Partitioning Tables</a></h3>
-<div class="paragraph">
-<p>Tables are partitioned into tablets according to a partition schema on the 
primary
-key columns. Each tablet is served by at least one tablet server. Ideally, a 
table
-should be split into tablets that are distributed across a number of tablet 
servers
-to maximize parallel operations. The details of the partitioning schema you use
-will depend entirely on the type of data you store and how you access it. For 
a full
-discussion of schema design in Kudu, see <a href="schema_design.html">Schema 
Design</a>.</p>
-</div>
-<div class="paragraph">
-<p>Kudu currently has no mechanism for splitting or merging tablets after the 
table has
-been created. You must provide a partition schema for your table when you 
create it.
-When designing your tables, consider using primary keys that will allow you to 
partition
-your table into tablets which grow at similar rates.</p>
-</div>
-<div class="paragraph">
-<p>You can partition your table using Impala&#8217;s <code>PARTITION BY</code> 
keyword, which
-supports distribution by <code>RANGE</code> or <code>HASH</code>. The 
partition scheme can contain zero
-or more <code>HASH</code> definitions, followed by an optional 
<code>RANGE</code> definition. The <code>RANGE</code>
-definition can refer to one or more primary key columns.
-Examples of <a href="#basic_partitioning">basic</a> and <a 
href="#advanced_partitioning">advanced</a>
-partitioning are shown below.</p>
-</div>
-<div class="sect3">
-<h4 id="basic_partitioning"><a class="link" href="#basic_partitioning">Basic 
Partitioning</a></h4>
-<div class="paragraph">
-<div class="title"><code>PARTITION BY RANGE</code></div>
-<p>You can specify range partitions for one or more primary key columns.
-Range partitioning in Kudu allows splitting a table based based on
-specific values or ranges of values of the chosen partition keys. This allows
-you to balance parallelism in writes with scan efficiency.</p>
-</div>
-<div class="paragraph">
-<p>Suppose you have a table that has columns <code>state</code>, 
<code>name</code>, and <code>purchase_count</code>. The
-following example creates 50 tablets, one per US state.</p>
-</div>
-<div class="admonitionblock note">
-<table>
-<tr>
-<td class="icon">
-<i class="fa icon-note" title="Note"></i>
-</td>
-<td class="content">
-<div class="title">Monotonically Increasing Values</div>
-<div class="paragraph">
-<p>If you partition by range on a column whose values are monotonically 
increasing,
-the last tablet will grow much larger than the others. Additionally, all data
-being inserted will be written to a single tablet at a time, limiting the 
scalability
-of data ingest. In that case, consider distributing by <code>HASH</code> 
instead of, or in
-addition to, <code>RANGE</code>.</p>
-</div>
-</td>
-</tr>
-</table>
-</div>
-<div class="listingblock">
-<div class="content">
-<pre class="highlight"><code class="language-sql" data-lang="sql">CREATE TABLE 
customers (
-  state STRING,
-  name STRING,
-  purchase_count int,
-  PRIMARY KEY (state, name)
-)
-PARTITION BY RANGE (state)
-(
-  PARTITION VALUE = 'al',
-  PARTITION VALUE = 'ak',
-  PARTITION VALUE = 'ar',
-  -- ... etc ...
-  PARTITION VALUE = 'wv',
-  PARTITION VALUE = 'wy'
-)
-STORED AS KUDU;</code></pre>
-</div>
-</div>
-<div id="distribute_by_hash" class="paragraph">
-<div class="title"><code>PARTITION BY HASH</code></div>
-<p>Instead of distributing by an explicit range, or in combination with range 
distribution,
-you can distribute into a specific number of 'buckets' by hash. You specify 
the primary
-key columns you want to partition by, and the number of buckets you want to 
use. Rows are
-distributed by hashing the specified key columns. Assuming that the values 
being
-hashed do not themselves exhibit significant skew, this will serve to 
distribute
-the data evenly across buckets.</p>
-</div>
-<div class="paragraph">
-<p>You can specify multiple definitions, and you can specify definitions which
-use compound primary keys. However, one column cannot be mentioned in multiple 
hash
-definitions. Consider two columns, <code>a</code> and <code>b</code>:
-* <span class="icon green"><i class="fa fa-check fa-pro"></i></span> 
<code>HASH(a)</code>, <code>HASH(b)</code>
-* <span class="icon green"><i class="fa fa-check fa-pro"></i></span> 
<code>HASH(a,b)</code>
-* <span class="icon red"><i class="fa fa-times fa-pro"></i></span> 
<code>HASH(a), HASH(a,b)</code></p>
-</div>
-<div class="admonitionblock note">
-<table>
-<tr>
-<td class="icon">
-<i class="fa icon-note" title="Note"></i>
-</td>
-<td class="content">
-<code>PARTITION BY HASH</code> with no column specified is a shortcut to 
create the desired
-number of buckets by hashing all primary key columns.
-</td>
-</tr>
-</table>
-</div>
-<div class="paragraph">
-<p>Hash partitioning is a reasonable approach if primary key values are evenly
-distributed in their domain and no data skew is apparent, such as timestamps or
-serial IDs.</p>
-</div>
-<div class="paragraph">
-<p>The following example creates 16 tablets by hashing the <code>id</code> and 
<code>sku</code> columns. This spreads
-writes across all 16 tablets. In this example, a query for a range of 
<code>sku</code> values
-is likely to need to read all 16 tablets, so this may not be the optimum 
schema for
-this table. See <a href="#advanced_partitioning">Advanced Partitioning</a> for 
an extended example.</p>
-</div>
-<div class="listingblock">
-<div class="content">
-<pre class="highlight"><code class="language-sql" data-lang="sql">CREATE TABLE 
cust_behavior (
-  id BIGINT,
-  sku STRING,
-  salary STRING,
-  edu_level INT,
-  usergender STRING,
-  `group` STRING,
-  city STRING,
-  postcode STRING,
-  last_purchase_price FLOAT,
-  last_purchase_date BIGINT,
-  category STRING,
-  rating INT,
-  fulfilled_date BIGINT,
-  PRIMARY KEY (id, sku)
-)
-PARTITION BY HASH PARTITIONS 16
-STORED AS KUDU;</code></pre>
-</div>
-</div>
-</div>
-<div class="sect3">
-<h4 id="advanced_partitioning"><a class="link" 
href="#advanced_partitioning">Advanced Partitioning</a></h4>
-<div class="paragraph">
-<p>You can combine <code>HASH</code> and <code>RANGE</code> partitioning to 
create more complex partition schemas.
-You can specify zero or more <code>HASH</code> definitions, followed by zero 
or one <code>RANGE</code> definitions.
-Each definition can encompass one or more columns. While enumerating every 
possible distribution
-schema is out of the scope of this document, a few examples illustrate some of 
the
-possibilities.</p>
-</div>
-</div>
-<div class="sect3">
-<h4 id="_code_partition_by_hash_code_and_code_range_code"><a class="link" 
href="#_code_partition_by_hash_code_and_code_range_code"><code>PARTITION BY 
HASH</code> and <code>RANGE</code></a></h4>
-<div class="paragraph">
-<p>Consider the <a href="#distribute_by_hash">simple hashing</a> example 
above, If you often query for a range of <code>sku</code>
-values, you can optimize the example by combining hash partitioning with range 
partitioning.</p>
-</div>
-<div class="paragraph">
-<p>The following example still creates 16 tablets, by first hashing the 
<code>id</code> column into 4
-buckets, and then applying range partitioning to split each bucket into four 
tablets,
-based upon the value of the <code>sku</code> string. Writes are spread across 
at least four tablets
-(and possibly up to 16). When you query for a contiguous range of 
<code>sku</code> values, you have a
-good chance of only needing to read from a quarter of the tablets to fulfill 
the query.</p>
-</div>
-<div class="admonitionblock note">
-<table>
-<tr>
-<td class="icon">
-<i class="fa icon-note" title="Note"></i>
-</td>
-<td class="content">
-By default, the entire primary key is hashed when you use <code>PARTITION BY 
HASH</code>.
-To hash on only part of the primary key, specify it by using syntax like 
<code>PARTITION
-BY HASH (id, sku)</code>.
-</td>
-</tr>
-</table>
-</div>
-<div class="listingblock">
-<div class="content">
-<pre class="highlight"><code class="language-sql" data-lang="sql">CREATE TABLE 
cust_behavior (
-  id BIGINT,
-  sku STRING,
-  salary STRING,
-  edu_level INT,
-  usergender STRING,
-  `group` STRING,
-  city STRING,
-  postcode STRING,
-  last_purchase_price FLOAT,
-  last_purchase_date BIGINT,
-  category STRING,
-  rating INT,
-  fulfilled_date BIGINT,
-  PRIMARY KEY (id, sku)
-)
-PARTITION BY HASH (id) PARTITIONS 4,
-RANGE (sku)
-(
-  PARTITION VALUES &lt; 'g',
-  PARTITION 'g' &lt;= VALUES &lt; 'o',
-  PARTITION 'o' &lt;= VALUES &lt; 'u',
-  PARTITION 'u' &lt;= VALUES
-)
-STORED AS KUDU;</code></pre>
-</div>
-</div>
-<div class="paragraph">
-<div class="title">Multiple <code>PARTITION BY HASH</code> Definitions</div>
-<p>Again expanding the example above, suppose that the query pattern will be 
unpredictable,
-but you want to ensure that writes are spread across a large number of tablets
-You can achieve maximum distribution across the entire primary key by hashing 
on
-both primary key columns.</p>
-</div>
-<div class="listingblock">
-<div class="content">
-<pre class="highlight"><code class="language-sql" data-lang="sql">CREATE TABLE 
cust_behavior (
-  id BIGINT,
-  sku STRING,
-  salary STRING,
-  edu_level INT,
-  usergender STRING,
-  `group` STRING,
-  city STRING,
-  postcode STRING,
-  last_purchase_price FLOAT,
-  last_purchase_date BIGINT,
-  category STRING,
-  rating INT,
-  fulfilled_date BIGINT,
-  PRIMARY KEY (id, sku)
-)
-PARTITION BY HASH (id) PARTITIONS 4,
-             HASH (sku) PARTITIONS 4
-STORED AS KUDU;</code></pre>
-</div>
-</div>
-<div class="paragraph">
-<p>The example creates 16 partitions. You could also use <code>HASH (id, sku) 
PARTITIONS 16</code>.
-However, a scan for <code>sku</code> values would almost always impact all 16 
partitions, rather
-than possibly being limited to 4.</p>
-</div>
-<div class="paragraph">
-<div class="title">Non-Covering Range Partitions</div>
-<p>Kudu 1.0 and higher supports the use of non-covering range partitions,
-which address scenarios like the following:</p>
-</div>
-<div class="ulist">
-<ul>
-<li>
-<p>Without non-covering range partitions, in the case of time-series data or 
other
-schemas which need to account for constantly-increasing primary keys, tablets
-serving old data will be relatively fixed in size, while tablets receiving new
-data will grow without bounds.</p>
-</li>
-<li>
-<p>In cases where you want to partition data based on its category, such as 
sales
-region or product type, without non-covering range partitions you must know all
-of the partitions ahead of time or manually recreate your table if partitions
-need to be added or removed, such as the introduction or elimination of a 
product
-type.</p>
-</li>
-</ul>
-</div>
-<div class="paragraph">
-<p>Non-covering range partitions have some caveats. Be sure to read the
-link:/docs/schema_design.html [Schema Design guide].</p>
-</div>
-<div class="paragraph">
-<p>This example creates a tablet per year (5 tablets total), for storing log 
data.
-The table only accepts data from 2012 to 2016. Keys outside of these
-ranges will be rejected.</p>
-</div>
-<div class="listingblock">
-<div class="content">
-<pre class="highlight"><code class="language-sql" data-lang="sql">CREATE TABLE 
sales_by_year (
-  year INT, sale_id INT, amount INT,
-  PRIMARY KEY (sale_id, year)
-)
-PARTITION BY RANGE (year) (
-  PARTITION VALUE = 2012,
-  PARTITION VALUE = 2013,
-  PARTITION VALUE = 2014,
-  PARTITION VALUE = 2015,
-  PARTITION VALUE = 2016
-)
-STORED AS KUDU;</code></pre>
-</div>
-</div>
-<div class="paragraph">
-<p>When records start coming in for 2017, they will be rejected. At that 
point, the <code>2017</code>
-range should be added as follows:</p>
-</div>
-<div class="listingblock">
-<div class="content">
-<pre class="highlight"><code class="language-sql" data-lang="sql">ALTER TABLE 
sales_by_year ADD RANGE PARTITION VALUE = 2017;</code></pre>
-</div>
-</div>
-<div class="paragraph">
-<p>In use cases where a rolling window of data retention is required, range 
partitions
-may also be dropped. For example, if data from 2012 should no longer be 
retained,
-it may be deleted in bulk:</p>
-</div>
-<div class="listingblock">
-<div class="content">
-<pre class="highlight"><code class="language-sql" data-lang="sql">ALTER TABLE 
sales_by_year DROP RANGE PARTITION VALUE = 2012;</code></pre>
-</div>
-</div>
-<div class="paragraph">
-<p>Note that, just like dropping a table, this irrecoverably deletes all data
-stored in the dropped partition.</p>
-</div>
-</div>
-<div class="sect3">
-<h4 id="partitioning_rules_of_thumb"><a class="link" 
href="#partitioning_rules_of_thumb">Partitioning Rules of Thumb</a></h4>
-<div class="ulist">
-<ul>
-<li>
-<p>For large tables, such as fact tables, aim for as many tablets as you have
-cores in the cluster.</p>
-</li>
-<li>
-<p>For small tables, such as dimension tables, ensure that each tablet is at
-least 1 GB in size.</p>
-</li>
-</ul>
-</div>
-<div class="paragraph">
-<p>In general, be mindful the number of tablets limits the parallelism of 
reads,
-in the current implementation. Increasing the number of tablets significantly
-beyond the number of cores is likely to have diminishing returns.</p>
-</div>
-</div>
-</div>
-<div class="sect2">
-<h3 id="_inserting_data_into_kudu_tables"><a class="link" 
href="#_inserting_data_into_kudu_tables">Inserting Data Into Kudu 
Tables</a></h3>
-<div class="paragraph">
-<p>Impala allows you to use standard SQL syntax to insert data into Kudu.</p>
-</div>
-<div class="sect3">
-<h4 id="_inserting_single_values"><a class="link" 
href="#_inserting_single_values">Inserting Single Values</a></h4>
-<div class="paragraph">
-<p>This example inserts a single row.</p>
-</div>
-<div class="listingblock">
-<div class="content">
-<pre class="highlight"><code class="language-sql" data-lang="sql">INSERT INTO 
my_first_table VALUES (99, "sarah");</code></pre>
-</div>
-</div>
-<div class="paragraph">
-<p>This example inserts three rows using a single statement.</p>
-</div>
-<div class="listingblock">
-<div class="content">
-<pre class="highlight"><code class="language-sql" data-lang="sql">INSERT INTO 
my_first_table VALUES (1, "john"), (2, "jane"), (3, "jim");</code></pre>
-</div>
-</div>
-</div>
-<div class="sect3">
-<h4 id="kudu_impala_insert_bulk"><a class="link" 
href="#kudu_impala_insert_bulk">Inserting In Bulk</a></h4>
-<div class="paragraph">
-<p>When inserting in bulk, there are at least three common choices. Each may 
have advantages
-and disadvantages, depending on your data and circumstances.</p>
-</div>
-<div class="dlist">
-<dl>
-<dt class="hdlist1">Multiple single <code>INSERT</code> statements</dt>
-<dd>
-<p>This approach has the advantage of being easy to
-understand and implement. This approach is likely to be inefficient because 
Impala
-has a high query start-up cost compared to Kudu&#8217;s insertion performance. 
This will
-lead to relatively high latency and poor throughput.</p>
-</dd>
-<dt class="hdlist1">Single <code>INSERT</code> statement with multiple 
<code>VALUES</code></dt>
-<dd>
-<p>If you include more
-than 1024 <code>VALUES</code> statements, Impala batches them into groups of 
1024 (or the value
-of <code>batch_size</code>) before sending the requests to Kudu. This approach 
may perform
-slightly better than multiple sequential <code>INSERT</code> statements by 
amortizing the query start-up
-penalties on the Impala side. To set the batch size for the current Impala
-Shell session, use the following syntax: <code>set batch_size=10000;</code></p>
-<div class="admonitionblock note">
-<table>
-<tr>
-<td class="icon">
-<i class="fa icon-note" title="Note"></i>
-</td>
-<td class="content">
-Increasing the Impala batch size causes Impala to use more memory. You should
-verify the impact on your cluster and tune accordingly.
-</td>
-</tr>
-</table>
-</div>
-</dd>
-<dt class="hdlist1">Batch Insert</dt>
-<dd>
-<p>The approach that usually performs best, from the standpoint of
-both Impala and Kudu, is usually to import the data using a <code>SELECT 
FROM</code> statement
-in Impala.</p>
-<div class="olist arabic">
-<ol class="arabic">
-<li>
-<p>If your data is not already in Impala, one strategy is to
-<a 
href="http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/impala_txtfile.html";>import
 it from a text file</a>,
-such as a TSV or CSV file.</p>
-</li>
-<li>
-<p><a href="#kudu_impala_create_table">Create the Kudu table</a>, being 
mindful that the columns
-designated as primary keys cannot have null values.</p>
-</li>
-<li>
-<p>Insert values into the Kudu table by querying the table containing the 
original
-data, as in the following example:</p>
-<div class="listingblock">
-<div class="content">
-<pre class="highlight"><code class="language-sql" data-lang="sql">INSERT INTO 
my_kudu_table
-  SELECT * FROM legacy_data_import_table;</code></pre>
-</div>
-</div>
-</li>
-</ol>
-</div>
-</dd>
-<dt class="hdlist1">Ingest using the C++ or Java API</dt>
-<dd>
-<p>In many cases, the appropriate ingest path is to
-use the C++ or Java API to insert directly into Kudu tables. Unlike other 
Impala tables,
-data inserted into Kudu tables via the API becomes available for query in 
Impala without
-the need for any <code>INVALIDATE METADATA</code> statements or other 
statements needed for other
-Impala storage types.</p>
-</dd>
-</dl>
-</div>
-</div>
-<div class="sect3">
-<h4 id="insert_ignore"><a class="link" 
href="#insert_ignore"><code>INSERT</code> and Primary Key Uniqueness 
Violations</a></h4>
-<div class="paragraph">
-<p>In most relational databases, if you try to insert a row that has already 
been inserted, the insertion
-will fail because the primary key would be duplicated. See <a 
href="#impala_insertion_caveat">Failures During <code>INSERT</code>, 
<code>UPDATE</code>, and <code>DELETE</code> Operations</a>.
-Impala, however, will not fail the query. Instead, it will generate a warning, 
but continue
-to execute the remainder of the insert statement.</p>
-</div>
-<div class="paragraph">
-<p>If the inserted rows are meant to replace existing rows, 
<code>UPSERT</code> may be used instead of <code>INSERT</code>.</p>
-</div>
-<div class="listingblock">
-<div class="content">
-<pre class="highlight"><code class="language-sql" data-lang="sql">INSERT INTO 
my_first_table VALUES (99, "sarah");
-UPSERT INTO my_first_table VALUES (99, "zoe");
--- the current value of the row is 'zoe'</code></pre>
-</div>
-</div>
-</div>
-</div>
-<div class="sect2">
-<h3 id="_updating_a_row"><a class="link" href="#_updating_a_row">Updating a 
Row</a></h3>
-<div class="listingblock">
-<div class="content">
-<pre class="highlight"><code class="language-sql" data-lang="sql">UPDATE 
my_first_table SET name="bob" where id = 3;</code></pre>
-</div>
-</div>
-<div class="admonitionblock important">
-<table>
-<tr>
-<td class="icon">
-<i class="fa icon-important" title="Important"></i>
-</td>
-<td class="content">
-The <code>UPDATE</code> statement only works in Impala when the target table 
is in
-Kudu.
-</td>
-</tr>
-</table>
-</div>
-<div class="sect3">
-<h4 id="_updating_in_bulk"><a class="link" href="#_updating_in_bulk">Updating 
In Bulk</a></h4>
-<div class="paragraph">
-<p>You can update in bulk using the same approaches outlined in
-<a href="#kudu_impala_insert_bulk">Inserting In Bulk</a>.</p>
-</div>
-<div class="listingblock">
-<div class="content">
-<pre class="highlight"><code class="language-sql" data-lang="sql">UPDATE 
my_first_table SET name="bob" where age &gt; 10;</code></pre>
-</div>
-</div>
-</div>
-</div>
-<div class="sect2">
-<h3 id="_deleting_a_row"><a class="link" href="#_deleting_a_row">Deleting a 
Row</a></h3>
-<div class="listingblock">
-<div class="content">
-<pre class="highlight"><code class="language-sql" data-lang="sql">DELETE FROM 
my_first_table WHERE id &lt; 3;</code></pre>
-</div>
-</div>
-<div class="paragraph">
-<p>You can also delete using more complex syntax. A comma in the 
<code>FROM</code> sub-clause is
-one way that Impala specifies a join query. For more information about Impala 
joins,
-see <a 
href="http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/impala_joins.html";
 
class="bare">http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/impala_joins.html</a>.</p>
-</div>
-<div class="listingblock">
-<div class="content">
-<pre class="highlight"><code class="language-sql" data-lang="sql">DELETE c 
FROM my_second_table c, stock_symbols s WHERE c.name = s.symbol;</code></pre>
-</div>
-</div>
-<div class="admonitionblock important">
-<table>
-<tr>
-<td class="icon">
-<i class="fa icon-important" title="Important"></i>
-</td>
-<td class="content">
-The <code>DELETE</code> statement only works in Impala when the target table 
is in
-Kudu.
-</td>
-</tr>
-</table>
-</div>
-<div class="sect3">
-<h4 id="_deleting_in_bulk"><a class="link" href="#_deleting_in_bulk">Deleting 
In Bulk</a></h4>
-<div class="paragraph">
-<p>You can delete in bulk using the same approaches outlined in
-<a href="#kudu_impala_insert_bulk">Inserting In Bulk</a>.</p>
-</div>
-<div class="listingblock">
-<div class="content">
-<pre class="highlight"><code class="language-sql" data-lang="sql">DELETE FROM 
my_first_table WHERE id &lt; 3;</code></pre>
-</div>
-</div>
-</div>
-</div>
-<div class="sect2">
-<h3 id="impala_insertion_caveat"><a class="link" 
href="#impala_insertion_caveat">Failures During <code>INSERT</code>, 
<code>UPDATE</code>, and <code>DELETE</code> Operations</a></h3>
-<div class="paragraph">
-<p><code>INSERT</code>, <code>UPDATE</code>, and <code>DELETE</code> 
statements cannot be considered transactional as
-a whole. If one of these operations fails part of the way through, the keys may
-have already been created (in the case of <code>INSERT</code>) or the records 
may have already
-been modified or removed by another process (in the case of 
<code>UPDATE</code> or <code>DELETE</code>).
-You should design your application with this in mind.</p>
-</div>
-</div>
-<div class="sect2">
-<h3 id="_altering_table_properties"><a class="link" 
href="#_altering_table_properties">Altering Table Properties</a></h3>
-<div class="paragraph">
-<p>You can change Impala&#8217;s metadata relating to a given Kudu table by 
altering the table&#8217;s
-properties. These properties include the table name, the list of Kudu master 
addresses,
-and whether the table is managed by Impala (internal) or externally.</p>
-</div>
-<div class="listingblock">
-<div class="title">Rename an Impala Mapping Table</div>
-<div class="content">
-<pre class="highlight"><code class="language-sql" data-lang="sql">ALTER TABLE 
my_table RENAME TO my_new_table;</code></pre>
-</div>
-</div>
-<div class="admonitionblock note">
-<table>
-<tr>
-<td class="icon">
-<i class="fa icon-note" title="Note"></i>
-</td>
-<td class="content">
-Renaming a table using the <code>ALTER TABLE &#8230;&#8203; RENAME</code> 
statement only renames
-the Impala mapping table, regardless of whether the table is an internal or 
external
-table. This avoids disruption to other applications that may be accessing the
-underlying Kudu table.
-</td>
-</tr>
-</table>
-</div>
-<div class="paragraph">
-<div class="title">Rename the underlying Kudu table for an internal table</div>
-<p>If a table is an internal table, the underlying Kudu table may be renamed by
-changing the <code>kudu.table_name</code> property:</p>
-</div>
-<div class="listingblock">
-<div class="content">
-<pre class="highlight"><code class="language-sql" data-lang="sql">ALTER TABLE 
my_internal_table
-SET TBLPROPERTIES('kudu.table_name' = 'new_name')</code></pre>
-</div>
-</div>
-<div class="paragraph">
-<div class="title">Remapping an external table to a different Kudu table</div>
-<p>If another application has renamed a Kudu table under Impala, it is 
possible to
-re-map an external table to point to a different Kudu table name.</p>
-</div>
-<div class="listingblock">
-<div class="content">
-<pre class="highlight"><code class="language-sql" data-lang="sql">ALTER TABLE 
my_external_table_
-SET TBLPROPERTIES('kudu.table_name' = 'some_other_kudu_table')</code></pre>
-</div>
-</div>
-<div class="listingblock">
-<div class="title">Change the Kudu Master Address</div>
-<div class="content">
-<pre class="highlight"><code class="language-sql" data-lang="sql">ALTER TABLE 
my_table
-SET TBLPROPERTIES('kudu.master_addresses' = 
'kudu-new-master.example.com:7051');</code></pre>
-</div>
-</div>
-<div class="listingblock">
-<div class="title">Change an Internally-Managed Table to External</div>
-<div class="content">
-<pre class="highlight"><code class="language-sql" data-lang="sql">ALTER TABLE 
my_table SET TBLPROPERTIES('EXTERNAL' = 'TRUE');</code></pre>
-</div>
-</div>
-</div>
-<div class="sect2">
-<h3 id="_dropping_a_kudu_table_using_impala"><a class="link" 
href="#_dropping_a_kudu_table_using_impala">Dropping a Kudu Table Using 
Impala</a></h3>
-<div class="paragraph">
-<p>If the table was created as an internal table in Impala, using <code>CREATE 
TABLE</code>, the
-standard <code>DROP TABLE</code> syntax drops the underlying Kudu table and 
all its data. If
-the table was created as an external table, using <code>CREATE EXTERNAL 
TABLE</code>, the mapping
-between Impala and Kudu is dropped, but the Kudu table is left intact, with 
all its
-data.</p>
-</div>
-<div class="listingblock">
-<div class="content">
-<pre class="highlight"><code class="language-sql" data-lang="sql">DROP TABLE 
my_first_table;</code></pre>
-</div>
-</div>
-</div>
-</div>
-</div>
-<div class="sect1">
-<h2 id="_what_s_next"><a class="link" href="#_what_s_next">What&#8217;s 
Next?</a></h2>
-<div class="sectionbody">
-<div class="paragraph">
-<p>The examples above have only explored a fraction of what you can do with 
Impala Shell.</p>
-</div>
-<div class="ulist">
-<ul>
-<li>
-<p>Learn about the <a href="http://impala.io";>Impala project</a>.</p>
-</li>
-<li>
-<p>Read the <a 
href="http://www.cloudera.com/content/www/en-us/documentation/enterprise/latest/topics/impala.html";>Impala
 documentation</a>.</p>
-</li>
-<li>
-<p>View the <a 
href="http://www.cloudera.com/content/www/en-us/documentation/enterprise/latest/topics/impala_langref.html";>Impala
 SQL reference</a>.</p>
-</li>
-<li>
-<p>Read about Impala internals or learn how to contribute to Impala on the <a 
href="https://github.com/cloudera/Impala/wiki";>Impala Wiki</a>.</p>
-</li>
-<li>
-<p>Read about the native <a href="installation.html#view_api">Kudu 
APIs</a>.</p>
-</li>
-</ul>
-</div>
-<div class="sect2">
-<h3 id="_known_issues_and_limitations"><a class="link" 
href="#_known_issues_and_limitations">Known Issues and Limitations</a></h3>
-<div class="ulist">
-<ul>
-<li>
-<p>Kudu tables with a name containing upper case or non-ascii characters must 
be
-assigned an alternate name when used as an external table in Impala.</p>
-</li>
-<li>
-<p>Kudu tables with a column name containing upper case or non-ascii characters
-may not be used as an external table in Impala. Columns may be renamed in Kudu
-to work around this issue.</p>
-</li>
-<li>
-<p>When creating a Kudu table, the <code>CREATE TABLE</code> statement must 
include the
-primary key columns before other columns, in primary key order.</p>
-</li>
-<li>
-<p>Impala can not create Kudu tables with <code>VARCHAR</code> or nested-typed 
columns.</p>
-</li>
-<li>
-<p>Impala cannot update values in primary key columns.</p>
-</li>
-<li>
-<p><code>!=</code> and <code>LIKE</code> predicates are not pushed to Kudu, and
-instead will be evaluated by the Impala scan node. This may decrease 
performance
-relative to other types of predicates.</p>
-</li>
-<li>
-<p>Updates, inserts, and deletes via Impala are non-transactional. If a query
-fails part of the way through, its partial effects will not be rolled back.</p>
-</li>
-<li>
-<p>The maximum parallelism of a single query is limited to the number of 
tablets
-in a table. For good analytic performance, aim for 10 or more tablets per host
-for large tables.</p>
-</li>
-</ul>
-</div>
-</div>
-</div>
-</div>
-    </div>
-    <div class="col-md-3">
-
-  <div id="toc" data-spy="affix" data-offset-top="70">
-  <ul>
-
-      <li>
-
-          <a href="index.html">Introducing Kudu</a> 
-      </li> 
-      <li>
-
-          <a href="release_notes.html">Kudu Release Notes</a> 
-      </li> 
-      <li>
-
-          <a href="quickstart.html">Getting Started with Kudu</a> 
-      </li> 
-      <li>
-
-          <a href="installation.html">Installation Guide</a> 
-      </li> 
-      <li>
-
-          <a href="configuration.html">Configuring Kudu</a> 
-      </li> 
-      <li>
-<span class="active-toc">Using Impala with Kudu</span>
-            <ul class="sectlevel1">
-<li><a href="#_requirements">Requirements</a></li>
-<li><a href="#_configuration">Configuration</a></li>
-<li><a href="#_using_the_impala_shell">Using the Impala Shell</a>
-<ul class="sectlevel2">
-<li><a href="#_internal_and_external_impala_tables">Internal and External 
Impala Tables</a></li>
-<li><a href="#_querying_an_existing_kudu_table_in_impala">Querying an Existing 
Kudu Table In Impala</a></li>
-<li><a href="#kudu_impala_create_table">Creating a New Kudu Table From 
Impala</a></li>
-<li><a 
href="#_optimizing_performance_for_evaluating_sql_predicates">Optimizing 
Performance for Evaluating SQL Predicates</a></li>
-<li><a href="#partitioning_tables">Partitioning Tables</a></li>
-<li><a href="#_inserting_data_into_kudu_tables">Inserting Data Into Kudu 
Tables</a></li>
-<li><a href="#_updating_a_row">Updating a Row</a></li>
-<li><a href="#_deleting_a_row">Deleting a Row</a></li>
-<li><a href="#impala_insertion_caveat">Failures During <code>INSERT</code>, 
<code>UPDATE</code>, and <code>DELETE</code> Operations</a></li>
-<li><a href="#_altering_table_properties">Altering Table Properties</a></li>
-<li><a href="#_dropping_a_kudu_table_using_impala">Dropping a Kudu Table Using 
Impala</a></li>
-</ul>
-</li>
-<li><a href="#_what_s_next">What&#8217;s Next?</a>
-<ul class="sectlevel2">
-<li><a href="#_known_issues_and_limitations">Known Issues and 
Limitations</a></li>
-</ul>
-</li>
-</ul> 
-      </li> 
-      <li>
-
-          <a href="administration.html">Administering Kudu</a> 
-      </li> 
-      <li>
-
-          <a href="troubleshooting.html">Troubleshooting Kudu</a> 
-      </li> 
-      <li>
-
-          <a href="developing.html">Developing Applications with Kudu</a> 
-      </li> 
-      <li>
-
-          <a href="schema_design.html">Kudu Schema Design</a> 
-      </li> 
-      <li>
-
-          <a href="security.html">Kudu Security</a> 
-      </li> 
-      <li>
-
-          <a href="transaction_semantics.html">Kudu Transaction Semantics</a> 
-      </li> 
-      <li>
-
-          <a href="background_tasks.html">Background Maintenance Tasks</a> 
-      </li> 
-      <li>
-
-          <a href="configuration_reference.html">Kudu Configuration 
Reference</a> 
-      </li> 
-      <li>
-
-          <a href="command_line_tools_reference.html">Kudu Command Line Tools 
Reference</a> 
-      </li> 
-      <li>
-
-          <a href="known_issues.html">Known Issues and Limitations</a> 
-      </li> 
-      <li>
-
-          <a href="contributing.html">Contributing to Kudu</a> 
-      </li> 
-      <li>
-
-          <a href="export_control.html">Export Control Notice</a> 
-      </li> 
-  </ul>
-  </div>
-    </div>
-  </div>
-</div>
\ No newline at end of file
[37/52] [abbrv] [partial] kudu git commit: Updating web site for Kudu 1.8.0 release

Reply via email to