drill-site git commit: doc updates for Drill 1.13

bridgetb Tue, 13 Mar 2018 18:13:28 -0700

Repository: drill-site
Updated Branches:
  refs/heads/asf-site b3daf9a11 -> c5a1214d4



doc updates for Drill 1.13


Project: http://git-wip-us.apache.org/repos/asf/drill-site/repo
Commit: http://git-wip-us.apache.org/repos/asf/drill-site/commit/c5a1214d
Tree: http://git-wip-us.apache.org/repos/asf/drill-site/tree/c5a1214d
Diff: http://git-wip-us.apache.org/repos/asf/drill-site/diff/c5a1214d

Branch: refs/heads/asf-site
Commit: c5a1214d4cf36ead0946d1966f87e5fd32e7ceac
Parents: b3daf9a
Author: Bridget Bevens <bbev...@maprtech.com>
Authored: Tue Mar 13 18:13:01 2018 -0700
Committer: Bridget Bevens <bbev...@maprtech.com>
Committed: Tue Mar 13 18:13:01 2018 -0700

----------------------------------------------------------------------
 .../index.html                                  |  32 ++++-
 docs/configuring-drill-memory/index.html        |  49 ++++---
 .../index.html                                  | 133 ++++++++++---------
 docs/start-up-options/index.html                |   6 +-
 feed.xml                                        |   4 +-
 team/index.html                                 |   4 +
 6 files changed, 134 insertions(+), 94 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/drill-site/blob/c5a1214d/docs/configuration-options-introduction/index.html
----------------------------------------------------------------------
diff --git a/docs/configuration-options-introduction/index.html 
b/docs/configuration-options-introduction/index.html
index 7b4e536..7fa513f 100644
--- a/docs/configuration-options-introduction/index.html
+++ b/docs/configuration-options-introduction/index.html
@@ -1153,7 +1153,7 @@
 
     </div>
 
-     Feb 5, 2018
+     Mar 14, 2018
 
     <link href="/css/docpage.css" rel="stylesheet" type="text/css">
 
@@ -1172,13 +1172,13 @@
 
 <h2 id="system-options">System Options</h2>
 
-<p>The sys.options table lists ptions that you can set at the system or 
session level, as described in the section, <a 
href="/docs/planning-and-execution-options">&quot;Planning and Execution 
Options&quot;</a>.  </p>
+<p>The sys.options table lists options that you can set at the system or 
session level, as described in the section, <a 
href="/docs/planning-and-execution-options">&quot;Planning and Execution 
Options&quot;</a>.  </p>
 
 <table><thead>
 <tr>
-<th><strong>Name</strong></th>
-<th><strong>Default</strong></th>
-<th><strong>Description</strong></th>
+<th>Name</th>
+<th>Default</th>
+<th>Description</th>
 </tr>
 </thead><tbody>
 <tr>
@@ -1187,6 +1187,11 @@
 <td>Available as of Drill 1.10. Sets the   workspace for temporary tables. The 
workspace must be writable, file-based,   and point to a location that already 
exists. This option requires the   following format: .&lt;workspace</td>
 </tr>
 <tr>
+<td>drill.exec.memory.operator.output_batch_size</td>
+<td>16777216   (16 MB)</td>
+<td>Available as of Drill 1.13. Limits the   amount of memory that the 
Flatten, Merge Join, and External Sort operators   allocate to outgoing 
batches.</td>
+</tr>
+<tr>
 <td>drill.exec.storage.implicit.filename.column.label</td>
 <td>filename</td>
 <td>Available as of Drill 1.10. Sets the   implicit column name for the 
filename column.</td>
@@ -1212,6 +1217,16 @@
 <td>In a text file, treat empty fields as NULL   values instead of empty 
string.</td>
 </tr>
 <tr>
+<td>drill.exe.spill.fs</td>
+<td>&quot;file:///&quot;</td>
+<td>Introduced   in Drill 1.11. The default file system on the local machine 
into which the   Sort, Hash Aggregate, and Hash Join operators spill data.</td>
+</tr>
+<tr>
+<td>drill.exec.spill.directories</td>
+<td>[&quot;/tmp/drill/spill&quot;]</td>
+<td>Introduced   in Drill 1.11. The list of directories into which the Sort, 
Hash Aggregate,   and Hash Join operators spill data. The list must be an array 
with   directories separated by a comma, for example 
[&quot;/fs1/drill/spill&quot; ,   &quot;/fs2/drill/spill&quot; , 
&quot;/fs3/drill/spill&quot;].</td>
+</tr>
+<tr>
 <td>drill.exec.storage.file.partition.column.label</td>
 <td>dir</td>
 <td>The column label for directory levels in   results of queries of files in 
a directory. Accepts a string input.</td>
@@ -1239,7 +1254,7 @@
 <tr>
 <td>exec.java.compiler.exp_in_method_size</td>
 <td>50</td>
-<td>Introduced in Drill 1.8. For queries with complex or multiple expressions 
in the query logic, this option   limits the number of expressions allowed in 
each method to prevent Drill from   generating code that exceeds the Java limit 
of 64K bytes. If a method   approaches the 64K limit, the Java compiler returns 
a message stating that   the code is too large to compile. If queries return 
such a message, reduce   the value of this option at the session level. The 
default value for this option is 50. The value is the count of   expressions 
allowed in a method. Expressions are added to a method until they   hit the 
Java 64K limit, when a new inner method is created and called from   the 
existing method.          <strong>Note:</strong> This logic has not   been 
implemented for all operators. If a query uses operators for which the   logic 
is not implemented, reducing the setting for this option may not   resolve the 
error. Setting this option at the system level impacts all   queries 
 and can degrade query performance.</td>
+<td>Introduced in Drill 1.8. For queries with   complex or multiple 
expressions in the query logic, this option limits the   number of expressions 
allowed in each method to prevent Drill from generating   code that exceeds the 
Java limit of 64K bytes. If a method approaches the 64K   limit, the Java 
compiler returns a message stating that the code is too large   to compile. If 
queries return such a message, reduce the value of this option   at the session 
level. The default value for this option is 50. The value is   the count of 
expressions allowed in a method. Expressions are added to a   method until they 
hit the Java 64K limit, when a new inner method is created   and called from 
the existing method. Note: This logic has not been implemented for all 
operators. If   a query uses operators for which the logic is not implemented, 
reducing the   setting for this option may not resolve the error. Setting this 
option at the   system level impacts all queries and can degrade query perf
 ormance.</td>
 </tr>
 <tr>
 <td>exec.java_compiler_janino_maxsize</td>
@@ -1457,6 +1472,11 @@
 <td>Defines the maximum amount of direct memory   allocated to a query for 
planning. When multiple queries run concurrently,   each query is allocated the 
amount of memory set by this parameter.Increase   the value of this parameter 
and rerun the query if partition pruning failed   due to insufficient 
memory.</td>
 </tr>
 <tr>
+<td>planner.memory.percent_per_query</td>
+<td>0.05</td>
+<td>Sets   the memory as a percentage of the total direct memory.</td>
+</tr>
+<tr>
 <td>planner.nestedloopjoin_factor</td>
 <td>100</td>
 <td>A heuristic value for influencing the nested   loop join.</td>

http://git-wip-us.apache.org/repos/asf/drill-site/blob/c5a1214d/docs/configuring-drill-memory/index.html
----------------------------------------------------------------------
diff --git a/docs/configuring-drill-memory/index.html 
b/docs/configuring-drill-memory/index.html
index 69e54b7..1083aaf 100644
--- a/docs/configuring-drill-memory/index.html
+++ b/docs/configuring-drill-memory/index.html
@@ -1151,42 +1151,30 @@
 
     </div>
 
-     Jan 30, 2018
+     Mar 14, 2018
 
     <link href="/css/docpage.css" rel="stylesheet" type="text/css">
 
     <div class="int_text" align="left">
       
-        <p>You can configure the amount of direct memory allocated to a 
Drillbit for query processing in any Drill cluster, multitenant or not. The 
default memory for a drillbit is 8G, but Drill prefers 16G or more depending on 
the workload. The total amount of direct memory that a drillbit allocates to 
query operations cannot exceed the limit set.</p>
+        <p>Drill uses Java direct memory. You can configure the amount of 
direct memory allocated to a Drillbit for query processing. The default memory 
for a Drillbit is 8G, but Drill prefers 16G or more depending on the workload. 
The total amount of direct memory that a Drillbit allocates to query operations 
cannot exceed the limit set.</p>
 
-<p>Drill uses Java direct memory and performs well when executing operations 
in memory instead of storing the operations on disk. Drill does not write to 
disk unless absolutely necessary, unlike MapReduce where everything is written 
to disk during each phase of a job.</p>
+<p>Drill performs well when executing operations in memory instead of storing 
the operations on disk. Drill does not write to disk unless absolutely 
necessary, unlike MapReduce where everything is written to disk during each 
phase of a job.</p>
 
-<p>The JVMâs heap memory does not limit the amount of direct memory 
available in
-a drillbit. The on-heap memory for Drill is typically set at 4-8G (default is 
4), which should
-suffice because Drill avoids having data sit in heap memory.</p>
+<p>The JVM heap memory does not limit the amount of direct memory available in 
a Drillbit. The on-heap memory for Drill is typically set at 4-8G (default is 
4), which should
+suffice because Drill avoids having data sit in heap memory.  </p>
 
-<p>As of Drill 1.5, Drill uses a new allocator that improves an operatorâs 
use of direct memory and tracks the memory use more accurately. Due to this 
change, the sort operator (in queries that ran successfully in previous 
releases) may not have enough memory, resulting in a failed query and out of 
memory error instead of spilling to disk.     </p>
+<p>The following sections describe how to modify the memory allocated to each 
Drillbit and queries:  </p>
 
-<h2 id="drillbit-memory">Drillbit Memory</h2>
+<h2 id="modifying-memory-allocated-to-a-drillbit">Modifying Memory Allocated 
to a Drillbit</h2>
 
-<p>The value set for the <a 
href="/docs/configuration-options-introduction/#system-options"><code>planner.memory.max_query_memory_per_node</code></a>
 system option sets the maximum amount of direct memory allocated to the Sort 
and Hash Aggreate operators in each query on a node. If a query plan contains 
multiple Sort and/or Hash Aggregate operators, they all share this memory. The 
default limit is set to 2147483648 bytes (2GB), which should be increased for 
queries on large data sets. If you encounter memory issues when running queries 
with Sort and/or Hash Aggregate operators, increase the value of this option. 
See <a 
href="https://drill.apache.org/docs/sort-based-and-hash-based-memory-constrained-operators/";>Sort-Based
 and Hash-Based Memory Constrained Operators</a> for more information.  </p>
+<p>Modify the memory allocated to each Drillbit in a cluster in the Drillbit 
startup script, 
<code>&lt;drill_installation_directory&gt;/conf/drill-env.sh</code>. You must 
<a href="/docs/starting-drill-in-distributed-mode">restart Drill</a> after you 
modify the script.</p>
 
-<p>If you continue to encounter memory issues after increasing this value, you 
can also reduce the value of the <a 
href="/docs/configuration-options-introduction/"><code>planner.width.max_per_node</code></a>
 option to reduce the level of parallelism per node. However, this may increase 
the amount of time required for a query to complete. </p>
-
-<h3 id="modifying-drillbit-memory">Modifying Drillbit Memory</h3>
-
-<p>You can modify memory for each drillbit node in your cluster. To modify the 
memory for a drillbit, set the DRILL_MAX_DIRECT_MEMORY variable in the drillbit 
startup script, <code>drill-env.sh</code>, located in 
<code>&lt;drill_installation_directory&gt;/conf</code>, as follows:</p>
-<div class="highlight"><pre><code class="language-text" 
data-lang="text">export 
DRILL_MAX_DIRECT_MEMORY=${DRILL_MAX_DIRECT_MEMORY:-&quot;&lt;value&gt;&quot;}
-</code></pre></div>
 <div class="admonition note">
   <p class="first admonition-title">Note</p>
-  <p class="last">If DRILL_MAX_DIRECT_MEMORY is not set, the limit depends on 
the amount of available system memory.  </p>
+  <p class="last">If DRILL_MAX_DIRECT_MEMORY is not set, the limit depends on 
the amount of available direct memory.  </p>
 </div>
 
-<p>After you edit 
<code>&lt;drill_installation_directory&gt;/conf/drill-env.sh</code>, <a 
href="/docs/starting-drill-in-distributed-mode">restart the drillbit</a> on the 
node.</p>
-
-<h3 id="about-the-drillbit-startup-script">About the Drillbit Startup 
Script</h3>
-
 <p>The <code>drill-env.sh</code> file contains the following options:</p>
 <div class="highlight"><pre><code class="language-text" 
data-lang="text">#export DRILL_HEAP=${DRILL_HEAP:-&quot;4Gâ}  
 #export DRILL_MAX_DIRECT_MEMORY=${DRILL_MAX_DIRECT_MEMORY:-&quot;8G&quot;}
@@ -1205,7 +1193,24 @@ DRILL_MAX_DIRECT_MEMORY is the Java direct memory limit 
per node.  </p>
 <div class="highlight"><pre><code class="language-text" 
data-lang="text">export DRILL_JAVA_OPTS=&quot;$DRILL_JAVA_OPTS 
-Ddrill.exec.memory.enable_unsafe_bounds_check=true&quot;  
 </code></pre></div>
 <p>For earlier versions of Drill (prior to 1.13), bounds checking is enabled 
by default. To disable bounds checking, set the 
<code>drill.enable_unsafe_memory_access</code> parameter to true, as shown:  
</p>
-<div class="highlight"><pre><code class="language-text" 
data-lang="text">export DRILL_JAVA_OPTS=&quot;$DRILL_JAVA_OPTS 
-Ddrill.enable_unsafe_memory_access=true&quot;
+<div class="highlight"><pre><code class="language-text" 
data-lang="text">export DRILL_JAVA_OPTS=&quot;$DRILL_JAVA_OPTS 
-Ddrill.enable_unsafe_memory_access=true&quot;  
+</code></pre></div>
+<h2 id="modifying-memory-allocated-to-queries">Modifying Memory Allocated to 
Queries</h2>
+
+<p>You can configure the amount of memory that Drill allocates to each query 
as a hard limit or a percentage of the total direct memory. The 
<code>planner.memory.max_query_memory_per_node</code> and 
<code>planner.memory.percent_per_query</code> options set the amount of memory 
that Drill can allocate to a query on a node. Both options are enabled by 
default. Of these two options, Drill picks the setting that provides the most 
memory. For more information about these options, see <a 
href="https://drill.apache.org/docs/sort-based-and-hash-based-memory-constrained-operators/";>Sort-Based
 and Hash-Based Memory Constrained Operators</a>.  </p>
+
+<p>If you modify the memory allocated per query and continue to experience 
out-of-memory errors, you can try reducing the value of the <a 
href="/docs/configuration-options-introduction/"><code>planner.width.max_per_node</code></a>
 option. Reducing the value of this option reduces the level of parallelism per 
node. However, this may increase the amount of time required for a query to 
complete.  </p>
+
+<p>Another option you can modify is the 
<code>drill.exec.memory.operator.output_batch_size</code> option, introduced in 
Drill 1.13. The  <code>drill.exec.memory.operator.output_batch_size</code> 
option limits the amount of memory that the Flatten, Merge Join, and External 
Sort operators allocate to outgoing batches. Limiting the memory allocated to 
outgoing batches can improve concurrency and prevent queries from failing with 
out-of-memory errors.</p>
+
+<p>The average row size of the outgoing batch (calculated from the incoming 
batch size) determines the number of rows that can fit into the available 
memory for the batch. If your queries fail with memory errors, reduce the value 
of the <code>drill.exec.memory.operator.output_batch_size</code> option to 
reduce the output batch size. </p>
+
+<p>The default value is 16777216 (16 MB). The maximum allowed value is 
536870912 (512 MB). Enter the value in bytes. </p>
+
+<p><strong>Note:</strong> Configuring a batch size less than 1 MB is not 
recommended, as it could lead to performance issues. </p>
+
+<p>Use the ALTER SYSTEM SET command to change the settings, as shown:  </p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">   
ALTER SYSTEM SET `drill.exec.memory.operator.output_batch_size` = &lt;value&gt;;
 </code></pre></div>
     
       

http://git-wip-us.apache.org/repos/asf/drill-site/blob/c5a1214d/docs/sort-based-and-hash-based-memory-constrained-operators/index.html
----------------------------------------------------------------------
diff --git 
a/docs/sort-based-and-hash-based-memory-constrained-operators/index.html 
b/docs/sort-based-and-hash-based-memory-constrained-operators/index.html
index f5743c5..a5df4c1 100644
--- a/docs/sort-based-and-hash-based-memory-constrained-operators/index.html
+++ b/docs/sort-based-and-hash-based-memory-constrained-operators/index.html
@@ -1153,95 +1153,106 @@
 
     </div>
 
-     Aug 18, 2017
+     Mar 14, 2018
 
     <link href="/css/docpage.css" rel="stylesheet" type="text/css">
 
     <div class="int_text" align="left">
       
-        <p>Drill uses hash-based and sort-based operators depending on the 
query characteristics. Hash Aggregate and Hash Join are hash-based operators. 
Sort, Streaming Aggregate, and Merge Join are sort-based operators. Both 
hash-based and sort-based operations consume memory, however the Hash Aggregate 
and Hash Join operators are the fastest and most memory intensive operators. 
</p>
+        <p>Drill uses operators to sort, join, and aggregate data when 
executing queries. Drill uses the Sort operator to sort data. Drill can use the 
Hash Aggregate or Hash Join operators to aggregate data, or Drill can sort the 
data and then use the Merge Join or Streaming Aggregate operators to aggregate 
the data. </p>
 
-<p>When planning a query with sort- and hash-based operations, Drill evaluates 
the available memory multiplied by a configurable reduction constant (for 
parallelization purposes) and then limits the operations to the maximum of this 
amount of memory. Drill spills data to disk if the sort and hash aggregate 
operations cannot be performed in memory. Alternatively, you can disable large 
hash operations if they do not fit in memory on your system. When disabled, 
Drill creates alternative plans. You can also modify the minimum hash table 
size, increasing the size for very large aggregations or joins when you have 
large amounts of memory for Drill to use. If you have large data sets, you can 
increase the hash table size to improve performance. </p>
+<p>The Hash operators typically perform better, however they are more memory 
intensive than the Merge Join and Streaming Aggregate operators. The Sort 
operator may use as much or even more memory than the Hash operators. If you 
want to see the difference in memory consumption between the operators, you can 
run a query and view the query profile in the Drill Web Console. Optionally, 
you can disable the Hash operators to force Drill to use the Merge Join and 
Streaming Aggregate operators. </p>
 
-<h2 id="memory-options">Memory Options</h2>
+<p>When a query requires sorting, joining, and aggregation, Drill equally 
divides the memory available among each instance of these memory intensive 
operators in a query. The number of instances is equivalent to the number of 
these operators in the query plan, each multiplied by its degree of 
parallelism. The degree of parallelism is the number of minor fragments 
required to perform the work for each instance of an operator. When an instance 
of an operator must process more data than it can hold, the operator 
temporarily spills some of the data to a directory on disk to complete its 
work.  </p>
 
-<p>The <code>planner.memory.max_query_memory_per_node</code> option sets the 
maximum amount of direct memory allocated to the Sort and Hash Aggregate 
operators during each query on a node. The default limit is set to 2147483648 
bytes (2GB), which should be increased for queries on large data sets. This 
memory is split between operators. If a query plan contains multiple Sort 
and/or Hash Aggregate operators, the memory is divided between them.</p>
-
-<p>When a query is parallelized, the number of operators is multiplied, which 
reduces the amount of memory given to each instance of the Sort and Hash 
Aggregate operators during a query. If you encounter memory issues when running 
queries with Sort and Hash Aggregate operators, calculate the memory 
requirements for your queries and the amount of available memory on each node. 
Based on the information, increase the value of the 
<code>planner.memory.max_query_memory_per_node</code> option using the ALTER 
SYSTEM|SESSION SET command, as shown:  </p>
-<div class="highlight"><pre><code class="language-text" data-lang="text">ALTER 
SYSTEM|SESSION SET `planner.memory.max_query_memory_per_node` = 
&lt;new_value&gt;  
-</code></pre></div>
-<p>The <code>planner.memory.enable_memory_estimation</code> option toggles the 
state of memory estimation and re-planning of a query. When enabled, Drill 
conservatively estimates memory requirements and typically excludes 
memory-constrained operators from the query plan, which can negatively impact 
performance. The default setting is false. If you want Drill to use very 
conservative memory estimates, use the ALTER SYSTEM|SESSION SET command to 
change the setting, as shown:  </p>
-<div class="highlight"><pre><code class="language-text" data-lang="text">ALTER 
SYSTEM|SESSION SET `planner.memory.enable_memory_estimation` = true  
-</code></pre></div>
 <h2 id="spill-to-disk">Spill to Disk</h2>
 
-<p>Spilling data to disk prevents queries that use memory-intensive Sort and 
Hash Aggregate operations from failing with out-of-memory errors. Drill 
automatically writes excess data to a temporary directory on disk when queries 
with Sort or Hash Aggregate operations exceed the set memory limit on a Drill 
node. When the operators finish processing the in-memory data, Drill reads the 
spilled data back from disk, and the operators finish processing the data. When 
the operations complete, Drill removes the data from disk.  </p>
+<p>Spilling to disk prevents queries that use memory intensive operations from 
failing with out-of-memory errors. The Spill to Disk feature enables the Sort, 
Hash Aggregate, and Hash Join operators to automatically write excess data (as 
files) to a temporary directory on disk when the memory requirements for the 
operators exceed the set memory limit. Queries run uninterrupted while the 
operators perform the spill operations in the background.</p>
 
-<p>Spilling data to disk enables queries to run uninterrupted while Drill 
performs the spill operations in the background. However, there can be 
performance impact due to the time required to spill data and then read the 
data back from disk.  </p>
+<p>When the Sort, Hash Aggregate, and Hash Join operators finish processing 
the data in memory, they read the spilled data back from disk and then finish 
processing the data. The operators clean up their data (files) from the 
temporary spill location after they finish processing the data. </p>
 
-<div class="admonition note">
-  <p class="first admonition-title">Note</p>
-  <p class="last">Drill 1.11 and later supports spilling to disk for the Hash 
Aggregate operator in addition to the Sort operator. Previous releases of Drill 
only supported spilling to disk for the Sort operator.  </p>
-</div>  
+<p>Ideally, you want to allocate enough memory for Drill to perform all 
operations in memory. When data spills to disk, you will not see any difference 
in terms of how queries run, however spilling to disk can impact performance 
due to the additional I/O required to write data to disk and read the data 
back. See Memory Allocation (page 4) for more information. </p>
 
-<h3 id="spill-locations">Spill Locations</h3>
+<p><strong>Note:</strong> Drill 1.13 and later supports spilling to disk for 
the Hash Join, Hash Aggregate, and Sort operators. Drill 1.11 and 1.12 supports 
spilling to disk for the Hash Aggregate and Sort operators. Releases of Drill 
prior to 1.11 only support spilling to disk for the Sort operator.  </p>
 
-<p>Drill writes data to a temporary work area on disk. The default location of 
the temporary work area is /tmp/drill/spill on the local file system. The 
/tmp/drill/spill directory should suffice for small workloads or examples, 
however it is highly recommended that you redirect the default spill location 
to a location with enough disk space to support spilling for large workloads.  
</p>
+<p><strong>Spill Locations</strong> </p>
 
-<div class="admonition note">
-  <p class="first admonition-title">Note</p>
-  <p class="last">Spilled data may require more space than the table 
referenced in the query that is spilling the data. For example, if a table is 
100 GB per node, the spill directory should have the capacity to hold more than 
100 GB.  </p>
-</div>
- 
+<p>The Sort, Hash Aggregate, and Hash Join operators write data to a temporary 
work area on disk when they cannot process all of the data in memory. The 
default location of the temporary work area is /tmp/drill/spill on the local 
file system. </p>
+
+<p>The /tmp/drill/spill directory should suffice for small workloads or 
examples, however it is highly recommended that you redirect the default spill 
location to a location with enough disk space to support spilling for large 
workloads.</p>
 
-<p>When you configure the spill location, you can specify a single directory, 
or a list of directories into which the sort and hash aggregate operators both 
spill. Alternatively, you can set specific spill directories for each type of 
operator, however this is not recommended as these options will be deprecated 
in future releases of Drill. For more information, see the Spill to Disk 
Configuration Options section below.  </p>
+<p><strong>Note:</strong> Spilled data may require more space than the table 
referenced in the query that is spilling the data. For example, if a table is 
100 GB per node, the spill directory should have the capacity to hold more than 
100 GB.</p>
 
-<h3 id="spill-to-disk-configuration-options">Spill to Disk Configuration 
Options</h3>
+<p>When you configure the spill location, you can specify a single directory 
or a list of directories into which the Sort, Hash Aggregate, and Hash Join 
operators spill data. For more information, see the Spill to Disk Configuration 
Options section below.  </p>
 
-<p>The options related to spilling reside in the drill-override.conf file on 
each Drill node. An administrator or someone familiar with storage and disks 
should manage these settings.</p>
+<p><strong>Spill to Disk Configuration Options</strong>  </p>
 
-<div class="admonition note">
-  <p class="first admonition-title">Note</p>
-  <p class="last">You can see examples of these configuration options in the 
drill-override-example.conf file located in the <drill_installation>/conf 
directory.  </p>
-</div> 
+<p>The drill-override.conf file, located in the /conf directory, contains 
options that set the spill locations for the Hash and Sort operators. An 
administrator can change the file system and directories into which the 
operators spill data. Refer to the drill-override-example.conf file for 
examples. </p>
 
-<p>The following list describes the configuration options for spilling data to 
disk:  </p>
+<p>The following list describes the spill to disk configuration options:  </p>
 
 <ul>
-<li><p><strong>drill.exe.spill.fs</strong><br>
-Introduced in Drill 1.11. The default file system on the local machine into 
which the Sort and Hash Aggregate operators spill data. This is the recommended 
option to use for spilling. You can configure this option so that data spills 
into a distributed file system, such as hdfs. For example, 
&quot;hdfs:///&quot;. The default setting is &quot;file:///&quot;.  </p></li>
-<li><p><strong>drill.exec.spill.directories</strong><br>
-Introduced in Drill 1.11. The list of directories into which the Sort and Hash 
Aggregate operators spill data. The list must be an array with directories 
separated by a comma, for example [&quot;/fs1/drill/spill&quot; , 
&quot;/fs2/drill/spill&quot; , &quot;/fs3/drill/spill&quot;]. This is the 
recommended option for spilling to multiple directories. The default setting is 
[&quot;/tmp/drill/spill&quot;].  </p></li>
-<li><p><strong>drill.exec.sort.external.spill.fs</strong><br>
-Overrides the default location into which the Sort operator spills data. 
Instead of spilling into the location set by the 
<code>drill.exec.spill.fs</code> option, the Sort operators spill into the 
location specified by this option.<br>
-<strong>Note:</strong> As of Drill 1.11, this option is supported for backward 
compatibility, however in future releases, this option will be deprecated. It 
is highly recommended that you use the <code>drill.exec.spill.fs</code> option 
to set the spill location instead. The default setting is &quot;file:///&quot;. 
 </p></li>
-<li><p><strong>drill.exec.sort.external.spill.directories</strong><br>
-Overrides the location into which the Sort operator spills data. Instead of 
spilling into the location set by the <code>drill.exec.spill.directories</code> 
option, the Sort operators spill into the directories specified by this option. 
The list must be an array with directories separated by a comma, for example 
[&quot;/fs1/drill/spill&quot; , &quot;/fs2/drill/spill&quot; , 
&quot;/fs3/drill/spill&quot;].<br>
-<strong>Note:</strong> As of Drill 1.11, this option is supported for backward 
compatibility, however in future releases, this option will be deprecated. It 
is highly recommended that you use the 
<code>drill.exec.spill.directories</code> option to set the spill location 
instead. The default setting is [&quot;/tmp/drill/spill&quot;].  </p></li>
-<li><p><strong>drill.exec.hashagg.spill.fs</strong><br>
-Overrides the location into which the Hash Aggregate operator spills data. 
Instead of spilling into the location set by the 
<code>drill.exec.spill.fs</code> option, the Hash Aggregate operator spills 
into the location specified by this option. Setting this option to 1 disables 
spilling for the Hash Aggregate operator.<br>
-<strong>Note:</strong> As of Drill 1.11, this option is supported for backward 
compatibility, however in future releases, this option will be deprecated. It 
is highly recommended that you use the <code>drill.exec.spill.fs</code> option 
to set the spill location instead. The default setting is &quot;file:///&quot;. 
 </p></li>
-<li><p><strong>drill.exec.hashagg.spill.directories</strong><br>
-Overrides the location into which the Hash Aggregate operator spills data. 
Instead of spilling into the location set by the 
<code>drill.exec.spill.directories</code> option, the Hash Aggregate operator 
spills to the directories specified by this option. The list must be an array 
with directories separated by a comma, for example 
[&quot;/fs1/drill/spill&quot; , &quot;/fs2/drill/spill&quot; , 
&quot;/fs3/drill/spill&quot;].<br>
-<strong>Note:</strong> As of Drill 1.11, this option is supported for backward 
compatibility, however in future releases, this option will be deprecated. It 
is highly recommended that you use the <code>drill.exec.spill.directories 
option</code> to set the spill location instead.  </p></li>
+<li><strong>drill.exe.spill.fs</strong><br>
+Introduced in Drill 1.11. The default file system on the local machine into 
which the Sort, Hash Aggregate, and Hash Join operators spill data. You can 
configure this option so that data spills into a distributed file system, such 
as hdfs. For example, &quot;hdfs:///&quot;. The default setting is 
&quot;file:///&quot;.</li>
+<li><strong>drill.exec.spill.directories</strong><br>
+Introduced in Drill 1.11. The list of directories into which the Sort, Hash 
Aggregate, and Hash Join operators spill data. The list must be an array with 
directories separated by a comma, for example [&quot;/fs1/drill/spill&quot; , 
&quot;/fs2/drill/spill&quot; , &quot;/fs3/drill/spill&quot;]. The default 
setting is [&quot;/tmp/drill/spill&quot;].<br></li>
 </ul>
 
-<h2 id="hash-based-operator-configuration-settings">Hash-Based Operator 
Configuration Settings</h2>
+<p><strong>Note:</strong> The following options were available prior to Drill 
1.11, but have since been deprecated and replaced with the options described 
above:  </p>
+
+<ul>
+<li>Drill.exec.sort.external.spill.fs (Replaced by drill.exec.spill.fs)</li>
+<li>Drill.exec.sort.external.spill.directories (Replaced by 
drill.exec.spill.directories)</li>
+<li>Drill.exec.hashagg.spill.fs (Replaced by drill.exec.spill.fs)<br></li>
+</ul>
+
+<h2 id="memory-allocation">Memory Allocation</h2>
+
+<p>Drill evenly splits the available memory among all instances of the Sort, 
Hash Aggregate, and Hash Join operators. When a query is parallelized, the 
number of operators is multiplied, which reduces the amount of memory given to 
each instance of the operators during a query.  </p>
+
+<p><strong>Memory Allocation Configuration Options</strong>  </p>
+
+<p>The <code>planner.memory.max_query_memory_per_node</code> and 
<code>planner.memory.percent_per_query</code> options set the amount of memory 
that Drill can allocate to a query on a node. Both options are enabled by 
default. Of these two options, Drill picks the setting that provides the most 
memory.  </p>
+
+<ul>
+<li><strong>planner.memory.max_query_memory_per_node</strong><br>
+The <code>planner.memory.max_query_memory_per_node</code> option, set at 2 GB 
by default, is the minimum amount of memory available to Drill per query on a 
node. The default of 2 GB typically allows between two and three concurrent 
queries to run when the JVM is configured to use 8 GB of direct memory 
(default). When the memory requirement for Drill increases, the default of 2GB 
is constraining. You must increase the amount of memory for queries to 
complete, unless the setting for the planner.memory.percent_per_query option 
allows for Drill to use more memory.</li>
+<li><strong>planner.memory.percent_per_query</strong><br>
+Alternatively, the <code>planner.memory.percent_per_query</code> option sets 
the memory as a percentage of the total direct memory. For example, if the 
allocation is set to 10%, and the total direct memory is 128 GB, each query 
gets approximately 13 GB.<br></li>
+</ul>
+
+<p>The percentage is calculated using the following formula:  </p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">   (1 
- non-managed allowance)/concurrency
+</code></pre></div>
+<p>The non-managed allowance is an assumed amount of system memory that 
non-managed operators will use. Non-managed operators do not spill to disk. The 
default non-managed allowance assumes 50% of the total system memory. And, the 
concurrency is the number of concurrent queries that may run. The default 
assumption is 10.</p>
+
+<p>Based on the default assumptions, the default value of 5% is calculated as 
follows:  </p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">   (1 
- .50)/10 = 0.05  
+</code></pre></div>
+<p>This value is only used when throttling is disabled. Setting the value to 0 
disables the option. You can increase or decrease the value, however you should 
set the percentage well below the JVM direct memory to account for the cases 
where Drill does not manage memory, such as for the less memory intensive 
operators.  </p>
+
+<p><strong>Increasing the Available Memory</strong>  </p>
+
+<p>You can increase the amount of available memory to Drill using the ALTER 
SYSTEM|SESSION SET commands with the 
<code>planner.memory.max_query_memory_per_node</code> or 
<code>planner.memory.percent_per_query</code> options, as shown:  </p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">   
ALTER SYSTEM|SESSION SET `planner.memory.max_query_memory_per_node` = 
&lt;new_value&gt;
+   //The default value is to 2147483648 bytes (2GB). 
+
+   ALTER SYSTEM|SESSION SET `planner.memory.percent_per_query` = 
&lt;new_value&gt;
+   //The default value is 0.05.  
+</code></pre></div>
+<h2 id="disabling-the-hash-operators">Disabling the Hash Operators</h2>
+
+<p>You can disable the Hash Aggregate and Hash Join operators. When you 
disable these operators, Drill creates alternative query plans that use the 
Sort operator and the Streaming Aggregate or the Merge Join operator. </p>
 
-<p>Use the ALTER SYSTEM|SESSION SET commands with the options below to disable 
the Hash Aggregate and Hash Join operators, modify the hash table size, or 
disable memory estimation. Typically, you set the options at the session level 
unless you want the setting to persist across all sessions.</p>
+<p>Use the ALTER SYSTEM|SESSION SET commands with the following options to 
disable the Hash Aggregate and Hash Join operators. Typically, you set the 
options at the session level unless you want the setting to persist across all 
sessions. </p>
 
-<p>The following options control the hash-based operators:</p>
+<p>The following options control the hash-based operators:  </p>
 
 <ul>
-<li><p><strong>planner.enable_hashagg</strong><br>
-Enables or disables hash aggregation; otherwise, Drill does a sort-based 
aggregation. This option is enabled by default. The default, and recommended, 
setting is true. 
-The Hash Aggregate operator uses an uncontrolled amount of memory, up to 10 
GB, after which the operator runs out of memory. As of Drill 1.11, the Hash 
Aggregate operator can write to disk. </p></li>
-<li><p><strong>planner.enable_hashjoin</strong><br>
-Enables or disables the memory hungry hash join. Drill assumes that a query 
will have adequate memory to complete and tries to use the fastest operations 
possible to complete the planned inner, left, right, or full outer joins using 
a hash table. The Hash Join operator uses an uncontrolled amount of memory, up 
to 10 GB, after which the operator runs out of memory. Currently, this operator 
does not write to disk. Disabling hash join allows Drill to manage arbitrarily 
large data in a small memory footprint. This option is enabled by default. The 
default setting is true.</p></li>
-<li><p><strong>exec.min_hash_table_size</strong><br>
-Starting size for hash tables. Increase this setting based on the memory 
available to improve performance. The default setting for this option is 65536. 
The setting can range from 0 to 1073741824.</p></li>
-<li><p><strong>exec.max_hash_table_size</strong><br>
-Ending size for hash tables. The default setting for this option is 
1073741824. The setting can range from 0 to 1073741824.</p></li>
+<li><strong>planner.enable_hashagg</strong><br>
+Enables or disables hash aggregation; otherwise, Drill does a sort-based 
aggregation. This option is enabled by default. The default, and recommended, 
setting is true. Prior to Drill 1.11, the Hash Aggregate operator used an 
uncontrolled amount of memory (up to 10 GB), after which the operator ran out 
of memory. As of Drill 1.11, the Hash Aggregate operator can write to disk.</li>
+<li><strong>planner.enable_hashjoin</strong><br>
+Enables or disables hash joins. This option is enabled by default. Drill 
assumes that a query will have adequate memory to complete and tries to use the 
fastest operations possible Drill 1.11, the Hash Join operator used an 
uncontrolled amount of memory (up to 10 GB), after which the operator ran out 
of memory. As of Drill 1.13, this operator can write to disk. This option is 
enabled by default.</li>
 </ul>
 
     

http://git-wip-us.apache.org/repos/asf/drill-site/blob/c5a1214d/docs/start-up-options/index.html
----------------------------------------------------------------------
diff --git a/docs/start-up-options/index.html b/docs/start-up-options/index.html
index 4115a7a..2f96249 100644
--- a/docs/start-up-options/index.html
+++ b/docs/start-up-options/index.html
@@ -1153,7 +1153,7 @@
 
     </div>
 
-     Aug 17, 2017
+     Mar 14, 2018
 
     <link href="/css/docpage.css" rel="stylesheet" type="text/css">
 
@@ -1206,9 +1206,9 @@ Defines the persistent storage (PStore) provider. The <a 
href="/docs/persistent-
 <li><p><strong>drill.exec.buffer.size</strong><br>
 Defines the amount of memory available, in terms of record batches, to hold 
data on the downstream side of an operation. Drill pushes data downstream as 
quickly as possible to make data immediately available. This requires Drill to 
use memory to hold the data pending operations. When data on a downstream 
operation is required, that data is immediately available so Drill does not 
have to go over the network to process it. Providing more memory to this option 
increases the speed at which Drill completes a query.  </p></li>
 <li><p><strong>drill.exe.spill.fs</strong><br>
-Introduced in Drill 1.11. The default file system on the local machine into 
which the Sort and Hash Aggregate operators spill data. This is the recommended 
option to use for spilling. You can configure this option so that data spills 
into a distributed file system, such as hdfs. For example, 
&quot;hdfs:///&quot;. The default setting is &quot;file:///&quot;. See <a 
href="/docs/sort-based-and-hash-based-memory-constrained-operators/">Sort-Based 
and Hash-Based Memory Constrained Operators</a> for more information.   
</p></li>
+Introduced in Drill 1.11. The default file system on the local machine into 
which the Sort, Hash Aggregate, and Hash Join operators spill data. This is the 
recommended option to use for spilling. You can configure this option so that 
data spills into a distributed file system, such as hdfs. For example, 
&quot;hdfs:///&quot;. The default setting is &quot;file:///&quot;. See <a 
href="/docs/sort-based-and-hash-based-memory-constrained-operators/">Sort-Based 
and Hash-Based Memory Constrained Operators</a> for more information.   
</p></li>
 <li><p><strong>drill.exec.spill.directories</strong><br>
-Introduced in Drill 1.11. The list of directories into which the Sort and Hash 
Aggregate operators spill data. The list must be an array with directories 
separated by a comma, for example [&quot;/fs1/drill/spill&quot; , 
&quot;/fs2/drill/spill&quot; , &quot;/fs3/drill/spill&quot;]. This is the 
recommended option for spilling to multiple directories. The default setting is 
[&quot;/tmp/drill/spill&quot;]. See <a 
href="/docs/sort-based-and-hash-based-memory-constrained-operators/">Sort-Based 
and Hash-Based Memory Constrained Operators</a> for more information.  </p></li>
+Introduced in Drill 1.11. The list of directories into which the Sort, Hash 
Aggregate, and Hash Join operators spill data. The list must be an array with 
directories separated by a comma, for example [&quot;/fs1/drill/spill&quot; , 
&quot;/fs2/drill/spill&quot; , &quot;/fs3/drill/spill&quot;]. This is the 
recommended option for spilling to multiple directories. The default setting is 
[&quot;/tmp/drill/spill&quot;]. See <a 
href="/docs/sort-based-and-hash-based-memory-constrained-operators/">Sort-Based 
and Hash-Based Memory Constrained Operators</a> for more information.  </p></li>
 <li><p><strong>drill.exec.zk.connect</strong><br>
 Provides Drill with the ZooKeeper quorum to use to connect to data sources. 
Change this setting to point to the ZooKeeper quorum that you want Drill to 
use. You must configure this option on each Drillbit node.  </p></li>
 <li><p><strong>drill.exec.profiles.store.inmemory</strong><br>

http://git-wip-us.apache.org/repos/asf/drill-site/blob/c5a1214d/feed.xml
----------------------------------------------------------------------
diff --git a/feed.xml b/feed.xml
index eee484b..611b6b4 100644
--- a/feed.xml
+++ b/feed.xml
@@ -6,8 +6,8 @@
 </description>
     <link>/</link>
     <atom:link href="/feed.xml" rel="self" type="application/rss+xml"/>
-    <pubDate>Wed, 21 Feb 2018 13:43:37 -0800</pubDate>
-    <lastBuildDate>Wed, 21 Feb 2018 13:43:37 -0800</lastBuildDate>
+    <pubDate>Tue, 13 Mar 2018 18:10:26 -0700</pubDate>
+    <lastBuildDate>Tue, 13 Mar 2018 18:10:26 -0700</lastBuildDate>
     <generator>Jekyll v2.5.2</generator>
     
       <item>

http://git-wip-us.apache.org/repos/asf/drill-site/blob/c5a1214d/team/index.html
----------------------------------------------------------------------
diff --git a/team/index.html b/team/index.html
index d51aa97..19cdaeb 100644
--- a/team/index.html
+++ b/team/index.html
@@ -257,6 +257,10 @@
 <td>Kamesh Bhallamudi</td>
 <td>kameshb</td>
 </tr>
+<tr>
+<td>Kunal Khatua</td>
+<td>kunal</td>
+</tr>
 </tbody></table>
 </div>

drill-site git commit: doc updates for Drill 1.13

Reply via email to