drill-site git commit: doc update for DRILL-3867

bridgetb Thu, 10 Aug 2017 15:31:47 -0700

Repository: drill-site
Updated Branches:
  refs/heads/asf-site 34faadd80 -> 0b2565cad



doc update for DRILL-3867


Project: http://git-wip-us.apache.org/repos/asf/drill-site/repo
Commit: http://git-wip-us.apache.org/repos/asf/drill-site/commit/0b2565ca
Tree: http://git-wip-us.apache.org/repos/asf/drill-site/tree/0b2565ca
Diff: http://git-wip-us.apache.org/repos/asf/drill-site/diff/0b2565ca

Branch: refs/heads/asf-site
Commit: 0b2565cad2a37fe82b3b1db7ed4162526dd5bac8
Parents: 34faadd
Author: Bridget Bevens <bbev...@maprtech.com>
Authored: Thu Aug 10 15:31:14 2017 -0700
Committer: Bridget Bevens <bbev...@maprtech.com>
Committed: Thu Aug 10 15:31:14 2017 -0700

----------------------------------------------------------------------
 .../index.html                                  | 47 ++++++++++++--------
 feed.xml                                        |  4 +-
 2 files changed, 31 insertions(+), 20 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/drill-site/blob/0b2565ca/docs/optimizing-parquet-metadata-reading/index.html
----------------------------------------------------------------------
diff --git a/docs/optimizing-parquet-metadata-reading/index.html 
b/docs/optimizing-parquet-metadata-reading/index.html
index 6e38ecd..caa93cd 100644
--- a/docs/optimizing-parquet-metadata-reading/index.html
+++ b/docs/optimizing-parquet-metadata-reading/index.html
@@ -1124,40 +1124,51 @@
 
     </div>
 
-     Feb 8, 2016
+     Aug 10, 2017
 
     <link href="/css/docpage.css" rel="stylesheet" type="text/css">
 
     <div class="int_text" align="left">
       
-        <p>Parquet metadata caching is an optional feature in Drill 1.2 and 
later. When you use this feature, Drill generates a metadata cache file. Drill 
stores the metadata cache file in a directory you specify and its 
subdirectories. When you run a query on this directory or a subdirectory, Drill 
reads a single metadata cache file instead of retrieving metadata from multiple 
Parquet files during the query-planning phase.</p>
+        <p>Parquet metadata caching is a feature that enables Drill to read a 
single metadata cache file instead of retrieving metadata from multiple Parquet 
files during the query-planning phase. 
+Parquet metadata caching is available for Parquet data in Drill 1.2 and later. 
To enable Parquet metadata caching, issue the REFRESH TABLE METADATA <path to 
table> command. When you run this command Drill generates a metadata cache 
file.  </p>
 
-<p>Parquet metadata caching is useful only with Parquet data, and does not 
benefit queries on Hive tables, HBase tables, or text files. </p>
+<div class="admonition note">
+  <p class="first admonition-title">Note</p>
+  <p class="last">Parquet metadata caching does not benefit queries on Hive 
tables, HBase tables, or text files.  </p>
+</div>  
+
+<p>Drill stores the metadata cache file in the specified directory and 
subdirectories. When you run a query on this directory or subdirectories, Drill 
reads the metadata cache file instead of retrieving metadata from multiple 
Parquet files during the query-planning phase.     </p>
+
+<p>In Drill 1.11 and later, Drill stores the paths to the Parquet files as 
relative paths instead of absolute paths. You can move partitioned Parquet 
directories from one location in the distributed files system to another 
without issuing the REFRESH TABLE METADATA command to rebuild the Parquet 
metadata files; the metadata remains valid in the new location.   </p>
+
+<div class="admonition note">
+  <p class="first admonition-title">Note</p>
+  <p class="last">Reverting back to a previous version of Drill from 1.11 is 
not recommended because Drill will incorrectly interpret the Parquet metadata 
files created by Drill 1.11. Should this occur, remove the Parquet metadata 
files and run the refresh table metadata command to rebuild the files in the 
older format.  </p>
+</div> 
+ 
 
 <h2 id="when-to-use-parquet-metadata-caching">When to Use Parquet Metadata 
Caching</h2>
 
-<p>The scenarios in which metadata caching is useful is when the planning time 
is a significant percentage of the total elapsed time of the query. If the 
query execution time is the dominant factor, which is typically observed with a 
large number of files, then metadata caching will have very little impact. To 
determine that query execution time is the dominant factor, run an EXPLAIN plan 
on your query of a large number of files, and compare its time to the total 
time of query execution. Use the comparison to determine whether metadata 
caching will be useful.</p>
+<p>Metadata caching is useful when planning time is a significant percentage 
of the total elapsed time of the query. If the query execution time is the 
dominant factor, which is typically observed with a large number of files, then 
metadata caching will have very little impact. To determine that query 
execution time is the dominant factor, run an EXPLAIN plan on your query of a 
large number of files, and compare its time to the total time of query 
execution. Use the comparison to determine whether metadata caching will be 
useful.</p>
 
 <p>When enabled, Drill always uses the Parquet metadata cache during the 
query-planning phase. To optimize reading Parquet metadata, make sure the 
metadata cache is up-to-date after making any changes, such as inserts, to the 
data in the cluster. The next section describes how to update the metadata 
cache.</p>
 
-<h2 id="how-to-trigger-generation-of-the-parquet-metadata-cache-file">How to 
Trigger Generation of the Parquet Metadata Cache File</h2>
+<h2 id="generating-the-parquet-metadata-cache-file">Generating the Parquet 
Metadata Cache File</h2>
 
 <p>The following command generates the Parquet metadata cache file in the 
<code>&lt;path to table&gt;</code> and its subdirectories.</p>
-
-<p><code>REFRESH TABLE METADATA &lt;path to table&gt;</code></p>
-
+<div class="highlight"><pre><code class="language-text" data-lang="text">   
REFRESH TABLE METADATA &lt;path to table&gt;
+</code></pre></div>
 <p>You need to run this command on a directory, nested or flat, only once 
during the session. Only the first query gathers the metadata unless the 
Parquet data changes, for example, you delete some data. If you did not make 
changes to the Parquet data, subsequent queries encounter the up-to-date 
Parquet metadata files. There is no need for Drill to regenerate the metadata. 
If there are changes, the metadata needs updating, so Drill dynamically 
regenerates the Parquet metadata when you issue the next query.</p>
 
-<p>The elapsed time of the first query that triggers regeneration of metadata 
can be greater than that of subsequent queries that use that metadata. If this 
increase in the time of the first query is unacceptable, make sure the cache is 
up-to-date by running the REFRESH TABLE METADATA command.</p>
-
-<h2 id="example-of-generating-parquet-metadata">Example of Generating Parquet 
Metadata</h2>
-<div class="highlight"><pre><code class="language-text" data-lang="text">0: 
jdbc:drill:schema=dfs&gt; REFRESH TABLE METADATA t1;
-+-------+----------------------------------------------+
-|  ok   |                   summary                    |
-+-------+----------------------------------------------+
-| true  | Successfully updated metadata for table t1.  |
-+-------+----------------------------------------------+
-1 row selected (0.445 seconds)
+<p>The elapsed time of the first query that triggers regeneration of metadata 
can be greater than that of subsequent queries that use that metadata. If this 
increase in the time of the first query is unacceptable, make sure the cache is 
up-to-date by running the REFRESH TABLE METADATA command, as shown in the 
following example:</p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">   0: 
jdbc:drill:schema=dfs&gt; REFRESH TABLE METADATA t1;
+   +-------+----------------------------------------------+
+   |  ok   |                   summary                    |
+   +-------+----------------------------------------------+
+   | true  | Successfully updated metadata for table t1.  |
+   +-------+----------------------------------------------+
+   1 row selected (0.445 seconds)  
 </code></pre></div>
 <h2 id="how-drill-generates-and-uses-parquet-metadata">How Drill Generates and 
Uses Parquet Metadata</h2>
 

http://git-wip-us.apache.org/repos/asf/drill-site/blob/0b2565ca/feed.xml
----------------------------------------------------------------------
diff --git a/feed.xml b/feed.xml
index a74c616..9b92428 100644
--- a/feed.xml
+++ b/feed.xml
@@ -6,8 +6,8 @@
 </description>
     <link>/</link>
     <atom:link href="/feed.xml" rel="self" type="application/rss+xml"/>
-    <pubDate>Wed, 09 Aug 2017 16:02:54 -0700</pubDate>
-    <lastBuildDate>Wed, 09 Aug 2017 16:02:54 -0700</lastBuildDate>
+    <pubDate>Thu, 10 Aug 2017 15:27:39 -0700</pubDate>
+    <lastBuildDate>Thu, 10 Aug 2017 15:27:39 -0700</lastBuildDate>
     <generator>Jekyll v2.5.2</generator>
     
       <item>

drill-site git commit: doc update for DRILL-3867

Reply via email to