Repository: incubator-trafodion Updated Branches: refs/heads/master 66827b053 -> b6b9f43b6
[TRAFODION-2164] Incremental Update Statistics documentation Project: http://git-wip-us.apache.org/repos/asf/incubator-trafodion/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-trafodion/commit/36446cea Tree: http://git-wip-us.apache.org/repos/asf/incubator-trafodion/tree/36446cea Diff: http://git-wip-us.apache.org/repos/asf/incubator-trafodion/diff/36446cea Branch: refs/heads/master Commit: 36446cead6c0961384bbc402f8f0fcc772d7de4c Parents: 1cb7a1a Author: Dave Birdsall <dbirds...@apache.org> Authored: Thu Sep 15 18:42:28 2016 +0000 Committer: Dave Birdsall <dbirds...@apache.org> Committed: Thu Sep 15 18:42:28 2016 +0000 ---------------------------------------------------------------------- docs/shared/revisions.txt | 1 + .../src/asciidoc/_chapters/about.adoc | 15 +--- .../src/asciidoc/_chapters/sql_statements.adoc | 4 + .../src/asciidoc/_chapters/sql_utilities.adoc | 86 +++++++++++++++++++- 4 files changed, 89 insertions(+), 17 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/incubator-trafodion/blob/36446cea/docs/shared/revisions.txt ---------------------------------------------------------------------- diff --git a/docs/shared/revisions.txt b/docs/shared/revisions.txt index 6afa98d..f58a61d 100644 --- a/docs/shared/revisions.txt +++ b/docs/shared/revisions.txt @@ -26,6 +26,7 @@ [cols="2",options="header"] |=== | Version | Date +| 2.1.0 | TBD | 2.0.1 | July 7, 2016 | 2.0.0 | June 6, 2016 | 1.3.0 | January, 2016 http://git-wip-us.apache.org/repos/asf/incubator-trafodion/blob/36446cea/docs/sql_reference/src/asciidoc/_chapters/about.adoc ---------------------------------------------------------------------- diff --git a/docs/sql_reference/src/asciidoc/_chapters/about.adoc b/docs/sql_reference/src/asciidoc/_chapters/about.adoc index 247e70d..b3b9ef6 100644 --- a/docs/sql_reference/src/asciidoc/_chapters/about.adoc +++ b/docs/sql_reference/src/asciidoc/_chapters/about.adoc @@ -51,20 +51,7 @@ This edition includes updates for these new features: [cols="50%,50%",options="header"] |=== | New Feature | Location in the Manual -| On Line Analytical Process (OLAP) window functions | <<olap_functions,OLAP Functions>> -| Ability to cancel DDL, update statistics, and -additional child query operations in addition to -DML statements | <<control_query_cancel_statement,CONTROL QUERY CANCEL Statement>> -| Authorization required to run the CONTROL QUERY -CANCEL Statement | <<control_query_cancel_statement,CONTROL QUERY CANCEL Statement>> -| Ability to grant privileges on behalf of a role -using the GRANTED BY clause. | <<grant_component_privilege_statement,GRANT COMPONENT PRIVILEGE Statement>> -| Authorization required for all SHOWDDL commands | <<showddl_statement,SHOWDDL Statement>> + -<<SHOWDDL_SCHEMA_Statement,SHOWDDL SCHEMA Statement>> -| Ability to display the DDL syntax of a library object -using the SHOWDDL LIBRARY command | <<showddl_statement,SHOWDDL Statement>> -| Listing of HBase objects using the GET HBASE OBJECTS -command through an SQL interface | <<get_hbase_objects_statement,GET HBASE OBJECTS Statement>> +| Incremental UPDATE STATISTICS | <<update_statistics_statement,UPDATE STATISTICS Statement>> |=== <<< http://git-wip-us.apache.org/repos/asf/incubator-trafodion/blob/36446cea/docs/sql_reference/src/asciidoc/_chapters/sql_statements.adoc ---------------------------------------------------------------------- diff --git a/docs/sql_reference/src/asciidoc/_chapters/sql_statements.adoc b/docs/sql_reference/src/asciidoc/_chapters/sql_statements.adoc index c0c218b..8e2fbb6 100644 --- a/docs/sql_reference/src/asciidoc/_chapters/sql_statements.adoc +++ b/docs/sql_reference/src/asciidoc/_chapters/sql_statements.adoc @@ -2344,6 +2344,10 @@ creates an HBase table if it does not already exist when the table is created. T + specifies the ANSI logical name of the table. See <<database_object_names,Database Object Names>>. This name must be unique among names of tables and views within its schema. ++ +TIP: Avoid using table names that begin with "TRAF_SAMPLE_", as {project-name} SQL assumes that +such tables are persistent sample tables. See also <<update_statistics_incremental_update_statistics, +Incremental Update Statistics>>. * `SALT USING _num_ PARTITIONS [ON (_column_[, _column_]…)]` + http://git-wip-us.apache.org/repos/asf/incubator-trafodion/blob/36446cea/docs/sql_reference/src/asciidoc/_chapters/sql_utilities.adoc ---------------------------------------------------------------------- diff --git a/docs/sql_reference/src/asciidoc/_chapters/sql_utilities.adoc b/docs/sql_reference/src/asciidoc/_chapters/sql_utilities.adoc index d386e64..248f994 100644 --- a/docs/sql_reference/src/asciidoc/_chapters/sql_utilities.adoc +++ b/docs/sql_reference/src/asciidoc/_chapters/sql_utilities.adoc @@ -686,11 +686,12 @@ devise optimized access plans. UPDATE STATISTICS is a {project-name} SQL extension. ``` -UPDATE STATISTICS FOR TABLE table [CLEAR | on-clause] +UPDATE STATISTICS FOR TABLE table [CLEAR | on-clause | sample-table-clause ] on-clause is: ON column-group-list CLEAR | ON column-group-list [histogram-option]... + | ON column-group-list INCREMENTAL WHERE predicate column-group-list is: column-list [,column-list]... @@ -716,8 +717,12 @@ histogram-option is: sample-option is: [r ROWS] - | RANDOM percent PERCENT + | RANDOM percent PERCENT [PERSISTENT] | PERIODIC size ROWS EVERY period ROWS + +sample-table-clause is: + CREATE SAMPLE RANDOM percent PERCENT + | REMOVE SAMPLE ``` [[update_statistics_syntax]] @@ -872,7 +877,6 @@ than specified, or if the sample size is greater than the system limit, A row sample is used to read _r_ rows from the table. The value _r_ must be an integer that is greater than zero -<<< **** `RANDOM _percent_ PERCENT` + directs {project-name} SQL to choose rows randomly from the table. The value @@ -881,6 +885,12 @@ addition, only the first four digits to the right of the decimal point are significant. For example, value 0.00001 is considered to be 0.0000, Value 1.23456 is considered to be 1.2345. +***** `PERSISTENT` ++ +directs {project-name} SQL to create a persistent sample table and store the +random sample in it. This table can then be used later for updating statistics +incrementally. + **** `PERIODIC _size_ ROWS EVERY _period_ ROW` + directs {project-name} SQL to choose the first _size_ number of rows from @@ -890,6 +900,28 @@ _size_ <= _period_). The size of the _period_ is defined by the number of rows specified for _period_. The value _period_ must be an integer that is greater than zero (_period_ > 0). +* `INCREMENTAL WHERE _predicate_` ++ +directs {project-name} SQL to update statistics incrementally. That is, instead +of taking a fresh sample of the entire table, {project-name} SQL will use a previously +created persistent sample table. {project-name} SQL will update the persistent sample +by replacing any rows satisfying the _predicate_ with a fresh sample of rows from +the original table satisfying the _predicate_. The sampling rate used is the +_percent_ specified when the persistent sample table was created. Statistics +are then generated from this updated sample. See also +<<update_statistics_incremental_update_statistics, +Incremental Update Statistics>>. + +* `CREATE SAMPLE RANDOM _percent_ PERCENT` ++ +Creates a persistent sample table associated with this table. The sample is +created using a random sample of _percent_ percent of the rows. The table +can then be used for later incremental statistics update. + +* `REMOVE SAMPLE` ++ +Drops the persistent sample table associated with this table. + [[update_statistics_considerations]] === Considerations for UPDATE STATISTICS @@ -1046,6 +1078,35 @@ NECESSARY COLUMN[S] to generate the needed histograms: update statistics for table _table-name_ on necessary columns sample; ``` +[[update_statistics_incremental_update_statistics]] +==== Incremental Update Statistics + +UPDATE STATISTICS processing time can be lengthy for very large tables. +One strategy for reducing the time is to create histograms only for +columns that actually need them (for example, using the ON NECESSARY COLUMNS +column group). Another strategy is to update statistics incrementally. These +strategies can be used together if desired. + +To use the incremental update statistics feature, you must first create +statistics for the table and create a persistent sample table. One way to +do this is to perform a normal update statistics command, adding the +PERSISTENT keyword to the _sample-option_. Another way to do this if you +already have reasonably up-to-date statistics for the table, is to create +a persistent sample table separately using the CREATE SAMPLE option. + +You can then perform update statistics incrementally by using the INCREMENTAL +WHERE _predicate_ syntax in the on-clause. The _predicate_ should be chosen +to describe the set of rows that have changed since the last statistics update +was performed. For example, if your table contains a column with a timestamp +giving the date and time of last update, this is a particularly useful column +to use in the _predicate_. + +If you decide later that you wish to change the _percent_ sampling rate used +for the persistent sample table, you can do so by dropping the persistent +sample table (using REMOVE SAMPLE) and creating a new one (by using the +CREATE SAMPLE option). Using a higher _percent_ results in more accurate +histograms, but at the price of a longer-running operation. + <<< [[update_statistics_examples]] === Examples of UPDATE STATISTICS @@ -1108,4 +1169,23 @@ table ADDRESS: UPDATE STATISTICS FOR TABLE address ON street CLEAR; ``` +* This example generates statistics but also creates a persistent +sample table for use when updating statistics incrementally: ++ +``` +UPDATE STATISTICS FOR TABLE address +ON (street), (city), (state), (zip, type) +SAMPLE RANDOM 5 PERCENT PERSISTENT; +``` +* This example updates statistics incrementally. It assumes that +a persistent sample table has already been created. The predicate +in the WHERE clause describes the set of rows that have changed +since statistics were last updated. Here we assume that rows +with a state of California are the only rows that have changed: ++ +``` +UPDATE STATISTICS FOR TABLE address +ON EXISTING COLUMNS +INCREMENTAL WHERE state = 'CA'; +```