This is an automated email from the ASF dual-hosted git repository. leesf pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git
The following commit(s) were added to refs/heads/asf-site by this push: new 18ce570 [MINOR] Update doc to include inc query on partitions (#1454) 18ce570 is described below commit 18ce5708e073e80779f6dcc00d388b4cb0cc758a Author: YanJia-Gary-Li <yanjia.gary...@gmail.com> AuthorDate: Sat Mar 28 20:28:48 2020 -0700 [MINOR] Update doc to include inc query on partitions (#1454) --- docs/_docs/0.5.2/2_3_querying_data.cn.md | 31 ++++++++++++++++++++++++++++++- docs/_docs/0.5.2/2_3_querying_data.md | 3 ++- docs/_docs/2_3_querying_data.cn.md | 31 ++++++++++++++++++++++++++++++- docs/_docs/2_3_querying_data.md | 3 ++- 4 files changed, 64 insertions(+), 4 deletions(-) diff --git a/docs/_docs/0.5.2/2_3_querying_data.cn.md b/docs/_docs/0.5.2/2_3_querying_data.cn.md index 74afcef..77ad2d7 100644 --- a/docs/_docs/0.5.2/2_3_querying_data.cn.md +++ b/docs/_docs/0.5.2/2_3_querying_data.cn.md @@ -25,6 +25,33 @@ language: cn 并与其他表(数据集/维度)结合以[写出增量](/cn/docs/0.5.2-writing_data.html)到目标Hudi数据集。增量视图是通过查询上表之一实现的,并具有特殊配置, 该特殊配置指示查询计划仅需要从数据集中获取增量数据。 + +## 查询引擎支持列表 + +下面的表格展示了各查询引擎是否支持Hudi格式 + +### 读优化表 + +|查询引擎|实时视图|增量拉取| +|------------|--------|-----------| +|**Hive**|Y|Y| +|**Spark SQL**|Y|Y| +|**Spark Datasource**|Y|Y| +|**Presto**|Y|N| +|**Impala**|Y|N| + + +### 实时表 + +|查询引擎|实时视图|增量拉取|读优化表| +|------------|--------|-----------|--------------| +|**Hive**|Y|Y|Y| +|**Spark SQL**|Y|Y|Y| +|**Spark Datasource**|N|N|Y| +|**Presto**|N|N|Y| +|**Impala**|N|N|Y| + + 接下来,我们将详细讨论在每个查询引擎上如何访问所有三个视图。 ## Hive @@ -128,7 +155,9 @@ scala> sqlContext.sql("select count(*) from hudi_rt where datestr = '2016-10-02' DataSourceReadOptions.VIEW_TYPE_INCREMENTAL_OPT_VAL()) .option(DataSourceReadOptions.BEGIN_INSTANTTIME_OPT_KEY(), <beginInstantTime>) - .load(tablePath); // For incremental view, pass in the root/base path of dataset + .option(DataSourceReadOptions.INCR_PATH_GLOB_OPT_KEY(), + "/year=2020/month=*/day=*") // 可选,从指定的分区增量拉取 + .load(tablePath); // 用数据集的最底层路径 ``` 请参阅[设置](/cn/docs/0.5.2-configurations.html#spark-datasource)部分,以查看所有数据源选项。 diff --git a/docs/_docs/0.5.2/2_3_querying_data.md b/docs/_docs/0.5.2/2_3_querying_data.md index 0c28b12..9d17e72 100644 --- a/docs/_docs/0.5.2/2_3_querying_data.md +++ b/docs/_docs/0.5.2/2_3_querying_data.md @@ -55,7 +55,7 @@ Note that `Read Optimized` queries are not applicable for COPY_ON_WRITE tables. |**Spark SQL**|Y|Y|Y| |**Spark Datasource**|N|N|Y| |**Presto**|N|N|Y| -|**Impala**|N|N|N| +|**Impala**|N|N|Y| In sections, below we will discuss specific setup to access different query types from different query engines. @@ -148,6 +148,7 @@ The following snippet shows how to obtain all records changed after `beginInstan .format("org.apache.hudi") .option(DataSourceReadOptions.QUERY_TYPE_OPT_KEY(), DataSourceReadOptions.QUERY_TYPE_INCREMENTAL_OPT_VAL()) .option(DataSourceReadOptions.BEGIN_INSTANTTIME_OPT_KEY(), <beginInstantTime>) + .option(DataSourceReadOptions.INCR_PATH_GLOB_OPT_KEY(), "/year=2020/month=*/day=*") // Optional, use glob pattern if querying certain partitions .load(tablePath); // For incremental query, pass in the root/base path of table hudiIncQueryDF.createOrReplaceTempView("hudi_trips_incremental") diff --git a/docs/_docs/2_3_querying_data.cn.md b/docs/_docs/2_3_querying_data.cn.md index b2c4870..1fa91d1 100644 --- a/docs/_docs/2_3_querying_data.cn.md +++ b/docs/_docs/2_3_querying_data.cn.md @@ -24,6 +24,33 @@ language: cn 并与其他表(数据集/维度)结合以[写出增量](/cn/docs/writing_data.html)到目标Hudi数据集。增量视图是通过查询上表之一实现的,并具有特殊配置, 该特殊配置指示查询计划仅需要从数据集中获取增量数据。 + +## 查询引擎支持列表 + +下面的表格展示了各查询引擎是否支持Hudi格式 + +### 读优化表 + +|查询引擎|实时视图|增量拉取| +|------------|--------|-----------| +|**Hive**|Y|Y| +|**Spark SQL**|Y|Y| +|**Spark Datasource**|Y|Y| +|**Presto**|Y|N| +|**Impala**|Y|N| + + +### 实时表 + +|查询引擎|实时视图|增量拉取|读优化表| +|------------|--------|-----------|--------------| +|**Hive**|Y|Y|Y| +|**Spark SQL**|Y|Y|Y| +|**Spark Datasource**|N|N|Y| +|**Presto**|N|N|Y| +|**Impala**|N|N|Y| + + 接下来,我们将详细讨论在每个查询引擎上如何访问所有三个视图。 ## Hive @@ -127,7 +154,9 @@ scala> sqlContext.sql("select count(*) from hudi_rt where datestr = '2016-10-02' DataSourceReadOptions.VIEW_TYPE_INCREMENTAL_OPT_VAL()) .option(DataSourceReadOptions.BEGIN_INSTANTTIME_OPT_KEY(), <beginInstantTime>) - .load(tablePath); // For incremental view, pass in the root/base path of dataset + .option(DataSourceReadOptions.INCR_PATH_GLOB_OPT_KEY(), + "/year=2020/month=*/day=*") // 可选,从指定的分区增量拉取 + .load(tablePath); // 用数据集的最底层路径 ``` 请参阅[设置](/cn/docs/configurations.html#spark-datasource)部分,以查看所有数据源选项。 diff --git a/docs/_docs/2_3_querying_data.md b/docs/_docs/2_3_querying_data.md index 875b7f0..3e6a436 100644 --- a/docs/_docs/2_3_querying_data.md +++ b/docs/_docs/2_3_querying_data.md @@ -54,7 +54,7 @@ Note that `Read Optimized` queries are not applicable for COPY_ON_WRITE tables. |**Spark SQL**|Y|Y|Y| |**Spark Datasource**|N|N|Y| |**Presto**|N|N|Y| -|**Impala**|N|N|N| +|**Impala**|N|N|Y| In sections, below we will discuss specific setup to access different query types from different query engines. @@ -147,6 +147,7 @@ The following snippet shows how to obtain all records changed after `beginInstan .format("org.apache.hudi") .option(DataSourceReadOptions.QUERY_TYPE_OPT_KEY(), DataSourceReadOptions.QUERY_TYPE_INCREMENTAL_OPT_VAL()) .option(DataSourceReadOptions.BEGIN_INSTANTTIME_OPT_KEY(), <beginInstantTime>) + .option(DataSourceReadOptions.INCR_PATH_GLOB_OPT_KEY(), "/year=2020/month=*/day=*") // Optional, use glob pattern if querying certain partitions .load(tablePath); // For incremental query, pass in the root/base path of table hudiIncQueryDF.createOrReplaceTempView("hudi_trips_incremental")