This is an automated email from the ASF dual-hosted git repository. vinoyang pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git
The following commit(s) were added to refs/heads/asf-site by this push: new 8504a50 Update the old hudi version to 0.5.2 (#1407) 8504a50 is described below commit 8504a50a966defa30212cf9df09e10528997294c Author: vinoyang <yanghua1...@gmail.com> AuthorDate: Sun Mar 15 18:56:38 2020 +0800 Update the old hudi version to 0.5.2 (#1407) --- docs/_docs/0.5.2/1_1_quick_start_guide.cn.md | 4 ++-- docs/_docs/0.5.2/1_1_quick_start_guide.md | 4 ++-- docs/_docs/0.5.2/2_2_writing_data.md | 2 +- docs/_docs/0.5.2/2_3_querying_data.md | 4 ++-- docs/_docs/0.5.2/2_6_deployment.md | 4 ++-- 5 files changed, 9 insertions(+), 9 deletions(-) diff --git a/docs/_docs/0.5.2/1_1_quick_start_guide.cn.md b/docs/_docs/0.5.2/1_1_quick_start_guide.cn.md index a8f4d49..c56774b 100644 --- a/docs/_docs/0.5.2/1_1_quick_start_guide.cn.md +++ b/docs/_docs/0.5.2/1_1_quick_start_guide.cn.md @@ -15,7 +15,7 @@ Hudi适用于Spark-2.x版本。您可以按照[此处](https://spark.apache.org/ 在提取的目录中,使用spark-shell运行Hudi: ```scala -bin/spark-shell --packages org.apache.hudi:hudi-spark-bundle:0.5.0-incubating --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' +bin/spark-shell --packages org.apache.hudi:hudi-spark-bundle:0.5.2-incubating --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' ``` 设置表名、基本路径和数据生成器来为本指南生成记录。 @@ -153,7 +153,7 @@ spark.sql("select `_hoodie_commit_time`, fare, begin_lon, begin_lat, ts from hu 您也可以通过[自己构建hudi](https://github.com/apache/incubator-hudi#building-apache-hudi-from-source)来快速开始, 并在spark-shell命令中使用`--jars <path to hudi_code>/packaging/hudi-spark-bundle/target/hudi-spark-bundle-*.*.*-SNAPSHOT.jar`, -而不是`--packages org.apache.hudi:hudi-spark-bundle:0.5.0-incubating` +而不是`--packages org.apache.hudi:hudi-spark-bundle:0.5.2-incubating` 这里我们使用Spark演示了Hudi的功能。但是,Hudi可以支持多种存储类型/视图,并且可以从Hive,Spark,Presto等查询引擎中查询Hudi数据集。 diff --git a/docs/_docs/0.5.2/1_1_quick_start_guide.md b/docs/_docs/0.5.2/1_1_quick_start_guide.md index d7bd0ff..ab4e37c 100644 --- a/docs/_docs/0.5.2/1_1_quick_start_guide.md +++ b/docs/_docs/0.5.2/1_1_quick_start_guide.md @@ -18,7 +18,7 @@ From the extracted directory run spark-shell with Hudi as: ```scala spark-2.4.4-bin-hadoop2.7/bin/spark-shell \ - --packages org.apache.hudi:hudi-spark-bundle_2.11:0.5.1-incubating,org.apache.spark:spark-avro_2.11:2.4.4 \ + --packages org.apache.hudi:hudi-spark-bundle_2.11:0.5.2-incubating,org.apache.spark:spark-avro_2.11:2.4.4 \ --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' ``` @@ -209,7 +209,7 @@ Note: Only `Append` mode is supported for delete operation. You can also do the quickstart by [building hudi yourself](https://github.com/apache/incubator-hudi#building-apache-hudi-from-source), and using `--jars <path to hudi_code>/packaging/hudi-spark-bundle/target/hudi-spark-bundle_2.11-*.*.*-SNAPSHOT.jar` in the spark-shell command above -instead of `--packages org.apache.hudi:hudi-spark-bundle_2.11:0.5.1-incubating`. Hudi also supports scala 2.12. Refer [build with scala 2.12](https://github.com/apache/incubator-hudi#build-with-scala-212) +instead of `--packages org.apache.hudi:hudi-spark-bundle_2.11:0.5.2-incubating`. Hudi also supports scala 2.12. Refer [build with scala 2.12](https://github.com/apache/incubator-hudi#build-with-scala-212) for more info. Also, we used Spark here to show case the capabilities of Hudi. However, Hudi can support multiple table types/query types and diff --git a/docs/_docs/0.5.2/2_2_writing_data.md b/docs/_docs/0.5.2/2_2_writing_data.md index 3dc85d0..10b6189 100644 --- a/docs/_docs/0.5.2/2_2_writing_data.md +++ b/docs/_docs/0.5.2/2_2_writing_data.md @@ -205,7 +205,7 @@ cd hudi-hive ./run_sync_tool.sh --jdbc-url jdbc:hive2:\/\/hiveserver:10000 --user hive --pass hive --partitioned-by partition --base-path <basePath> --database default --table <tableName> ``` -Starting with Hudi 0.5.1 version read optimized version of merge-on-read tables are suffixed '_ro' by default. For backwards compatibility with older Hudi versions, +Starting with Hudi 0.5.2 version read optimized version of merge-on-read tables are suffixed '_ro' by default. For backwards compatibility with older Hudi versions, an optional HiveSyncConfig - `--skip-ro-suffix`, has been provided to turn off '_ro' suffixing if desired. Explore other hive sync options using the following command: ```java diff --git a/docs/_docs/0.5.2/2_3_querying_data.md b/docs/_docs/0.5.2/2_3_querying_data.md index 242c84a..0c28b12 100644 --- a/docs/_docs/0.5.2/2_3_querying_data.md +++ b/docs/_docs/0.5.2/2_3_querying_data.md @@ -118,7 +118,7 @@ both parquet and avro data, this default setting needs to be turned off using se This will force Spark to fallback to using the Hive Serde to read the data (planning/executions is still Spark). ```java -$ spark-shell --driver-class-path /etc/hive/conf --packages org.apache.hudi:hudi-spark-bundle_2.11:0.5.1-incubating,org.apache.spark:spark-avro_2.11:2.4.4 --conf spark.sql.hive.convertMetastoreParquet=false --num-executors 10 --driver-memory 7g --executor-memory 2g --master yarn-client +$ spark-shell --driver-class-path /etc/hive/conf --packages org.apache.hudi:hudi-spark-bundle_2.11:0.5.2-incubating,org.apache.spark:spark-avro_2.11:2.4.4 --conf spark.sql.hive.convertMetastoreParquet=false --num-executors 10 --driver-memory 7g --executor-memory 2g --master yarn-client scala> sqlContext.sql("select count(*) from hudi_trips_mor_rt where datestr = '2016-10-02'").show() scala> sqlContext.sql("select count(*) from hudi_trips_mor_rt where datestr = '2016-10-02'").show() @@ -135,7 +135,7 @@ spark.sparkContext.hadoopConfiguration.setClass("mapreduce.input.pathFilter.clas The Spark Datasource API is a popular way of authoring Spark ETL pipelines. Hudi COPY_ON_WRITE tables can be queried via Spark datasource similar to how standard datasources work (e.g: `spark.read.parquet`). Both snapshot querying and incremental querying are supported here. Typically spark jobs require adding `--jars <path to jar>/hudi-spark-bundle_2.11-<hudi version>.jar` to classpath of drivers -and executors. Alternatively, hudi-spark-bundle can also fetched via the `--packages` options (e.g: `--packages org.apache.hudi:hudi-spark-bundle_2.11:0.5.1-incubating`). +and executors. Alternatively, hudi-spark-bundle can also fetched via the `--packages` options (e.g: `--packages org.apache.hudi:hudi-spark-bundle_2.11:0.5.2-incubating`). ### Incremental query {#spark-incr-query} diff --git a/docs/_docs/0.5.2/2_6_deployment.md b/docs/_docs/0.5.2/2_6_deployment.md index c5cac7b..171dda9 100644 --- a/docs/_docs/0.5.2/2_6_deployment.md +++ b/docs/_docs/0.5.2/2_6_deployment.md @@ -37,7 +37,7 @@ With Merge_On_Read Table, Hudi ingestion needs to also take care of compacting d Here is an example invocation for reading from kafka topic in a single-run mode and writing to Merge On Read table type in a yarn cluster. ```java -[hoodie]$ spark-submit --packages org.apache.hudi:hudi-utilities-bundle_2.11:0.5.1-incubating,org.apache.spark:spark-avro_2.11:2.4.4 \ +[hoodie]$ spark-submit --packages org.apache.hudi:hudi-utilities-bundle_2.11:0.5.2-incubating,org.apache.spark:spark-avro_2.11:2.4.4 \ --master yarn \ --deploy-mode cluster \ --num-executors 10 \ @@ -85,7 +85,7 @@ Here is an example invocation for reading from kafka topic in a single-run mode Here is an example invocation for reading from kafka topic in a continuous mode and writing to Merge On Read table type in a yarn cluster. ```java -[hoodie]$ spark-submit --packages org.apache.hudi:hudi-utilities-bundle_2.11:0.5.1-incubating,org.apache.spark:spark-avro_2.11:2.4.4 \ +[hoodie]$ spark-submit --packages org.apache.hudi:hudi-utilities-bundle_2.11:0.5.2-incubating,org.apache.spark:spark-avro_2.11:2.4.4 \ --master yarn \ --deploy-mode cluster \ --num-executors 10 \