Repository: spark Updated Branches: refs/heads/master d80063278 -> 35f7f5ce8
[DOCS][MINOR] Fix a few broken links and typos, and, nit, use HTTPS more consistently ## What changes were proposed in this pull request? Fix a few broken links and typos, and, nit, use HTTPS more consistently esp. on scripts and Apache links ## How was this patch tested? Doc build Closes #22172 from srowen/DocTypo. Authored-by: Sean Owen <sean.o...@databricks.com> Signed-off-by: hyukjinkwon <gurwls...@apache.org> Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/35f7f5ce Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/35f7f5ce Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/35f7f5ce Branch: refs/heads/master Commit: 35f7f5ce83984d8afe0b7955942baa04f2bef74f Parents: d800632 Author: Sean Owen <sean.o...@databricks.com> Authored: Wed Aug 22 01:02:17 2018 +0800 Committer: hyukjinkwon <gurwls...@apache.org> Committed: Wed Aug 22 01:02:17 2018 +0800 ---------------------------------------------------------------------- docs/README.md | 4 ++-- docs/_layouts/404.html | 2 +- docs/_layouts/global.html | 6 +++--- docs/building-spark.md | 8 ++++---- docs/contributing-to-spark.md | 2 +- docs/index.md | 16 ++++++++-------- docs/ml-migration-guides.md | 2 +- docs/quick-start.md | 2 +- docs/rdd-programming-guide.md | 4 ++-- docs/running-on-mesos.md | 2 +- docs/running-on-yarn.md | 2 +- docs/security.md | 6 +++--- docs/sparkr.md | 2 +- docs/sql-programming-guide.md | 6 +++--- docs/streaming-kinesis-integration.md | 2 +- docs/streaming-programming-guide.md | 5 ++--- docs/structured-streaming-programming-guide.md | 2 +- 17 files changed, 36 insertions(+), 37 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/spark/blob/35f7f5ce/docs/README.md ---------------------------------------------------------------------- diff --git a/docs/README.md b/docs/README.md index dbea4d6..7da543d 100644 --- a/docs/README.md +++ b/docs/README.md @@ -2,7 +2,7 @@ Welcome to the Spark documentation! This readme will walk you through navigating and building the Spark documentation, which is included here with the Spark source code. You can also find documentation specific to release versions of -Spark at http://spark.apache.org/documentation.html. +Spark at https://spark.apache.org/documentation.html. Read on to learn more about viewing documentation in plain text (i.e., markdown) or building the documentation yourself. Why build it yourself? So that you have the docs that correspond to @@ -79,7 +79,7 @@ jekyll plugin to run `build/sbt unidoc` before building the site so if you haven may take some time as it generates all of the scaladoc and javadoc using [Unidoc](https://github.com/sbt/sbt-unidoc). The jekyll plugin also generates the PySpark docs using [Sphinx](http://sphinx-doc.org/), SparkR docs using [roxygen2](https://cran.r-project.org/web/packages/roxygen2/index.html) and SQL docs -using [MkDocs](http://www.mkdocs.org/). +using [MkDocs](https://www.mkdocs.org/). NOTE: To skip the step of building and copying over the Scala, Java, Python, R and SQL API docs, run `SKIP_API=1 jekyll build`. In addition, `SKIP_SCALADOC=1`, `SKIP_PYTHONDOC=1`, `SKIP_RDOC=1` and `SKIP_SQLDOC=1` can be used http://git-wip-us.apache.org/repos/asf/spark/blob/35f7f5ce/docs/_layouts/404.html ---------------------------------------------------------------------- diff --git a/docs/_layouts/404.html b/docs/_layouts/404.html index 0446544..78f98b9 100755 --- a/docs/_layouts/404.html +++ b/docs/_layouts/404.html @@ -151,7 +151,7 @@ <script> var GOOG_FIXURL_LANG = (navigator.language || '').slice(0,2),GOOG_FIXURL_SITE = location.host; </script> - <script src="http://linkhelp.clients.google.com/tbproxy/lh/wm/fixurl.js"></script> + <script src="https://linkhelp.clients.google.com/tbproxy/lh/wm/fixurl.js"></script> </div> </body> </html> http://git-wip-us.apache.org/repos/asf/spark/blob/35f7f5ce/docs/_layouts/global.html ---------------------------------------------------------------------- diff --git a/docs/_layouts/global.html b/docs/_layouts/global.html index e5af5ae..88d549c 100755 --- a/docs/_layouts/global.html +++ b/docs/_layouts/global.html @@ -50,7 +50,7 @@ </head> <body> <!--[if lt IE 7]> - <p class="chromeframe">You are using an outdated browser. <a href="http://browsehappy.com/">Upgrade your browser today</a> or <a href="http://www.google.com/chromeframe/?redirect=true">install Google Chrome Frame</a> to better experience this site.</p> + <p class="chromeframe">You are using an outdated browser. <a href="https://browsehappy.com/">Upgrade your browser today</a> or <a href="http://www.google.com/chromeframe/?redirect=true">install Google Chrome Frame</a> to better experience this site.</p> <![endif]--> <!-- This code is taken from http://twitter.github.com/bootstrap/examples/hero.html --> @@ -114,8 +114,8 @@ <li><a href="hardware-provisioning.html">Hardware Provisioning</a></li> <li class="divider"></li> <li><a href="building-spark.html">Building Spark</a></li> - <li><a href="http://spark.apache.org/contributing.html">Contributing to Spark</a></li> - <li><a href="http://spark.apache.org/third-party-projects.html">Third Party Projects</a></li> + <li><a href="https://spark.apache.org/contributing.html">Contributing to Spark</a></li> + <li><a href="https://spark.apache.org/third-party-projects.html">Third Party Projects</a></li> </ul> </li> </ul> http://git-wip-us.apache.org/repos/asf/spark/blob/35f7f5ce/docs/building-spark.md ---------------------------------------------------------------------- diff --git a/docs/building-spark.md b/docs/building-spark.md index affd7df..d3dfd49 100644 --- a/docs/building-spark.md +++ b/docs/building-spark.md @@ -45,7 +45,7 @@ Other build examples can be found below. ## Building a Runnable Distribution To create a Spark distribution like those distributed by the -[Spark Downloads](http://spark.apache.org/downloads.html) page, and that is laid out so as +[Spark Downloads](https://spark.apache.org/downloads.html) page, and that is laid out so as to be runnable, use `./dev/make-distribution.sh` in the project root directory. It can be configured with Maven profile settings and so on like the direct Maven build. Example: @@ -164,7 +164,7 @@ prompt. Developers who compile Spark frequently may want to speed up compilation; e.g., by using Zinc (for developers who build with Maven) or by avoiding re-compilation of the assembly JAR (for developers who build with SBT). For more information about how to do this, refer to the -[Useful Developer Tools page](http://spark.apache.org/developer-tools.html#reducing-build-times). +[Useful Developer Tools page](https://spark.apache.org/developer-tools.html#reducing-build-times). ## Encrypted Filesystems @@ -182,7 +182,7 @@ to the `sharedSettings` val. See also [this PR](https://github.com/apache/spark/ ## IntelliJ IDEA or Eclipse For help in setting up IntelliJ IDEA or Eclipse for Spark development, and troubleshooting, refer to the -[Useful Developer Tools page](http://spark.apache.org/developer-tools.html). +[Useful Developer Tools page](https://spark.apache.org/developer-tools.html). # Running Tests @@ -203,7 +203,7 @@ The following is an example of a command to run the tests: ## Running Individual Tests For information about how to run individual tests, refer to the -[Useful Developer Tools page](http://spark.apache.org/developer-tools.html#running-individual-tests). +[Useful Developer Tools page](https://spark.apache.org/developer-tools.html#running-individual-tests). ## PySpark pip installable http://git-wip-us.apache.org/repos/asf/spark/blob/35f7f5ce/docs/contributing-to-spark.md ---------------------------------------------------------------------- diff --git a/docs/contributing-to-spark.md b/docs/contributing-to-spark.md index 9252545..ede5584 100644 --- a/docs/contributing-to-spark.md +++ b/docs/contributing-to-spark.md @@ -5,4 +5,4 @@ title: Contributing to Spark The Spark team welcomes all forms of contributions, including bug reports, documentation or patches. For the newest information on how to contribute to the project, please read the -[Contributing to Spark guide](http://spark.apache.org/contributing.html). +[Contributing to Spark guide](https://spark.apache.org/contributing.html). http://git-wip-us.apache.org/repos/asf/spark/blob/35f7f5ce/docs/index.md ---------------------------------------------------------------------- diff --git a/docs/index.md b/docs/index.md index 2f00941..40f628b 100644 --- a/docs/index.md +++ b/docs/index.md @@ -12,7 +12,7 @@ It also supports a rich set of higher-level tools including [Spark SQL](sql-prog # Downloading -Get Spark from the [downloads page](http://spark.apache.org/downloads.html) of the project website. This documentation is for Spark version {{site.SPARK_VERSION}}. Spark uses Hadoop's client libraries for HDFS and YARN. Downloads are pre-packaged for a handful of popular Hadoop versions. +Get Spark from the [downloads page](https://spark.apache.org/downloads.html) of the project website. This documentation is for Spark version {{site.SPARK_VERSION}}. Spark uses Hadoop's client libraries for HDFS and YARN. Downloads are pre-packaged for a handful of popular Hadoop versions. Users can also download a "Hadoop free" binary and run Spark with any Hadoop version [by augmenting Spark's classpath](hadoop-provided.html). Scala and Java users can include Spark in their projects using its Maven coordinates and in the future Python users can also install Spark from PyPI. @@ -111,7 +111,7 @@ options for deployment: * [Amazon EC2](https://github.com/amplab/spark-ec2): scripts that let you launch a cluster on EC2 in about 5 minutes * [Standalone Deploy Mode](spark-standalone.html): launch a standalone cluster quickly without a third-party cluster manager * [Mesos](running-on-mesos.html): deploy a private cluster using - [Apache Mesos](http://mesos.apache.org) + [Apache Mesos](https://mesos.apache.org) * [YARN](running-on-yarn.html): deploy Spark on top of Hadoop NextGen (YARN) * [Kubernetes](running-on-kubernetes.html): deploy Spark on top of Kubernetes @@ -127,20 +127,20 @@ options for deployment: * [Cloud Infrastructures](cloud-integration.html) * [OpenStack Swift](storage-openstack-swift.html) * [Building Spark](building-spark.html): build Spark using the Maven system -* [Contributing to Spark](http://spark.apache.org/contributing.html) -* [Third Party Projects](http://spark.apache.org/third-party-projects.html): related third party Spark projects +* [Contributing to Spark](https://spark.apache.org/contributing.html) +* [Third Party Projects](https://spark.apache.org/third-party-projects.html): related third party Spark projects **External Resources:** -* [Spark Homepage](http://spark.apache.org) -* [Spark Community](http://spark.apache.org/community.html) resources, including local meetups +* [Spark Homepage](https://spark.apache.org) +* [Spark Community](https://spark.apache.org/community.html) resources, including local meetups * [StackOverflow tag `apache-spark`](http://stackoverflow.com/questions/tagged/apache-spark) -* [Mailing Lists](http://spark.apache.org/mailing-lists.html): ask questions about Spark here +* [Mailing Lists](https://spark.apache.org/mailing-lists.html): ask questions about Spark here * [AMP Camps](http://ampcamp.berkeley.edu/): a series of training camps at UC Berkeley that featured talks and exercises about Spark, Spark Streaming, Mesos, and more. [Videos](http://ampcamp.berkeley.edu/6/), [slides](http://ampcamp.berkeley.edu/6/) and [exercises](http://ampcamp.berkeley.edu/6/exercises/) are available online for free. -* [Code Examples](http://spark.apache.org/examples.html): more are also available in the `examples` subfolder of Spark ([Scala]({{site.SPARK_GITHUB_URL}}/tree/master/examples/src/main/scala/org/apache/spark/examples), +* [Code Examples](https://spark.apache.org/examples.html): more are also available in the `examples` subfolder of Spark ([Scala]({{site.SPARK_GITHUB_URL}}/tree/master/examples/src/main/scala/org/apache/spark/examples), [Java]({{site.SPARK_GITHUB_URL}}/tree/master/examples/src/main/java/org/apache/spark/examples), [Python]({{site.SPARK_GITHUB_URL}}/tree/master/examples/src/main/python), [R]({{site.SPARK_GITHUB_URL}}/tree/master/examples/src/main/r)) http://git-wip-us.apache.org/repos/asf/spark/blob/35f7f5ce/docs/ml-migration-guides.md ---------------------------------------------------------------------- diff --git a/docs/ml-migration-guides.md b/docs/ml-migration-guides.md index e473641..2047065 100644 --- a/docs/ml-migration-guides.md +++ b/docs/ml-migration-guides.md @@ -289,7 +289,7 @@ In the `spark.mllib` package, there were several breaking changes. The first ch In the `spark.ml` package, the main API changes are from Spark SQL. We list the most important changes here: -* The old [SchemaRDD](http://spark.apache.org/docs/1.2.1/api/scala/index.html#org.apache.spark.sql.SchemaRDD) has been replaced with [DataFrame](api/scala/index.html#org.apache.spark.sql.DataFrame) with a somewhat modified API. All algorithms in `spark.ml` which used to use SchemaRDD now use DataFrame. +* The old [SchemaRDD](https://spark.apache.org/docs/1.2.1/api/scala/index.html#org.apache.spark.sql.SchemaRDD) has been replaced with [DataFrame](api/scala/index.html#org.apache.spark.sql.DataFrame) with a somewhat modified API. All algorithms in `spark.ml` which used to use SchemaRDD now use DataFrame. * In Spark 1.2, we used implicit conversions from `RDD`s of `LabeledPoint` into `SchemaRDD`s by calling `import sqlContext._` where `sqlContext` was an instance of `SQLContext`. These implicits have been moved, so we now call `import sqlContext.implicits._`. * Java APIs for SQL have also changed accordingly. Please see the examples above and the [Spark SQL Programming Guide](sql-programming-guide.html) for details. http://git-wip-us.apache.org/repos/asf/spark/blob/35f7f5ce/docs/quick-start.md ---------------------------------------------------------------------- diff --git a/docs/quick-start.md b/docs/quick-start.md index f1a2096..ef7af6c 100644 --- a/docs/quick-start.md +++ b/docs/quick-start.md @@ -12,7 +12,7 @@ interactive shell (in Python or Scala), then show how to write applications in Java, Scala, and Python. To follow along with this guide, first, download a packaged release of Spark from the -[Spark website](http://spark.apache.org/downloads.html). Since we won't be using HDFS, +[Spark website](https://spark.apache.org/downloads.html). Since we won't be using HDFS, you can download a package for any version of Hadoop. Note that, before Spark 2.0, the main programming interface of Spark was the Resilient Distributed Dataset (RDD). After Spark 2.0, RDDs are replaced by Dataset, which is strongly-typed like an RDD, but with richer optimizations under the hood. The RDD interface is still supported, and you can get a more detailed reference at the [RDD programming guide](rdd-programming-guide.html). However, we highly recommend you to switch to use Dataset, which has better performance than RDD. See the [SQL programming guide](sql-programming-guide.html) to get more information about Dataset. http://git-wip-us.apache.org/repos/asf/spark/blob/35f7f5ce/docs/rdd-programming-guide.md ---------------------------------------------------------------------- diff --git a/docs/rdd-programming-guide.md b/docs/rdd-programming-guide.md index b642409..d95b757 100644 --- a/docs/rdd-programming-guide.md +++ b/docs/rdd-programming-guide.md @@ -106,7 +106,7 @@ You can also use `bin/pyspark` to launch an interactive Python shell. If you wish to access HDFS data, you need to use a build of PySpark linking to your version of HDFS. -[Prebuilt packages](http://spark.apache.org/downloads.html) are also available on the Spark homepage +[Prebuilt packages](https://spark.apache.org/downloads.html) are also available on the Spark homepage for common HDFS versions. Finally, you need to import some Spark classes into your program. Add the following line: @@ -1569,7 +1569,7 @@ as Spark does not support two contexts running concurrently in the same program. # Where to Go from Here -You can see some [example Spark programs](http://spark.apache.org/examples.html) on the Spark website. +You can see some [example Spark programs](https://spark.apache.org/examples.html) on the Spark website. In addition, Spark includes several samples in the `examples` directory ([Scala]({{site.SPARK_GITHUB_URL}}/tree/master/examples/src/main/scala/org/apache/spark/examples), [Java]({{site.SPARK_GITHUB_URL}}/tree/master/examples/src/main/java/org/apache/spark/examples), http://git-wip-us.apache.org/repos/asf/spark/blob/35f7f5ce/docs/running-on-mesos.md ---------------------------------------------------------------------- diff --git a/docs/running-on-mesos.md b/docs/running-on-mesos.md index 3e76d47..b473e65 100644 --- a/docs/running-on-mesos.md +++ b/docs/running-on-mesos.md @@ -672,7 +672,7 @@ See the [configuration page](configuration.html) for information on Spark config <td><code>spark.mesos.dispatcher.historyServer.url</code></td> <td><code>(none)</code></td> <td> - Set the URL of the <a href="http://spark.apache.org/docs/latest/monitoring.html#viewing-after-the-fact">history + Set the URL of the <a href="monitoring.html#viewing-after-the-fact">history server</a>. The dispatcher will then link each driver to its entry in the history server. </td> http://git-wip-us.apache.org/repos/asf/spark/blob/35f7f5ce/docs/running-on-yarn.md ---------------------------------------------------------------------- diff --git a/docs/running-on-yarn.md b/docs/running-on-yarn.md index 1c1f40c..e3d67c3 100644 --- a/docs/running-on-yarn.md +++ b/docs/running-on-yarn.md @@ -61,7 +61,7 @@ In `cluster` mode, the driver runs on a different machine than the client, so `S # Preparations Running Spark on YARN requires a binary distribution of Spark which is built with YARN support. -Binary distributions can be downloaded from the [downloads page](http://spark.apache.org/downloads.html) of the project website. +Binary distributions can be downloaded from the [downloads page](https://spark.apache.org/downloads.html) of the project website. To build Spark yourself, refer to [Building Spark](building-spark.html). To make Spark runtime jars accessible from YARN side, you can specify `spark.yarn.archive` or `spark.yarn.jars`. For details please refer to [Spark Properties](running-on-yarn.html#spark-properties). If neither `spark.yarn.archive` nor `spark.yarn.jars` is specified, Spark will create a zip file with all jars under `$SPARK_HOME/jars` and upload it to the distributed cache. http://git-wip-us.apache.org/repos/asf/spark/blob/35f7f5ce/docs/security.md ---------------------------------------------------------------------- diff --git a/docs/security.md b/docs/security.md index c8eec73..7fb3e17 100644 --- a/docs/security.md +++ b/docs/security.md @@ -49,7 +49,7 @@ respectively by default) are restricted to hosts that are trusted to submit jobs Spark supports AES-based encryption for RPC connections. For encryption to be enabled, RPC authentication must also be enabled and properly configured. AES encryption uses the -[Apache Commons Crypto](http://commons.apache.org/proper/commons-crypto/) library, and Spark's +[Apache Commons Crypto](https://commons.apache.org/proper/commons-crypto/) library, and Spark's configuration system allows access to that library's configuration for advanced users. There is also support for SASL-based encryption, although it should be considered deprecated. It @@ -169,7 +169,7 @@ The following settings cover enabling encryption for data written to disk: ## Authentication and Authorization -Enabling authentication for the Web UIs is done using [javax servlet filters](http://docs.oracle.com/javaee/6/api/javax/servlet/Filter.html). +Enabling authentication for the Web UIs is done using [javax servlet filters](https://docs.oracle.com/javaee/6/api/javax/servlet/Filter.html). You will need a filter that implements the authentication method you want to deploy. Spark does not provide any built-in authentication filters. @@ -492,7 +492,7 @@ distributed with the application using the `--files` command line argument (or t configuration should just reference the file name with no absolute path. Distributing local key stores this way may require the files to be staged in HDFS (or other similar -distributed file system used by the cluster), so it's recommended that the undelying file system be +distributed file system used by the cluster), so it's recommended that the underlying file system be configured with security in mind (e.g. by enabling authentication and wire encryption). ### Standalone mode http://git-wip-us.apache.org/repos/asf/spark/blob/35f7f5ce/docs/sparkr.md ---------------------------------------------------------------------- diff --git a/docs/sparkr.md b/docs/sparkr.md index 84e9b4a..b4248e8 100644 --- a/docs/sparkr.md +++ b/docs/sparkr.md @@ -128,7 +128,7 @@ head(df) SparkR supports operating on a variety of data sources through the `SparkDataFrame` interface. This section describes the general methods for loading and saving data using Data Sources. You can check the Spark SQL programming guide for more [specific options](sql-programming-guide.html#manually-specifying-options) that are available for the built-in data sources. The general method for creating SparkDataFrames from data sources is `read.df`. This method takes in the path for the file to load and the type of data source, and the currently active SparkSession will be used automatically. -SparkR supports reading JSON, CSV and Parquet files natively, and through packages available from sources like [Third Party Projects](http://spark.apache.org/third-party-projects.html), you can find data source connectors for popular file formats like Avro. These packages can either be added by +SparkR supports reading JSON, CSV and Parquet files natively, and through packages available from sources like [Third Party Projects](https://spark.apache.org/third-party-projects.html), you can find data source connectors for popular file formats like Avro. These packages can either be added by specifying `--packages` with `spark-submit` or `sparkR` commands, or if initializing SparkSession with `sparkPackages` parameter when in an interactive R shell or from RStudio. <div data-lang="r" markdown="1"> http://git-wip-us.apache.org/repos/asf/spark/blob/35f7f5ce/docs/sql-programming-guide.md ---------------------------------------------------------------------- diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md index d9ebc3c..8e308d5 100644 --- a/docs/sql-programming-guide.md +++ b/docs/sql-programming-guide.md @@ -1796,7 +1796,7 @@ strings, e.g. integer indices. See [pandas.DataFrame](https://pandas.pydata.org/ on how to label columns when constructing a `pandas.DataFrame`. Note that all data for a group will be loaded into memory before the function is applied. This can -lead to out of memory exceptons, especially if the group sizes are skewed. The configuration for +lead to out of memory exceptions, especially if the group sizes are skewed. The configuration for [maxRecordsPerBatch](#setting-arrow-batch-size) is not applied on groups and it is up to the user to ensure that the grouped data will fit into the available memory. @@ -1876,7 +1876,7 @@ working with timestamps in `pandas_udf`s to get the best performance, see ## Upgrading From Spark SQL 2.3 to 2.4 - - Since Spark 2.4, Spark will evaluate the set operations referenced in a query by following a precedence rule as per the SQL standard. If the order is not specified by parentheses, set operations are performed from left to right with the exception that all INTERSECT operations are performed before any UNION, EXCEPT or MINUS operations. The old behaviour of giving equal precedence to all the set operations are preserved under a newly added configuaration `spark.sql.legacy.setopsPrecedence.enabled` with a default value of `false`. When this property is set to `true`, spark will evaluate the set operators from left to right as they appear in the query given no explicit ordering is enforced by usage of parenthesis. + - Since Spark 2.4, Spark will evaluate the set operations referenced in a query by following a precedence rule as per the SQL standard. If the order is not specified by parentheses, set operations are performed from left to right with the exception that all INTERSECT operations are performed before any UNION, EXCEPT or MINUS operations. The old behaviour of giving equal precedence to all the set operations are preserved under a newly added configuration `spark.sql.legacy.setopsPrecedence.enabled` with a default value of `false`. When this property is set to `true`, spark will evaluate the set operators from left to right as they appear in the query given no explicit ordering is enforced by usage of parenthesis. - Since Spark 2.4, Spark will display table description column Last Access value as UNKNOWN when the value was Jan 01 1970. - Since Spark 2.4, Spark maximizes the usage of a vectorized ORC reader for ORC files by default. To do that, `spark.sql.orc.impl` and `spark.sql.orc.filterPushdown` change their default values to `native` and `true` respectively. - In PySpark, when Arrow optimization is enabled, previously `toPandas` just failed when Arrow optimization is unable to be used whereas `createDataFrame` from Pandas DataFrame allowed the fallback to non-optimization. Now, both `toPandas` and `createDataFrame` from Pandas DataFrame allow the fallback by default, which can be switched off by `spark.sql.execution.arrow.fallback.enabled`. @@ -2162,7 +2162,7 @@ See the API docs for `SQLContext.read` ( <a href="api/python/pyspark.sql.html#pyspark.sql.SQLContext.read">Python</a> ) and `DataFrame.write` ( <a href="api/scala/index.html#org.apache.spark.sql.DataFrame@write:DataFrameWriter">Scala</a>, - <a href="api/java/org/apache/spark/sql/DataFrame.html#write()">Java</a>, + <a href="api/java/org/apache/spark/sql/Dataset.html#write()">Java</a>, <a href="api/python/pyspark.sql.html#pyspark.sql.DataFrame.write">Python</a> ) more information. http://git-wip-us.apache.org/repos/asf/spark/blob/35f7f5ce/docs/streaming-kinesis-integration.md ---------------------------------------------------------------------- diff --git a/docs/streaming-kinesis-integration.md b/docs/streaming-kinesis-integration.md index 678b064..6a52e8a 100644 --- a/docs/streaming-kinesis-integration.md +++ b/docs/streaming-kinesis-integration.md @@ -196,7 +196,7 @@ A Kinesis stream can be set up at one of the valid Kinesis endpoints with 1 or m #### Running the Example To run the example, -- Download a Spark binary from the [download site](http://spark.apache.org/downloads.html). +- Download a Spark binary from the [download site](https://spark.apache.org/downloads.html). - Set up Kinesis stream (see earlier section) within AWS. Note the name of the Kinesis stream and the endpoint URL corresponding to the region where the stream was created. http://git-wip-us.apache.org/repos/asf/spark/blob/35f7f5ce/docs/streaming-programming-guide.md ---------------------------------------------------------------------- diff --git a/docs/streaming-programming-guide.md b/docs/streaming-programming-guide.md index 118b053..0ca0f2a 100644 --- a/docs/streaming-programming-guide.md +++ b/docs/streaming-programming-guide.md @@ -915,8 +915,7 @@ JavaPairDStream<String, Integer> runningCounts = pairs.updateStateByKey(updateFu The update function will be called for each word, with `newValues` having a sequence of 1's (from the `(word, 1)` pairs) and the `runningCount` having the previous count. For the complete Java code, take a look at the example -[JavaStatefulNetworkWordCount.java]({{site.SPARK_GITHUB_URL}}/blob/v{{site.SPARK_VERSION_SHORT}}/examples/src/main/java/org/apache/spark/examples/streaming -/JavaStatefulNetworkWordCount.java). +[JavaStatefulNetworkWordCount.java]({{site.SPARK_GITHUB_URL}}/blob/v{{site.SPARK_VERSION_SHORT}}/examples/src/main/java/org/apache/spark/examples/streaming/JavaStatefulNetworkWordCount.java). </div> <div data-lang="python" markdown="1"> @@ -2470,7 +2469,7 @@ additional effort may be necessary to achieve exactly-once semantics. There are - [Kafka Integration Guide](streaming-kafka-integration.html) - [Kinesis Integration Guide](streaming-kinesis-integration.html) - [Custom Receiver Guide](streaming-custom-receivers.html) -* Third-party DStream data sources can be found in [Third Party Projects](http://spark.apache.org/third-party-projects.html) +* Third-party DStream data sources can be found in [Third Party Projects](https://spark.apache.org/third-party-projects.html) * API documentation - Scala docs * [StreamingContext](api/scala/index.html#org.apache.spark.streaming.StreamingContext) and http://git-wip-us.apache.org/repos/asf/spark/blob/35f7f5ce/docs/structured-streaming-programming-guide.md ---------------------------------------------------------------------- diff --git a/docs/structured-streaming-programming-guide.md b/docs/structured-streaming-programming-guide.md index b832f71..355a6cc 100644 --- a/docs/structured-streaming-programming-guide.md +++ b/docs/structured-streaming-programming-guide.md @@ -17,7 +17,7 @@ In this guide, we are going to walk you through the programming model and the AP # Quick Example Letâs say you want to maintain a running word count of text data received from a data server listening on a TCP socket. Letâs see how you can express this using Structured Streaming. You can see the full code in [Scala]({{site.SPARK_GITHUB_URL}}/blob/v{{site.SPARK_VERSION_SHORT}}/examples/src/main/scala/org/apache/spark/examples/sql/streaming/StructuredNetworkWordCount.scala)/[Java]({{site.SPARK_GITHUB_URL}}/blob/v{{site.SPARK_VERSION_SHORT}}/examples/src/main/java/org/apache/spark/examples/sql/streaming/JavaStructuredNetworkWordCount.java)/[Python]({{site.SPARK_GITHUB_URL}}/blob/v{{site.SPARK_VERSION_SHORT}}/examples/src/main/python/sql/streaming/structured_network_wordcount.py)/[R]({{site.SPARK_GITHUB_URL}}/blob/v{{site.SPARK_VERSION_SHORT}}/examples/src/main/r/streaming/structured_network_wordcount.R). -And if you [download Spark](http://spark.apache.org/downloads.html), you can directly [run the example](index.html#running-the-examples-and-shell). In any case, letâs walk through the example step-by-step and understand how it works. First, we have to import the necessary classes and create a local SparkSession, the starting point of all functionalities related to Spark. +And if you [download Spark](https://spark.apache.org/downloads.html), you can directly [run the example](index.html#running-the-examples-and-shell). In any case, letâs walk through the example step-by-step and understand how it works. First, we have to import the necessary classes and create a local SparkSession, the starting point of all functionalities related to Spark. <div class="codetabs"> <div data-lang="scala" markdown="1"> --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org