GitHub user zsxwing opened a pull request:
https://github.com/apache/spark/pull/18179
[SPARK-20894][SS] Resolve the checkpoint location in driver and use the
resolved path in state store (branch-2.2)
## What changes were proposed in this pull request?
Backport #18149 to 2.2.
## How was this patch tested?
Jenkins.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/zsxwing/spark SPARK-20894-2.2
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/18179.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #18179
----
commit a611c4776f9b195fcba6e23338d134071f28c87e
Author: Shixiong Zhu <[email protected]>
Date: 2017-06-01T00:24:37Z
[SPARK-20894][SS] Resolve the checkpoint location in driver and use the
resolved path in state store
When the user runs a Structured Streaming query in a cluster, if the driver
uses the local file system, StateStore running in executors will throw a
file-not-found exception. However, the current error is not obvious.
This PR makes StreamExecution resolve the path in driver and uses the full
path including the scheme part (such as `hdfs:/`, `file:/`) in StateStore.
Then if the above error happens, StateStore will throw an error with this
full path which starts with `file:/`, and it makes this error obvious: the
checkpoint location is on the local file system.
One potential minor issue is that the user cannot use different default
file system settings in driver and executors (e.g., use a public HDFS address
in driver and a private HDFS address in executors) after this change. However,
since the batch query also has this issue (See
https://github.com/apache/spark/blob/4bb6a53ebd06de3de97139a2dbc7c85fc3aa3e66/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L402),
it doesn't make things worse.
The new added test.
Author: Shixiong Zhu <[email protected]>
Closes #18149 from zsxwing/SPARK-20894.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]