GitHub user zsxwing opened a pull request:
https://github.com/apache/spark/pull/18149
[SPARK-20894][SS]Resolve the checkpoint location in driver and use the
resolved path in state store
## What changes were proposed in this pull request?
When the user runs a Structured Streaming query in a cluster, if the driver
uses the local file system, StateStore running in executors will throw a
file-not-found exception. However, the current error is not obvious.
This PR makes StreamExecution resolve the path in driver and uses the full
path including the scheme part (such as `hdfs:/`, `file:/`) in StateStore.
Then if the above error happens, StateStore will throw an error with this
full path which starts with `file:/`, and it makes this error obvious: the
checkpoint location is on the local file system.
One potential minor issue is that the user cannot use different default
file system settings in driver and executors (e.g., use a public HDFS address
in driver and a private HDFS address in executors) after this change. However,
since the batch query also has this issue (See
https://github.com/apache/spark/blob/4bb6a53ebd06de3de97139a2dbc7c85fc3aa3e66/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L402),
it doesn't make things worse.
## How was this patch tested?
The new added test.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/zsxwing/spark SPARK-20894
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/18149.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #18149
----
commit 133f0dd3eefd4f665a1a82f302505e58fd6f3a4e
Author: Shixiong Zhu <[email protected]>
Date: 2017-05-30T22:13:03Z
Resolve the checkpoint location in driver and use the resolved path in
state store
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]