GitHub user zsxwing opened a pull request:

    https://github.com/apache/spark/pull/18149

    [SPARK-20894][SS]Resolve the checkpoint location in driver and use the 
resolved path in state store

    ## What changes were proposed in this pull request?
    
    When the user runs a Structured Streaming query in a cluster, if the driver 
uses the local file system, StateStore running in executors will throw a 
file-not-found exception. However, the current error is not obvious.
    
    This PR makes StreamExecution resolve the path in driver and uses the full 
path including the scheme part (such as `hdfs:/`, `file:/`) in StateStore.
    
    Then if the above error happens, StateStore will throw an error with this 
full path which starts with `file:/`, and it makes this error obvious: the 
checkpoint location is on the local file system.
    
    One potential minor issue is that the user cannot use different default 
file system settings in driver and executors (e.g., use a public HDFS address 
in driver and a private HDFS address in executors) after this change. However, 
since the batch query also has this issue (See 
https://github.com/apache/spark/blob/4bb6a53ebd06de3de97139a2dbc7c85fc3aa3e66/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L402),
 it doesn't make things worse.
    
    ## How was this patch tested?
    
    The new added test.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/zsxwing/spark SPARK-20894

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/18149.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #18149
    
----
commit 133f0dd3eefd4f665a1a82f302505e58fd6f3a4e
Author: Shixiong Zhu <[email protected]>
Date:   2017-05-30T22:13:03Z

    Resolve the checkpoint location in driver and use the resolved path in 
state store

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to