GitHub user zsxwing opened a pull request:

    https://github.com/apache/spark/pull/18179

    [SPARK-20894][SS] Resolve the checkpoint location in driver and use the 
resolved path in state store (branch-2.2)

    ## What changes were proposed in this pull request?
    
    Backport #18149 to 2.2.
    
    ## How was this patch tested?
    
    Jenkins.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/zsxwing/spark SPARK-20894-2.2

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/18179.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #18179
    
----
commit a611c4776f9b195fcba6e23338d134071f28c87e
Author: Shixiong Zhu <shixi...@databricks.com>
Date:   2017-06-01T00:24:37Z

    [SPARK-20894][SS] Resolve the checkpoint location in driver and use the 
resolved path in state store
    
    When the user runs a Structured Streaming query in a cluster, if the driver 
uses the local file system, StateStore running in executors will throw a 
file-not-found exception. However, the current error is not obvious.
    
    This PR makes StreamExecution resolve the path in driver and uses the full 
path including the scheme part (such as `hdfs:/`, `file:/`) in StateStore.
    
    Then if the above error happens, StateStore will throw an error with this 
full path which starts with `file:/`, and it makes this error obvious: the 
checkpoint location is on the local file system.
    
    One potential minor issue is that the user cannot use different default 
file system settings in driver and executors (e.g., use a public HDFS address 
in driver and a private HDFS address in executors) after this change. However, 
since the batch query also has this issue (See 
https://github.com/apache/spark/blob/4bb6a53ebd06de3de97139a2dbc7c85fc3aa3e66/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L402),
 it doesn't make things worse.
    
    The new added test.
    
    Author: Shixiong Zhu <shixi...@databricks.com>
    
    Closes #18149 from zsxwing/SPARK-20894.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to