Github user pierre-borckmans commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10392#discussion_r48103410
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
    @@ -2072,14 +2072,15 @@ class SparkContext(config: SparkConf) extends 
Logging with ExecutorAllocationCli
         // Otherwise, the driver may attempt to reconstruct the checkpointed 
RDD from
         // its own local file system, which is incorrect because the 
checkpoint files
         // are actually on the executor machines.
    -    if (!isLocal && Utils.nonLocalPaths(directory).isEmpty) {
    +    val path = new Path(directory, UUID.randomUUID().toString)
    +    val fs = path.getFileSystem(hadoopConfiguration)
    +    val isDirLocal = fs.isInstanceOf[LocalFileSystem]
    +    if (!isLocal && Utils.nonLocalPaths(directory).isEmpty && !isDirLocal) 
{
    --- End diff --
    
    Indeed, should be ```if (!isLocal && Utils.nonLocalPaths(directory).isEmpty 
&& isDirLocal) {```.
    
    Checking LocalFileSystem is brittle indeed.
    
    The point is the following. We had a use case where we were deploying on a 
Yarn cluster, with Kerberos. We could not access parquet files on hdfs, nor set 
the checkpoint dir using a full URI (`hdfs://host:port/path/xxx`). Instead, we 
had to use the path only `path/xxx`, and the `FileSystem` used by spark both 
for the checkpoint dir and the parquet file loading would then correctly access 
hdfs, relying on the system hadoop configuration.
    
    With this use case, the warning is misleading since spark master is not 
local, but the path we are using is not local.
    
    I agree with you, changing the warning message could suffice, but I'm not 
sure how to rephrase it for this use case.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to