Github user pierre-borckmans commented on a diff in the pull request:
https://github.com/apache/spark/pull/10392#discussion_r48103410
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -2072,14 +2072,15 @@ class SparkContext(config: SparkConf) extends
Logging with ExecutorAllocationCli
// Otherwise, the driver may attempt to reconstruct the checkpointed
RDD from
// its own local file system, which is incorrect because the
checkpoint files
// are actually on the executor machines.
- if (!isLocal && Utils.nonLocalPaths(directory).isEmpty) {
+ val path = new Path(directory, UUID.randomUUID().toString)
+ val fs = path.getFileSystem(hadoopConfiguration)
+ val isDirLocal = fs.isInstanceOf[LocalFileSystem]
+ if (!isLocal && Utils.nonLocalPaths(directory).isEmpty && !isDirLocal)
{
--- End diff --
Indeed, should be ```if (!isLocal && Utils.nonLocalPaths(directory).isEmpty
&& isDirLocal) {```.
Checking LocalFileSystem is brittle indeed.
The point is the following. We had a use case where we were deploying on a
Yarn cluster, with Kerberos. We could not access parquet files on hdfs, nor set
the checkpoint dir using a full URI (`hdfs://host:port/path/xxx`). Instead, we
had to use the path only `path/xxx`, and the `FileSystem` used by spark both
for the checkpoint dir and the parquet file loading would then correctly access
hdfs, relying on the system hadoop configuration.
With this use case, the warning is misleading since spark master is not
local, but the path we are using is not local.
I agree with you, changing the warning message could suffice, but I'm not
sure how to rephrase it for this use case.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]