Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/10392#discussion_r48133110
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -2072,14 +2072,15 @@ class SparkContext(config: SparkConf) extends
Logging with ExecutorAllocationCli
// Otherwise, the driver may attempt to reconstruct the checkpointed
RDD from
// its own local file system, which is incorrect because the
checkpoint files
// are actually on the executor machines.
- if (!isLocal && Utils.nonLocalPaths(directory).isEmpty) {
+ val path = new Path(directory, UUID.randomUUID().toString)
+ val fs = path.getFileSystem(hadoopConfiguration)
+ val isDirLocal = fs.isInstanceOf[LocalFileSystem]
+ if (!isLocal && Utils.nonLocalPaths(directory).isEmpty && !isDirLocal)
{
--- End diff --
Is the "host:port" the issue? then write `hdfs:///path/xxx`.
My suggestion is to modify the warning message to something like `s"If
Spark is not running in local mode, then the checkpoint directory must not be
on the local filesystem. Directory '$directory' appears to be on the local
filesystem."` It still causes a warning in your case, but I believe that
warning is avoidable (?) with the right `hdfs` URI in this case. Hence it's
still useful.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]