Github user zjffdu commented on a diff in the pull request:
https://github.com/apache/spark/pull/9490#discussion_r44608884
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala ---
@@ -604,10 +609,33 @@ abstract class HadoopFsRelation
private[sql](maybePartitionSpec: Option[Partitio
}
}
- buildInternalScan(requiredColumns, filters, inputStatuses,
broadcastedConf)
+ if (!inputExists) {
+ throw new IOException("Input paths do not exist, input paths="
+ + inputPaths.mkString("[", ",", "]"))
+ } else {
+ if (inputStatuses.isEmpty && readFromHDFS) {
+ logWarning("Input paths are empty, input paths=" +
inputPaths.mkString("[", ",", "]"))
+ sqlContext.sparkContext.emptyRDD[InternalRow]
+ } else {
+ buildInternalScan(requiredColumns, filters, inputStatuses,
broadcastedConf)
+ }
+ }
}
/**
+ * Most of time, HadoopFsRelation should check the inputPaths, but for
some cases it is not,
+ * e.g. JsonRelation may read from RDD[String]
+ */
+ def inputExists: Boolean = fileStatusCache.inputExists
+
+ /**
+ * Most of time, HadoopFsRelation should read from hdfs, but some cases
it is not,
+ * e.g. JsonRelation may read from RDD[String]
+ * @return
+ */
+ def readFromHDFS: Boolean = true
--- End diff --
@yhuai @liancheng This issue exist also in TextRelation, not only
JsonRelation. So it would be better to check the input paths in
HadoopFsRelation. But JsonRelation also can accept RDD[String] as input, so I
think it would be better to separate JsonRelation that accept RDD[String] to a
new XXXRelation. So here's what I proposal:
* Checking input paths in HadoopFsRelation, but don't change its interface
* Separate JsonRelation that accept RDD[String] into a new JsonRDDRelation
What do you think ?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]