Github user windpiger commented on a diff in the pull request:
https://github.com/apache/spark/pull/16910#discussion_r103134443
--- Diff:
sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala ---
@@ -114,22 +114,30 @@ class HadoopTableReader(
val tablePath = hiveTable.getPath
val inputPathStr = applyFilterIfNeeded(tablePath, filterOpt)
- // logDebug("Table input: %s".format(tablePath))
- val ifc = hiveTable.getInputFormatClass
- .asInstanceOf[java.lang.Class[InputFormat[Writable, Writable]]]
- val hadoopRDD = createHadoopRdd(tableDesc, inputPathStr, ifc)
-
- val attrsWithIndex = attributes.zipWithIndex
- val mutableRow = new SpecificInternalRow(attributes.map(_.dataType))
-
- val deserializedHadoopRDD = hadoopRDD.mapPartitions { iter =>
- val hconf = broadcastedHadoopConf.value.value
- val deserializer = deserializerClass.newInstance()
- deserializer.initialize(hconf, tableDesc.getProperties)
- HadoopTableReader.fillObject(iter, deserializer, attrsWithIndex,
mutableRow, deserializer)
- }
+ val locationPath = new Path(inputPathStr)
+ val fs = locationPath.getFileSystem(broadcastedHadoopConf.value.value)
+
+ // if the table location does not exist, return an empty RDD
+ if (!fs.exists(locationPath)) {
--- End diff --
good catch! Thanks!
I test it in Hive, when the table created by `stored by`(e.g. HBase), even
if there is a table path created under warehouse path when we create the table,
but there is no data files exist after we insert into the table, and it is ok
to select data after we delete the table path.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]