Github user davies commented on a diff in the pull request:
https://github.com/apache/spark/pull/11509#discussion_r55117518
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala ---
@@ -465,214 +379,168 @@ abstract class OutputWriter {
}
/**
- * ::Experimental::
- * A [[BaseRelation]] that provides much of the common code required for
relations that store their
- * data to an HDFS compatible filesystem.
- *
- * For the read path, similar to [[PrunedFilteredScan]], it can eliminate
unneeded columns and
- * filter using selected predicates before producing an RDD containing all
matching tuples as
- * [[Row]] objects. In addition, when reading from Hive style partitioned
tables stored in file
- * systems, it's able to discover partitioning information from the paths
of input directories, and
- * perform partition pruning before start reading the data. Subclasses of
[[HadoopFsRelation()]]
- * must override one of the four `buildScan` methods to implement the read
path.
- *
- * For the write path, it provides the ability to write to both
non-partitioned and partitioned
- * tables. Directory layout of the partitioned tables is compatible with
Hive.
- *
- * @constructor This constructor is for internal uses only. The
[[PartitionSpec]] argument is for
- * implementing metastore table conversion.
- *
- * @param maybePartitionSpec An [[HadoopFsRelation]] can be created with
an optional
- * [[PartitionSpec]], so that partition discovery can be skipped.
- *
- * @since 1.4.0
+ * Acts as a container for all of the metadata required to read from a
datasource. All discovery,
+ * resolution and merging logic for schemas and partitions has been
removed.
+ *
+ * @param location A [[FileCatalog]] that can enumerate the locations of
all the files that comprise
+ * this relation.
+ * @param partitionSchema The schmea of the columns (if any) that are used
to partition the relation
+ * @param dataSchema The schema of any remaining columns. Note that if
any partition columns are
+ * present in the actual data files as well, they are
removed.
+ * @param bucketSpec Describes the bucketing (hash-partitioning of the
files by some column values).
+ * @param fileFormat A file format that can be used to read and write the
data in files.
+ * @param options Configuration used when reading / writing data.
*/
-@Experimental
-abstract class HadoopFsRelation private[sql](
- maybePartitionSpec: Option[PartitionSpec],
- parameters: Map[String, String])
- extends BaseRelation with FileRelation with Logging {
+case class HadoopFsRelation(
+ sqlContext: SQLContext,
+ location: FileCatalog,
+ partitionSchema: StructType,
+ dataSchema: StructType,
+ bucketSpec: Option[BucketSpec],
+ fileFormat: FileFormat,
+ options: Map[String, String]) extends BaseRelation with FileRelation {
- override def toString: String = getClass.getSimpleName
+ /**
+ *
--- End diff --
Fill with some comments or remove this?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]