[GitHub] spark pull request: [SPARK-13665][SQL] Separate the concerns of Ha...

davies Fri, 04 Mar 2016 23:53:48 -0800

Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11509#discussion_r55117518
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala ---
    @@ -465,214 +379,168 @@ abstract class OutputWriter {
     }
     
     /**
    - * ::Experimental::
    - * A [[BaseRelation]] that provides much of the common code required for 
relations that store their
    - * data to an HDFS compatible filesystem.
    - *
    - * For the read path, similar to [[PrunedFilteredScan]], it can eliminate 
unneeded columns and
    - * filter using selected predicates before producing an RDD containing all 
matching tuples as
    - * [[Row]] objects. In addition, when reading from Hive style partitioned 
tables stored in file
    - * systems, it's able to discover partitioning information from the paths 
of input directories, and
    - * perform partition pruning before start reading the data. Subclasses of 
[[HadoopFsRelation()]]
    - * must override one of the four `buildScan` methods to implement the read 
path.
    - *
    - * For the write path, it provides the ability to write to both 
non-partitioned and partitioned
    - * tables.  Directory layout of the partitioned tables is compatible with 
Hive.
    - *
    - * @constructor This constructor is for internal uses only. The 
[[PartitionSpec]] argument is for
    - *              implementing metastore table conversion.
    - *
    - * @param maybePartitionSpec An [[HadoopFsRelation]] can be created with 
an optional
    - *        [[PartitionSpec]], so that partition discovery can be skipped.
    - *
    - * @since 1.4.0
    + * Acts as a container for all of the metadata required to read from a 
datasource. All discovery,
    + * resolution and merging logic for schemas and partitions has been 
removed.
    + *
    + * @param location A [[FileCatalog]] that can enumerate the locations of 
all the files that comprise
    + *                 this relation.
    + * @param partitionSchema The schmea of the columns (if any) that are used 
to partition the relation
    + * @param dataSchema The schema of any remaining columns.  Note that if 
any partition columns are
    + *                   present in the actual data files as well, they are 
removed.
    + * @param bucketSpec Describes the bucketing (hash-partitioning of the 
files by some column values).
    + * @param fileFormat A file format that can be used to read and write the 
data in files.
    + * @param options Configuration used when reading / writing data.
      */
    -@Experimental
    -abstract class HadoopFsRelation private[sql](
    -    maybePartitionSpec: Option[PartitionSpec],
    -    parameters: Map[String, String])
    -  extends BaseRelation with FileRelation with Logging {
    +case class HadoopFsRelation(
    +    sqlContext: SQLContext,
    +    location: FileCatalog,
    +    partitionSchema: StructType,
    +    dataSchema: StructType,
    +    bucketSpec: Option[BucketSpec],
    +    fileFormat: FileFormat,
    +    options: Map[String, String]) extends BaseRelation with FileRelation {
     
    -  override def toString: String = getClass.getSimpleName
    +  /**
    +   *
    --- End diff --
    
    Fill with some comments or remove this?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-13665][SQL] Separate the concerns of Ha...

Reply via email to