[GitHub] spark pull request: [SPARK-7713] [SQL] Use shared broadcast hadoop...

yhuai Tue, 19 May 2015 20:28:53 -0700

Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/6252#discussion_r30669119
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala ---
    @@ -584,6 +588,34 @@ abstract class HadoopFsRelation 
private[sql](maybePartitionSpec: Option[Partitio
       }
     
       /**
    +   * For a non-partitioned relation, this method builds an `RDD[Row]` 
containing all rows within
    +   * this relation. For partitioned relations, this method is called for 
each selected partition,
    +   * and builds an `RDD[Row]` containing all rows within that single 
partition.
    +   *
    +   * Note: This interface is subject to change in future.
    +   *
    +   * @param requiredColumns Required columns.
    +   * @param filters Candidate filters to be pushed down. The actual filter 
should be the conjunction
    +   *        of all `filters`.  The pushed down filters are currently 
purely an optimization as they
    +   *        will all be evaluated again. This means it is safe to use them 
with methods that produce
    +   *        false positives such as filtering partitions based on a bloom 
filter.
    +   * @param inputFiles For a non-partitioned relation, it contains paths 
of all data files in the
    +   *        relation. For a partitioned relation, it contains paths of all 
data files in a single
    +   *        selected partition.
    +   * @param broadcastedConf A shared broadcast Hadoop Configuration, which 
can be used to reduce the
    +   *                        overhead of broadcasting the Configuration for 
every Hadoop RDD.
    +   *
    +   * @since 1.4.0
    +   */
    +  private[sql] def buildScan(
    +      requiredColumns: Array[String],
    +      filters: Array[Filter],
    +      inputFiles: Array[FileStatus],
    +      broadcastedConf: Broadcast[SerializableWritable[Configuration]]): 
RDD[Row] = {
    --- End diff --
    
    oh, this one is the default implementation. Parquet overrides it.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-7713] [SQL] Use shared broadcast hadoop...

Reply via email to