[GitHub] spark pull request: [SPARK-5061][Alex Baretta] SQLContext: overloa...

alexbaretta Fri, 02 Jan 2015 17:21:07 -0800

Github user alexbaretta commented on a diff in the pull request:

    https://github.com/apache/spark/pull/3882#discussion_r22428318
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala ---
    @@ -269,6 +269,43 @@ class SQLContext(@transient val sparkContext: 
SparkContext)
             path, ScalaReflection.attributesFor[A], allowExisting, conf, this))
       }
     
    +
    +  /**
    +   * :: Experimental ::
    +   * Creates an empty parquet file with the provided schema. The parquet 
file thus created
    +   * can be registered as a table, which can then be used as the target of 
future
    +   * `insertInto` operations.
    +   *
    +   * {{{
    +   *   val sqlContext = new SQLContext(...)
    +   *   import sqlContext._
    +   *
    +   *   val schema = StructType(List(StructField("name", 
StringType),StructField("age", IntegerType)))
    +   *   createParquetFile(schema, 
"path/to/file.parquet").registerTempTable("people")
    +   *   sql("INSERT INTO people SELECT 'michael', 29")
    +   * }}}
    +   *
    +   * @param schema StructType describing the records to be stored in the 
Parquet file.
    +   * @param path The path where the directory containing parquet metadata 
should be created.
    +   *             Data inserted into this table will also be stored at this 
location.
    +   * @param allowExisting When false, an exception will be thrown if this 
directory already exists.
    +   * @param conf A Hadoop configuration object that can be used to specify 
options to the parquet
    +   *             output format.
    +   *
    +   * @group userf
    +   */
    +  @Experimental
    +  def createParquetFile(
    --- End diff --
    
    Andrew,
    
    OK, but keep in mind that my patch overloads an existing method. If you
    think createParquetFile should be renamed to createEmptyParquetFile you
    should probably file a separate JIRA.
    
    Also, arguably "creating a file" implies that it is empty.
    
    Alex
    On Jan 2, 2015 5:11 PM, "Andrew Ash" <[email protected]> wrote:
    
    > In sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala
    > <https://github.com/apache/spark/pull/3882#discussion-diff-22428199>:
    >
    > > +   *   val schema = StructType(List(StructField("name", 
StringType),StructField("age", IntegerType)))
    > > +   *   createParquetFile(schema, 
"path/to/file.parquet").registerTempTable("people")
    > > +   *   sql("INSERT INTO people SELECT 'michael', 29")
    > > +   * }}}
    > > +   *
    > > +   * @param schema StructType describing the records to be stored in 
the Parquet file.
    > > +   * @param path The path where the directory containing parquet 
metadata should be created.
    > > +   *             Data inserted into this table will also be stored at 
this location.
    > > +   * @param allowExisting When false, an exception will be thrown if 
this directory already exists.
    > > +   * @param conf A Hadoop configuration object that can be used to 
specify options to the parquet
    > > +   *             output format.
    > > +   *
    > > +   * @group userf
    > > +   */
    > > +  @Experimental
    > > +  def createParquetFile(
    >
    > I kind of think createEmptyParquetFile would be a better name for this
    > method, since most Parquet files have data I'd think
    >
    > â
    > Reply to this email directly or view it on GitHub
    > <https://github.com/apache/spark/pull/3882/files#r22428199>.
    >



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-5061][Alex Baretta] SQLContext: overloa...

Reply via email to