Github user alexbaretta commented on a diff in the pull request:
https://github.com/apache/spark/pull/3882#discussion_r22428318
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala ---
@@ -269,6 +269,43 @@ class SQLContext(@transient val sparkContext:
SparkContext)
path, ScalaReflection.attributesFor[A], allowExisting, conf, this))
}
+
+ /**
+ * :: Experimental ::
+ * Creates an empty parquet file with the provided schema. The parquet
file thus created
+ * can be registered as a table, which can then be used as the target of
future
+ * `insertInto` operations.
+ *
+ * {{{
+ * val sqlContext = new SQLContext(...)
+ * import sqlContext._
+ *
+ * val schema = StructType(List(StructField("name",
StringType),StructField("age", IntegerType)))
+ * createParquetFile(schema,
"path/to/file.parquet").registerTempTable("people")
+ * sql("INSERT INTO people SELECT 'michael', 29")
+ * }}}
+ *
+ * @param schema StructType describing the records to be stored in the
Parquet file.
+ * @param path The path where the directory containing parquet metadata
should be created.
+ * Data inserted into this table will also be stored at this
location.
+ * @param allowExisting When false, an exception will be thrown if this
directory already exists.
+ * @param conf A Hadoop configuration object that can be used to specify
options to the parquet
+ * output format.
+ *
+ * @group userf
+ */
+ @Experimental
+ def createParquetFile(
--- End diff --
Andrew,
OK, but keep in mind that my patch overloads an existing method. If you
think createParquetFile should be renamed to createEmptyParquetFile you
should probably file a separate JIRA.
Also, arguably "creating a file" implies that it is empty.
Alex
On Jan 2, 2015 5:11 PM, "Andrew Ash" <[email protected]> wrote:
> In sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala
> <https://github.com/apache/spark/pull/3882#discussion-diff-22428199>:
>
> > + * val schema = StructType(List(StructField("name",
StringType),StructField("age", IntegerType)))
> > + * createParquetFile(schema,
"path/to/file.parquet").registerTempTable("people")
> > + * sql("INSERT INTO people SELECT 'michael', 29")
> > + * }}}
> > + *
> > + * @param schema StructType describing the records to be stored in
the Parquet file.
> > + * @param path The path where the directory containing parquet
metadata should be created.
> > + * Data inserted into this table will also be stored at
this location.
> > + * @param allowExisting When false, an exception will be thrown if
this directory already exists.
> > + * @param conf A Hadoop configuration object that can be used to
specify options to the parquet
> > + * output format.
> > + *
> > + * @group userf
> > + */
> > + @Experimental
> > + def createParquetFile(
>
> I kind of think createEmptyParquetFile would be a better name for this
> method, since most Parquet files have data I'd think
>
> â
> Reply to this email directly or view it on GitHub
> <https://github.com/apache/spark/pull/3882/files#r22428199>.
>
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]