[GitHub] spark pull request: [SPARK-5061][Alex Baretta] SQLContext: overloa...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/3957#issuecomment-69476143 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5061][Alex Baretta] SQLContext: overloa...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/3957#issuecomment-69476187 The implementation here looks reasonable. @alexbaretta can you elaborate on what your use case is? We are doing a clean-up of the API and I was actually wondering if we should remove these methods that just create parquet metadata files. Also please change the PR title to `[SPARK-5061][SQL] SQLContext: overload createParquetFile` /cc @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5061][Alex Baretta] SQLContext: overloa...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3957#issuecomment-69476253 [Test build #25365 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25365/consoleFull) for PR 3957 at commit [`bea3f33`](https://github.com/apache/spark/commit/bea3f3306d45342c2acdd8a53ac6eccac61bef04). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5061][Alex Baretta] SQLContext: overloa...
Github user alexbaretta closed the pull request at: https://github.com/apache/spark/pull/3882 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5061][Alex Baretta] SQLContext: overloa...
Github user alexbaretta commented on the pull request: https://github.com/apache/spark/pull/3882#issuecomment-69256448 In retrospect amending my commit might not have been the right thing to do... Any feedback on how to properly amend a PR would be appreciated. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5061][Alex Baretta] SQLContext: overloa...
GitHub user alexbaretta opened a pull request: https://github.com/apache/spark/pull/3957 [SPARK-5061][Alex Baretta] SQLContext: overload createParquetFile Overload taking a StructType instead of TypeTag You can merge this pull request into a Git repository by running: $ git pull https://github.com/alexbaretta/spark createParquetFile Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3957.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3957 commit bea3f3306d45342c2acdd8a53ac6eccac61bef04 Author: Alex Baretta a...@planalechmy.com Date: 2014-12-27T02:29:29Z [Alex Baretta] SQLContext: overload createParquetFile Overload taking a StructType instead of TypeTag --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5061][Alex Baretta] SQLContext: overloa...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3957#issuecomment-69256963 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5061][Alex Baretta] SQLContext: overloa...
Github user nchammas commented on the pull request: https://github.com/apache/spark/pull/3957#issuecomment-69260948 cc @marmbrus Btw @alexbaretta, when/if this PR gets merged into the codebase, your full name (not just your GitHub username) will be used as the author, so there's no need to put your name in the PR title. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5061][Alex Baretta] SQLContext: overloa...
Github user alexbaretta commented on the pull request: https://github.com/apache/spark/pull/3957#issuecomment-69262680 @nchammas Thanks for the tip. I am obviously a newbie here, so I greatly value feedback. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5061][Alex Baretta] SQLContext: overloa...
Github user alexbaretta commented on a diff in the pull request: https://github.com/apache/spark/pull/3882#discussion_r22428318 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala --- @@ -269,6 +269,43 @@ class SQLContext(@transient val sparkContext: SparkContext) path, ScalaReflection.attributesFor[A], allowExisting, conf, this)) } + + /** + * :: Experimental :: + * Creates an empty parquet file with the provided schema. The parquet file thus created + * can be registered as a table, which can then be used as the target of future + * `insertInto` operations. + * + * {{{ + * val sqlContext = new SQLContext(...) + * import sqlContext._ + * + * val schema = StructType(List(StructField(name, StringType),StructField(age, IntegerType))) + * createParquetFile(schema, path/to/file.parquet).registerTempTable(people) + * sql(INSERT INTO people SELECT 'michael', 29) + * }}} + * + * @param schema StructType describing the records to be stored in the Parquet file. + * @param path The path where the directory containing parquet metadata should be created. + * Data inserted into this table will also be stored at this location. + * @param allowExisting When false, an exception will be thrown if this directory already exists. + * @param conf A Hadoop configuration object that can be used to specify options to the parquet + * output format. + * + * @group userf + */ + @Experimental + def createParquetFile( --- End diff -- Andrew, OK, but keep in mind that my patch overloads an existing method. If you think createParquetFile should be renamed to createEmptyParquetFile you should probably file a separate JIRA. Also, arguably creating a file implies that it is empty. Alex On Jan 2, 2015 5:11 PM, Andrew Ash notificati...@github.com wrote: In sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala https://github.com/apache/spark/pull/3882#discussion-diff-22428199: + * val schema = StructType(List(StructField(name, StringType),StructField(age, IntegerType))) + * createParquetFile(schema, path/to/file.parquet).registerTempTable(people) + * sql(INSERT INTO people SELECT 'michael', 29) + * }}} + * + * @param schema StructType describing the records to be stored in the Parquet file. + * @param path The path where the directory containing parquet metadata should be created. + * Data inserted into this table will also be stored at this location. + * @param allowExisting When false, an exception will be thrown if this directory already exists. + * @param conf A Hadoop configuration object that can be used to specify options to the parquet + * output format. + * + * @group userf + */ + @Experimental + def createParquetFile( I kind of think createEmptyParquetFile would be a better name for this method, since most Parquet files have data I'd think â Reply to this email directly or view it on GitHub https://github.com/apache/spark/pull/3882/files#r22428199. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5061][Alex Baretta] SQLContext: overloa...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3882#issuecomment-68577765 [Test build #25000 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25000/consoleFull) for PR 3882 at commit [`f6e40b5`](https://github.com/apache/spark/commit/f6e40b50c4aca9372c51d1337d559fc9cf50108d). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5061][Alex Baretta] SQLContext: overloa...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3882#issuecomment-68577767 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25000/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5061][Alex Baretta] SQLContext: overloa...
GitHub user alexbaretta opened a pull request: https://github.com/apache/spark/pull/3882 [SPARK-5061][Alex Baretta] SQLContext: overload createParquetFile Overload of createParquetFile taking a StructType instead of a TypeTag You can merge this pull request into a Git repository by running: $ git pull https://github.com/alexbaretta/spark createParquetFile Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3882.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3882 commit f6e40b50c4aca9372c51d1337d559fc9cf50108d Author: Alex Baretta a...@planalechmy.com Date: 2014-12-27T02:29:29Z [Alex Baretta] SQLContext: overload createParquetFile Overload taking a StructType instead of TypeTag --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5061][Alex Baretta] SQLContext: overloa...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3882#issuecomment-68567852 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5061][Alex Baretta] SQLContext: overloa...
Github user ash211 commented on a diff in the pull request: https://github.com/apache/spark/pull/3882#discussion_r22428199 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala --- @@ -269,6 +269,43 @@ class SQLContext(@transient val sparkContext: SparkContext) path, ScalaReflection.attributesFor[A], allowExisting, conf, this)) } + + /** + * :: Experimental :: + * Creates an empty parquet file with the provided schema. The parquet file thus created + * can be registered as a table, which can then be used as the target of future + * `insertInto` operations. + * + * {{{ + * val sqlContext = new SQLContext(...) + * import sqlContext._ + * + * val schema = StructType(List(StructField(name, StringType),StructField(age, IntegerType))) + * createParquetFile(schema, path/to/file.parquet).registerTempTable(people) + * sql(INSERT INTO people SELECT 'michael', 29) + * }}} + * + * @param schema StructType describing the records to be stored in the Parquet file. + * @param path The path where the directory containing parquet metadata should be created. + * Data inserted into this table will also be stored at this location. + * @param allowExisting When false, an exception will be thrown if this directory already exists. + * @param conf A Hadoop configuration object that can be used to specify options to the parquet + * output format. + * + * @group userf + */ + @Experimental + def createParquetFile( --- End diff -- I kind of think createEmptyParquetFile would be a better name for this method, since most Parquet files have data I'd think --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5061][Alex Baretta] SQLContext: overloa...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3882#issuecomment-68577739 [Test build #25000 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25000/consoleFull) for PR 3882 at commit [`f6e40b5`](https://github.com/apache/spark/commit/f6e40b50c4aca9372c51d1337d559fc9cf50108d). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5061][Alex Baretta] SQLContext: overloa...
Github user ash211 commented on the pull request: https://github.com/apache/spark/pull/3882#issuecomment-68577529 Jenkins this is ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org