[GitHub] spark pull request: [SPARK-5061][Alex Baretta] SQLContext: overloa...

2015-01-10 Thread marmbrus
Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/3957#issuecomment-69476143
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5061][Alex Baretta] SQLContext: overloa...

2015-01-10 Thread marmbrus
Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/3957#issuecomment-69476187
  
The implementation here looks reasonable.  @alexbaretta can you elaborate 
on what your use case is?  We are doing a clean-up of the API and I was 
actually wondering if we should remove these methods that just create parquet 
metadata files.

Also please change the PR title to `[SPARK-5061][SQL] SQLContext: overload 
createParquetFile`

/cc @rxin 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5061][Alex Baretta] SQLContext: overloa...

2015-01-10 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3957#issuecomment-69476253
  
  [Test build #25365 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25365/consoleFull)
 for   PR 3957 at commit 
[`bea3f33`](https://github.com/apache/spark/commit/bea3f3306d45342c2acdd8a53ac6eccac61bef04).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5061][Alex Baretta] SQLContext: overloa...

2015-01-08 Thread alexbaretta
Github user alexbaretta closed the pull request at:

https://github.com/apache/spark/pull/3882


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5061][Alex Baretta] SQLContext: overloa...

2015-01-08 Thread alexbaretta
Github user alexbaretta commented on the pull request:

https://github.com/apache/spark/pull/3882#issuecomment-69256448
  
In retrospect amending my commit might not have been the right thing to 
do... Any feedback on how to properly amend a PR would be appreciated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5061][Alex Baretta] SQLContext: overloa...

2015-01-08 Thread alexbaretta
GitHub user alexbaretta opened a pull request:

https://github.com/apache/spark/pull/3957

[SPARK-5061][Alex Baretta] SQLContext: overload createParquetFile

Overload taking a StructType instead of TypeTag


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/alexbaretta/spark createParquetFile

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3957.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3957


commit bea3f3306d45342c2acdd8a53ac6eccac61bef04
Author: Alex Baretta a...@planalechmy.com
Date:   2014-12-27T02:29:29Z

[Alex Baretta] SQLContext: overload createParquetFile

Overload taking a StructType instead of TypeTag




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5061][Alex Baretta] SQLContext: overloa...

2015-01-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3957#issuecomment-69256963
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5061][Alex Baretta] SQLContext: overloa...

2015-01-08 Thread nchammas
Github user nchammas commented on the pull request:

https://github.com/apache/spark/pull/3957#issuecomment-69260948
  
cc @marmbrus

Btw @alexbaretta, when/if this PR gets merged into the codebase, your full 
name (not just your GitHub username) will be used as the author, so there's no 
need to put your name in the PR title.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5061][Alex Baretta] SQLContext: overloa...

2015-01-08 Thread alexbaretta
Github user alexbaretta commented on the pull request:

https://github.com/apache/spark/pull/3957#issuecomment-69262680
  
@nchammas Thanks for the tip. I am obviously a newbie here, so I greatly 
value feedback.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5061][Alex Baretta] SQLContext: overloa...

2015-01-02 Thread alexbaretta
Github user alexbaretta commented on a diff in the pull request:

https://github.com/apache/spark/pull/3882#discussion_r22428318
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala ---
@@ -269,6 +269,43 @@ class SQLContext(@transient val sparkContext: 
SparkContext)
 path, ScalaReflection.attributesFor[A], allowExisting, conf, this))
   }
 
+
+  /**
+   * :: Experimental ::
+   * Creates an empty parquet file with the provided schema. The parquet 
file thus created
+   * can be registered as a table, which can then be used as the target of 
future
+   * `insertInto` operations.
+   *
+   * {{{
+   *   val sqlContext = new SQLContext(...)
+   *   import sqlContext._
+   *
+   *   val schema = StructType(List(StructField(name, 
StringType),StructField(age, IntegerType)))
+   *   createParquetFile(schema, 
path/to/file.parquet).registerTempTable(people)
+   *   sql(INSERT INTO people SELECT 'michael', 29)
+   * }}}
+   *
+   * @param schema StructType describing the records to be stored in the 
Parquet file.
+   * @param path The path where the directory containing parquet metadata 
should be created.
+   * Data inserted into this table will also be stored at this 
location.
+   * @param allowExisting When false, an exception will be thrown if this 
directory already exists.
+   * @param conf A Hadoop configuration object that can be used to specify 
options to the parquet
+   * output format.
+   *
+   * @group userf
+   */
+  @Experimental
+  def createParquetFile(
--- End diff --

Andrew,

OK, but keep in mind that my patch overloads an existing method. If you
think createParquetFile should be renamed to createEmptyParquetFile you
should probably file a separate JIRA.

Also, arguably creating a file implies that it is empty.

Alex
On Jan 2, 2015 5:11 PM, Andrew Ash notificati...@github.com wrote:

 In sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala
 https://github.com/apache/spark/pull/3882#discussion-diff-22428199:

  +   *   val schema = StructType(List(StructField(name, 
StringType),StructField(age, IntegerType)))
  +   *   createParquetFile(schema, 
path/to/file.parquet).registerTempTable(people)
  +   *   sql(INSERT INTO people SELECT 'michael', 29)
  +   * }}}
  +   *
  +   * @param schema StructType describing the records to be stored in 
the Parquet file.
  +   * @param path The path where the directory containing parquet 
metadata should be created.
  +   * Data inserted into this table will also be stored at 
this location.
  +   * @param allowExisting When false, an exception will be thrown if 
this directory already exists.
  +   * @param conf A Hadoop configuration object that can be used to 
specify options to the parquet
  +   * output format.
  +   *
  +   * @group userf
  +   */
  +  @Experimental
  +  def createParquetFile(

 I kind of think createEmptyParquetFile would be a better name for this
 method, since most Parquet files have data I'd think

 —
 Reply to this email directly or view it on GitHub
 https://github.com/apache/spark/pull/3882/files#r22428199.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5061][Alex Baretta] SQLContext: overloa...

2015-01-02 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3882#issuecomment-68577765
  
  [Test build #25000 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25000/consoleFull)
 for   PR 3882 at commit 
[`f6e40b5`](https://github.com/apache/spark/commit/f6e40b50c4aca9372c51d1337d559fc9cf50108d).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5061][Alex Baretta] SQLContext: overloa...

2015-01-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3882#issuecomment-68577767
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25000/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5061][Alex Baretta] SQLContext: overloa...

2015-01-02 Thread alexbaretta
GitHub user alexbaretta opened a pull request:

https://github.com/apache/spark/pull/3882

[SPARK-5061][Alex Baretta] SQLContext: overload createParquetFile

Overload of createParquetFile taking a StructType instead of a TypeTag

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/alexbaretta/spark createParquetFile

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3882.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3882


commit f6e40b50c4aca9372c51d1337d559fc9cf50108d
Author: Alex Baretta a...@planalechmy.com
Date:   2014-12-27T02:29:29Z

[Alex Baretta] SQLContext: overload createParquetFile

Overload taking a StructType instead of TypeTag




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5061][Alex Baretta] SQLContext: overloa...

2015-01-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3882#issuecomment-68567852
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5061][Alex Baretta] SQLContext: overloa...

2015-01-02 Thread ash211
Github user ash211 commented on a diff in the pull request:

https://github.com/apache/spark/pull/3882#discussion_r22428199
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala ---
@@ -269,6 +269,43 @@ class SQLContext(@transient val sparkContext: 
SparkContext)
 path, ScalaReflection.attributesFor[A], allowExisting, conf, this))
   }
 
+
+  /**
+   * :: Experimental ::
+   * Creates an empty parquet file with the provided schema. The parquet 
file thus created
+   * can be registered as a table, which can then be used as the target of 
future
+   * `insertInto` operations.
+   *
+   * {{{
+   *   val sqlContext = new SQLContext(...)
+   *   import sqlContext._
+   *
+   *   val schema = StructType(List(StructField(name, 
StringType),StructField(age, IntegerType)))
+   *   createParquetFile(schema, 
path/to/file.parquet).registerTempTable(people)
+   *   sql(INSERT INTO people SELECT 'michael', 29)
+   * }}}
+   *
+   * @param schema StructType describing the records to be stored in the 
Parquet file.
+   * @param path The path where the directory containing parquet metadata 
should be created.
+   * Data inserted into this table will also be stored at this 
location.
+   * @param allowExisting When false, an exception will be thrown if this 
directory already exists.
+   * @param conf A Hadoop configuration object that can be used to specify 
options to the parquet
+   * output format.
+   *
+   * @group userf
+   */
+  @Experimental
+  def createParquetFile(
--- End diff --

I kind of think createEmptyParquetFile would be a better name for this 
method, since most Parquet files have data I'd think


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5061][Alex Baretta] SQLContext: overloa...

2015-01-02 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3882#issuecomment-68577739
  
  [Test build #25000 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25000/consoleFull)
 for   PR 3882 at commit 
[`f6e40b5`](https://github.com/apache/spark/commit/f6e40b50c4aca9372c51d1337d559fc9cf50108d).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5061][Alex Baretta] SQLContext: overloa...

2015-01-02 Thread ash211
Github user ash211 commented on the pull request:

https://github.com/apache/spark/pull/3882#issuecomment-68577529
  
Jenkins this is ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org