[GitHub] spark pull request: Spark-2789: Apply names to RDD to create DataF...

2015-02-09 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/4421#issuecomment-73625791 Types need to exist, but names don't. They can just be random column names like _1, _2, _3. In Scala, if you import sqlContext.implicits._, then any RDD[Product]

[GitHub] spark pull request: Spark-2789: Apply names to RDD to create DataF...

2015-02-09 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/4421#issuecomment-73631311 I just talked to @davies offline. He is going to submit a PR that adds createDataFrame with named columns. I think we can roll this into that one and close this PR. Would

[GitHub] spark pull request: Spark-2789: Apply names to RDD to create DataF...

2015-02-09 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/4421#issuecomment-73621476 @dwmclary thanks for submitting this. I think this is similar to the toDataFrame method that supports renaming, isn't it? --- If your project is set up for it, you can

[GitHub] spark pull request: Spark-2789: Apply names to RDD to create DataF...

2015-02-09 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/4421#issuecomment-73621532 In particular, I'm talking about

[GitHub] spark pull request: Spark-2789: Apply names to RDD to create DataF...

2015-02-09 Thread dwmclary
Github user dwmclary commented on the pull request: https://github.com/apache/spark/pull/4421#issuecomment-73623236 Reynold, It is similar, but I think the distinction here is that toDataFrame appears to require that old names (and a schema) exist. Or, at least

[GitHub] spark pull request: Spark-2789: Apply names to RDD to create DataF...

2015-02-09 Thread dwmclary
Github user dwmclary commented on the pull request: https://github.com/apache/spark/pull/4421#issuecomment-73626452 Ah, yes, I see that now. Python doesn't seem to have a toDataFrame, so maybe the logical thing to do here is to just do a new PR with a Python implementation

[GitHub] spark pull request: Spark-2789: Apply names to RDD to create DataF...

2015-02-09 Thread dwmclary
Github user dwmclary commented on the pull request: https://github.com/apache/spark/pull/4421#issuecomment-73626542 Or, I guess I can just do it in this PR if you don't mind it changing a bunch. On Mon, Feb 9, 2015 at 5:18 PM, Dan McClary dan.mccl...@gmail.com wrote:

[GitHub] spark pull request: Spark-2789: Apply names to RDD to create DataF...

2015-02-09 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/4421#issuecomment-73627424 Adding toDataFrame to Python DataFrame is a great idea. You can do it in this PR if you want (make sure you update the title). Also - you might want to do it on

[GitHub] spark pull request: Spark-2789: Apply names to RDD to create DataF...

2015-02-09 Thread dwmclary
Github user dwmclary commented on the pull request: https://github.com/apache/spark/pull/4421#issuecomment-73632890 Sounds like a plan -- I'll do it on top of #4479. Thought: I've added a getReservedWords private method to SQLContext.scala. I feel like leaving that there

[GitHub] spark pull request: Spark-2789: Apply names to RDD to create DataF...

2015-02-08 Thread dwmclary
Github user dwmclary commented on the pull request: https://github.com/apache/spark/pull/4421#issuecomment-73325875 Updated to keep reserved words in the JVM. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: Spark-2789: Apply names to RDD to create DataF...

2015-02-06 Thread dwmclary
GitHub user dwmclary opened a pull request: https://github.com/apache/spark/pull/4421 Spark-2789: Apply names to RDD to create DataFrame This seemed like a reasonably useful function to add to SparkSQL. However, unlike the [JIRA](https://issues.apache.org/jira/browse/SPARK-2789),

[GitHub] spark pull request: Spark-2789: Apply names to RDD to create DataF...

2015-02-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4421#issuecomment-73201347 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: Spark-2789: Apply names to RDD to create DataF...

2015-02-06 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/4421#discussion_r24247234 --- Diff: python/pyspark/sql.py --- @@ -1469,6 +1470,44 @@ def applySchema(self, rdd, schema): df =

[GitHub] spark pull request: Spark-2789: Apply names to RDD to create DataF...

2015-02-06 Thread dwmclary
Github user dwmclary commented on a diff in the pull request: https://github.com/apache/spark/pull/4421#discussion_r24253601 --- Diff: python/pyspark/sql.py --- @@ -1469,6 +1470,44 @@ def applySchema(self, rdd, schema): df =