Github user rxin commented on the pull request:
https://github.com/apache/spark/pull/4421#issuecomment-73625791
Types need to exist, but names don't. They can just be random column names
like _1, _2, _3.
In Scala, if you import sqlContext.implicits._, then any RDD[Product]
Github user rxin commented on the pull request:
https://github.com/apache/spark/pull/4421#issuecomment-73631311
I just talked to @davies offline. He is going to submit a PR that adds
createDataFrame with named columns. I think we can roll this into that one and
close this PR. Would
Github user rxin commented on the pull request:
https://github.com/apache/spark/pull/4421#issuecomment-73621476
@dwmclary thanks for submitting this. I think this is similar to the
toDataFrame method that supports renaming, isn't it?
---
If your project is set up for it, you can
Github user rxin commented on the pull request:
https://github.com/apache/spark/pull/4421#issuecomment-73621532
In particular, I'm talking about
Github user dwmclary commented on the pull request:
https://github.com/apache/spark/pull/4421#issuecomment-73623236
Reynold,
It is similar, but I think the distinction here is that toDataFrame
appears to require that old names (and a schema) exist. Or, at least
Github user dwmclary commented on the pull request:
https://github.com/apache/spark/pull/4421#issuecomment-73626452
Ah, yes, I see that now.
Python doesn't seem to have a toDataFrame, so maybe the logical thing to do
here is to just do a new PR with a Python implementation
Github user dwmclary commented on the pull request:
https://github.com/apache/spark/pull/4421#issuecomment-73626542
Or, I guess I can just do it in this PR if you don't mind it changing a
bunch.
On Mon, Feb 9, 2015 at 5:18 PM, Dan McClary dan.mccl...@gmail.com wrote:
Github user rxin commented on the pull request:
https://github.com/apache/spark/pull/4421#issuecomment-73627424
Adding toDataFrame to Python DataFrame is a great idea. You can do it in
this PR if you want (make sure you update the title).
Also - you might want to do it on
Github user dwmclary commented on the pull request:
https://github.com/apache/spark/pull/4421#issuecomment-73632890
Sounds like a plan -- I'll do it on top of #4479.
Thought: I've added a getReservedWords private method to SQLContext.scala.
I feel like leaving that there
Github user dwmclary commented on the pull request:
https://github.com/apache/spark/pull/4421#issuecomment-73325875
Updated to keep reserved words in the JVM.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
GitHub user dwmclary opened a pull request:
https://github.com/apache/spark/pull/4421
Spark-2789: Apply names to RDD to create DataFrame
This seemed like a reasonably useful function to add to SparkSQL. However,
unlike the [JIRA](https://issues.apache.org/jira/browse/SPARK-2789),
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/4421#issuecomment-73201347
Can one of the admins verify this patch?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user JoshRosen commented on a diff in the pull request:
https://github.com/apache/spark/pull/4421#discussion_r24247234
--- Diff: python/pyspark/sql.py ---
@@ -1469,6 +1470,44 @@ def applySchema(self, rdd, schema):
df =
Github user dwmclary commented on a diff in the pull request:
https://github.com/apache/spark/pull/4421#discussion_r24253601
--- Diff: python/pyspark/sql.py ---
@@ -1469,6 +1470,44 @@ def applySchema(self, rdd, schema):
df =
14 matches
Mail list logo