GitHub user HyukjinKwon opened a pull request:
https://github.com/apache/spark/pull/13771
[SPARK-13748][PYSPARK][DOC] Add the description for explictly setting None
for a named argument for ROW
## What changes were proposed in this pull request?
It seems allowed to not set a key and value for a dict to represent the
value is `None` or missing as below:
```python
spark.createDataFrame([{"x": 1}, {"y": 2}]).show()
```
```
+----+----+
| x| y|
+----+----+
| 1|null|
|null| 2|
+----+----+
```
However, it seems it is not for `Row` as below:
```python
spark.createDataFrame([Row(x=1), Row(y=2)]).show()
```
```scala
16/06/19 16:25:56 ERROR Executor: Exception in task 6.0 in stage 66.0 (TID
316)
java.lang.IllegalStateException: Input row doesn't have expected number of
values required by the schema. 2 fields are required while 1 values are
provided.
at
org.apache.spark.sql.execution.python.EvaluatePython$.fromJava(EvaluatePython.scala:147)
at
org.apache.spark.sql.SparkSession$$anonfun$7.apply(SparkSession.scala:656)
at
org.apache.spark.sql.SparkSession$$anonfun$7.apply(SparkSession.scala:656)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at
org.apache.spark.sql.execution.SparkPlan$$anonfun$4.apply(SparkPlan.scala:247)
at
org.apache.spark.sql.execution.SparkPlan$$anonfun$4.apply(SparkPlan.scala:240)
at
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:780)
```
The behaviour seems right but it seems it might confuse users just like
this JIRA was reported.
This PR adds the explanation for `Row` class.
## How was this patch tested?
N/A
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/HyukjinKwon/spark SPARK-13748
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/13771.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #13771
----
commit b9705dd5b0e29a57599eb0cd39cda45b3e7c4ac5
Author: hyukjinkwon <[email protected]>
Date: 2016-06-19T07:17:15Z
Add the description for explictly setting None for a named argument
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]