GitHub user HyukjinKwon opened a pull request:

    https://github.com/apache/spark/pull/13771

    [SPARK-13748][PYSPARK][DOC] Add the description for explictly setting None 
for a named argument for ROW

    ## What changes were proposed in this pull request?
    
    
    It seems allowed to not set a key and value for a dict to represent the 
value is `None` or missing as below:
    
    ```python
    spark.createDataFrame([{"x": 1}, {"y": 2}]).show()
    ```
    
    ```
    +----+----+
    |   x|   y|
    +----+----+
    |   1|null|
    |null|   2|
    +----+----+
    ```
    
    However,  it seems it is not for `Row` as below:
    
    ```python
    spark.createDataFrame([Row(x=1), Row(y=2)]).show()
    ```
    
    ```scala
    16/06/19 16:25:56 ERROR Executor: Exception in task 6.0 in stage 66.0 (TID 
316)
    java.lang.IllegalStateException: Input row doesn't have expected number of 
values required by the schema. 2 fields are required while 1 values are 
provided.
        at 
org.apache.spark.sql.execution.python.EvaluatePython$.fromJava(EvaluatePython.scala:147)
        at 
org.apache.spark.sql.SparkSession$$anonfun$7.apply(SparkSession.scala:656)
        at 
org.apache.spark.sql.SparkSession$$anonfun$7.apply(SparkSession.scala:656)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
        at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$4.apply(SparkPlan.scala:247)
        at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$4.apply(SparkPlan.scala:240)
        at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:780)
    ```
    
    The behaviour seems right but it seems it might confuse users just like 
this JIRA was reported.
    
    This PR adds the explanation for `Row` class. 
    
    ## How was this patch tested?
    
    N/A


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/HyukjinKwon/spark SPARK-13748

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/13771.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #13771
    
----
commit b9705dd5b0e29a57599eb0cd39cda45b3e7c4ac5
Author: hyukjinkwon <[email protected]>
Date:   2016-06-19T07:17:15Z

    Add the description for explictly setting None for a named argument

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to