Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/22140
@gatorsmile it seemed like a straightforward bug to me. Rows with extra
values lead to incorrect output and exceptions when used in `DataFrames`, so it
did not seem like there was any possible this would break existing code. For
example
```
In [1]: MyRow = Row('a','b')
In [2]: print(MyRow(1,2,3))
Row(a=1, b=2)
In [3]: spark.createDataFrame([MyRow(1,2,3)])
Out[3]: DataFrame[a: bigint, b: bigint]
In [4]: spark.createDataFrame([MyRow(1,2,3)]).show()
18/09/08 21:55:48 ERROR Executor: Exception in task 2.0 in stage 2.0 (TID 7)
java.lang.IllegalStateException: Input row doesn't have expected number of
values required by the schema. 2 fields are required while 3 values are
provided.
In [5]: spark.createDataFrame([MyRow(1,2,3)], schema="x: int, y:
int").show()
ValueError: Length of object (3) does not match with length of fields (2)
```
Maybe I was too hasty with backporting and this needed some discussion. Do
you know of a use case that this change would break?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]