Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/20280
After looking into this, it seems like the behavior of the `Row` class is
as follows:
If a `Row` is made from kwargs, then the order of the fields can not be
relied upon and whenever accessing data, it must be done like a dict with the
field name. When this is the case, the order of the supplied schema doesn't
matter but the field name must be a subset of what is in each row.
If a `Row` is made from generating a custom class, like `TestRow =
Row("key", "value")` then `row = TestRow('a', 1)`, then the position of each
element is what is important and data is accessed by position in the tuple.
The supplied schema for this must match the types of the rows exactly, however
field names are not important and can be changed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]