Github user kiszk commented on the issue:
https://github.com/apache/spark/pull/13680
@cloud-fan , I have one question about null field. Should we put zero into
the corresponding field to position where ```setNullAt()``` is called as
```UnsafeRow```
[does](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java#L194).
If we avoid to put zero, this avoidance affects two properties.
1. [row
equality](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java#L197)
2. undetermined value may be included in the array returned by
```UnsafeArrayData.to<PrimitiveType>Array()```
In my current implementation, a width of each element depends on element
type (4: Int, 8: Double, etc). Thus, it is hard to do the same approach as it
[did](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java#L194).
Since ```UnsafeRow``` always use 8 bytes per field. Since we want to make
data conversion fast between primitive array and unsafe array, we have to keep
the type-based element-width (e.g. 4: Int, 8: Double, etc).
What do you think? Should we have to keep the above two property by
clearing a field?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]