Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5984#discussion_r29909028
  
    --- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java
 ---
    @@ -223,6 +227,16 @@ public void setString(int ordinal, String value) {
       }
     
       @Override
    +  public void setDate(int ordinal, Date value) {
    +    throw new UnsupportedOperationException();
    +  }
    +
    +  @Override
    +  public void setTimestamp(int ordinal, Timestamp value) {
    --- End diff --
    
    Although `Timestamp` stores data in the variable-length field, the width of 
that variable-length data won't change once it's stored or updated, so I think 
we can actually support in-place updates for timestamp fields.  One tricky case 
will be to handle cases where the timestamp column is initially null then we 
choose to set a new value.  In order to be able to safely hash rows by hashing 
their bytes, we need to ensure that identical rows are bitwise identical. To 
achieve this, the current `setNull()` method zeros out the field in the 
fixed-width section.  However, I don't think that we can safely do that for 
variable-length fields that support updates (such as Timestamp) because then 
we'll no longer be able to determine where that field's variable-length data 
can be found (since we'll be zeroing out the pointer).  If we could do a 
type-specific `setNull` method, then we could change that to only update the 
null bitmap and to zero out the variable-length data, but leave the fixed-
 length offset unchanged.
    
    @rxin, do you have any ideas for how we should handle this?  Refactoring 
the internal row interfaces / codegen to call type-specific `setNull()` methods 
might be pretty hard.  @marmbrus, is there any API contract constraining what 
types of columns `setNull()` will be called for?  I'm wondering if we could do 
something like only call `setNull()` for primitives, but call something 
`setTimestamp(null)` or `setString(null)` for the non-primitive types.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to