Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/6959#discussion_r33198118
  
    --- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java
 ---
    @@ -171,8 +122,64 @@ private void setNotNullAt(int i) {
       }
     
       @Override
    -  public void update(int ordinal, Object value) {
    -    throw new UnsupportedOperationException();
    +  public void update(int i, Object value) {
    +    if (value == null) {
    +      if (!isNullAt(i)) {
    +        // remove the old value from pool
    +        long idx = getLong(i);
    +        if (idx <= 0) {
    +          // this is the index of old value in pool, remove it
    +          pool.replace((int)-idx, null);
    +        } else {
    +          // there will be some garbage left (UTF8String or byte[])
    --- End diff --
    
    I guess one consequence of this is that equality comparisons of modified 
UnsafeRows won't work as expected if we just compare the bytes, since creating 
an UnsafeRow with a null value for a particular string / byte column produces a 
row with different bytes than if we had created a row with a non-null value for 
the column then subsequently set it to null.
    
    This wasn't a problem before because we didn't allow updates to 
string-valued columns, but it might be a problem here for hashing / comparison 
of rows.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to