GitHub user superbobry opened a pull request:

    https://github.com/apache/spark/pull/19369

    [SPARK-22147][CORE] Removed redundant allocations from BlockId

    ## What changes were proposed in this pull request?
    
    Prior to this commit BlockId.hashCode and BlockId.equals were defined
    in terms of BlockId.name. This allowed the subclasses to be concise and
    enforced BlockId.name as a single unique identifier for a block. All
    subclasses override BlockId.name with an expression involving an
    allocation of StringBuilder and ultimatelly String. This is suboptimal
    since it induced unnecessary GC pressure on the dirver, see
    BlockManagerMasterEndpoint.
    
    The commit removes the definition of hashCode and equals from the base
    class. No other change is necessary since all subclasses are in fact
    case classes and therefore have auto-generated hashCode and equals. No
    change of behaviour is expected.
    
    Sidenote: you might be wondering, why did the subclasses use the base
    implementation and the auto-generated one? Apparently, this behaviour
    is documented in the spec. See this SO answer for details
    https://stackoverflow.com/a/44990210/262432.
    
    ## How was this patch tested?
    
    BlockIdSuite

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/criteo-forks/spark blockid-equals-hashcode

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19369.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19369
    
----
commit 3b1cb50dd1481ee4edbe767916c187709cee2de8
Author: Sergei Lebedev <[email protected]>
Date:   2017-09-27T16:13:45Z

    [SPARK-22147][CORE] Removed redundant allocations from BlockId
    
    Prior to this commit BlockId.hashCode and BlockId.equals were defined
    in terms of BlockId.name. This allowed the subclasses to be concise and
    enforced BlockId.name as a single unique identifier for a block. All
    subclasses override BlockId.name with an expression involving an
    allocation of StringBuilder and ultimatelly String. This is suboptimal
    since it induced unnecessary GC pressure on the dirver, see
    BlockManagerMasterEndpoint.
    
    The commit removes the definition of hashCode and equals from the base
    class. No other change is necessary since all subclasses are in fact
    case classes and therefore have auto-generated hashCode and equals. No
    change of behaviour is expected.
    
    Sidenote: you might be wondering, why did the subclasses use the base
    implementation and the auto-generated one? Apparently, this behaviour
    is documented in the spec. See this SO answer for details
    https://stackoverflow.com/a/44990210/262432.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to