GitHub user superbobry opened a pull request:
https://github.com/apache/spark/pull/19369
[SPARK-22147][CORE] Removed redundant allocations from BlockId
## What changes were proposed in this pull request?
Prior to this commit BlockId.hashCode and BlockId.equals were defined
in terms of BlockId.name. This allowed the subclasses to be concise and
enforced BlockId.name as a single unique identifier for a block. All
subclasses override BlockId.name with an expression involving an
allocation of StringBuilder and ultimatelly String. This is suboptimal
since it induced unnecessary GC pressure on the dirver, see
BlockManagerMasterEndpoint.
The commit removes the definition of hashCode and equals from the base
class. No other change is necessary since all subclasses are in fact
case classes and therefore have auto-generated hashCode and equals. No
change of behaviour is expected.
Sidenote: you might be wondering, why did the subclasses use the base
implementation and the auto-generated one? Apparently, this behaviour
is documented in the spec. See this SO answer for details
https://stackoverflow.com/a/44990210/262432.
## How was this patch tested?
BlockIdSuite
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/criteo-forks/spark blockid-equals-hashcode
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/19369.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #19369
----
commit 3b1cb50dd1481ee4edbe767916c187709cee2de8
Author: Sergei Lebedev <[email protected]>
Date: 2017-09-27T16:13:45Z
[SPARK-22147][CORE] Removed redundant allocations from BlockId
Prior to this commit BlockId.hashCode and BlockId.equals were defined
in terms of BlockId.name. This allowed the subclasses to be concise and
enforced BlockId.name as a single unique identifier for a block. All
subclasses override BlockId.name with an expression involving an
allocation of StringBuilder and ultimatelly String. This is suboptimal
since it induced unnecessary GC pressure on the dirver, see
BlockManagerMasterEndpoint.
The commit removes the definition of hashCode and equals from the base
class. No other change is necessary since all subclasses are in fact
case classes and therefore have auto-generated hashCode and equals. No
change of behaviour is expected.
Sidenote: you might be wondering, why did the subclasses use the base
implementation and the auto-generated one? Apparently, this behaviour
is documented in the spec. See this SO answer for details
https://stackoverflow.com/a/44990210/262432.
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]