Github user a-roberts commented on the pull request:
https://github.com/apache/spark/pull/12327#issuecomment-212991789
@cloud-fan I've had a closer look at this and think a more robust method
would be to use weak references to identify when an object is out of scope,
with IBM Java we see the 29% reduction between cache size 1 and cache size 2
but with OpenJDK we see a 4% increase, suggesting that we can't rely on the
sizes being similar across JDK vendors, now thinking this is a test case issue
rather than a problem in the ClosureCleaner or IBM Java code.
With IBM Java our second cache size (after repartitioning) is much smaller;
repartitioning uses ContextCleaner whereas with OpenJDK it grows. Either we
have a bigger memory footprint or the cached size is being calculated
incorrectly (looks fine to me and we actually have smaller object sizes). The
problem on Z was due to using repl/pom.xml instead of pom.xml in the Spark home
directory (same result if we use the right pom.xml file) so can be discarded
for this discussion.
I'm going to figure out what's in the Scala REPL line objects between
vendors, I think the intention of this commit is to test that the REPL line
object is being cleaned but the assertion in place at the moment doesn't look
to be correct (the size is bigger after the cleaning and cacheSize2 is the
result of cleaning if I'm understanding the code correctly), have I missed a
trick?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]