Github user patmcdonough commented on the pull request:
https://github.com/apache/spark/pull/377#issuecomment-43108393
@pwendell - you're too kind, it was my pleasure.
@shivaram - to add a bit of info about the toolchain I referred to in my
previous comment, I was using the data and commands outlined in the JIRA,
creating a remote debug config in IntelliJ, adding the debug options to
conf/java-opts in spark, setting a breakpoint somewhere in the size estimation
code path, then observing the heap using VisualVM (and the VisualGC plugin).
It's probably only about 10 minutes of set-up, but obviously not automated at
all.
Unfortunately, I didn't take notes on this, but IIRC, the internal size
estimate did not line up with what was actually on the heap as reported by the
JVM.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---