Github user pwendell commented on the pull request:
https://github.com/apache/spark/pull/3913#issuecomment-71553196
Just not sure how overall useful it would be. For RDD data, it might be
slightly misleading here because of things like serialization in-memory. For
broadcast objects in the shell, it would only work in the scala shell though
because of the way that serialization works in Python.
I'm also not totally sure overall how accurate our memory estimation is and
it may get less so if we add smarter caching for SchemaRDD's. Anyways, what
would be helpful, could you walk through an example with a case class or
something and show how accurate it is? That would be useful to better
understand.
One thing we could do that would be more isolated is have a function in
SparkContext called `estimateSizeOf(object: Any)`, so that at least we don't
expose the class location and names as API's.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]