Github user pwendell commented on the pull request:
https://github.com/apache/spark/pull/2087#issuecomment-57244182
Yeah so I just prefer keeping the TaskMetrics/InputMetrics as simple as
possible rather than having callback registration and other state in them. The
simplest possible interface is that they are just structs and people update
their values. This keeps all of the logic around this thread-based Hadoop
instrumentation local to the HadoopRDD itself, so the interface is much
narrower between the components.
Overall, I'm gauging the complexity based on how complicated the interfaces
are, not on the complexity of the internal implementations.
If we have a single large record this might be an issue. But we already
make other assumptions that record sizes are fairly small, for instance they
must fit easily in memory so they can't be large.
By keeping the interactions between the components simpler, this will be
easier to test also. Right now there are no unit tests for this and because the
interfaces are complex it might be difficult to test as-is.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]