Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/2087#issuecomment-57244182
  
    Yeah so I just prefer keeping the TaskMetrics/InputMetrics as simple as 
possible rather than having callback registration and other state in them. The 
simplest possible interface is that they are just structs and people update 
their values. This keeps all of the logic around this thread-based Hadoop 
instrumentation local to the HadoopRDD itself, so the interface is much 
narrower between the components.
    
    Overall, I'm gauging the complexity based on how complicated the interfaces 
are, not on the complexity of the internal implementations.
    
    If we have a single large record this might be an issue. But we already 
make other assumptions that record sizes are fairly small, for instance they 
must fit easily in memory so they can't be large.
    
    By keeping the interactions between the components simpler, this will be 
easier to test also. Right now there are no unit tests for this and because the 
interfaces are complex it might be difficult to test as-is.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to