[GitHub] spark pull request: [SPARK-9585] add config to enable inputFormat ...

sryza Tue, 08 Sep 2015 13:19:36 -0700

Github user sryza commented on the pull request:

    https://github.com/apache/spark/pull/7918#issuecomment-138687768
  
    Sorry for the delay here, have been on PTO.
    
    IIUC, the change here makes Spark work with some exotic InputFormats that 
it previously did not work with due to thread safety, at the possible expense 
of performance.  Users can revert to the old behavior with a config.
    
    There's no associated JIRA, but a6eeb5ffd54956667ec4e793149fdab90041ad6c is 
the hash of the change that appears to have introduced the input format cache.  
Unfortunately, I don't see any performance numbers there justifying its 
addition.
    
    It seems like the main overhead we're trying to avoid is the reflective 
calls.  What about caching the constructor so that we don't need to look it up 
for each task?
    
    Lastly, is an equivalent change needed for NewHadoopRDD?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-9585] add config to enable inputFormat ...

Reply via email to