[GitHub] spark pull request: [SQL][SPARK-5453] Use property 'mapreduce.inpu...

marmbrus Sat, 11 Apr 2015 16:14:17 -0700

Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/4246#issuecomment-91941342
  
    Thanks for addressing the comments.  As I look at this closer I wonder if 
we aren't approaching this in the wrong way.  If this is a hadoop option that 
FileInputFormat supports, why aren't users just setting it in the hadoop 
configuration?  (i.e. `sc.hadoopConfiguration.set("mapreduce.input.pathFilter", 
...)`).  As it stands now, it is sprinkling new logic all over that is reading 
from some random hadoop config.  This seems very likely to lead to errors in 
the future.  If we aren't properly propagating the hadoop config that is 
another thing that should be fixed separately.
    
    Unless there is something that I am missing, I suggest that we close this 
issue and take the hadoop configuration approach.
    
    If you do think this is the right approach, we can discuss further, but I 
have some feeback on the PR in general:
     - The indentation is still off in several places.  Please look at the 
coding style guide.
     - There is a lot of duplicated logic (some of which is wrong).  It would 
be better to isolate these things, so if we need to fix it in the future we 
can.  For example I see this patter a lot:
    
    ```scala
    val filterClassName = 
sqlContext.getConf(NewFileInputFormat.PATHFILTER_CLASS, "")
      if (filterClassName.nonEmpty) {
        val filterClazz = Class.forName(filterClassName).asInstanceOf[Class[_ 
<: PathFilter]]
    ```
    
    However `Class.forName` is probably not using a classloader that can see 
user code.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SQL][SPARK-5453] Use property 'mapreduce.inpu...

Reply via email to