[GitHub] spark pull request: [SPARK-14993] [SQL] Fix Partition Discovery In...

tdas Mon, 02 May 2016 19:58:22 -0700

Github user tdas commented on the pull request:

    https://github.com/apache/spark/pull/12828#issuecomment-216424273
  
    @yhuai and I discussed that this solution of substring match seems very 
hacky. 
    
    The real problem is that basePaths should never have files as it does not 
make sense to have a basePath that is not a directory. So, our strategy in 
HDFSFileCatalog of making the set of input files as the default basePath is 
incorrect. The correct fix is to set the default base path based on the [dirs 
in input paths] UNION [parent dirs of files in input paths]. 
    
    Here is the fix - 
https://github.com/apache/spark/commit/fbef90f47db7c0a81ec29db27e83d0daf56673bd
    Please update your PR with this. You dont have to change `parsePartition` 
in that case. 
    
    Consider updating the scala docs to make this implicit assumption of 
`basePath` clear in the code.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-14993] [SQL] Fix Partition Discovery In...

Reply via email to