Github user tdas commented on the pull request:
https://github.com/apache/spark/pull/12828#issuecomment-216424273
@yhuai and I discussed that this solution of substring match seems very
hacky.
The real problem is that basePaths should never have files as it does not
make sense to have a basePath that is not a directory. So, our strategy in
HDFSFileCatalog of making the set of input files as the default basePath is
incorrect. The correct fix is to set the default base path based on the [dirs
in input paths] UNION [parent dirs of files in input paths].
Here is the fix -
https://github.com/apache/spark/commit/fbef90f47db7c0a81ec29db27e83d0daf56673bd
Please update your PR with this. You dont have to change `parsePartition`
in that case.
Consider updating the scala docs to make this implicit assumption of
`basePath` clear in the code.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]