zsxwing commented on pull request #31638: URL: https://github.com/apache/spark/pull/31638#issuecomment-808413955
> When we provide `/output/*` as a glob path on path, what would we expect? Currently Spark will `Don't leverage metadata in /output/b and read both directories via listing.`. Ideally we should respect metadata in each directory. But I cannot find a simple way to solve it. Ideally, we should provide different APIs for normal path and glob path. But since we have mixed two different concepts into one parameter, we cannot take back it. I'd suggest to set the goal to fix the regression from 3.0: `when a glob path is valid but we cannot call getFileStatus with it, how to allow users to access batch output.` and avoid changing any other existing behaviors. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
