Github user ericl commented on the issue:
https://github.com/apache/spark/pull/14690
> For one thing, a ListingFileCatalog performs a file tree traversal right
off the bat. However, the external catalog returns the locations of partitions
as part of the listPartitionsByFilter call. I believe that should suffice for
the purpose of building a query plan for metastore-backed tables and executing
it.
You'd have to re-implement a large portion of the parallel traversal logic
here right? I think we should keep this PR minimal and leave that for future
work. I am also thinking of adding a per-directory file listing cache as a
followup to avoid performance regressions, which would likely involve
refactoring this path anyways.
>I would be wary of amending our data sources to support case-insensitive
field resolution. For one thing, strictly speaking it can lead to ambiguity in
schema resolution. In theâpotential but unlikelyâevent that a
(case-sensitive) data source schema has two distinct fields x1 and x2 such that
x1.toLowerCase == x2.toLowerCase we're going to get undefined behavior.
> For another, for case-sensitive data sources this adds code complexity in
their implementation.
I do agree this might be an issue with other datasources. For parquet
though, I talked with @liancheng and we don't think there are any issues with
supporting case-insensitive field resolution. Given that, I think we can also
leave this for future work when we add datasource table support. It might also
be that we need to add back something like
https://github.com/apache/spark/pull/14750
> Finally, this would require us to read the schema files. That's something
I'm trying to avoid in this patch.
Not sure what you mean here, but the parquet change should be execution
time only. I'll submit a pr here for that.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]