Github user vanzin commented on the issue:
https://github.com/apache/spark/pull/18849
I think there is still a lot of confusion around here about what this is
fixing. I see a bunch of comments related to testing the schema for
compatibility.
That does not work. Schema compatibility is not the issue here; the issue
is whether the table was *initially* created as Hive-compatible or not. This is
the Hive metastore, not Spark, complaining, so the Spark-side schema for
non-compatible tables is pretty irrelevant.
The schema by itself does not provide enough information to detect whether
a table is compatible or not. Even if the schema is Hive compatible, the data
source may not have a Hive counterpart, or the table might have been initially
created in a case sensitive session and have conflicting column names when case
is ignore, or a few other things, all of which are checked at table creation
time.
The same checks *cannot* be done later, and should not be done. If the
table was non-compatible it should remain non-compatible, and vice-versa. The
only thing that is needed is a way to detect that single property of the table.
You cannot do that just from the schema as has been proposed a few times here.
There are two options:
- use an explicit option, which is the approach I took
- use some combination of metadata written by old Spark versions that tells
you whether the table is compatible or not.
The only thing that exists for the second one is the serde field in the
storage descriptor. Spark sets it to either `None` or some placeholder that
does not match the datasource serde. I use that fact as a fallback for when the
property does not exist, but I think it's safer to have an explicit property
for that instead of relying on these artifacts.
Hope that clarifies things.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]