ryan-johnson-databricks commented on PR #40300:
URL: https://github.com/apache/spark/pull/40300#issuecomment-1463005688
> It's a good idea to provide an API that allows people to unambiguously
reference metadata columns, and I like the new `Dataset.metadataColumn`
function. However, I think the prepending underscore approach is a bit hacky.
It's too implicit and I'd prefer a more explicit syntax like `SELECT
metadata(_metadata) FROM t`. We can discuss this more and invite more SQL
experts. Shall we exclude it from this PR for now?
@cloud-fan The prepended underscore is _NOT_ primarily intended as a user
surface. Rather, it's a reliale way to get a unique column name that's still at
least somewhat readable if you look at the query plan (unlike e.g. a uuid). The
new `Dataset.metadataColumn` method does not even _look_ at a renamed
attribute's name, for example.
At this point, the only reference in the code to prepended underscores is
the two unit tests ("metadata name conflict resolved with leading underscores")
that try to validate that the renaming works as intended. If you don't think
the test coverage is important, we could remove even that?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]