[GitHub] [spark] ryan-johnson-databricks commented on pull request #40300: [SPARK-42683] Automatically rename conflicting metadata columns

via GitHub Thu, 09 Mar 2023 15:52:14 -0800


ryan-johnson-databricks commented on PR #40300:
URL: https://github.com/apache/spark/pull/40300#issuecomment-1463005688


   > It's a good idea to provide an API that allows people to unambiguously 
reference metadata columns, and I like the new `Dataset.metadataColumn` 
function. However, I think the prepending underscore approach is a bit hacky. 
It's too implicit and I'd prefer a more explicit syntax like `SELECT 
metadata(_metadata) FROM t`. We can discuss this more and invite more SQL 
experts. Shall we exclude it from this PR for now?
   
   @cloud-fan The prepended underscore is _NOT_ primarily intended as a user 
surface. Rather, it's a reliale way to get a unique column name that's still at 
least somewhat readable if you look at the query plan (unlike e.g. a uuid). The 
new `Dataset.metadataColumn` method does not even _look_ at a renamed 
attribute's name, for example.
   
   At this point, the only reference in the code to prepended underscores is 
the two unit tests ("metadata name conflict resolved with leading underscores") 
that try to validate that the renaming works as intended. If you don't think 
the test coverage is important, we could remove even that?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] ryan-johnson-databricks commented on pull request #40300: [SPARK-42683] Automatically rename conflicting metadata columns

Reply via email to