HeartSaVioR edited a comment on pull request #28326: URL: https://github.com/apache/spark/pull/28326#issuecomment-620310834
Sorry for maybe out-of-topic, but I'm seeing that column metadata may have more issues due to not properly defined its usage and how it should be handled. I'm actually a bit worried that both Spark and end users co-use metadata and can overwrite/hide each other. I thought that's only used internally from Spark, and wasn't aware that Spark exposes a public API to modify metadata. I'm not sure this is really needed to be provided on end users side (or even 3rd party), because end users (+ 3rd party) would have no way to retrieve metadata from only public API. Retrieving metadata in end user's perspective requires pattern matching `col.expr` with `NamedExpression` and call `metadata` which is already in catalyst area (not a public API), or package hack to call `named` method (not a public API). That means, they may just blindly overwrite the one and hide the metadata of the underlying attribute. Do we have actual usage on it? Also, as @cloud-fan commented earlier https://github.com/apache/spark/pull/28326#issuecomment-619769222, metadata propagation doesn't seem to be clearly defined. Alias keeps the metadata being propagated, but I'm not sure which other operations consider the metadata propagation while considering output of the operation. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
